Maximum Acyclic Subgraph - Multiple Sequence Alignment
MaximumAcyclicSubgraph
Directory structure:
1 {C++}
2 Color ColorBasedOnMapping(string kmer){
3  if (kmerSize==3) {
4  ...
5  return ...
6  }
7  // return missing in this case
8 }

Commenting

Versions

1 ## Refactor
2  Do not reinvent the wheel, e.g. reuse a suitable existing list of well-distinguishable RGB values
3  Do not use DOS linebreaks.
4 
5 ### Color palette (Lucas)
6 1) The one-to-one correspondence between (DNA) letter and number (e.g. of type `char`) should be specified only once.
7 
8 ```C++
9  int charRead(char character){
10  if (character=='A')
11  return 0;
12  else if(character == 'C')
13  return 1;
14  else if(character == 'T')
15  return 2;
16  else if(character == 'G')
17  return 3;
18  }

and

1 {C++}
2  string alphabet = "ACTG";

2) Color palette should not require to include a graph => modularize better

3) Color map: assignment k-mer <-> color need not be stable across different problem instances.

4) kmercolor.cc references Graph::getSimpleKOfKmer() that does not exist.

Graph construction (Sebastian)

1) Lines too long. GitHub needs a horizontal scrollbar to show the code. 2) Tabs instead of space, inconsistent alignment of comments

1 {C++}
2 std::vector<std::array<int,3>> listOfEdges; // vector of edges (as array with size 3)
3 std::vector<std::string> stringListSequence; // vector of strings for the sequences
4 std::vector<int> numberOfKmer; // vector amount of k-mer

3) Graph does not need to know k.

4) Edges should probably be pairs of Nodes, rather than array<int,3>. This abstraction allows e.g. to consider edges that are not between neighboring sequences.

5) The sequence itself need not be displayed. It is irrelevant for the game or RL problem and probably clutters the image. std::vector<std::string> stringListSequence;

Visualization (Moritz)

1) Lines too long. Tabs instead of space. Comment style.

2) Variable name: kMer is expected to be a string of lenght k, not an int.

3) What is wrong here?

1 {C++}
2 const vector<int>& Graph::getNumberOfKmer(){
3  int numStrings=stringListSequence.size(); // amount of sequences
4  for(int i=0; i<numStrings; i++){ // for-loop to get number of k-mer
5  numberOfKmer.push_back(splitString.at(i).size());
6  }
7  return numberOfKmer;
8 }

4) splitString does not seem to be initialized before splitString.at(i).push_back() is called

5) k-mers can have less than k characters at sequence end - don't allow that (statistical significance)

6) Don't make the secret, undocumented assumption that k-mers cannot overlap.

1 {C++}
2 for(int j=0; j<stringListSequence.at(i).length(); j=j+kMer){

Rather, introduce a variable unsigned shift

1 <-------- k ------->
2 --------------------
3 <-shift->
4  --------------------
5  <-shift->
6  --------------------

and default it to k.

State + Actions (Maurice)

1) Avoid multiple vectors that are supposed to be of the same length(?).

1 {C++}
2  std::vector <std::array<int,3>> edges;
3  std::vector <bool> consistentSubset; //shows whether edge is selected in current subset (true=selected)
4  std::vector <bool> selectable; //shows whether edge is selectable in the current subset (true=selectable)

Rather, define a new tuple, class or struct and use a single vector, e.g. EdgeChoice (Edge e, bool chosen, bool selectable)

2) Unintuitive name: void state::possibleActions(void)

3) Do not use l as variable name.

4) Algorithm in state::possibleActions needs documentation. Also the consistency check of a pair of edges should be moved to a separate function.