Melvin's digital garden

Algorithms and complexity of assembling microbial genomes

created#201006021752 #seminar #genome_assembly #niranjan_nagarajan

Pan genomics: bacteria and variants

Metagenomics: bacteria and others in its environment

Assembly using optical maps

Fragmented genomes is difficult to analyse. Technologies such as mate pairs, landmark maps helps to determine the order of the contigs.

Restriction enzymes cut the DNA at specific sites. Optical scaffolding have sizing errors and missing fragments.

[http://www.cbcb.umd.edu/soma/|SOMA] is a software for assembling microbial genomes using optimal restriction map alignment.

Matching = scoring of a single match to minimize missing sites and maximize alignment score, solved by DP

Global placement = matching all the contigs the restriction map

Significance of a placement checked using permutation test.

Repeats? Checked using F-test

Graph models and complexity

Assembly models:

  • shortest common superstring
  • path in overlap/De Bruijn/string graph

Studied parametric complexity of Hamiltonian path on string graph. Read length $r$ and overlap length $k$

  • no repeats longer than $k$ -> easy
  • all long repeats -> Chinese Postman problem
  • unique solution? there is an algorithm to test
  • counting number of solutions is #P complete
  • have an algorithm to enumerate subpaths that are part of every CP path
  • short repeats -> NP hard
  • some other results on hardness for different combinations of r and k

Reassortment in flu genomes

Flu virus has a segmented genome, shuffling of segments leads to pandemics

Difficulties: many strains, low confidence branches

Incompatible splits in phylogenetic tree is evidence for reassortment

Distribution of trees (ML, MCMC) -> distribution of splits

Construct a bipartite graph where the node set are splits for genome G and genome H. Add an edge if the two splits are incompatible (incompatibility graph).

Find bicliques to detect reassortment.

Enumerating maximal bicliques using consensus generation. If (A,B) and (C,D) are maximal bicliques, then (A u B, C n D) is also a maximal biclique.

Links to this note