Melvin's digital garden

Chauve2006

CREATED: 200801070356 LINK: url:~/Modules/Literature/Chauve2006.pdf TITLE: On common interval with errors

** Definitions

  • permutation/sequence
  • multiple genomes
  • quorum parameter
  • allowing errors

** Concepts

  • character set
  • set distance: symmetric set distance, set transformation distance
  • d-neighbourhood (absolution), $\epsilon$-neighbourhood (relative)
  • occurrences
  • location set
  • covering
  • d-cluster, $\epsilon$-cluster
  • maximality of a cluster
  • equivalence of clusters

** Algorithms *** common intervals as character sets Generate all possible character sets that could be a representative and checks if each of these character sets is a representative for a set of approximate occurrences that cover at least q sequences. Running time is $O(k^2 n^{2d + 5})$. *** common intervals as cliques in graphs Construct a vertex-labeled bipartite graph $G = (U \cup V, R)$ where U contains one vertex u for each character set in C^all and V contains one vertex v for each character set in N_d(C^all). Two vertices u and v are connected by an edge if the distance between them is not larger than d

Can reduce the problem to finding all maximal cliques, maximal cliques as equivalence classes of clusters *** common intervals as binary arrays Represent a set as a binary array and define a d-center of a set of binary arrays

** Discussion

  • cluster maximality is not efficiently taken into account
  • maximality is central to filter the enormous amount of output

Links to this note