Melvin's digital garden

Bocker2008

CREATED: 200812140545 LINK: url:~/Modules/Literature/Bocker2008.pdf Title: Computation of Median Gene Clusters ** Forces acting on genomes random shuffling: good for phylogenetic inference

selective pressure: good for functional inference

** Assumption gene order not considered

multiple occurrence of gene possible

** Median of character sets $D(CS_1, CS_2)$ = symmetric set distance distance threshold, $\delta$, minimum cluster size, $s$

Find all $M \subseteq \Sigma$ such that $\sum D(M, CS(S_{i,j})) \le \delta$ and $|M| \ge s$

Search space is all $O(n^{2k})$ substrings of $S_1, S_2, \ldots, S_k$

** Algorithm Main idea: at least one CS has distance $\le \delta/k$ from the median

Compute cluster filters, extension of Connecting Intervals algorithm

Compute k-tuples containing a cluster filter, worst case $O(n^k)$

Compute median by majority vote

** Experiment comparison with ILP by Rahmann and Klau

five gamme proteobacteria

** Future ranking of gene clusters

webserver

more genomic datasets

model extension to include quorum paramter

Links to this note