She2009
CREATED: 201004261537 LINK: url:/home/melvin/Modules/Literature/She2009.pdf
Output of BLAST are HSP (high scoring pairs), pairs of regions from query to target.
Suppose query is a gene, want to group target HSP into genes. Difficulty due to multiple paralogous genes.
Cui2007 group HSP based on distance. Assume HSP in same gene are nearer than HSP between two genes, not true for paralogous genes in tandem cluster. Use of ad-hoc distance threshold to separate adjacent genes.
Approach is to model HSP in a DAG and partition HSP by computing shortest path in DAG. Makes use of novel length metric in the DAG. Similar to problem of determining optimal storage of steel beams.