Melvin's digital garden

Wall2003

Reciprocal smallest distance algorithm

Blast search often returns as highest scoring hit a protein that is not the nearest phylogenetic neighbor of the query sequence.

Query sequence i

Retrieve a set of hits H with $E < 10^{-20}.$

Aligh each sequence in H with i, if alignable region is at least 0.80 of the alignment’s total length then use PAML to obtain a maximum likelihood estimate of the number of amino acid substitutions separating the two sequences, given an empirical amino acid substitution rate matrix (gamma distribution with shape parameter alpha = 1.53). The sequence with the shortest distance is retained. Call this sequence j.

Use j as a query to see if i has the shortest distance when starting from j. If so, (i,j) is a true orthologous pair.

Links to this note