Melvin's digital garden

Vilella2009

CREATED: 201003251732 LINK: url:~/Modules/Literature/Vilella2009.pdf TITLE: EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates

Presents a computational framework for computing gene trees. Proposed new metrics and benchmarks several tree methods.

Until April 2006, Ensembl used Best Reciprocal BLAST method. In June 2006, it was replaced with a phylogenetically sound, gene tree-based approach.

Programs for generation of genomewide orthology:

  • Inparanoid (Remm2001)
  • MSOAR (Fu2007)
  • OrthoMCL (Li2003)
  • HomoloGene (Wheeler2008)
  • TreeFam (Li2006)
  • PhyOP (Goodstadt2006)
  • PhiGs (Dehal2006)

First four does not use gene trees, last three does.

Ensembl using TreeBeST, which incorporates species-tree aware penalization of topologies.

PhyML (Guindon2003) is a maximum likelihood method for tree building.

Method

consider only protein coding genes, for each gene consider the longest

  • translation

All vs all comparison using WUBLASTP

Connection between nodes (proteins) are retained when they are BRH or BLAST

  • score ratio (BSR) > 1/3

BSR(p1,p2) = score(p1,p2)/max(self-score(p1), self-score(p2))

Extract connected components from the graph. If components has more than 750

  • members, repeat 3 and 4 at high stringency (BSR threshold increased by 0.1)

Using MUSCLE (Edgar2004) to do multiple alignment

Gene tree and reconciliation using TreeBeST

Inference of orthologs and paralogs

Compute pairwise d_N/d_S (nonsynonymous substitutions/synonymous

  • substitutions) using pairs of genes for closely related species using
  • codeml from the PAML package (Yang2007)

TreeBeST runs a number of independent phylogenetic method (DNA, code, and protein maximum likelihood) and then create a combined tree that penalizes duplications and deletions relative to a known species tree.

Metrics: coverage in homology relationships, duplication consistency score, gene synteny metric

Conclusion: TreeBeST was the best method on the criteria of duplication consistecy and synteny consistency.

Comparison of ortholog sets between EnsemblCompara, Inparanoid, HomoloGene, PhyOP, and TreeFamCurated for certain pairs of species.

Projection of GO terms via orthology links.

Links to this note