Melvin's digital garden

Dehal2005

CREATED: 200906220258 LINK: url:~/Modules/Literature/Dehal2005.pdf ** Introduction Supporting 2R, ref 4-8 Supporting 1R, ref 9-11 Refuting WGD altogether, ref 12,13

2R supported by Hox clusters which follow 4:1 rule, however less than 5% of homologous gene families follow 4:1 rule

Relationship of vertebrate multigene families do not show (AB)(CD) relationship

** Challenges only minority of gene pairs will adopt new function (neofunctionalization)

or partition old functions (subfunctionalization)

most are loss due to mutations

no complete sequence of from an outgroup that is closely related to vertebrates, all methods of phylogenetic reconstruction are less than accurate with more distant relatives such as Drosophila and yeast

no method to accurately and comprehensively cluster genes into homologous families

** Tree based genome sequence available for Ciona intestinalis, Takifugu rubripes, mouse and human

graph based clustering of gene sets of the four chordates

build gene tree: multiple sequence alignment and max likelihood evolutionary tree was constructed for each cluster

compare gene tree with species tree

46.6% of ancestral chordate genes appear in duplicate in one or more lineages 34.5% having at least one duplication before the divergence of fish from tetrapods 23.5% having at least one duplication afterwards 3753 gene duplications are placed at the base of the Vertebrata

** Map based examined relative position of the paralogs resulting from the 3753 early duplication events in the human genome

identify all cases where >= 2 different early-duplicating gene are within 100 gene window, then for each, querying all other places in the genome using a sliding window to count the number of cases in which their respective paralogs are within both 50 genes upstream as well as 50 genes downstream from that point

4 fold is the most prevalent, accounts for only 25% of the genome

evaluated whether 4-fold matchings falls into tetra-paralogons

identify paralogons containing the same set of at least two duplicated gene pairs, while allowing a maximum of 100 unduplicated genes in between (similar to ref 10).

identified 2953 paralogous human gene pairs that resulted from 1912 genes that duplicated prior to the divergence of fish and tetrapod.

Of these, 32.4% are still in 386 detectable paralogons comprising 772 individual genomic segments containing from 2 to 42 gene pairs.

Of these 772 segments, 454 comprise tetra-paralogons in which overlapping sets of paralogs fall into 4-fold groups.

in constrast when looking at gene pairs that arose from a duplication event after the divergence of fish and mammals, only 11% are detected in paralogons.

by looking at tandemly duplicated genes (paralogons on the same chromosome, separated by fewer than 10 genes), 50% of these human gene pairs arose from tandem duplication compared to 6% for human gene pairs that arose before the divergance of fish and tetrapod lineages

Links to this note