Download presentation
Presentation is loading. Please wait.
Published byNoel Hodges Modified over 9 years ago
1
Gene tree discordance and multi-species coalescent models Noah Rosenberg December 21, 2007 James Degnan Randa Tao David Bryant Mike DeGiorgio
2
Gene trees and species trees Different genes may produce different inferences about species relationships
3
Coalescent model for evolution within species, conditional on the species tree Hudson (1983, Evolution) Tajima (1983, Genetics) Nei (1987, Molecular Evolutionary Genetics book) Pamilo & Nei (1988, Molecular Biology and Evolution) Takahata (1989, Genetics) Wu (1991, Genetics) Hudson (1992, Genetics) Maddison (1997, Systematic Biology) T2T2 T3T3
4
1.Coalescences occur within species, with the same rate for each lineage pair. 3.When species splits are encountered, lineages from all groups descended from the split are allowed to coalesce. Assumptions of the multispecies coalescent model conditional on a species tree 2.The rate of coalescence is proportional to the number of pairs of lineages. T2T2 T3T3
5
The probability that i lineages have j ancestors at T coalescent time units (T = t / N ) in the past is a [k] = a(a-1)…(a-k+1) a (k) = a(a+1)…(a+k-1) Takahata and Nei (1985, Genetics) Tavare (1984, Theoretical Population Biology)
6
Concordant gene treeDiscordant gene tree 2.1/3 of the probability that gene tree is determined in the ancestral phase, or (1/3)e -T 1.The probability gene tree is determined in the 2-species phase, or 1-e -T Probability of concordance equals 1-(2/3)e -T For 3 taxa, the probability of concordance is a sum of two terms: T ABC Probability of a concordant gene tree topology Hudson (1983, Evolution) Nei (1987, Molecular Evolutionary Genetics) Tajima (1983, Genetics)
7
Probability of the matching gene tree ((AB)C) Probability of a particular discordant gene tree ((BC)A)
8
It would be desirable to have a general computation of the probability that a particular species tree topology with branch lengths gives rise to a particular gene tree topology
9
Gene tree probabilities under the multispecies coalescent model A coalescent history gives the list of species tree branches on which gene tree coalescences occur. Consider a species tree S (topology and branch lengths) Consider a species tree G (topology only) ABCABC JH Degnan & LA Salter Evolution 59: 24-37 (2005)
10
The list of coalescent histories for an example with five taxa ABCDEACBDE Species tree Gene tree 4 3 2 1 (A,C) ((AC),B)(D,E)(((AC)B,(DE)) Probability g ij (T) is the probability that i lineages coalesce to j lineages during time T
11
What are the properties of the number of coalescent histories? Computing the probabilities of gene trees Is it possible for the most likely gene tree to disagree with the species tree? Using the probabilities of gene trees How do species tree inference algorithms behave when applied to multiple gene trees?
12
The number of coalescent histories
13
The number of coalescent histories for the matching gene tree 1 2 3 4 5 6 7 8 ABCDEF A S,m is the number of coalescent histories for the matching gene tree when we subdivide the species tree root into m pieces
14
The number of coalescent histories for trees with at most 5 taxa
15
Number of coalescent histories for special shapes with n taxa Catalan number C n-1 (Degnan 2005) 1, 2, 5, 14, 42, 132, 429, 1430… Number of taxa in left subtree is l -, -, -, 13, 42, 138, 462, 1573…
16
The number of coalescent histories for up to 11 taxa
17
Ratio of the largest and smallest number of coalescent histories for n taxa >
18
Which types of shapes have the most coalescent histories? The number of coalescent histories for trees with 8 taxa Most Least
19
Caterpillar-like shapes with n taxa, based on 4- and 5-taxon subtrees C n-1 ~(5/4)C n-1 (1.25)C n-1 ~(23/16)C n-1 (1.4375)C n-1
20
Largest values for caterpillar-like shapes based on 7 and 8-taxon subtrees ~(1381/256)C n-1 (5.39453125)C n-1 ~(189/64)C n-1 (2.953125)C n-1
21
Can a non-matching gene tree have more coalescent histories? Caterpillar species tree 1430 coalescent histories 1441 coalescent histories
22
Is it possible for the most likely gene tree to disagree with the species tree? Using the probabilities of gene trees How do species tree inference algorithms behave when applied to multiple gene trees? What are the properties of the number of coalescent histories? Computing the probabilities of gene trees
23
For n>3 taxa, can species trees be discordant with the gene trees they are most likely to produce?
24
The labeled history for a gene tree is its sequence of coalescence events. BCDABCDA The two labeled histories below produce the same labeled topology ((AB)(CD)) Randomly joining pairs of lineages leads to a uniform distribution over the set of possible labeled histories. The number of labeled histories possible for four taxa is
25
ABCD T2T2 T3T3 If the branch lengths of the species tree are sufficiently short, coalescences will occur more anciently than the species tree root. BCDA BCDA BCDA Combined probability 1/9 Probability 1/18
26
((AB)(CD))0.132 ((AC)(BD))0.094 ((AD)(BC))0.094 (((AB)C)D)0.125 (((AB)D)C)0.100 (((AC)B)D)0.070 (((AC)D)B)0.062 (((AD)B)C)0.032 (((AD)C)B)0.032 (((BC)A)D)0.070 (((BC)D)A)0.062 (((BD)A)C)0.032 (((BD)C)A)0.032 (((CD)A)B)0.032 (((CD)B)A)0.032 0.14 ABCD Species tree Gene tree frequency distribution Matching gene tree
27
T 2 (units of N generations) T3T3 Species tree is (((AB)C)D) Most likely gene tree is not (((AB)C)D) T2T2 T3T3 Species tree is (((AB)C)D) but most likely gene tree is ((AB)(CD)) A species tree topology produces anomalous gene trees if branch lengths can be chosen so that the most likely gene tree topology differs from the species tree topology.
28
ABCD T2T2 T3T3 BCDA BCDA BCDA Combined probability 1/9 Probability 1/18 Does the 4-taxon symmetric species tree topology produce anomalous gene trees?
29
3 species – no anomalous gene trees. 4 species – asymmetric but not symmetric species trees have AGTs. 5 or more species? Probability of the concordant gene tree Probability of a particular discordant gene tree
30
BCDABCDAEBDEAFC For n > 4, suppose a species tree topology is not n-maximally probable. If its branches are short enough, it produces AGTs that are n-maximally probable. With 5 or more species, any species tree topology produces at least one anomalous gene tree. A labeled topology for n taxa is n-maximally probable if its probability under random branching is greater than or equal to that of any other labeled topology with n taxa. Proof:
31
Suppose a species tree topology is n-maximally probable. With 5 or more species, any species tree topology produces at least one anomalous gene tree. Proof (continued): For n > 8 an inductive argument reduces the problem to the case of n=5, 6, 7, or 8. For n=5, 6, 7, or 8 taxa it remains to show that the n-maximally probable species tree topologies produce AGTs.
32
With 5 or more species, any species tree topology produces at least one anomalous gene tree. Proof (continued): For n=5 the n-maximally probable species tree topology produces AGTs.
33
With 5 or more species, any species tree topology produces at least one anomalous gene tree. Proof (continued): For n=5, 6, 7, or 8 the n-maximally probable species tree topologies produce AGTs.
34
With 5 or more species, any species tree topology produces at least one anomalous gene tree. Proof (continued): For n > 8 one of the two most basal subtrees has between 5 and n-1 taxa inclusive. GHI J Choose branch lengths to produce an AGT for that subtree, and make them long for the other subtree. An inductive argument for n > 8 reduces the problem to the case of n=5, 6, 7, or 8.
35
If the species tree topology is not n-maximally probable, it has maximally probable AGTs. With 5 or more species, any species tree topology produces at least one anomalous gene tree. Proof (summary): For n > 8, induction reduces the problem to the case of n=5, 6, 7, or 8. By example, n-maximally probable species tree topologies produce AGTs for n=5, 6, 7, or 8. This completes the proof
36
Some properties of anomalous gene trees
37
Species tree Gene tree ABCDE DECAB Anomalous gene trees can have the same unlabeled shape as the species tree
38
There exist mutually anomalous sets of tree topologies (“wicked forests”).
39
AGTs can occur if some but not all species tree branches are short T4 T3 T2
40
T 2 (units of N generations) T3T3 Does the severity of AGTs increase with more taxa? Maximal value for shared branch length that still produces AGTs: 0.1568
41
Does the severity of AGTs increase with more taxa?
42
Number of AGTs for the 4-taxon asymmetric species tree
43
Number of AGTs for 5-taxon species trees
44
Does the number of AGTs increase with more taxa?
45
What implications do gene tree probabilities have for phylogenetic inference algorithms?
46
Most commonly observed gene tree topology Statistically inconsistent in estimating the species tree T3T3 T2T2 ABCD T 2 (units of N generations) T3T3 ABCD ABCD Species treeEstimated species tree
47
Estimated gene tree of concatenated sequence Statistically inconsistent in estimating the species tree
48
Maximum likelihood based on the frequency distribution of gene tree topologies Statistically consistent even when anomalous gene trees exist ((AB)(CD))0.132 ((AC)(BD))0.094 ((AD)(BC))0.094 (((AB)C)D)0.125 (((AB)D)C)0.100 (((AC)B)D)0.070 (((AC)D)B)0.062 (((AD)B)C)0.032 (((AD)C)B)0.032 (((BC)A)D)0.070 (((BC)D)A)0.062 (((BD)A)C)0.032 (((BD)C)A)0.032 (((CD)A)B)0.032 (((CD)B)A)0.032 0.14 ABCD Species tree Gene tree frequency distribution Matching gene tree Anomalous gene tree
49
Consensus among gene tree topologies -Majority rule consensus -Greedy consensus -Rooted triple consensus (R*)
50
Tree obtained by agglomeration using minimum pairwise coalescence times across a large number of loci (“Glass tree”)
51
Summary There exist algorithms for computing gene tree probabilities on species trees The number of coalescent histories increases quickly - algorithmic improvements in gene tree probability computations are likely possible HOWEVER, some algorithms can infer the correct species tree even when gene tree discordance is extreme A species tree can disagree with the gene tree that it is most likely to produce This severe discordance only gets worse with more taxa
52
Acknowledgments David Bryant Mike DeGiorgio James Degnan Randa Tao National Science Foundation DEB-0716904
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.