Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

Similar presentations


Presentation on theme: "1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and."— Presentation transcript:

1 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and Evolution Biomathematics Research Centre University of Canterbury, Christchurch, New Zealand

2 2 Where are phylogenetic trees used? Evolutionary biology – species relationships, dating divergences, speciation processes, molecular evolution. Ecology – classifying new species; biodiversity, co-phylogeny, migration of populations. Epidemiology – systematics, processes, dynamics Extras - linguistics, stematology, psychology.

3 3 Phylogenetic trees [Definition] A phylogenetic X-tree is a tree T=(V,E) with a set X of labelled leaves, and all other vertices unlabelled and of degree >3. If all non-leaf vertices have degree 3 then T is binary

4 4 Trees and splits 1 2 3 4 5 6 Partial order: Buneman’s Theorem

5 5 Quartet trees A quartet tree is a binary phylogenetic tree on 4 leaves (say, x,y,w,z) written xy|wz. A phylogenetic X-tree displays xy|wz if there is an edge in T whose deletion separates {x,y} from {w,z} x y w z r y z u x s w

6 6 Corresponding notions for rooted trees Clusters (in place of splits) Triples in place of quartets

7 7 How are trees useful in epidemiology? Systematics and reconstruction How are different types/strains of a virus related? When, where, and how did they arise? What is their likely future evolution? What was the ancestral sequence?

8 8 How are trees useful in epidemiology? Processes and dynamics (“Phylodynamics”) How do viruses change with time in a population? Population size etc What is their rate of mutation, recombination, selection? Within-host dynamcs  How do viruses evolve in a single patient?  How is this related to the progression of the disease?  How much compartmental variation exists?

9

10 10 What do the shapes of these trees tell us about the processes governing their evolution? Eg. Population dynamics, selection Coalescent prediction

11 11 abcde Tree shapes (non-metric) George Yule

12

13 13 Why do trees on the same taxa disagree? 1. Model violation 1. “true model” differs from “assumed model” 2. “true model = assumed model” but estimation method not appropriate to model 3. model true but too parameter rich (non-identifyability) 2. Sampling error (and factors that make it worse!) 3. Alignment error 4. Evolutionary processes 1. Lineage sorting 2. Recombination 3. Horizontal gene transfer; hybrid taxa 4. Gene duplication and loss

14 14 Sampling error that’s hard to deal with ? T4T4  T3T3 T2T2 T1T1 Time

15 15 Example: Deep divergence in the Metazoan phylogeny From Huson and Bryant, 2006

16 16 Models 1 2 3 4 1 3 2 4 vs Finite state Markov process

17 17 Models 1 2 3 4 vs 1 2 3 4 “site saturation” subdividing long edges only offers a partial remedy (trade-off).

18 18 Why do trees on the same taxa disagree? 1. Model violation 1. “true model” differs from “assumed model” 2. “true model = assumed model” but estimation method not appropriate to model 3. model true but too parameter rich (non-identifyability) 2. Sampling error (and factors that make it worse!) 3. Alignment 4. Evolutionary processes 1. Lineage sorting 2. Recombination 3. Horizontal gene transfer; hybrid taxa 4. Gene duplication and loss

19 19 Gene trees vs species trees Theorem J. H. Degnan and N.A. Rosenberg, 2006. For n>5, for any tree, there are branch lengths and population sizes for which the most likely gene tree is different from the species tree. Discordance of species trees with their most likely gene trees. PLoS Genetics, 2(5), e68 May, 2006 a b c

20 20 Example Orangutan GorillaChimpanzee Human Adapted From the Tree of the Life Website, University of Arizona ?

21 21 Distinguishing between signals Lineage sorting vs sampling error vs HGT A B C A C B

22 22 Why do trees on the same taxa disagree? 1. Model violation 1. “true model” differs from “assumed model” 2. “true model = assumed model” but estimation method not appropriate to model 3. model true but too parameter rich (non-identifyability) 2. Sampling error (and factors that make it worse!) 3. Alignment 4. Evolutionary processes 1. Lineage sorting 2. Recombination 3. Horizontal gene transfer; hybrid taxa 4. Gene duplication and loss

23 23 Given a tree what questions might we want to answer? How reliable is a split? Where is the root of the tree? Relative ranking of vertices? Dating? How well supported is some ‘deep divergence’ resolved? What model best describes the evolution of the sequences (molecular clock? dS/dN ratio constant? etc) Statistical approaches: Non-parametric bootstrap Parametric bootstrap Likelihood ratio tests Bayesian posterior probabilities Tests (KH, SH, SOWH) Goldman, N., J. P. Anderson, and A. G. Rodrigo. 2000. Likelihood-based tests of topologies in phylogenetics. Systematic Biology 49: 652-670.

24 24 From Steve Thompson, Florida State Uni

25 25 Example

26 26 Non-parametric bootstrap

27 27

28 28 Dealing with incompatibility: Consensus trees Strict Majority rule Semistrict consensus

29 29 Consensus networks Take the splits that are in at least x% of the trees and represent them by a graph Splits Graph (G(  )) – Dress and Huson Each split is represented by a class of ‘parallel’ edges Simplest example (n=4).

30 30 (NS) (SS) (NS) (SS) (NS) (SS) chloroplast J SA tree (A) (NS) (SS) (N,NS) (A) (SS) (NS) (C,S) (NS) (N) (NS, N)

31 31 nuclear ITS tree (SS) (NS) (SS) (NS) (SS) (NS) (SS) (NS) (SS) (NS,N) (NS) (SS) (NS,N) (A) (N) (SS,NS)

32 32 consensus network (ITStree+JSAtree) I III II R.nivicola

33 33 Maximum agreement subtrees Concept Computational complexity

34 34 Comparing trees Splits metric (Robinson-Foulds) Statistical aspects. Tree rearrangement operations – the graph of trees (rSPR). Cophylogeny

35 35 Co-phylogeny (m. charleston)

36 36 Supertrees Compatibility concept Compatibility of rooted trees (BUILD) Why do we want to do this? Extension – higher order taxa, dates Methods for handling incompatible trees (MRP; mincut variants; minflip)

37 37 Compatibility Example: Q={12|34, 13|45, 14|26 } 1 2 3 4 5 6 A set Q of quartets is compatible if there is a phylogenetic X-tree T that displays each quartet of Q Complexity?

38 38 Supertrees Compatibility concept Compatibility of rooted trees (BUILD) Why do we want to do this? Extension – higher order taxa, dates Methods for handling incompatible trees (MRP; mincut variants; minflip)

39 39 Phylogenetic networks Consensus setting: consensus networks Minimizing hybrid/reticulate vertices Supernetworks – Z closure, filtering

40 40 a Networks can represent: Reticulate evolution (eg. hybrid species) Phylogenetic uncertainty (i.e. possible alternative trees) Z-closure Given T 1,…, T k on overlapping sets of species, let construct spcl 2 (  ) and construct the ‘splits graph’ of the resulting splits that are ‘full’. c b d a b c d a c b d

41 41 A2A2A2A2 B2B2B2B2 Split closure operation (Meacham 1986) A1A1A1A1 B1B1B1B1 A2A2A2A2 B2B2B2B2 A1A1A1A1 B1UB2B1UB2B1UB2B1UB2 A1UA2A1UA2A1UA2A1UA2 B2B2B2B2, A1A1A1A1 B1B1B1B1

42 42

43 43

44 44 Reconstructing ancestral sequences Methods (MP, Likelihood, Bayesian) Quiz. MP for a balanced tree = majority state? Information-theoretic considerations

45 45 Statistics of parsimony (clustering on a tree)


Download ppt "1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and."

Similar presentations


Ads by Google