1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

Slides:



Advertisements
Similar presentations
The multispecies coalescent: implications for inferring species trees
Advertisements

CS 598AGB What simulations can tell us. Questions that simulations cannot answer Simulations are on finite data. Some questions (e.g., whether a method.
An introduction to maximum parsimony and compatibility
A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic reconstruction
Molecular Evolution Revised 29/12/06
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
The (Supertree) of Life: Procedures, Problems, and Prospects Presented by Usman Roshan.
Inferring Evolutionary History with Network Models in Population Genomics: Challenges and Progress Yufeng Wu Dept. of Computer Science and Engineering.
Fast Computation of the Exact Hybridization Number of Two Phylogenetic Trees Yufeng Wu and Jiayin Wang Department of Computer Science and Engineering University.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Classification and Phylogenies Taxonomic categories and taxa Inferring phylogenies –The similarity vs. shared derived character states –Homoplasy –Maximum.
“Species Trees”. What is the “species tree?” The true tree (when there is one) The population tree The dominant history ????
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Molecular phylogenetics
Speciation history inferred from gene trees L. Lacey Knowles Department of Ecology and Evolutionary Biology University of Michigan, Ann Arbor MI
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.
Introduction to Phylogenetic Trees
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
Calculating branch lengths from distances. ABC A B C----- a b c.
The bootstrap, consenus-trees, and super-trees Phylogenetics Workhop, August 2006 Barbara Holland.
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Gene tree discordance and multi-species coalescent models Noah Rosenberg December 21, 2007 James Degnan Randa Tao David Bryant Mike DeGiorgio.
Estimating Species Tree from Gene Trees by Minimizing Duplications
26.1 Organisms Evolve Through Genetic Change Occurring Within Populations. “Nothing in Biology makes sense except in the light of Evolution” –Theodosius.
Inference rules for supernetwork construction Katharina Huber, School of Computing Sciences, University of East Anglia.
Phylogeny Ch. 7 & 8.
Understanding sets of trees CS 394C September 10, 2009.
The 2-state symmetric Markov model 1. Another way to think about it as a ‘random cluster’ model Cut each edge e independently with probability 2p e This.
Phylogeny & Systematics
Classification and Phylogenetic Relationships
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
SupreFine, a new supertree method Shel Swenson September 17th 2009.
The Big Issues in Phylogenetic Reconstruction Randy Linder Integrative Biology, University of Texas
Why use phylogenetic networks?
CS 598 AGB Supertrees Tandy Warnow. Today’s Material Supertree construction: given set of trees on subsets of S (the full set of taxa), construct tree.
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
CS 395T: Computational phylogenetics January 18, 2006 Tandy Warnow.
Statistical stuff: models, methods, and performance issues CS 394C September 3, 2009.
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
Building Phylogenies. Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot.
Reconstructing and Using Phylogenies 16. Concept 16.1 All of Life Is Connected through Its Evolutionary History All of life is related through a common.
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
Lecture 19 – Species Tree Estimation
Evolutionary genomics can now be applied beyond ‘model’ organisms
Phylogenetic basis of systematics
394C, Spring 2012 Jan 23, 2012 Tandy Warnow.
Distance based phylogenetics
Methods of molecular phylogeny
Summary and Recommendations
CS 581 Tandy Warnow.
Algorithms for Inferring the Tree of Life
Unit Genomic sequencing
Summary and Recommendations
Imputing Supertrees and Supernetworks from Quartets
Presentation transcript:

1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and Evolution Biomathematics Research Centre University of Canterbury, Christchurch, New Zealand

2 Where are phylogenetic trees used? Evolutionary biology – species relationships, dating divergences, speciation processes, molecular evolution. Ecology – classifying new species; biodiversity, co-phylogeny, migration of populations. Epidemiology – systematics, processes, dynamics Extras - linguistics, stematology, psychology.

3 Phylogenetic trees [Definition] A phylogenetic X-tree is a tree T=(V,E) with a set X of labelled leaves, and all other vertices unlabelled and of degree >3. If all non-leaf vertices have degree 3 then T is binary

4 Trees and splits Partial order: Buneman’s Theorem

5 Quartet trees A quartet tree is a binary phylogenetic tree on 4 leaves (say, x,y,w,z) written xy|wz. A phylogenetic X-tree displays xy|wz if there is an edge in T whose deletion separates {x,y} from {w,z} x y w z r y z u x s w

6 Corresponding notions for rooted trees Clusters (in place of splits) Triples in place of quartets

7 How are trees useful in epidemiology? Systematics and reconstruction How are different types/strains of a virus related? When, where, and how did they arise? What is their likely future evolution? What was the ancestral sequence?

8 How are trees useful in epidemiology? Processes and dynamics (“Phylodynamics”) How do viruses change with time in a population? Population size etc What is their rate of mutation, recombination, selection? Within-host dynamcs  How do viruses evolve in a single patient?  How is this related to the progression of the disease?  How much compartmental variation exists?

10 What do the shapes of these trees tell us about the processes governing their evolution? Eg. Population dynamics, selection Coalescent prediction

11 abcde Tree shapes (non-metric) George Yule

13 Why do trees on the same taxa disagree? 1. Model violation 1. “true model” differs from “assumed model” 2. “true model = assumed model” but estimation method not appropriate to model 3. model true but too parameter rich (non-identifyability) 2. Sampling error (and factors that make it worse!) 3. Alignment error 4. Evolutionary processes 1. Lineage sorting 2. Recombination 3. Horizontal gene transfer; hybrid taxa 4. Gene duplication and loss

14 Sampling error that’s hard to deal with ? T4T4  T3T3 T2T2 T1T1 Time

15 Example: Deep divergence in the Metazoan phylogeny From Huson and Bryant, 2006

16 Models vs Finite state Markov process

17 Models vs “site saturation” subdividing long edges only offers a partial remedy (trade-off).

18 Why do trees on the same taxa disagree? 1. Model violation 1. “true model” differs from “assumed model” 2. “true model = assumed model” but estimation method not appropriate to model 3. model true but too parameter rich (non-identifyability) 2. Sampling error (and factors that make it worse!) 3. Alignment 4. Evolutionary processes 1. Lineage sorting 2. Recombination 3. Horizontal gene transfer; hybrid taxa 4. Gene duplication and loss

19 Gene trees vs species trees Theorem J. H. Degnan and N.A. Rosenberg, For n>5, for any tree, there are branch lengths and population sizes for which the most likely gene tree is different from the species tree. Discordance of species trees with their most likely gene trees. PLoS Genetics, 2(5), e68 May, 2006 a b c

20 Example Orangutan GorillaChimpanzee Human Adapted From the Tree of the Life Website, University of Arizona ?

21 Distinguishing between signals Lineage sorting vs sampling error vs HGT A B C A C B

22 Why do trees on the same taxa disagree? 1. Model violation 1. “true model” differs from “assumed model” 2. “true model = assumed model” but estimation method not appropriate to model 3. model true but too parameter rich (non-identifyability) 2. Sampling error (and factors that make it worse!) 3. Alignment 4. Evolutionary processes 1. Lineage sorting 2. Recombination 3. Horizontal gene transfer; hybrid taxa 4. Gene duplication and loss

23 Given a tree what questions might we want to answer? How reliable is a split? Where is the root of the tree? Relative ranking of vertices? Dating? How well supported is some ‘deep divergence’ resolved? What model best describes the evolution of the sequences (molecular clock? dS/dN ratio constant? etc) Statistical approaches: Non-parametric bootstrap Parametric bootstrap Likelihood ratio tests Bayesian posterior probabilities Tests (KH, SH, SOWH) Goldman, N., J. P. Anderson, and A. G. Rodrigo Likelihood-based tests of topologies in phylogenetics. Systematic Biology 49:

24 From Steve Thompson, Florida State Uni

25 Example

26 Non-parametric bootstrap

27

28 Dealing with incompatibility: Consensus trees Strict Majority rule Semistrict consensus

29 Consensus networks Take the splits that are in at least x% of the trees and represent them by a graph Splits Graph (G(  )) – Dress and Huson Each split is represented by a class of ‘parallel’ edges Simplest example (n=4).

30 (NS) (SS) (NS) (SS) (NS) (SS) chloroplast J SA tree (A) (NS) (SS) (N,NS) (A) (SS) (NS) (C,S) (NS) (N) (NS, N)

31 nuclear ITS tree (SS) (NS) (SS) (NS) (SS) (NS) (SS) (NS) (SS) (NS,N) (NS) (SS) (NS,N) (A) (N) (SS,NS)

32 consensus network (ITStree+JSAtree) I III II R.nivicola

33 Maximum agreement subtrees Concept Computational complexity

34 Comparing trees Splits metric (Robinson-Foulds) Statistical aspects. Tree rearrangement operations – the graph of trees (rSPR). Cophylogeny

35 Co-phylogeny (m. charleston)

36 Supertrees Compatibility concept Compatibility of rooted trees (BUILD) Why do we want to do this? Extension – higher order taxa, dates Methods for handling incompatible trees (MRP; mincut variants; minflip)

37 Compatibility Example: Q={12|34, 13|45, 14|26 } A set Q of quartets is compatible if there is a phylogenetic X-tree T that displays each quartet of Q Complexity?

38 Supertrees Compatibility concept Compatibility of rooted trees (BUILD) Why do we want to do this? Extension – higher order taxa, dates Methods for handling incompatible trees (MRP; mincut variants; minflip)

39 Phylogenetic networks Consensus setting: consensus networks Minimizing hybrid/reticulate vertices Supernetworks – Z closure, filtering

40 a Networks can represent: Reticulate evolution (eg. hybrid species) Phylogenetic uncertainty (i.e. possible alternative trees) Z-closure Given T 1,…, T k on overlapping sets of species, let construct spcl 2 (  ) and construct the ‘splits graph’ of the resulting splits that are ‘full’. c b d a b c d a c b d

41 A2A2A2A2 B2B2B2B2 Split closure operation (Meacham 1986) A1A1A1A1 B1B1B1B1 A2A2A2A2 B2B2B2B2 A1A1A1A1 B1UB2B1UB2B1UB2B1UB2 A1UA2A1UA2A1UA2A1UA2 B2B2B2B2, A1A1A1A1 B1B1B1B1

42

43

44 Reconstructing ancestral sequences Methods (MP, Likelihood, Bayesian) Quiz. MP for a balanced tree = majority state? Information-theoretic considerations

45 Statistics of parsimony (clustering on a tree)