Processing & Testing Phylogenetic Trees. Rooting.

Slides:



Advertisements
Similar presentations
Bootstrapping (non-parametric)
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
An Introduction to Phylogenetic Methods
Estimating the reliability of a tree Reconstructed phylogenetic trees are almost certainly wrong. They are estimates of the true tree. But how reliable.
Introduction to Phylogenies
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
BIO2093 – Phylogenetics Darren Soanes Phylogeny I.
Plant Molecular Systematics (Phylogenetics). Systematics classifies species based on similarity of traits and possible mechanisms of evolution, a change.
Phylogenetic reconstruction
Phylogenetic trees Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Chapter 2.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
“Inferring Phylogenies” Joseph Felsenstein Excellent reference
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
Tree Evaluation Tree Evaluation. Tree Evaluation A question often asked of a data set is whether it contains ‘significant cladistic structure’, that is.
Bell Work Dogs of a certain breed can have black fur or white fur. Black fur is dominant, but the breeder only wants puppies with white fur. Cross two.
Lecture 13 – Performance of Methods Folks often use the term “reliability” without a very clear definition of what it is. Methods of assessing performance.
Chapter 2 Opener How do we classify organisms?. Figure 2.1 Tracing the path of evolution to Homo sapiens from the universal ancestor of all life.
Classification and Phylogenies Taxonomic categories and taxa Inferring phylogenies –The similarity vs. shared derived character states –Homoplasy –Maximum.
Phylogeny and the Tree of Life
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Terminology of phylogenetic trees
Molecular phylogenetics
Pinpointing Uncertainty. Comparing competing phylogenetic hypotheses - tests of two (or more) trees Particularly useful techniques are those designed.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Tree Confidence Have we got the true tree? Use known phylogenies Unfortunately, very rare Hillis et al. (1992) created experimental phylogenies using phage.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
GENE 3000 Fall 2013 slides More geologists agree that the age of the Earth is ~4.5 billion years old geneticists have independent data suggesting.
Warm-Up 1.Contrast adaptive radiation vs. convergent evolution? Give an example of each. 2.What is the correct sequence from the most comprehensive to.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogeny GENE why is coalescent theory important for understanding phylogenetics (species trees)? coalescent theory lets us test our assumptions.
A brief introduction to phylogenetics
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Processing & Testing Phylogenetic Trees. Rooting.
Phylogenies Reconstructing the Past. The field of systematics Studies –the mechanisms of evolution evolutionary agents –the process of evolution speciation.
PHYLOGENY AND THE TREE OF LIFE CH 26. I. Phylogenies show evolutionary relationships A. Binomial nomenclature: – Genus + species name Homo sapiens.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
Warm-Up In a population of 500 rabbits, 320 are homozygous dominant for brown coat color (BB), 160 are heterozygous (Bb), and 20 are homozygous white.
Classification Biology I. Lesson Objectives Compare Aristotle’s and Linnaeus’s methods of classifying organisms. Explain how to write a scientific name.
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
Phylogeny & Systematics The study of the diversity and relationships among organisms.
PCB 3043L - General Ecology Data Analysis Organizing an ecological study What is the aim of the study? What is the main question being asked? What are.
Phylogenetic comparative methods Comparative studies (nuisance) Evolutionary studies (objective) Community ecology (lack of alternatives)
Phylogeny and the Tree of Life
Phylogenetic Inference
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Summary and Recommendations
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Assessing Phylogenetic Hypotheses and Phylogenetic Data
Assessing Phylogenetic Hypotheses and Phylogenetic Data
Phylogeny and the Tree of Life
Chapter 19 Molecular Phylogenetics
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Phylogeny and the Tree of Life
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Phylogeny and the Tree of Life
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Summary and Recommendations
But what if there is a large amount of homoplasy in the data?
Presentation transcript:

Processing & Testing Phylogenetic Trees

Rooting

Rooting Outgroup Rooting 1. Outgroup Rooting: Based on external Information. Midpoint Rooting 2. Midpoint Rooting: Direct a posteriori use of the ultrametricity assumption. Largest-Genetic-Variability-Group Rooting 3. Largest-Genetic-Variability-Group Rooting: Indirect a posteriori use of the ultrametricity assumption.

Rooting with outgroup plant fungus animal Unrooted tree Are fungi relatives of animals or plants?

Rooting with outgroup plant fungus animal Unrooted tree Add an outgroup, e.g., a bacterium.

Rooted tree Rooting with outgroup plant fungus animal bacterium root animal fungus Unrooted tree plant Monophyletic group Monophyletic group bacterial outgroup

Midpoint rooting

Largest variation = Most ancient

Species Divergence Times If we know T 1 and the rate of evolution, then we can infer T 2. If we know T 2 and the rate of evolution, then we can infer T 1.

If T 1 is known

If T 2 is known

Dating divergence events requires paleontological calibrations. This is a complicated problem.

Topological comparisons Topological comparisons entail measuring the similarity or dissimilarity among tree topologies. The need to compare topologies may arise when dealing with trees that have been inferred from analyses of different sets of data or from different types of analysis of the same data set. When two trees derived from different data sets or different methodologies are identical, they are said to be congruent. Congruence can sometimes be partial, i.e., limited to some parts of the trees, other parts being incongruent.

Penny and Hendy's topological distance (d T ) A commonly used measure of dissimilarity between two tree topologies. The measure is based on tree partitioning. d T = 2c c = the number of partitions resulting in different divisions of the OTUs in the two tree topologies under consideration.

Trees inferred from the analysis of a particular data set are called fundamental trees, i.e., they summarize the phylogenetic information in a data set. Consensus trees are trees that summarize the phylogenetic information in a set of fundamental trees.

strict consensus tree In a strict consensus tree, all conflicting branching patterns are collapsed into multifurcations. majority-rule consensus trees In a X% majority-rule consensus trees, a branching pattern that occurs with a frequency of X% or more is adopted. When X = 100%, the majority-rule consensus tree will be identical with the strict consensus tree.

A tree is an evolutionary hypothesis

Q: How can we ascertain that the methodology we have used yields reliable results? A: We can test the methodology on a phylogeny that is known for certain to be true, and compare the inferred phylogeny with the true phylogeny.

Caminalcules are a group of artificial organisms (belonging to the genus Caminalculus) that were invented by Dr. Joseph H. Camin from the University of Kansas. Interested in how taxonomists group species, he designed these creatures to show an evolutionary pattern of divergence and diversification in morphology. There are 29 recent “species” of Caminalculus and 48 fossil forms. The Caminalcules first appeared in print in the journal Systematic Zoology (now Systematic Biology) in 1983, four years after Camin's death in The first four papers on Caminalcules were written by Robert R. Sokal. Joseph H. Camin (1922–1979)

Extant Extinct

Assessing tree reliability Phylogenetic reconstruction is a problem of statistical inference. One must assess the reliability of the inferred phylogeny and its component parts. Questions: (1) how reliable is the tree? (2) which parts of the tree are reliable? (3) is this tree significantly better than another one?

Bootstrapping A statistical technique that uses intensive random resampling of data to estimate a statistic whose underlying distribution is unknown.A statistical technique that uses intensive random resampling of data to estimate a statistic whose underlying distribution is unknown.

Characters are resampled with replacement to create many bootstrap replicate data sets (pseudosamples)Characters are resampled with replacement to create many bootstrap replicate data sets (pseudosamples) Each bootstrap replicate data set is analyzedEach bootstrap replicate data set is analyzed Frequency of occurrence of a group (bootstrap proportions) is a measure of support for the groupFrequency of occurrence of a group (bootstrap proportions) is a measure of support for the group Bootstrapping

Bootstrapping - an example Ciliate SSUrDNA - parsimony bootstrap Freq ** ** ** **** ****** ** ****.* ***** ******* **....* **.....* 1.00 Partition Table Ochromonas (1) Symbiodinium (2) Prorocentrum (3) Euplotes (8) Tetrahymena (9) Loxodes (4) Tracheloraphis (5) Spirostomum (6) Gruberia (7)

Reduction of a phylogenetic tree by the collapsing of internal branches associated with bootstrap values that are lower than a critical value (C). (a) Gene tree for  -tubulin (b) C = 50% (c) C = 90%

All these tests use the null hypothesis that the differences between two trees (A and B) are no greater than expected by chance (from the sampling error).All these tests use the null hypothesis that the differences between two trees (A and B) are no greater than expected by chance (from the sampling error). Tests for two competing trees

Likelihood Ratio Test Likelihood of Hypothesis 1 = L 1Likelihood of Hypothesis 1 = L 1 Likelihood of Hypothesis 2 = L 2Likelihood of Hypothesis 2 = L 2  = 2(ln L 1 – ln L 2 )  = 2(ln L 1 – ln L 2 ) Compare  to  2 distribution or to a simulated distribution.Compare  to  2 distribution or to a simulated distribution.

Reliability of Phylogenetic Methods Phylogenetic methods can also be evaluated in terms of their general performance, particularly their:Phylogenetic methods can also be evaluated in terms of their general performance, particularly their: consistency - approach the truth with more data efficiency - how quickly can they handle how much data robustness - how sensitive to violations of assumptions

Problems with long branches With long branches most methods may yield erroneous trees. For example, the maximum-parsimony method tends to cluster long branches together. This phenomenon is called long-branch attraction or the Felsenstein zone