But what if there is a large amount of homoplasy in the data?

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
An Introduction to Phylogenetic Methods
Introduction to Phylogenies
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Phylogenetic Trees Lecture 4
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
1) Taxonomy: Classification and naming of organisms a.Hierarchical nomenclature with taxonomic categories (kingdom, phylum, class, order, family, genus,
Phylogenetic reconstruction
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Review of cladistic technique Shared derived (apomorphic) traits are useful in understanding evolutionary relationships Shared primitive (plesiomorphic)
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Tree Evaluation Tree Evaluation. Tree Evaluation A question often asked of a data set is whether it contains ‘significant cladistic structure’, that is.
Molecular Clocks, Base Substitutions, & Phylogenetic Distances.
Probabilistic methods for phylogenetic trees (Part 2)
Chapter 2 Opener How do we classify organisms?. Figure 2.1 Tracing the path of evolution to Homo sapiens from the universal ancestor of all life.
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
Topic : Phylogenetic Reconstruction I. Systematics = Science of biological diversity. Systematics uses taxonomy to reflect phylogeny (evolutionary history).
Phylogenetic trees Sushmita Roy BMI/CS 576
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
Molecular phylogenetics
Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Tree Inference Methods
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
BINF6201/8201 Molecular phylogenetic methods
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
A brief introduction to phylogenetics
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Parsimony is Computationally Intensive
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Phylogeny Ch. 7 & 8.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Lecture 15: Reconstruction of Phylogeny Adaptive characters: 1.May indicate derived character (special adaptation) e.g. Raptorial forelegs in mantids 2.May.
Section 2: Modern Systematics
Introduction to Bioinformatics Resources for DNA Barcoding
Evolutionary genomics can now be applied beyond ‘model’ organisms
Phylogenetic basis of systematics
Inferring a phylogeny is an estimation procedure.
Maximum likelihood (ML) method
Systematics: Tree of Life
Section 2: Modern Systematics
Phylogenetic Inference
Multiple Alignment and Phylogenetic Trees
Models of Sequence Evolution
Goals of Phylogenetic Analysis
Patterns in Evolution I. Phylogenetic
Molecular Clocks Rose Hoberman.
Molecular Evolution.
Parsimony is Computationally Intensive
Why Models of Sequence Evolution Matter
Chapter 19 Molecular Phylogenetics
Molecular data assisted morphological analyses
Presentation transcript:

But what if there is a large amount of homoplasy in the data? Sequence data may have multiple substitutions due to fast evolving gene or long internal branches. Use a model of evolution to correct for different rates of substitutions or unequal base frequencies. Maximum-likelihood phylogenetic analysis

Multiple changes at a single site - hidden changes Seq 1 AGCGAG Seq 2 GCGGAC A C Number of changes A C A G T 1 2 3 Seq 1 Seq 2

But what if there is a large amount of homoplasy in the data? Sequence data may have multiple substitutions due to fast evolving gene or long internal branches. Use a model of evolution to correct for different rates of substitutions or unequal base frequencies. Maximum-likelihood phylogenetic analysis

Models of sequence evolution Simplest Model= Jukes-Cantor - Assumes all substitutions are equally likely, corrects for multiple substitutions by using a natural log transformation • A bit more complicated= Kimura 2-parameter, assumes different rates for transitions and transversions, but equal base frequencies • More complicated models: HKY, (General Times Reversible (GTR) • Use a model that best fits your data

Phylogenetic Inference Using Maximum Likelihood Model of sequence evolution and the estimation of its parameters allows the placement of probabilities on different types of substitutional change Likelihood analysis focuses on the data, not the tree. It is the probability of the data given a tree and a model of evolution. L=P(DT, M)

Seq 1 ATATC Seq 2 CTAGC The Likelihood (i.e. the probability of observing the data) is a sum over all possible assignments of nucleotides to the internal nodes

Phylogenetic Inference Using Maximum Likelihood Calculate the Likelihood for each base position in the sequence and summarize across all base positions. The ML tree is the tree that produces the highest likelihood. Evaluating the branching structure of the tree, but also the branch length, using similar tree-searching strategies as used in parsimony analysis. This is important, because by using a model-based approach, mutational change is more probable along longer braches than on shorter branches. Can be extremely computationally intensive.

Seq 1 ATATC Seq 2 CTAGC The Likelihood (i.e. the probability of observing the data) is a sum over all possible assignments of nucleotides to the internal nodes

The larger the data set (both in numbers of species and characters) the longer the analysis.

Phylogenetic Inference Using Maximum Likelihood Important point about ML: The model you choose to use can have a large impact on the resulting ML tree. If you flip a coin and get a head, what is its likelihood? If it’s a 2 sided and fair coin (your model), the likelihood is 0.5 If it’s a two-headed coin (your model), the likelihood is 1.0

Phylogenetic Inference Using Distance Data Possible alternative to computationally intensive parsimony and ML searches. Distances attempt to summarize the differences between two sequences in a single measurement. Can use an uncorrected distance between taxa, based on some the observed number of nucleotide differences between the pairs of taxa. Alternatively, you can use a distance based on an evolutionary model to account for multiple substitutions between taxa.

Phylogenetic Inference Using Distance Data The simplest form of phylogeny construction using distance data is UPGMA: Unweighted Pair Group Method with Arithmetic Mean A  B  C  D  E B  2   C  4  4    D  6  6  6    E  6  6  6  4    F  8  8  8  8  8   A,B C  D  E   C  4    D  6  6    E  6  6  4   F  8  8  8  8

Phylogenetic Inference Using Distance Data UPGMA is fast, but makes assumption of molecular clock, that all taxa have roughly equal substitution rates Other distance-based methods (quite a few, in fact) that can be used that do not make this assumption: Most commonly used is Neighbor-Joining. In general, distance based methods perform OK, but not as good as methods using Likelihood

Assessing The Robustness Of Our Trees We’ve seen a number of ways to construct phylogenetic trees, but how much confidence do we place on the reconstructed relationships? We can use a number of methods to assess the robustness of particular branches in our trees Bootstrapping, Jacknifing, Decay-Index

Bootstrapping and Jackknifing Multiple new data sets are made by resampling from the original data set. Bootstrapping: Sampling done with replacement Jackknifing: Sampling done without replacement The resampled data sets are subjected to phylogenetic analysis. The proportion of times a clade appears in the trees across all replicate data sets is called its bootstrap proportion.

Taken from Baldauf, S. L. Phylogeny for the faint of heart: a tutorial Taken from Baldauf, S. L. Phylogeny for the faint of heart: a tutorial. Trends in Genetics 19:345-351.

Bootstrapping and Jackknifing Clades that receive a high bootstrap are considered to be more supported by the data than clades with a lower bootstrap. 70% or greater is good, but many phylogeneticists will only consider branches with ≥90% as being strongly supported. Jackknifing is seldom done anymore, but adheres to the same general patterns for support. Bootstrap

Bootstrapping and Jackknifing Can perform with any type of phylogenetic analysis: parsimony, ML, distance-based Important to emphasize that a bootstrap does not reveal the probability that a particular clade is true, but only how well it is supported by the particular dataset.

Decay analysis Assesses whether a clade is found in a less parsimonious tree. The difference in tree length between the overall most parsimonious tree and the most parsimonious tree that does not contain the clade of interest. 1 2 3 4 1 2 3 4 1 2 3 4 * * Tree Length=20 Tree Length=21 Tree Length=22 Decay Index for * is 22-20=2

Decay Indices: Interpretation Generally, the higher the decay index the better the relative support for a group Unlike BPs decay indices are not scaled (0-100) and it is less clear what is an acceptable decay index In general, the magnitude of decay indices and BPs generally correlated (i.e. they tend to agree) Decay Index

Phylogenetic Evaluation of Homology and Character Evolution Some salamander families are external fertilizers, some have internal fertilization. Internal fertilization has long been thought to have arisen only once in the phylogeny of salamanders. Phylogenetic analysis of morphological supports this hypothesis Internal Fertilization

Phylogenetic Evaluation of Homology and Character Evolution Phylogenetic analysis of nuclear rRNA data provides an alternative tree, suggesting that internally fertilizing salamanders are not a monophyletic group. Parsimony analysis weakly supports this hypothesis.

Maximum-Likelihood Analysis of Molecular Data Strongly Support Non-monophyly of Internal Fertilization ML rRNA ML mtDNA I I I I I I E E E E I I E I E I I I I I

Phylogenetic Evaluation of Homology and Character Evolution Molecular Data sets can produce trees that are different from morphological data. Independent molecular data setsindicate that internally fertilizing salamanders do not form a monophyletic group. Multiple reversals from internal to external fertilization. Phylogenetic analysis of new data sets can give new perspectives on the evolution of characters that were not previously available.

Molecular Clocks Mutation rate in genes is relatively constant across species. Based on assumptions of the neutral theory-the majority of mutations that occur are selectively neutral. Should accumulate at a rate equal to the mutation rate. To functionally apply a molecular clock, you assume that mutations rates will not vary across species. This typically limits the application of a molecular clock to a set of closely related species.

} Molecular Clocks R= 2%/1MY 1) Construct A Tree 2) Date a Node in the Tree Outgroup Outgroup Species 1 Species 1 Species 2 Species 2 Species 3 Species 3 Species 4 Species 4 You know that the most recent possible divergence between 3 and 4 is at least 1 MY Fossil for Species 4 ~1 MY 3) Calculate Divergence 4) Calculate a Rate Species 3 } 2% Sequence Divergence R= 2%/1MY Species 4

Molecular Clocks 5MY 2MY 1MY 5) Apply Rate to Other Nodes in Tree Outgroup Species 1 Species 2 Species 3 5MY Species 4 2MY 1MY Best applied when dates available for multiple nodes. Can utilize solid geological information as well as fossil information. Must be aware of possible non-clock behavior of genes.

Phylogeny of North American Black Basses Near et al., 2003. Evolution 57:1610–1621. Previous hypothesis that speciation within the genus Micropterus occurred during the Pleistocene. Micropterus has a very good fossil record. Calibration of a molecular clock and calculation of divergence times among species reveals that most species diverged well before the Pleistocene

Phylogenetics and Historical Biogeography Uses phylogenetic trees to try and information from geography, geology and environment to try and understand how organisms have gotten to where they are today. Example from European and Turkish salamanders

} ~20MY Genetic divergence correlates with the geological divergence between Europe and Turkey during the Miocene x x x x