Why Models of Sequence Evolution Matter

Slides:



Advertisements
Similar presentations
Introduction to Molecular Evolution
Advertisements

Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Phylogenetic Trees Lecture 4
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
Maximum Parsimony.
Probabilistic methods for phylogenetic trees (Part 2)
Lecture 13 – Performance of Methods Folks often use the term “reliability” without a very clear definition of what it is. Methods of assessing performance.
Chapter 2 Opener How do we classify organisms?. Figure 2.1 Tracing the path of evolution to Homo sapiens from the universal ancestor of all life.
Building Phylogenies Parsimony 2.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic trees Sushmita Roy BMI/CS 576
. Phylogenetic Trees Lecture 13 This class consists of parts of Prof Joe Felsenstein’s lectures 4 and 5 taken from:
Lecture 8 – Searching Tree Space. The Search Tree.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
Molecular phylogenetics
Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy.
Tree Inference Methods
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Fixations along phylogenetic lineages. Phylogenetic reconstruction: a simplification of the evolutionary process.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
Lecture 2: Principles of Phylogenetics
MOLECULAR PHYLOGENETICS Four main families of molecular phylogenetic methods :  Parsimony  Distance methods  Maximum likelihood methods  Bayesian methods.
Calculating branch lengths from distances. ABC A B C----- a b c.
Lecture 10 – Models of DNA Sequence Evolution Correct for multiple substitutions in calculating pairwise genetic distances. Derive transformation probabilities.
Lecture 16 – Molecular Clocks Up until recently, studies such as this one relied on sequence evolution to behave in a clock-like fashion, with a uniform.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
Lecture 15: Reconstruction of Phylogeny Adaptive characters: 1.May indicate derived character (special adaptation) e.g. Raptorial forelegs in mantids 2.May.
Mareike Fischer How many characters are needed to reconstruct the true tree? Mareike Fischer and Mike Steel Future Directions in Phylogenetic Methods and.
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Maximum Parsimony Phenetic (distance based) methods are fast and often accurate but discard data and are not based on explicit character states at each.
Chapter AGB. Today’s material Maximum Parsimony Fixed tree versions (solvable in polynomial time using dynamic programming) Optimal tree search.
Lecture 19 – Species Tree Estimation
From: On the Origin of Darwin's Finches
Phylogenetic basis of systematics
394C, Spring 2012 Jan 23, 2012 Tandy Warnow.
Lecture 16 – Molecular Clocks
Lecture 6B – Optimality Criteria: ML & ME
Inferring a phylogeny is an estimation procedure.
Linkage and Linkage Disequilibrium
Maximum likelihood (ML) method
Multiple Alignment and Phylogenetic Trees
Models of Sequence Evolution
Goals of Phylogenetic Analysis
Patterns in Evolution I. Phylogenetic
BNFO 602 Phylogenetics Usman Roshan.
CS 581 Tandy Warnow.
Lecture 6B – Optimality Criteria: ML & ME
Coupling Genetic and Ecological-Niche Models to Examine How Past Population Distributions Contribute to Divergence  L. Lacey Knowles, Bryan C. Carstens,
The Most General Markov Substitution Model on an Unrooted Tree
Lecture 8 – Searching Tree Space
Lecture 7 – Algorithmic Approaches
CS 394C: Computational Biology Algorithms
But what if there is a large amount of homoplasy in the data?
Presentation transcript:

Why Models of Sequence Evolution Matter Differences accumulate linearly with time for only a very shot time after two taxa diverge 2 pairs taxa that have different divergence times, may have a similar number of differences between them – saturation. Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy for time since divergence between the two taxa.

Models of sequence evolution expect multiple hits. 3 So even though there have been 4 substitutions, when we compare these two lineages, we only can detect 3 differences. Models of sequence evolution expect multiple hits.

Branch-length Information Let’s assume a true tree. A ATCGAGCAGCCTGGGAGAGAGACTTATTTGACAAACGTAA B ATTGGGGAGTAGCGTAAACACTCTTATTTGACGAAATTAT C ATCGTGGGTTAGAGTAGAGACTCTCATTTGACGAAATTAT D AACGTGGCGAATAGTAGTCAAAAAATGTGTACCAGATTAC This tree is 37 steps, and the true tree is 38 steps. Increase # replicates – keeps happening. Increase # bp – happens with certainty.

Branch-length Information Now let’s subject the sequences to an ME search. First, we need to convert the character by taxon matrix to a matrix of pairwise distances: above diagonal are p-distances, below are JC distances. A B C D A ------- 0.400 0.400 0.575 B 0.572 -------- 0.200 0.525 C 0.572 0.232 ------- 0.475 D 1.091 0.903 0.752 -------

Branch-length Information Optimum tree is the true tree. We’re getting pretty lousy estimates of branch lengths – under these conditions, branch-length estimates would converge on true values with more data. So in the simulation, the methods that are agnostic with respect to multiple hits (MP and ME using p-distances) incorrectly unite the long-branch taxa (A & D). ML can avoid long-branch attraction

Long-branch attraction 1) The long-branch taxa could have A 2 & 3) One of the long-branch taxa has a substitution to nucleotide X ( = G, C, or T) 1 2 1 2 Let’s assume that there is an A in both the short-branch taxa. There are four possibilities for states at the other two terminals.

Long-branch attraction 1 2 If X1 ≠ X2 , the site is uninformative. If X1 = X2 , the site is misleading. 4) There could be substitutions to X1 and X2 along both long branches. X1 X2 C C C G C T X1 X2 G C G G G T X1 X2 T C T G T T So 1/3 of all possibilities results in a convergence.

The Importance of Branch Lengths Fitch Optimization :0 {A,C,G} :1 Large # of terminals with A, this is a slowly evolving site. C at node 2, transversion to A along short branch a, no change along b and change to G along g. G at node 2, transition to A along short branch a, no change along g, and change to C along b. A at node 2, no change along short branch a, a change to C along b, and change to G along g.

The Importance of Branch Lengths {A,C,G} ML can voice a preference here, where parsimony can’t. This is because ML accounts for branch lengths in calculating reconstruction probabilities. No change along a short branch and changes along both long branches is more likely than a change along the short branch coupled with no change along one of the long branches. All reconstructions are permitted and accounted for, but the reconstruction with A at node 2 (& node 1) contributes the most to the single-site likelihood.