Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy.

Slides:



Advertisements
Similar presentations
Introduction to Molecular Evolution
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Phylogenetic Trees Lecture 4
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
Maximum Parsimony.
Probabilistic methods for phylogenetic trees (Part 2)
Lecture 13 – Performance of Methods Folks often use the term “reliability” without a very clear definition of what it is. Methods of assessing performance.
Chapter 2 Opener How do we classify organisms?. Figure 2.1 Tracing the path of evolution to Homo sapiens from the universal ancestor of all life.
Building Phylogenies Parsimony 2.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic trees Sushmita Roy BMI/CS 576
. Phylogenetic Trees Lecture 13 This class consists of parts of Prof Joe Felsenstein’s lectures 4 and 5 taken from:
Lecture 8 – Searching Tree Space. The Search Tree.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
Molecular phylogenetics
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
Tree Inference Methods
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Day 8,9 Carlow Bioinformatics Phylogenetic inferences Trees.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Fixations along phylogenetic lineages. Phylogenetic reconstruction: a simplification of the evolutionary process.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
Lecture 2: Principles of Phylogenetics
Introduction to Phylogenetics
MOLECULAR PHYLOGENETICS Four main families of molecular phylogenetic methods :  Parsimony  Distance methods  Maximum likelihood methods  Bayesian methods.
Calculating branch lengths from distances. ABC A B C----- a b c.
Lecture 10 – Models of DNA Sequence Evolution Correct for multiple substitutions in calculating pairwise genetic distances. Derive transformation probabilities.
Phylogeny and Genome Biology Andrew Jackson Wellcome Trust Sanger Institute Changes: Type program name to start Always Cd to phyml directory before starting.
Lecture 16 – Molecular Clocks Up until recently, studies such as this one relied on sequence evolution to behave in a clock-like fashion, with a uniform.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogeny Ch. 7 & 8.
Tree Searching Methods Exhaustive search (exact) Branch-and-bound search (exact) Heuristic search methods (approximate) –Stepwise addition –Branch swapping.
Phylogenetics.
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
Lecture 15: Reconstruction of Phylogeny Adaptive characters: 1.May indicate derived character (special adaptation) e.g. Raptorial forelegs in mantids 2.May.
Mareike Fischer How many characters are needed to reconstruct the true tree? Mareike Fischer and Mike Steel Future Directions in Phylogenetic Methods and.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Maximum Parsimony Phenetic (distance based) methods are fast and often accurate but discard data and are not based on explicit character states at each.
Chapter AGB. Today’s material Maximum Parsimony Fixed tree versions (solvable in polynomial time using dynamic programming) Optimal tree search.
Lecture 16 – Molecular Clocks
Lecture 6B – Optimality Criteria: ML & ME
Inferring a phylogeny is an estimation procedure.
Linkage and Linkage Disequilibrium
Maximum likelihood (ML) method
Multiple Alignment and Phylogenetic Trees
Patterns in Evolution I. Phylogenetic
BNFO 602 Phylogenetics Usman Roshan.
CS 581 Tandy Warnow.
Why Models of Sequence Evolution Matter
Lecture 6B – Optimality Criteria: ML & ME
Lecture 8 – Searching Tree Space
Lecture 7 – Algorithmic Approaches
CS 394C: Computational Biology Algorithms
But what if there is a large amount of homoplasy in the data?
Presentation transcript:

Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy for time since divergence between the two taxa. Differences accumulate linearly with time for only a very shot time after two taxa diverge 2 pairs taxa that have different divergence times, may have a similar number of differences between them – saturation.

Multiple Hits. 3 So even though there have been 4 substitutions, when we compare these two lineages, we only can detect 3 differences. Models of sequence evolution expect multiple hits.

Branch-length Information Let’s assume a true tree. A ATCGAGCAGCCTGGGAGAGAGACTTATTTGACAAACGT AA B ATTGGGGAGTAGCGTAAACACTCTTATTTGACGAAATTA T C ATCGTGGGTTAGAGTAGAGACTCTCATTTGACGAAATTA T D AACGTGGCGAATAGTAGTCAAAAAATGTGTACCAGATTA C Increase # replicates – keeps happening. Increase # bp – happens with certainty. This tree is 37 steps, and the true tree is 38 steps.

Branch-length Information Now let’s subject the sequences to an ME search. First, we need to convert the character by taxon matrix to a matrix of pairwise distances: A B C D A B C D

Branch-length Information So in the simulation, the methods that are agnostic with respect to multiple hits (MP and ME using p-distances) incorrectly unite the long-branch taxa (A & D). ML can avoid long-branch attraction Optimum tree is the true tree. We’re getting pretty lousy estimates of branch lengths – under these conditions, branch-length estimates would converge on true values with more data.

Long-branch attraction Let’s assume that there is an A in both the short-branch taxa. There are four possibilities for states at the other two terminals. 1) The long-branch taxa could have A 2 & 3) One of the long-branch taxa has a substitution to nucleotide X ( = G, C, or T) 12 4) There could be substitutions to X 1 and X 2 along both long branches. If X 1 ≠ X 2, the site is uninformative. If X 1 = X 2, the site is misleading. 1 2

The Importance of Branch Lengths {A} {A,C,G} Large # of terminals with A, this is a slowly evolving site. C at node 2, transversion to A along short branch , no change along  and change to G along . G at node 2, transition to A along short branch , no change along  and change to C along . A at node 2, no change along short branch , a change to C along  and change to G along . Fitch Optimization

The Importance of Branch Lengths {A} {A,C,G} ML can voice a preference here, where parsimony can’t. This is because ML accounts for branch lengths in calculating reconstruction probabilities. No change along a short branch and changes along both long branches is more likely than a change along the short branch coupled with no change along one of the long branches. All reconstructions are permitted and accounted for, but the reconstruction with A at node 2 (& node 1) contributes the most to the single-site likelihood.