An Equivalence of Maximum Parsimony and Maximum Likelihood revisited

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

An introduction to maximum parsimony and compatibility
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Congestion Games with Player- Specific Payoff Functions Igal Milchtaich, Department of Mathematics, The Hebrew University of Jerusalem, 1993 Presentation.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Phylogenetic Trees Lecture 4
Amplicon-Based Quasipecies Assembly Using Next Generation Sequencing Nick Mancuso Bassam Tork Computer Science Department Georgia State University.
Tree Reconstruction.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
Realistic evolutionary models Marjolijn Elsinga & Lars Hemel.
Maximum Parsimony.
Phylogeny Tree Reconstruction
Incorporating Mutations
CISC667, F05, Lec16, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (III) Probabilistic methods.
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Perfect Phylogeny MLE for Phylogeny Lecture 14
. Phylogenetic Trees Lecture 13 This class consists of parts of Prof Joe Felsenstein’s lectures 4 and 5 taken from:
Terminology of phylogenetic trees
Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy.
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Comp. Genomics Recitation 8 Phylogeny. Outline Phylogeny: Distance based Probabilistic Parsimony.
Counting nCr = n!/r!(n-r)!=nC(n-r) This equation reflects the fact that selecting r items is same as selecting n-r items in forming a combination from.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Phylogeny Ch. 7 & 8.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Lecture 15: Reconstruction of Phylogeny Adaptive characters: 1.May indicate derived character (special adaptation) e.g. Raptorial forelegs in mantids 2.May.
1 Covering Non-uniform Hypergraphs Endre Boros Yair Caro Zoltán Füredi Raphael Yuster.
Mareike Fischer How many characters are needed to reconstruct the true tree? Mareike Fischer and Mike Steel Future Directions in Phylogenetic Methods and.
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Mareike Fischer Revisiting the question: How many characters are needed to reconstruct the true tree? Mareike Fischer and Marta Casanellas Isaac Newton.
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
Building Phylogenies. Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot.
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
Chapter AGB. Today’s material Maximum Parsimony Fixed tree versions (solvable in polynomial time using dynamic programming) Optimal tree search.
by d. gusfield v. bansal v. bafna y. song presented by vikas taliwal
Phylogenetic basis of systematics
Markov Chains and Random Walks
Distance based phylogenetics
Lecture 6B – Optimality Criteria: ML & ME
Maximum likelihood (ML) method
Chapter 5. Optimal Matchings
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Multiple Alignment and Phylogenetic Trees
Goals of Phylogenetic Analysis
Recitation 5 2/4/09 ML in Phylogeny
The Tree of Life From Ernst Haeckel, 1891.
Inferring phylogenetic trees: Distance and maximum likelihood methods
CS 581 Tandy Warnow.
CS 581 Tandy Warnow.
Lecture 6B – Optimality Criteria: ML & ME
BNFO 602 Phylogenetics – maximum likelihood
BNFO 602 Phylogenetics Usman Roshan.
Chapter 6 Network Flow Models.
Phylogeny.
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Analysis of Algorithms CS 477/677
Presentation transcript:

An Equivalence of Maximum Parsimony and Maximum Likelihood revisited MIEP, 10 – 12 June 08, Montpellier An Equivalence of Maximum Parsimony and Maximum Likelihood revisited Mareike Fischer and Bhalchandra Thatte Mareike Fischer

The Problem Growing amount of DNA data  stochastic models and methods needed for analysis! MP and ML are two of the most frequently discussed methods. MP and ML can perform differently (e.g. in the so-called ‘Felsenstein Zone’) But: When are MP and ML equivalent?  Approach by Tuffley & Steel Mareike Fischer

The Nr-Model Given: r character states c1,…,cr ; No distinction between character states (fully symmetric model!); The probability pe of a transition on edge e is pe ≤ 1/r; Transition events on different edges are independent. Note: If r=4: Jukes-Cantor! Mareike Fischer

The Equivalence Result Tuffley and Steel (1997): MP and ML with no common mechanism are equivalent in the sense that both choose the same tree(s). Note: ‘No common mechanism’ means that the transition probabilities can vary from site to site. Mareike Fischer

Linearity of the Likelihood Function An extension gf of a character f agrees with f on the leaves, but also assigns character states to the ancestral nodes. Example: r=2, f=(c1,c1,c1,c2): c1 c2 8 different extensions! c2 c1 c1 c2 1 2 3 4 f: c1 c1 c1 c2 Mareike Fischer

Thus, P(f) is linear in each pe ! Linearity of the Likelihood Function Note that and u pe Thus, P(f) is linear in each pe ! 1 2 3 4 c1 Mareike Fischer

Maximum of the Likelihood Function Linear functions h: [0,t] kR are maximized at a corner of the box [0,t] k. Thus, we can assume wlog. that ML chooses a tree T with pe = 0 or 1/r for all edges e of T ! 1/r t t 1/r Mareike Fischer

Bound of the Likelihood Function Let k be the number of ∞-edges. As before, we have ∞ ∞ ∞ Therefore, Note that P(gf)=0 if gf requires a substitution on an edge of length 0! For N = #{gf : P(gf)≠0} ML-Tree T ∞ Note that if P(gf)≠0 , then P(gf)=(1/r) k+1 ! And thus Mareike Fischer

So, for N = #{gf : P(gf)≠0} and k = #{∞-edges}, we have: Bound for the Likelihood Function So, for N = #{gf : P(gf)≠0} and k = #{∞-edges}, we have: ∞ Wanted: Upper bound for N . ∞ ∞ Delete ∞-edges; k+1 connected components remain, M of them are labelled (i.e. contain at least one leaf) And: PS(f,T) ≤ M – 1 ck ck c1 ci k+1 components, M labelled ∞ Here: k =4. cj Mareike Fischer

Equivalence of MP and ML Altogether: So we have: But obviously also as the most parsimonious extension of f requires exactly PS(f,T) changes. And thus In a sequence of ‘no common mechanism’, each likelihood can be maximized independently, and thus  Applied to one character f, MP and ML are equivalent! Mareike Fischer

Then, MP and ML are not equivalent! Bounded edge lengths Modification of the model: Transition probabilities subject to upper bound u: 0 ≤ pe ≤ u < 1/r Then, MP and ML are not equivalent! Mareike Fischer

Example: Bounded edge lengths for r=2 Then, PS(f1|T1) = PS(f2|T2) = 1 Therefore, MP and ML are not equivalent in this setting! Also, P(f1|T1) = P(f2|T2),  MP is indecisive between T1 and T2 ! Note that by repeating f1 n times and f2 (n+c) times (c>0), a strong counterexample can be constructed! but max P(f2|T1) = 2u2(1-u)2 > u2 = max P(f1|T2)  ML favors T1 over T2 ! and PS(f1|T2) = PS(f2|T1) = 2 Mareike Fischer

Under a molecular clock, MP and ML are not equivalent! Example: Here, pe = (1-Pe)/2. Under a molecular clock, MP and ML are not equivalent! Note that under a clock, the maximum of the likelihood can occur in the interior of the box [0,1/r]k ! The ‘height’ P of the tree is fixed: P=P1P2=P3P4P5 In this setting, MP is indecisive between T1 and T2 but ML favors T1. Mareike Fischer

Summary Even under the assumption of no common mechanism, MP and ML do not have to be equivalent! Small changes to the model assumptions suffice to achieve this. Mareike Fischer

Thanks…  … to my supervisor Mike Steel, … to the organizers of this conference, … to the Allan Wilson Centre for financing my research, … to YOU for listening or at least waking up early enough to read this message . Mareike Fischer