Mareike Fischer How many characters are needed to reconstruct the true tree? Mareike Fischer and Mike Steel Future Directions in Phylogenetic Methods and.

Slides:



Advertisements
Similar presentations
Tree Building What is a tree ? How to build a tree ? Cladograms Trees
Advertisements

CS 598AGB What simulations can tell us. Questions that simulations cannot answer Simulations are on finite data. Some questions (e.g., whether a method.
An introduction to maximum parsimony and compatibility
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
An Introduction to Phylogenetic Methods
Branch and Bound Optimization In an exhaustive search, all possible trees in a search space are generated for comparison At each node, if the tree is optimal.
Likelihood methods Given a particular model of evolution, we can estimate phylogenies using maximum likelihood.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Tree Reconstruction.
“Inferring Phylogenies” Joseph Felsenstein Excellent reference
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Normal Distributions (2). OBJECTIVES –Revise the characteristics of the normal probability distribution; –Use the normal distribution tables (revision);
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
. Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.
Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research
Phylogeny reconstruction BNFO 602 Roshan. Simulation studies.
BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.
Copyright N. Friedman, M. Ninio. I. Pe’er, and T. Pupko. 2001RECOMB, April 2001 Structural EM for Phylogentic Inference Nir Friedman Computer Science &
Probabilistic methods for phylogenetic trees (Part 2)
Lecture 13 – Performance of Methods Folks often use the term “reliability” without a very clear definition of what it is. Methods of assessing performance.
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
Chapter 7 ~ Sample Variability
. Phylogenetic Trees Lecture 13 This class consists of parts of Prof Joe Felsenstein’s lectures 4 and 5 taken from:
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Molecular phylogenetics
Proof of Kleinberg’s small-world theorems
Barking Up the Wrong Treelength Kevin Liu, Serita Nelesen, Sindhu Raghavan, C. Randal Linder, and Tandy Warnow IEEE TCCB 2009.
Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Fixations along phylogenetic lineages. Phylogenetic reconstruction: a simplification of the evolutionary process.
MOLECULAR PHYLOGENETICS Four main families of molecular phylogenetic methods :  Parsimony  Distance methods  Maximum likelihood methods  Bayesian methods.
Chapter 7: Sample Variability Empirical Distribution of Sample Means.
Calculating branch lengths from distances. ABC A B C----- a b c.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Fabio Pardi PhD student in Goldman Group European Bioinformatics Institute and University of Cambridge, UK Joint work with: Barbara Holland, Mike Hendy,
The star-tree paradox in Bayesian phylogenetics Bengt Autzen Department of Philosophy, Logic and Scientific Method LSE.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
© 2001 Prentice-Hall, Inc.Chap 7-1 BA 201 Lecture 11 Sampling Distributions.
Ben Stöver WS 2012/2013 Ancestral state reconstruction Molecular Phylogenetics – exercise.
Algorithmic research in phylogeny reconstruction Tandy Warnow The University of Texas at Austin.
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Mareike Fischer Revisiting the question: How many characters are needed to reconstruct the true tree? Mareike Fischer and Marta Casanellas Isaac Newton.
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
CS 395T: Computational phylogenetics January 18, 2006 Tandy Warnow.
Building Phylogenies. Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot.
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
Chapter AGB. Today’s material Maximum Parsimony Fixed tree versions (solvable in polynomial time using dynamic programming) Optimal tree search.
394C: Algorithms for Computational Biology Tandy Warnow Jan 25, 2012.
Phylogenetic basis of systematics
Lecture 6B – Optimality Criteria: ML & ME
An Equivalence of Maximum Parsimony and Maximum Likelihood revisited
Maximum likelihood (ML) method
Recitation 5 2/4/09 ML in Phylogeny
BNFO 602 Phylogenetics Usman Roshan.
CS 581 Tandy Warnow.
CS 581 Tandy Warnow.
Why Models of Sequence Evolution Matter
Lecture 6B – Optimality Criteria: ML & ME
BNFO 602 Phylogenetics – maximum likelihood
BNFO 602 Phylogenetics Usman Roshan.
The Most General Markov Substitution Model on an Unrooted Tree
Proof of Kleinberg’s small-world theorems
Phylogeny.
CS 394C: Computational Biology Algorithms
Algorithms for Inferring the Tree of Life
Presentation transcript:

Mareike Fischer How many characters are needed to reconstruct the true tree? Mareike Fischer and Mike Steel Future Directions in Phylogenetic Methods and Models, 17 – 21 Dec 07

Mareike Fischer The Problem Given: Sequence of characters (e.g. DNA) Wanted: Reconstruction of the ‘true’ tree Solution: Maximum Parsimony, Maximum Likelihood, etc. But: Is the sequence long enough for a reliable reconstruction?

Mareike Fischer Previous Approaches 1.Churchill, von Haeseler, Navidi (1992) 4 taxa scenario Observations:  The probability of reconstructing the true tree increases with the length of the interior edge.  “Bringing the outer nodes closer to the central branch can increase [this probability] dramatically.” more characters Rec. Prob. int. edge

Mareike Fischer Previous Approaches 2. Yang (1998) 4 taxa scenario, interior edge ‘fixed’ at 5% of tree length 5 different tree-shapes were investigated Observations: ‘Farris Zone’: MP better ‘Felsenstein Zone’: ML better The optimal length for the interior edge ranges between and Tree length Rec. Prob.

Mareike Fischer Our Approach Limitation: Most previous approaches are based on simulations. Our approach: Mathematical analysis of influence of branch lengths on tree reconstruction. We investigate MP first and consider other methods afterwards.

Mareike Fischer Already known x y y y y Here, the number k of characters needed to reconstruct the true tree grows at rate. But what happens if we fix the ratio (y:=px), and then take the value of x that minimizes k? Steel and Székely (2002):

Mareike Fischer Our Approach Setting: 4 taxa, pending edges of length px (with p>1), short interior edge of length x, 2-state symmetric model. x px

Mareike Fischer Main Result k grows at least at rate p 2 For the optimal value of x, k grows at rate p 2 For ‘reliable’ MP reconstruction:

Mareike Fischer The constants c ε and c ε ’ determine the size ε of the area under the curve of the Standard Normal Distribution. The Standard Normal Distribution

Mareike Fischer Idea of Proof: 1. Applying the CLT. Then (by CLT) Set X i i.i.d., and Note that the true tree T 1 will be favored over T 2 if and only if Z k >0.

Mareike Fischer Idea of Proof: 2. The Hadamard Representation Since the X i are i.i.d., μ k and σ k depend only on k and the probabilities P(X 1 =1) and P(X 1 =-1). These probabilities can using the ‘Hadamard Representation’: (Here, θ=e -2x.) Thus, for fixed p, the ratio to find a value of x that minimizes k. Note that P(X 1 =1) and P(X 1 =-1) only depend on x and p. can be used

Mareike Fischer The Hadamard Representation

Mareike Fischer Idea of Proof: 2. X i are i.i.d. Since the X i are i.i.d., we have

Mareike Fischer Summary and Extension For MP, the number k of characters needed to reliably reconstruct the true tree grows at rate p 2. Can other methods do better (e.g. rate p)?  No! [Can be shown using the ‘Hellinger distance’.]

Mareike Fischer The Hellinger Distance S: set of site patterns p, q: probability distributions

Mareike Fischer Outlook Questions for future work: What happens when you approach the ‘Felsenstein Zone’? What happens in general with different tree shapes or more taxa?

Mareike Fischer Thanks… … to my supervisor Mike Steel, … to the Newton Institute for organizing this great conference, … to the Allan Wilson Centre for financing my research, … to YOU for listening or at least waking up early enough to read this message.

Mareike Fischer The only true tree… Merry Christmas! … is a Christmas tree. (And it does not even require reconstruction!)