Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.

Slides:



Advertisements
Similar presentations
Chapter 5: Tree Constructions
Advertisements

Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
. Phylogenetic Trees - Parsimony Tutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in:
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.
BME 130 – Genomes Lecture 26 Molecular phylogenies I.
NJ was originally described as a method for approximating a tree that minimizes the sum of least- squares branch lengths – the minimum – evolution criterion.
Multiple sequence alignment
Phylogeny Tree Reconstruction
. Phylogenetic Trees - Parsimony Tutorial #11 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Parsimony 2.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Phylogenetic trees Sushmita Roy BMI/CS 576
Terminology of phylogenetic trees
Parsimony and searching tree-space Phylogenetics Workhop, August 2006 Barbara Holland.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
Evolutionary tree reconstruction
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Thursday, May 9 Heuristic Search: methods for solving difficult optimization problems Handouts: Lecture Notes See the introduction to the paper.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
Subtree Prune Regraft & Horizontal Gene Transfer or Recombination.
Phylogenetic Trees - Parsimony Tutorial #13
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
394C: Algorithms for Computational Biology Tandy Warnow Jan 25, 2012.
Phylogenetic Trees - Parsimony Tutorial #12
Phylogenetic basis of systematics
Distance based phylogenetics
Character-Based Phylogeny Reconstruction
Methods of molecular phylogeny
Hierarchical clustering approaches for high-throughput data
Inferring phylogenetic trees: Distance and maximum likelihood methods
BNFO 602 Phylogenetics Usman Roshan.
BNFO 602 Phylogenetics – maximum parsimony
CS 581 Tandy Warnow.
BNFO 602 Phylogenetics – maximum likelihood
BNFO 602 Phylogenetics Usman Roshan.
Backtracking and Branch-and-Bound
Phylogeny.
CS 394C: Computational Biology Algorithms
Algorithms for Inferring the Tree of Life
Presentation transcript:

Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

 The parsimony principle:  Find the tree that requires the fewest evolutionary changes!  A fundamentally different method:  Search rather than reconstruct  Parsimony algorithm 1.Construct all possible trees 2.For each site in the alignment and for each tree count the minimal number of changes required 3.Add sites to obtain the total number of changes required for each tree 4.Pick the tree with the lowest score A quick review

 The parsimony principle:  Find the tree that requires the fewest evolutionary changes!  A fundamentally different method:  Search rather than reconstruct  Parsimony algorithm 1.Construct all possible trees 2.For each site in the alignment and for each tree count the minimal number of changes required 3.Add sites to obtain the total number of changes required for each tree 4.Pick the tree with the lowest score A quick review Too many! The small parsimony problem

 We divided the problem of finding the most parsimonious tree into two sub-problems:  Large parsimony: Find the topology which gives best score  Small parsimony: Given a tree topology and the state in all the tips, find the minimal number of changes required  Large parsimony is “NP-hard”  Small parsimony can be solved quickly using Fitch’s algorithm Large vs. Small Parsimony

 Input: 1.A tree topology: The Small Parsimony Problem humanchimpgorillalemurgibbonbonobo HumanCACT ChimpTACT BonoboAGCC GorillaAGCA GibbonGACT LemurTAG T  Output: The minimal number of changes required: parsimony score 2.State assignments for all tips: humanchimpgorillalemurgibbonbonobo CTGTAA (but in fact, we will also find the most parsimonious assignment for all internal nodes)

 Execute independently for each character:  Two phases: 1.Bottom-up phase: Determine the set of possible states for each internal node 2.Top-down phase: Pick a state for each internal node Fitch’s algorithm humanchimpgorillalemurgibbonbonobo CTGTAA 2 1

1.Initialization: R i = {s i } for all tips 2.Traverse the tree from leaves to root (“post-order”) 3.Determine R i of internal node i with children j, k: 1. Fitch’s algorithm: Bottom-up phase (Determine the set of possible states for each internal node) humanchimpgorillalemurgibbonbonobo CTGTAA 1 C,T G,T G,T,A T T,A Let s i denote the state of node i and R i the set of possible states of node i

1.Initialization: R i = {s i } 2.Traverse the tree from leaves to root (“post-order“) 3.Determine R i of internal node i with children j, k: 1. Fitch’s algorithm: Bottom-up phase (Determine the set of possible states for each internal node) humanchimpgorillalemurgibbonbonobo CTGTAA 1 C,T G,T G,T,A T Parsimony-score = # union operations Parsimony-score = 4 T,A

1.Pick arbitrary state in R root to be the state of the root,s root 2.Traverse the tree from root to leaves (“pre-order”) 3.Determine s i of internal node i with parent j: 2. Fitch’s algorithm: Top-down phase (Pick a state for each internal node) humanchimpgorillalemurgibbonbonobo CTGTAA C,T G,T G,T,A T Parsimony-score = 4 2 T,A

T T T T A 1.Pick arbitrary state in R root to be the state of the root,s root 2.Traverse the tree from root to leaves (“pre-order”) 3.Determine s i of internal node i with parent j: 2. Fitch’s algorithm: Top-down phase (Pick a state for each internal node) humanchimpgorillalemurgibbonbonobo CTGTAA Parsimony-score = 4 2

And now back to the “big” parsimony problem … How do we find the most parsimonious tree amongst the many possible trees?

 Exhaustive search: Up to 8-10 leaves (10k-2m unrooted trees, 135k-34m rooted) Guaranteed results  Branch-and-bound*: Up to leaves Guaranteed results!!! * Branch-and-bound is a clever way of ruling out most trees as they are built, so you can evaluate more trees by exhaustive search.  Heuristic search (e.g. hill-climb): 20+ leaves May not find correct solution. Searching tree space

Hill-climbing Rejected related tree Starting tree Different trees Parsimony score Accepted related tree Final tree still possible that best tree is here A “greedy” algorithm

Nearest-Neighbor Interchange (NNI) Sub-tree 1.Find a tree with some score. 2.At each internal branch consider the two alternative arrangements of the 4 sub-trees. 3.Keep the tree that has the best score. 4.Repeat.

three (of many) places where NNI can be considered

Hill-climbing with NNI Rejected NNI tree Starting tree Different trees Parsimony score Accepted NNI tree Final tree still possible that best tree is here A “greedy” algorithm

How can we improve this algorithm and increase our chances of finding the optimal tree?

1)Construct all possible trees or search the space of possible trees using NNI hill-climb 2)For each site in the alignment and for each tree count the minimal number of changes required using Fitch’s algorithm 3)Add all sites up to obtain the total number of changes for each tree 4)Pick the tree with the lowest score or search until no better tree can be found The parsimony algorithm

Parsimony Trees: 1)Construct all possible trees or search the space of possible trees 2)For each site in the alignment and for each tree count the minimal number of changes required using Fitch’s algorithm 3)Add all sites up to obtain the total number of changes for each tree 4)Pick the tree with the lowest score Phylogenetic trees: Summary Distance Trees: 1)Compute pairwise corrected distances. 2)Build tree by sequential clustering algorithm (UPGMA or Neighbor- Joining). 3)These algorithms don't consider all tree topologies, so they are very fast, even for large trees. Maximum-Likelihood Trees: 1)Tree evaluated for likelihood of data given tree. 2)Uses a specific model for evolutionary rates (such as Jukes-Cantor). 3)Like parsimony, must search tree space. 4)Usually most accurate method but slow.

Branch confidence How certain are we that this is the correct tree? Can be reduced to many simpler questions - how certain are we that each branch point is correct? For example, at the circled branch point, how certain are we that the three subtrees have the correct content : subtree1 - QUA025, QUA013 subtree2 - QUA003, QUA024, QUA023 subtree3 - everything else

Most commonly used branch support test: 1.Randomly sample alignment sites. 2.Use sample to estimate the tree. 3.Repeat many times. (sample with replacement means that a sampled site remains in the source data after each sampling, so that some sites will be sampled more than once) Bootstrap support

For each branch point on the computed tree, count what fraction of the bootstrap trees have the same subtree partitions (regardless of topology within the subtrees). For example at the circled branch point, what fraction of the bootstrap trees have a branch point where the three subtrees include: subtree1 - QUA025, QUA013 subtree2 - QUA003, QUA024, QUA023 subtree3 - everything else This fraction is the bootstrap support for that branch. Bootstrap support

low-confidence branches are marked Original tree figure with branch supports (here as fractions, also common to give % support)