. Phylogenetic Trees - Parsimony Tutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in:

Slides:



Advertisements
Similar presentations
Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Maximum Parsimony Probabilistic Models of Evolutions Distance Based Methods Lecture 12 © Shlomo Moran, Ilan Gronau.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Phylogenetic Trees Lecture 4
Tree Reconstruction.
Maximum Parsimony (MP) Algorithm. MP Algorithm  Character-based algorithm – does not use distances, but utilizes the character information in sequences.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
. Phylogenetic Trees Lecture 3 Based on: Durbin et al 7.4; Gusfield 17.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
. Phylogenetic Trees - Parsimony Tutorial #11 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Phylogeny reconstruction BNFO 602 Roshan. Simulation studies.
BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and.
Distance-Based Phylogenetic Reconstruction Tutorial #8 © Ilan Gronau, edited by Itai Sharon.
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Perfect Phylogeny MLE for Phylogeny Lecture 14
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Multiple Sequence Alignment S 1 = AGGTC S 2 = GTTCG S 3 = TGAAC Possible alignment A-TA-T GGGGGG G--G-- TTATTA -TA-TA CCCCCC -G--G- AG-AG- GTTGTT GTGGTG.
Presented By Dr. Shazzad Hosain Asst. Prof. EECS, NSU
Parsimony and searching tree-space Phylogenetics Workhop, August 2006 Barbara Holland.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetics II.
. Phylogenetic Trees Lecture 11 Sections 6.1, 6.2, in Setubal et. al., 7.1, 7.1 Durbin et. al. © Shlomo Moran, based on Nir Friedman. Danny Geiger, Ilan.
Phylogenetic Tree Reconstruction
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Evolutionary tree reconstruction
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Lecture 6A – Introduction to Trees & Optimality Criteria Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves) 3 & 5 are internal branches.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
Phylogenetic Trees - Parsimony Tutorial #13
Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
CS 395T: Computational phylogenetics January 18, 2006 Tandy Warnow.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Chapter AGB. Today’s material Maximum Parsimony Fixed tree versions (solvable in polynomial time using dynamic programming) Optimal tree search.
394C: Algorithms for Computational Biology Tandy Warnow Jan 25, 2012.
Phylogenetic Trees - Parsimony Tutorial #12
Phylogenetic basis of systematics
Lecture 6A – Introduction to Trees & Optimality Criteria
Character-Based Phylogeny Reconstruction
Recitation 5 2/4/09 ML in Phylogeny
BNFO 602 Phylogenetics Usman Roshan.
BNFO 602 Phylogenetics – maximum parsimony
CS 581 Tandy Warnow.
Multiple Sequence Alignment
Phylogeny.
Lecture 6A – Introduction to Trees & Optimality Criteria
Presentation transcript:

. Phylogenetic Trees - Parsimony Tutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in: - Come to me for more details -

. Phylogenetic Reconstruction We’d like to study the evolutionary history of species Distance-based approach: Calculate (ML) pairwise (evolutionary) distances between species Find the edge-weighted tree best describing this metric Major drawback: Lose of information when reducing data to pairwise distances Character-based approach: Consider the character vector of each specie: – morphological characters – bio-molecular characters Optimization criteria: – parsimony – likelihood / posterior-probability

3 Parsimony-score: Number of character-changes ( mutations ) along the evolutionary tree (tree containing labels on internal vertices) Example: Most Parsimonious Tree AGA AAA AAG GGA AAA AGA AAA AAG GGA AAA AGA Most parsimonious tree:  Tree with minimal parsimony score Score = 4 Score = 3 Minimal Evolution Principle

4 We break the problem into two: 1.Small parsimony: Given the topology find the best assignment to internal nodes 2.Large parsimony: Find the topology which gives best score  Large parsimony is NP-hard  We’ll show solution to small parsimony ( Fitch and Sankoff’s algorithms ) Input to small parsimony: tree with character-state assignments to leaves Example: Small vs. Large Parsimony AardvarkBisonChimpDog Elephant A: CAGGTA B: CAGACA C: CGGGTA D: TGCACT E: TGCGTA

5 Fitch’s Algorithm Execute independently for each character: 1.Bottom-up phase: Determine set of possible states for each internal node 2.Top-down phase: Pick states for each internal node AardvarkBisonChimpDog Elephant 1 2 CA G GTA CA G ACA CG G GTA TG C ACT TG C GTA Dynamic Programming framework

6 Determine set of possible states for each internal node Initialization: R i = {s i } Do a post-order (from leaves to root) traversal of tree –Determine R i of internal node i with children j, k : Fitch’s Algorithm Bottom-up phase Parsimony-score = # union operations T CT T CTAGT AGT GT T score = 3

7 Pick states for each internal node Pick arbitrary state in R root for the root Do pre-order (from root to leaves) traversal of tree –Determine s j of internal node j with parent i : Fitch’s Algorithm Top-down phase T CT T CTAGT AGT GT T Complexity: O(mnk) #characters #taxa/nodes #states score = 3

8 Weighted Parsimony Sankoff’s algorithm Each mutation a↔b costs differently - S(a,b). 1.Bottom-up phase: Determine R i (s) – cost of optimal state- assignment for subtree of i, when it is assigned state s. 2.Top-down phase: Pick optimal states for each internal node Fitch’s algorithm as special case: R i – set of states which yield minimal-cost subtree of i Same as algorithm for optimal lifted tree alignment (Tutorial #4)

9 Determine R i (s) for each internal node Initialization: Do a post-order (from leaves to root) traversal of tree –Determine R i of internal node i with children j, k : Sankoff’s Algorithm Bottom-up phase CTAGTT Natural generalization For non-binary trees Remember pointers s  s’

10 Pick states for each internal node Select minimal cost character for root ( s minimizing R root (s) ) Do pre-order (from root to leaves) traversal of tree: - For internal node j, with parent i, select state that produced minimal cost at i (use pointers kept in 1 st stage) Sankoff’s Algorithm Top-down phase CTAGT T Complexity: O(mnk 2 ) #characters #taxa/nodes #states

11 Unweighted parsimony: Sankoff ’ s algorithm: R i (s) - cost of optimal subtree of i, when it is assigned state s Fitch ’ s algorithm: Score(i) - cost of optimal state-assignment for subtree of i R i - set of optimal state-assignment for subtree of i We need to show that: 1.Optimal tree assigns node i with state from R i. 2.Fitch’s bottom-up recursive formula for R i. is correct: Fitch’s Algorithm as special case of Sankoff’s algorithm Check for yourselves

12 Unweighted parsimony: Score(i) - cost of optimal state-assignment for subtree of i R i - set of optimal state-assignment for subtree of i We need to show that: 1.Optimal tree assigns node i with state from R i. Trivially true for the root Assume ( to the contrary ) that in an optimal assignment, some node – j is assigned s j ∉ R j root i j s j ∉ R j  R j (s j ) ≥ Score(j)+1  By switching from s j to some s ∊ R j we do not raise the parsimony-score Why is this not the case for the weighted version? Parsimony-score is integer Fitch’s Algorithm as special case of Sankoff’s algorithm

13 Exploring the Space of Trees We saw how to find optimal state-assignment for a given tree topology We need to explore space of topologies Given n sequences there are (2n-3)!! possible rooted trees and (2n-5)!! possible unrooted trees taxa (n) # rooted trees # unrooted trees ,13510, ,459,4252,027,025

14 Exploring the Space of Trees Possible solutions: 1.Heuristic solutions for “ traveling ” through “ topology-space ” 2.Find (basic) topology using distance-based methods (NJ) Notice another problem: We obtain state-assignments to taxa using multiple alignment We obtain optimal MA using topology of phylogenetic tree (e.g. CLUSTAL ) Solution: Again, use some initial topology (via NJ) A-TA-T GGGGGG G--G-- TTATTA -TA-TA CCCCCC -G--G- C 1,C 2, …, C m