CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.

Slides:



Advertisements
Similar presentations
Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
An Introduction to Phylogenetic Methods
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Phylogeny Tree Reconstruction
. Phylogenetic Trees - Parsimony Tutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in:
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
Realistic evolutionary models Marjolijn Elsinga & Lars Hemel.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogeny Tree Reconstruction
. Phylogenetic Trees - Parsimony Tutorial #11 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
CISC667, F05, Lec16, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (III) Probabilistic methods.
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
CISC667, F05, Lec8, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms.
Motif finding : Lecture 2 CS 498 CXZ. Recap Problem 1: Given a motif, finding its instances Problem 2: Finding motif ab initio. –Paradigm: look for over-represented.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
BINF6201/8201 Molecular phylogenetic methods
P HYLOGENETIC T REE. OVERVIEW Phylogenetic Tree Phylogeny Applications Types of phylogenetic tree Terminology Data used to build a tree Building phylogenetic.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
COMPUTATIONAL MODELS FOR PHYLOGENETIC ANALYSIS K. R. PARDASANI DEPTT OF APPLIED MATHEMATICS MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY (MANIT) BHOPAL.
1 Chapter 7 Building Phylogenetic Trees. 2 Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances –UPGMA method.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetics II.
Phylogenetic Tree Reconstruction
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
Lecture 6A – Introduction to Trees & Optimality Criteria Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves) 3 & 5 are internal branches.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogeny Ch. 7 & 8.
Phylogenetic Trees - Parsimony Tutorial #13
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Phylogenetic Trees - Parsimony Tutorial #12
Phylogenetic basis of systematics
Character-Based Phylogeny Reconstruction
BNFO 602 Phylogenetics Usman Roshan.
BNFO 602 Phylogenetics – maximum parsimony
CS 581 Tandy Warnow.
Phylogeny.
Presentation transcript:

CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony

CISC667, F05, Lec14, Liao2 –Mutation, selection, Only the Fittest Survive. –Speciation. At one extreme, a single gene mutation may lead to speciation. [Nature 425(2003)679] –Phylogeny: evolutionary relation among species, often represented as a tree structure. Evolution

CISC667, F05, Lec14, Liao3 Question: how to infer phylogeny? - Based on morphological features - Based on molecular features Gene trees Phylogenetic trees (using 16s rRNA) Criteria for selecting features –Ubiquitous –Relatively stable Reconciliation between gene trees and species trees –Orthology genes

CISC667, F05, Lec14, Liao4 Trees (binary) –Unrooted vs rooted –Leaves versus internal nodes –For an unrooted tree with n leaves # of nodes (including leaves) is 2n – 2. # of edges is 2n -3 Can lead to 2n-3 rooted trees, by adding a root at any edge.

CISC667, F05, Lec14, Liao5 For example,

CISC667, F05, Lec14, Liao6 How many different configurations can a tree of n leaves have? Assume the tree is unrooted. Grow the tree by adding one leaf at a time n = 2, there is 1 edge to break. n = 3, there are 3 edges to break => 3 different configurations n = 4, there are 5 edges to break => 5 different configurations … n = n, there are (2n-3) edges to break => (2n-5) 1·3 ·5 ·7 ·… ·(2n-3) = (2n-3)!! The number of possible configurations as a function of the tree size increases very fast.

CISC667, F05, Lec14, Liao7 Parsimony –Based on sequence alignment. –Assign a cost to a given tree –Search through the topological space of all trees for the best tree: the one that has the lowest cost. For example, given an alignment of four sequences AAG AAA GGA AGA If the number of mutations is used as a measure of cost, then the leftmost tree in the following is the best tree. AAGAGAAAAGGA AAAAGA AAA AAGGGAAAAAGA AAAAGA AAA AAGAAAGGAAGA AAAAGA AAA

CISC667, F05, Lec14, Liao8 Algorithm: traditional parsimony [Fitch 1971] // given an alignment A, and a tree T with leaves labeled. // each position in A is treated independently C = 0; // the total cost for (u = 1 to |A|) { // u is the position index into the alignment A initialization: set C u = 0 and k = 2n -1 // C u is the cost and k is the node index recursion: to obtain the set R k // contains candidate residues assigning to node k if k is leaf node: set R k = x u // residue at position u else compute R i, R j for the daughter nodes i, j of k if (R i  R j ) is not empty: set R k = R i  R j else set R k = R i  R j C u = C u + 1 termination: C = C + C u } minimal cost of tree = C. AA A G {A} {A,G} {A} AAGAAAGGAAGA

CISC667, F05, Lec14, Liao9 Trackback phase: –Randomly choose a residue from R 2n-1 and proceed down the tree. –if a residue is chosen from the set R k Choose the same residue from the daughter set R i if possible, otherwise pick a residue at random from R i. Choose the same residue from the daughter set R j if possible, otherwise pick a residue at random from R j. For example, A B A {A} {A,B} 2 B B A A A A 2 B A X X B A A A B 2 B X X B A B B B 2 B X X A A Traceback cannot find this tree, though it is equally optimal as the other two trees.

CISC667, F05, Lec14, Liao10 Algorithm: Weighted parsimony [Sankoff & Cedergren 1983] // given an alignment A, a tree T with the leaves labeled, and a substitute matrix S. // each position in A is treated independently C = 0; // the total cost for (u = 1 to |A|) { // u is the position index into the alignment A initialization: set k = 2n -1 // k is the node index, currently pointing to the root recursion: Compute S k (a) // the minimal cost for assigning residue a to node k if k is leaf node: if a = x u k then S k (a) = 0 else S k (a) =  else // k is not a leaf node compute S i (a), S j (a) for all a at the daughter nodes i, j of k set S k (a) = min b [S i (b) +S(a,b)] + min b [S j (b) +S(a,b)] set l k (a) = argmin b [S i (b) +S(a,b)], r k (a) = argmin b [S j (b) +S(a,b)]. // for traceback termination: C = C+ min a S 2n-1 (a). } minimal cost of tree = C. A B A {1,2} {1,1} {2,2} 2 B

CISC667, F05, Lec14, Liao11 Both algorithms run in O(nm), where n is number of sequences and m is the sequence length in terms of number of residues. Weighted parsimony, when using S(a,a) = 0 for all a and S(a,b) = 1 for all a ≠ b, gives the same cost as that for the traditional parsimony. Traceback in weighted parsimony can find assignments that are missed in the traditional parsimony. The cost from the traditional parsimony is independent of the position for the root node. Therefore, the cost can be computed using unrooted trees. Still the number trees to search using parsimony grows huge as the number of leaves increases. It is proved that finding the most parsimonious tree is an NP- hard problem. Branch-and-bound –Guarantee to find the optimal tree –Worse-case complexity is the same as exhaustive search.

CISC667, F05, Lec14, Liao12 Assessing the trees: the bootstrap “Plug-in” sampling with replacement –Given an alignment of, say, one hundred columns. –Randomly select one column from the original alignment as the first column, and repeat this process until one hundred columns are selected forming a new alignment of one hundred columns. –Use this artificially created alignment for parsimony analysis, a new tree is found. –Repeat this whole process many times (say 1000). –The frequency with which a chosen phylogenetic feature appears is used as a measure of the confidence we have in this feature.

CISC667, F05, Lec14, Liao13 Software packages and databases for phylogenetic trees Phylip by Felsenstein ( PAUP ( MacClad ( TreeBase (