. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield 17.1-17.3, Setubal&Meidanis 6.1.

Slides:



Advertisements
Similar presentations
Maximum Parsimony Probabilistic Models of Evolutions Distance Based Methods Lecture 12 © Shlomo Moran, Ilan Gronau.
Advertisements

PHYLOGENETIC TREES Bulent Moller CSE March 2004.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Recursive Definitions and Structural Induction
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
. Intro to Phylogenetic Trees Lecture 5 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran. Slight modifications by Benny.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Phylogenetic Trees Lecture 4
Lectures on Network Flows
Problem Set 2 Solutions Tree Reconstruction Algorithms
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Phylogenetic Trees - Parsimony Tutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in:
. Perfect Phylogeny Tutorial #11 © Ilan Gronau Original slides by Shlomo Moran.
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.
. Phylogenetic Trees Lecture 3 Based on: Durbin et al 7.4; Gusfield 17.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and.
Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.
Phylogeny Tree Reconstruction
. Phylogenetic Trees - Parsimony Tutorial #11 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and.
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
Perfect Phylogeny MLE for Phylogeny Lecture 14
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
Graph Algorithms Using Depth First Search Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Analysis of Algorithms.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
CSCI 115 Chapter 7 Trees. CSCI 115 §7.1 Trees §7.1 – Trees TREE –Let T be a relation on a set A. T is a tree if there exists a vertex v 0 in A s.t. there.
Phylogenetics II.
. Phylogenetic Trees Lecture 11 Sections 6.1, 6.2, in Setubal et. al., 7.1, 7.1 Durbin et. al. © Shlomo Moran, based on Nir Friedman. Danny Geiger, Ilan.
Mathematical Preliminaries. Sets Functions Relations Graphs Proof Techniques.
Discrete Structures Lecture 12: Trees Ji Yanyan United International College Thanks to Professor Michael Hvidsten.
 Rooted tree and binary tree  Theorem 5.19: A full binary tree with t leaves contains i=t-1 internal vertices.
5.5.2 M inimum spanning trees  Definition 24: A minimum spanning tree in a connected weighted graph is a spanning tree that has the smallest possible.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Evolutionary tree reconstruction
5.5.3 Rooted tree and binary tree  Definition 25: A directed graph is a directed tree if the graph is a tree in the underlying undirected graph.  Definition.
5.5.2 M inimum spanning trees  Definition 24: A minimum spanning tree in a connected weighted graph is a spanning tree that has the smallest possible.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
CMSC 341 Introduction to Trees. 2/21/20062 Tree ADT Tree definition –A tree is a set of nodes which may be empty –If not empty, then there is a distinguished.
Phylogenetic Trees - Parsimony Tutorial #13
. Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran.
Chapter 10: Trees A tree is a connected simple undirected graph with no simple circuits. Properties: There is a unique simple path between any 2 of its.
Mathematical Induction Section 5.1. Climbing an Infinite Ladder Suppose we have an infinite ladder: 1.We can reach the first rung of the ladder. 2.If.
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
An Algorithm for the Consecutive Ones Property Claudio Eccher.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
5.6 Prefix codes and optimal tree Definition 31: Codes with this property which the bit string for a letter never occurs as the first part of the bit string.
Chapter 5 With Question/Answer Animations 1. Chapter Summary Mathematical Induction - Sec 5.1 Strong Induction and Well-Ordering - Sec 5.2 Lecture 18.
Chapter AGB. Today’s material Maximum Parsimony Fixed tree versions (solvable in polynomial time using dynamic programming) Optimal tree search.
Section Recursion 2  Recursion – defining an object (or function, algorithm, etc.) in terms of itself.  Recursion can be used to define sequences.
Phylogenetic Trees - Parsimony Tutorial #12
Chapter 5 : Trees.
dij(T) - the length of a path between leaves i and j
Lectures on Network Flows
PC trees and Circular One Arrangements
Character-Based Phylogeny Reconstruction
Graph Algorithms Using Depth First Search
CS 581 Tandy Warnow.
CSCI2950-C Lecture 8 Molecular Phylogeny: Parsimony and Likelihood
Phylogeny.
Perfect Phylogeny Tutorial #10
Presentation transcript:

. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1

2 Character-based methods for constructing phylogenies In this approach, trees are constructed by comparing the characters of the corresponding species. Characters may be morphological (teeth structures) or molecular (nucleotides in homologous DNA sequences). One common approach is Maximum Parsimony Common Assumptions: u Independence of characters (no interactions) u Best tree is one where minimal changes take place

3 Character based methods: Input data species C1C1 C2C2 C3C3 C4C4 …CmCm dog AACAGGTCTTCGAGGCCC horse AACAGGCCTATGAGACCC frog AACAGGTCTTTGAGTCCC human AACAGGTCTTTGATGACC pig AACAGTTCTTCGATGGCC *********** Each character (column) is processed independently. The green character will separate the human and pig from frog, horse and dog. The red character will separate the dog and pig from frog, horse and human. We seek for a tree that will best explain all characters simultaneously.

4 1. Maximum Parsimony A Character-based method Input: u h sequences (one per species), all of length k. Goal: u Find a tree with the input sequences at its leaves, and an assignment of sequences to internal nodes, such that the total number of substitutions is minimized.

5 Example Input: four nucleotide sequences: AAG, AAA, GGA, AGA taken from four species. AGA AAA GGA AAG AAA Total #substitutions = 4 By the parsimony principle, we seek a tree that has a minimum total number of substitutions of symbols between species and their originator in the phylogenetic tree. Here is one possible tree.

6 Example Continued There are many assignments for this tree. For example: AGA GGA AAA AAG AAA AGA AAA Total #substitutions = 3 GGA AAA AGA AAG AAA Total #substitutions = 4 The left tree is preferred over the right tree. The total number of changes is called the parsimony score.

7 Example With One Letter Sequences u Suppose we have five species, such that three have ‘C’ and two ‘T’ at a specified position u Minimal tree has only one evolutionary change: C C C C C T T T T  C

8 Parsimony Based Reconstruction Two separate components: 1. A procedure to find the minimum number of changes needed to explain the data for a given tree topology, where species are assigned to leaves. 2.A search through the space of trees. 3.We will see efficient algorithms for (1). (2) is hard.

9 Example of Input for a Given Tree AardvarkBisonChimpDogElephant A : CAGGTA B : CAGACA C : CGGGTA D : TGCACT E : TGCGTA The tree and assignments of strings to the leaves is given, and we need only to assign strings to internal vertices.

10 Fitch Algorithm: Maximum Parsimony for a Given Tree Input: A rooted binary tree with characters at the leaves Output: Most parsimonious assignment of states to internal vertices Work on each position independently. Make one pass from the leaves to the root, and another pass from the root to the leaves. A A/T A A C T A A A/C

11 Fitch’s Algorithm, More detailed traverse tree from leaves to root, fix a set of possible states (e.g. nucleotides) for each internal vertex traverse tree from root to leaves, pick a unique state for each internal vertex

12 Fitch’s Algorithm – Phase 1 D o a post-order (from leaves to root) traversal of tree, assign to each vertex a set of possible states. Each leaf has a unique possible state, given by the input. The possible states R i of internal node i with children j and k is given by:

13 Fitch’s Algorithm – Phase 1 Claim (to be proved soon): # of substitutions in optimal solution = # of union operations TC T CT C C T A G C AGC GC

14 Fitch’s Algorithm – Phase 2 do a pre-order (from root to leaves) traversal of tree select state r j of internal node j with parent i as follows:

15 Fitch’s Algorithm – Phase 2 TCTC T CTCT C C T A G C AGC GCGC The algorithm could also select C as the assignment to the root. All other assignment are unique. Complexity: O(nk), where n is the number of leaves and k is the number of states. For m characters the complexity is O(nmk).

16 Proof of Fitch’s Algorithm We’ll show that Fitch minimizes the parsimony score at every character. u Definitions: l For a leaf-labeled tree T, let T* be an optimal assignment of labels to internal nodes of T. T*(v) be the assignment at internal node. l Let T v be the tree rooted at v.

17 u Claim: The first phase of Fitch keeps at v the set of states S(v) such that s  S(v) iff there exists an optimal assignment T v * with T v * (v) = s. u Proof: By induction of the tree height h. l Basis: h=1 I.If both children have the same state – zero change. II.Otherwise – exactly one change. AA A AB A  B

18 Induction step: Assume correctness for height k and will prove for k+1. Let p 1 and p 2 be the optimal costs of the subtrees of v’s children. If the intersection of v’s children lists is not empty, then the optimal score is p 1 +p 2 and it can be achieved by labeling v with any member in the intersection, and only in this way. Otherwise, the optimal score is p 1 +p 2 +1, and it can be achieved by labeling v with any member in the union of the lists, and only in this way. A,B C,D A,B,C,D A,B B,C B

19 Generalization: Weighted Parsimony (Sankoff’s algorithm) Weighted Parsimony score: l Each change is weighted by a score c(a,b). l The weighted parsimony score reduces to the parsimony score when c(a,a)=0 and c(a,b)=1 for all b other than a.

20 Weighted Parsimony on a Given Tree Each position is independent and computed by itself. Use Dynamic programming. u if i is a node with children j and k, then S(i,a) = min b (S(j,b)+c(a,b)) + min b’ (S(k,b’)+c(a,b’)) i j k S(j,b) S(j,b)  the optimal score of a subtree rooted at j when j has the character b. S(k,b’) S(i,a)

21 Evaluating Parsimony Scores Dynamic programming on a given tree Initialization:  For each leaf i set S(i,a) = 0 if i is labeled by a, otherwise S(i,a) =  Iteration:  if i is node with children j and k, then S(i,a) = min x (S(j,x)+c(a,x)) + min y (S(k,y)+c(a,y)) Termination:  cost of tree is min x S(r,x) where r is the root Comment: To reconstruct an optimal assignment, we need to keep in each node i and for each character a two characters x, y that minimize the cost when i has character a.

22 Cost of Evaluating Parsimony for binary trees For a tree with n nodes and a single character with k values, the complexity is O(nk 2 ). When there are m such characters, it is O(nmk 2 ).

23 2. Finding the right tree: The Perfect Phylogeny Problem Recall the general problem: Input: A set of species, specified by strings of characters. Output: A tree T, and assignment of species to the leaves of T, with minimum parsimony score. A restricted variant of this problem is the Perfect Phylogeny problem. The algorithms of Fitch and Sankoff assume that the tree is known. Finding the optimal tree is harder.

24 2. The Perfect Phylogeny Problem Basic assumption for the perfect phylogeny problem: A character is a significant property, which distinguishes between species (e.g. dental structure). Hence, characters in evolutionary trees should be “Homoplasy free”, as we define next.

25 Homoplasy-free characters 1 Characters in Phylogenetic Trees should avoid: reversal transitions u A species regains a state it’s direct ancestor has lost. u Famous known reversals: l Teeth in birds. l Legs in snakes.

26 Homoplasy-free characters 2 …and also avoid convergence transitions u Two species possess the same state while their least common ancestor possesses a different state. u Famous known convergence: The marsupials.

27

28 Characters as Colorings A coloring of a tree T=(V,E) is a mapping C:V  [set of colors] A partial coloring of T is a mapping defined on a subset of the vertices U  V: C:U  [set of colors] U=

29 Each character defines a (partial) coloring of the corresponding phylogenetic tree: Characters as Colorings (2) Species ≡ Vertices States ≡ Colors

30 Convex Colorings (and Characters) C Definition: A (partial/total) coloring of a tree is convex iff all d-carriers are disjoint Let T=(V,E) be a colored tree, and d be a color. The d-carrier is the minimal subtree of T containing all vertices colored d

31 A character is Homoplasy free (avoids reversal and convergence transitions) ↕ The corresponding (partial) coloring is convex Convexity  Homoplasy Freedom

32 The Perfect Phylogeny Problem u Input: a set of species, and many characters. u Question: is there a tree T containing the species as vertices, in which all the characters (colorings) are convex?

33 Input: Partial colorings (C 1,…,C k ) of a set of vertices U (in the example: 3 total colorings: left, center, right, each by two colors). Problem: Is there a tree T=(V,E), s.t. U  V and for i=1,…,k,, C i is a convex (partial) coloring of T? RBRRBRRRR BBRRRB The Perfect Phylogeny Problem (pure graph theoretic setting) NP-Hard In general, in P for some special cases. Next we show a polynomial time algorithm for the case of binary characters.

34 Perfect Phylogeny for directed binary characters Input: a matrix where rows correspond to objects (species), columns to characters. Each character has two states: 0 (non exists) or 1 (exists). WLOG for each character there is a species which possesses it. Question: Is there a perfect phylogeny tree for the given species, in which all the characters have value 0 at some specified internal vertex (the root). C1C1C2C3C4C5 A11000 B00100 C11001 D00110 E01000 A E D C B (11000) (00100) (01000) (00110) (11001) (00000)

35 Perfect Phylogeny for directed binary characters By the definition, for each character C there is one edge in which it is converted from 0 to 1. In the below tree, the edge on which character C2 is converted to 1 is marked. The resulted tree is convex for this character. C1C2C3C4C5 A1 B0 C1 D0 E1 A E D C B C

36 Directed Perfect Phylogeny for a 0-1 Matrix Proof of the observation (sketch): we need to show that: [I and II hold]  [each character is convex on T]. [I and II hold]  for each character C there is one edge in which it is converted from 0 to 1  the species of each character C induces a connected subtree of T. C1C2C3C4C5 A1 B0 C1 D0 E1 A E D C B C2 the edge on which character C2 is converted to 1

37 The directed, binary Perfect Phylogeny Problem C1C2C3C4C5 A11000 B00100 C11001 D00110 E01000 A E D C B C4 C3 C2 C1 C5 A tree is a directed perfect phylogeny for a given 0-1 matrix M iff we can map each character to an edge s.t. edge labeled by Ci represent changing character Ci’s state from 0 to 1. Below we show such a tree for the given matrix:

38 Efficient algorithm for the Binary Perfect Phylogeny Problem Definition: Given a 0-1 matrix M, O k ={j:M jk =1}, ie: O k is the set of objects that have character Ck. Theorem: M has a directed perfect phylogenetic tree iff the sets {O i } are laminar, ie: for all i, j, either O i and O j are disjoint, or one includes the other. C1C2C3C4C5 A11000 B00100 C11001 D00110 E01000 C1C2 C3 C4C5 A11000 B00101 C11001 D00110 E01001 LaminarNot Laminar

39 Proof  : Assume M has a directed perfect phylogeny, and let i, j be given. Consider the edges labeled i and j. Case 1: There is a root to leaf path containing both edges. Then one is included in the other (C2 and C1 below). Case 2: not case 1. Then they are disjoint (C2 and C3). A E D C B C4 C3 C2 C1 C5

40 Proof (cont.)  : Assume for all i, j, either O i and O j are disjoint, or one includes the other. We prove by induction on the number of characters that M has a perfect phylogenetic tree for the matrix. Basis: one character. Then there are at most two objects, one with and one without this character. C1 A1 B0 AB

41 Proof (cont.)  : Induction step: Assume correctness for n-1 characters, and consider a matrix with n characters (non-zero columns). WLOG assume that O 1 is not contained in O j for j > 1. Let S 1 be the set of objects j for which M j1 = 1, and S 2 be the remaining objects. Then for each character C, either all the objects possessing C are contained in S 1, or all of them are contained in S 2 (prove!). By induction there are trees T 1 and T 2 for S 1 and S 2. Combining them as below gives the desired tree. C1C2C3C4C5 A11000 B00100 C11001 D00110 E10000 T1T1 T2T2 1 S 1 ={A,C,E } S 2 ={B,D}