Download presentation
Presentation is loading. Please wait.
1
Perfect Phylogeny MLE for Phylogeny Lecture 14
Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1 .
2
Some Announcements: The Final Exam will take Place on Friday, , 0900, at Taub 8. Allowed Material: Course&Tutorial slides+ the textbooks of the course (Durbin et el, Setubal&Meidanis, Gusfield). Lab offered next semester: algorithms for constructing phylogenetic trees:
3
2. The perfect phylogeny problem
A character is assumed to be a property which distinguishes between species (e.g. dental structure). A characters state is a value of the character (human dental structure). Problem: Given set of species, specified by their characters, reconstruct their evolutionary tree.
4
The Perfect Phylogeny Problem (pure graph theoretic setting)
Input: Partial colorings (C1,…,Ck) of a set of vertices U (in the example: 3 total colorings: left, center, right, each by two colors). Problem: Is there a tree T=(V,E), s.t. UV and for i=1,…,k,, Ci is a convex (partial) coloring of T? RBR RRR BBR RRB NP-Hard In general, in P for some special cases
5
Perfect Phylogeny for directed binary characters
Input: a matrix where rows correspond to objects (species), columns to characters. Each character has two states: 0 (non exists) or 1 (exists). Question: Is there a directed perfect phylogeny tree for the given species, in which all the characters have value 0 at the root? (00000) C1 C2 C3 C4 C5 A 1 B C D E E B (01000) D (00100) (00110) A C (11000) (11001)
6
Perfect Phylogeny for directed binary characters
By the definition, for each character C there is one edge in which it is converted from 0 to 1. In the below tree, the edge on which character C2 is converted to 1 is marked. The resulted tree is convex for this character. the edge on which character C2 is converted to 1 C1 C2 C3 C4 C5 A 1 B C D E C2 E B 1 D A C 1 1
7
Perfect Phylogeny for directed binary characters
A tree is a directed perfect phylogeny for a given 0-1 matrix M iff we can map each character to an edge s.t. edge labeled by Ci represent changing character Ci’s state from 0 to 1. Below we show such a tree for the given matrix: C1 C2 C3 C4 C5 A 1 B C D E A E D C B C4 C3 C2 C1 C5
8
Efficient algorithm for the Binary Perfect Phylogeny Problem
Definition: Given a 0-1 matrix M, Ok={j:Mjk=1}, ie: Ok is the set of objects that have character Ck. Theorem: M has a perfect phylogenetic tree iff the sets {Oi} are laminar, ie: for all i, j, either Oi and Oj are disjoint, or one includes the other. Laminar Not Laminar C1 C2 C3 C4 C5 A 1 B C D E C1 C2 C3 C4 C5 A 1 B C D E
9
Proof : Assume M has a perfect phylogeny, and let Ci, Cj be given.
Consider the edges labeled Ci and Cj. Case 1: There is a root to leaf path containing both edges. Then one is included in the other (C2 and C1 below). Case 2: not case 1. Then they are disjoint (C2 and C3). C2 C3 C1 C4 E D B C5 A C
10
Proof (cont.) : Assume for all i, j, either Oi and Oj are disjoint, or one includes the other. We prove by induction on the number of characters that M has a perfect phylogenetic tree for the matrix. Basis: one character. Then there are at most two objects, one with and one without this character. C1 A 1 B C1 A B
11
Proof (cont.) : Induction step: Assume correctness for n-1 characters, and consider a matrix with n characters (non-zero columns). WLOG assume that O1 is not contained in Oj for j > 1. Let S1 be the set of objects j for which Mj1= 1, and S2 be the remaining objects. Then each character belongs to objects in S1 or S2, but not both (prove!). By induction there are trees T1 and T2 for S1 and S2. Combining them as below gives the desired tree. C1 C2 C3 C4 C5 A 1 B C D E S1={A,C,E} S2={B,D} 1 T1 T2
12
Efficient Implementation
1 Sort the columns (characters) by decreasing value when considered as binary numbers. (Time complexity: O(mn), using radix sort). Claim: If the binary value of column i is larger than that of column j, then Oi is not a proper subset of Oj. Proof: Oi – Oj > 0 means the 1’s in Oi are not covered by the 1’s in Oj. C1 C2 C3 C4 C5 A 1 B C D E C2 C1 C3 C5 C4 A 1 B C D E
13
Efficient Implementation(2)
2. Make a backwards linked list of the 1’s in each row (leftmost 1 in each row points at itself). Time complexity: O(mn). Claim: If the columns are sorted, then the set of columns is laminar iff for each column i, all the links leaving column i point at the same column. Can be checked in O(mn) time. C2 C1 C3 C5 C4 A 1 B C D E
14
Examples Not laminar laminar A 1 B C D E A 1 B C D E
15
Efficient Implementation(3)
3. When the matrix is laminar, the tree edges corresponding to characters are defined by the backwards links in the matrix. remaining edges and leaves are determined by the characters of each object. Needs O(mn) time. C2 C1 C3 C5 C4 A 1 B C D E C2 C3 C1 C4 E D B C5 A C
16
A scenario where Maximum Parsimony (and Perfect Phylogeny) are misleading
Consider a model with 4 letters (DNA), where the probability for a substitution is proportional to time. 1 4 In the following topology, 2 and 3 are likely to be as the origin, but 4 and 5 are likely to be different. In this case, Maximum Parsimony principle may be useless or misleading. A A 2 3 A
17
Parsimony may be useless/misleading
I Uninformative A G A II Uninformative C III Uninformative Assume the (likely) scenario where leaves 2 and 3 are the same. There are 4 combinations of substitution for leaves 1,4. In the first three, all three topologies will obtain the same parsimony score. G A 1 4 3 2 IV Misinformative In the fourth, a wrong topology will score best
18
Case I Parsimony is Useless
1 4 A A 2 3 A 1 4 2 3 A 1 2 3 4 A 1 3 2 4 A Score=0 Score=0 Score=0
19
Case II Parsimony is Useless
G 1 4 A A 2 3 A 1 4 2 3 A G 1 2 3 4 A G 1 3 2 4 A G Score=1 Score=1 Score=1
20
Case III Parsimony is useless
G 1 4 A A 2 3 A 1 4 2 3 A G C 1 2 3 4 A C G 1 3 2 4 A C G Score=2 Score=2 Score=2
21
Case III Parsimony is misleading
1 4 A A 2 3 A 1 4 2 3 A C 1 2 3 4 A C 1 3 2 4 A C Score=2 Score=2 Score=1
22
Parsimony is correct only in rare cases
1 4 3 2 Will infer correctly only in the rare case of a change on the central edge, or A C 1 4 3 2 In an even more rare case of a parallel change from A to C on the pendant edges to 1 and 2.
23
3. Maximum Likelihood Approach
Consider the phylogenetic tree to be a stochastic process. AGA GGA AAA AAG A simple model assumes that in each edge, likelihood of transition from character a to charcter b is given by parameters b|a . The liklihood of a letter a in the root is qa. Given the complete tree, its probability is defined by the values of the b|a ‘s and the qa’s.
24
Maximum Likelihood Approach(2)
When the data consists only of the leaves sequences (but the topology is fixed): AGA GGA AAA AAG Write down the likelihood of the data (leaves sequences) given the tree. Use EM to estimate the b|a parameters. When the tree is not given: Search for the tree that maximizes Prob(data|Tree, EM)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.