Presentation is loading. Please wait.

Presentation is loading. Please wait.

Perfect Phylogeny Tutorial #10

Similar presentations


Presentation on theme: "Perfect Phylogeny Tutorial #10"— Presentation transcript:

1 Perfect Phylogeny Tutorial #10
© Ilan Gronau Original slides by Shlomo Moran .

2 Perfect Phylogeny The underlying model:
A character-vector is given for every specie in S. Each character represents some observable trait. Each character takes values from a finite set. Basic Underlying Assumption: characters are homoplasy free.

3 Homoplasy-Free Characters
no reversals Homoplasy-free characters induce a convex coloring of the phylogenetic tree The Perfect Phylogeny Problem: Given character-vectors for S, find: a phylogenetic tree T over S. (S is the leaf-set of T) convex character assignments to all vertices of T. no convergence If exists ! This problem is generally NP-hard !

4 Directed Binary Perfect Phylogeny
Directed binary characters: 0 – property exists 1 – property doesn’t exist Initially (at the root) all propertied do not exist. Input: binary coloring (C1,…,Cm) of a set S (nxm binary matrix M) Problem: Find a phylogenetic tree T over S (if one exists), s.t. For j=1,…,m, the partial coloring induced by Cj is convex in T. The root has state 0 in all characters. We will present a polynomial-time solution

5 Example Input: Possible output: A 1 B C D E zero-root m characters C1
B C D E (00000) (11000) (01000) (00100) A E D C B C2 n species C3 (11000) (00100) (01000) (00110) (11001)

6 An Important Observation
A tree is a directed perfect phylogeny for a given 0/1 matrix iff we can map each character to an edge/vertex on which this character was “turned on”. Example: C1 C2 C3 C4 C5 A 1 B C D E A E D C B C3 C2 origin of C2 C1 C4 C5

7 Laminar Matrices Definitions:
Oj – set of objects that have character Cj (Oj={i : Mij=1}). A collection of sets {S1 ,…, Sk} is laminar if for all i, j, either Si and Sj are disjoint, or one includes the other. Theorem: A binary matrix M has a perfect phylogenetic tree iff the collection {O1 ,…, Om} is laminar. Laminar Not Laminar C1 C2 C3 C4 C5 A 1 B C D E C1 C2 C3 C4 C5 A 1 B C D E

8 Proof of Theorem  Assume M has a perfect phylogeny.
Consider the edges labeled Ci and Cj: If there is a root-to-leaf path containing both edges (C1,C2 below), then Oi includes Oj or vice-versa. Otherwise, Oi and Oj are disjoint (C1,C3 below). C3 C2 C1 C4 E D B C5 A C

9 Proof of Theorem (cont)
 Assume that the collection {O1 ,…, Ok} is laminar. We prove by induction on the number of characters k that M has a perfect phylogenetic tree. Basis: one character. There are at most two (distinct) objects, one with and one without this character. C1 A 1 B C1 A B root

10 Proof of Theorem (cont)
Assume that the collection {O1 ,…, Ok} is laminar. Induction step: assume correctness for n-1 characters. Consider a matrix with n characters (non-zero columns), and assume WLOG that O1 is not contained in Oj for all j > 1. Partition the elements into two sets S1 = O1 and S2 = S\O1. Claim: Any two elements s1 (from S1), s2 (from S2) do not share any character. By induction there are trees T1 and T2 for S1 and S2. why is this? C1 C2 C3 C4 C5 A 1 B C D E C1 S1 ={A,C,E} S2 ={B,D} T1 T2

11 Efficient Implementation
Sort the columns (characters) according to decreasing binary value. Claim: If the binary value of column i is larger than that of column j, then Oi is not a proper subset of Oj. Proof: Ci > Cj means the 1’s in Ci are not covered by the 1’s in Cj. C1 C2 C3 C4 C5 A 1 B C D E C2 C1 C3 C5 C4 A 1 B C D E

12 Efficient Implementation (cont)
Make a backwards linked list of the 1’s in each row Claim: If the columns are sorted, then the set of columns is laminar iff for each column i, all the links leaving column i point at the same column.  If the matrix is laminar then these pointers define the inclusion hierarchy why is this? C2 C1 C3 C5 C4 A 1 B C D E C2 C1 C3 C5 C4 A 1 B C D E

13 Efficient Implementation (cont)
If the matrix is laminar, compute the inclusion hierarchy Reconstruct topology of the phylogenetic tree and ancestral character states (11000) (00100) (01000) (00110) (11001) (00000) (10000) A E D C B C4 C3 C5 C1 C2 C2 C1 C3 C5 C4 A 1 B C D E C5 C1 C2 C4 C3

14 Efficient Implementation - Summary
Sort the columns (characters) according to decreasing binary value. Make a backwards linked list of the 1’s in each row If the matrix is laminar, compute the inclusion hierarchy Reconstruct topology of the phylogenetic tree and ancestral character states Complexity: O(mn) – use radix (bucket) sort in stage 1. C1 C2 C3 C4 C5 A 1 B C D E C2 C1 C3 C5 C4 A 1 B C D E


Download ppt "Perfect Phylogeny Tutorial #10"

Similar presentations


Ads by Google