Download presentation
1
Based on the paper by D.Huson, S.Nettles, T.Warnow
Disk-Covering Method Orangutan Gorilla Chimpanzee Human From the Tree of the Life Website, University of Arizona Based on the paper by D.Huson, S.Nettles, T.Warnow Presented by Galiya S. , Eduard S.
2
Phylogenetic Tree From the Desert Vista high school, Phoenix, Arizona A phylogenetic tree is a tree showing the evolutionary interrelationships among various species.
3
Jukes-Cantor model site
Definition 1: Let T be a fixed rooted tree with leaves labeled 1,…,n. The Jukes-Cantor model makes the following assumptions: The possible states for each site are A,C,T,G. 2. The sequence length is an input parameter and for each site, the state at the root is drawn from a distribution (typically uniform). site AGACTT 3. The sites evolve identically and independently (i.i.d) down the tree from the root. GGACTT AGGCCT
4
Jukes-Cantor model (cont.)
4. For each edge with u the parent of v, if the state of a site is different at u than at v, then the probability that v has any state of the three remaining states is equal. GGGCAT AGCCCT GCACTT AGACTT GGACTT AGGCCT e u v The example above based on CIPRES ppt. University of Texas at Austin.
5
Jukes-Cantor model (cont.)
5. To each edge e in the tree T associated a Poisson random variable for the number of mutations of a randomly selected site on that edge. 6. Each edge has an expectancy , AGTCAC AGTCAG AGTCTG 3 Multiple changes at a single site – hidden changes: seq1 AGTCAG seq2 AGTCAC Number of changes: Seq1 T G C A Seq2 T A 2 1
6
Definition 2 split - Removing an edge e from an unrooted phylogenetic tree T partitions the leaf set S of the tree into two not empty sets. We denote it Example: 5 T: 1 e 4 2 3 S={1,2,3,4,5} Definition 2: T is the unrooted true tree, and T’ is the unrooted inferred tree, both with leaves labeled 1,…,n. e is internal edge. let define:
7
Definition 2 (cont.) T: T’: Example:
Any split is called a false negative (FN). Any split is called a false positive (PN). An edge is recovered in T’ if the split appears in Example: 5 e2 e1 1 2 3 4 FN T: e1 1 2 3 4 e2 5 FP T’:
8
Definition 2 (cont.) T: T’: FP rate: Example: FN rate: 5 1 e2 e1 FN 4
3 4 FN T’: FP FN=0.5=50% FP=0.5=50%
9
Additive matrix Definition 3: A matrix D is called additive if there exists a tree T with positive edge weighting w such that is the path in T between leaves i and j. Given an additive matrix D the tree T can be uniquely reconstruct in A dissimilarity matrix is a symmetric matrix that is 0 on the diagonal.
10
True distance remainder: Let T be the unrooted true tree.
is the path in T between leaves i and j. we represent the evolutionary process by a set of Poisson process. i Xe1 Xe2 Xe3 j Xij= Xe1+Xe2 +Xe3 is called the true distance between i and j. is an additive matrix.
11
Hamming Distance is the sequence length.
is the number of different sites between sequences i and j. is called the Hamming Distance. is the sequence length. is the normalized Hamming distance. Example: s1 CAACCCCGGT H(s1, s2) = 4 s2 TAATTTCGGT k = 10 h(s1, s2) = 4/10 = 0.4
12
distance correction Jukes-Cantor distance correction for each two leaves i, j is: If : Afterwards, compute the maximum Jukes-Cantor distance, multiply that value by the number n of leaves and replace all undefined values. Example: 3 TCAAG 4 TTGGA TTGCC 1 TGGCC 2 The 4 leaves are: The matrix d is: Replace * with * 0.778
13
The error Definition 7: Let be a real number. Then: and Example: q=3.2
1 3 1.2 1.5 2.8 3.1 1.1 0.2 0.4 0.3
14
Threshold Graph Let d be an dissimilarity matrix and let be any real number. The threshold graph Thresh(d,q) is defined as: Vertex set is {1,2,…,n }. The edges are: (i,j) is an edge if and only if q. For example: q = 4.5 Thresh(d,4.5): 2 4 1
15
Triangulated graph Definetion: A graph is triangulated if no subset of nodes induced a cycle of size four or more. Taken from wikipedia
16
Disk Covering Method A generic disk-covering method has four steps:
Decomposition: Compute a decomposition of the dataset into overlapping subsets. Solution: Construct trees on the subsets using a base method. Merge: Use a supertree method to merge the trees on the subsets into a tree on the full dataset. 4. Refinement: Compute the asymetric median tree of all posible supertrees. The example above based on CIPRES ppt. University of Texas at Austin.
17
Simplicial elimination order
Lemma: Simplicial elimination order is ordering of the vertices of G so the set Form a clique. Every triangulated graph G has a simplicial elimination ordering. The maximal clique in G are of the form This ordering can be found at So maximal cliques of G can be found at Example: 3 5 7 8
18
Constructing Tq input: d dissimilarity matrix, Real number q>0.
output: reconstructed tree, Tq. 1. Compute Thresh(d,q) 2. Triangulate Thresh(d,q) Polynomial Complexity 3. Compute Buneman Trees far all Maximal Cliques in Triangulated Thresh(d,q). 4. Merge subtrees into a supertree. Overall Complexity: Polynomial Complexity
19
Intersection graph Intersection graph is undirected graph formed by sets of sets of vertices: by choosing one vertex for each set and connecting two vertices when the corresponding sets have none empty intersection. Taken from wikipedia
20
Triangulaing Tresh(d,q) Complexity
Lemma: If d is an additive matrix, then Tresh(d,q) is triangulated. Proof: let d be an arbitrary additive matrix, and let (T,w) be the edge weighted tree associated uniquily to d. Let q > 0. Add intermediate vertices to the edges of T and re-weight the edges so that the path between leaf pair are unchanged, but for every pair of leaves u and v in T if then there is a node x in the enlarged tree T’ so that subtree of T’ tree T’
21
Triangulaing Tresh(d,q) Complexity
Now let denote the subtree of T’ of distance at most q/2 of u. Note that if only if , and so the Thresh(d,q) is identical to the intersection graph of the as u ranges over the leaves of T. Consecuntly Thresh(d,q) is triangulated. tree T Intersection Graph Thresh(d,q) Taken from wikipedia
22
Supertree Construction Algorithm (SCA)
Step 1 : First obtain a simplicial elemination ordering for G. Compute where For each Ci find a maximal clique C containing Ci and compute a tree ti for Ci by deleting the leaves in C-Ci form Tc. Step 2 : Construct tree for i = n-3,n-4,…,1 compute the tree Ti formed by merging ti and using Consensus Subtree Merger method Example: C: {1,2,3,4} C2: { 2,3,4} C-C2{ } left { 2,3,4}
23
Strict Consenseus Subtree Merger
1 2 3 7 4 1 2 3 4 6 5 This method contracts a minimum set of edges in each tree in order to make them identical on the subtree they induce, lets denote that subtree by X and call it the backbone. Merging two tree is done by attaching the pieces of each tree appropriately to the different edges of the backbone. The situatuion in which the some piece of each tree attaches onto the same edge of the backbone, called collision. 1 2 3 4 1 3 2 4 1 2 3 4 1 2 3 4 5 6 7
24
Short Quartet Definition
Let (T,w) be a binary tree edge weighted by , and leaf laled by the set of spieces. Let e be an edge in T that is not incident to a leaf of T. Aroun e there is four subtrees A,B,C,D. Let a,b,c,d be four laves of the subtrees A,B,C,D repectivly, closest to e.Where the distance between leaves i and j measured as We call {a,b,c,d} a short quartet around e. and the collection of all short quartets around internal nodes of T is denoted by subtree of B subtree of A subtree of D subtree of C d c b a e
25
Gsq Definition Let be the additive distance matrix associated to T.
The Graph Gsq on the vertex set S = {1,2,…,n} is defined by if i and j are in same short quatet Examples: T j j i i
26
Proof of Tq correctness
Theorem: Let T be a leaf-labeled tree, Let G be a triangulated graph such that . Let Be the collection of Buneman trees applied to on the maximal cliques of G and assume this collection reconstructs the correct subtree, and let T* be the tree obtained by applying SCA to (G, ). Then T*=T. Proof: We will show that under this conditions, Ti and the T restricted to the same vertices are identical and no collision occur. Part I: Let T be a tree whose leaves are labeled by Let G be a triangulated graph on S, and let where is a tree on leaf set A for every maximal clique A in G. Let be a simplicial elimination ordering of G. Let show that for every i Base: this is true since we assumed that all buneman trees are correct.
27
Proof of Tq correctness(Cont.)
Lets assume for some forms the leaf set of the back bone of the strict consensus merger of So we get Consequently there is no edge contraction when we compute the back bone. Part II: There can be a collision only if the backbone contains an edge onto which both and some other attach, denote this edge by e. Thus, some subtree t’ of Ti attached onto e. Let the leaf set of t’ by Let P be a path in T corresponding to edge e and let its endpoints be a and b. Let denote T0 be subtree of T obtained by deleting all the nodes in T that are separated from a by the deletion of b, and vice versa. Let be the leaves of T0. The following are true: and all leaves in t’ are also in restricted to is path connected. 3.
28
Proof of Tq correctness(Cont.)
Now, let P’ be a path lying in form to some node in Y. Let y be the first node in Y on the path P’. by (3) also lies entirely in so Consequently But this contradicts earlier assumption that
29
Experimental Results-Buneman
FN rate of DCM-Buneman is lower than Buneman for every sequnce length. FP rate of DCM-Buneman is slightly higher than Buneman 3% and 0% respectively FN rate of DCM-Buneman reaches 5% at 10,000 sequence length,Buneman doesn’t reach this value.
30
Experimental Results - NJ
FN and FP rates of DCM-NJ is significantly lower than NJ. DCM-NJ becomes lower then 5% at 250 sequence length. DCM-NJ can reconstruct the true tree at sequence beyond length of 900.
31
Distance Methods A distance matrix D is a symmetric, non-negative with zero diagonal. The goal is a phylogenetic tree T such that the distance between species in T approximate The distance in D. we now describe some distance methods.
32
Buneman Input: a dissimilarity matrix d. Output: tree T.
1. Topology on every four-leaf subset is inferred using Four-Point Method: Input – 4*4 dissimilarity matrix on i, j ,k, l. Output – if dij+dkl< min {dik+djl, dil+djk} then: The topology ij | kl (i, j are separated from k, l by an edge) is returned. if dij+dkl= min {dik+djl, dil+djk} then a star tree is returned. i l j k e ij | kl j i k l star
33
Buneman (cont.) A={1,2,3} B={4,5} Q: 1 5 2 4 1,2 | 4,5 3 1,3 | 4,5
Let Q be a set of four-leaf trees, defined by the FPM. The buneman tree is the maximally resolved tree satisfying: for all quartets i, j, k, l if T restricted to i, j, k, l induces a binary tree, then: the tree in Q in i, j, k, l is the same binary tree. Lemma 1: Let d be an input dissimilarity matrix. Let T be the buneman tree defined by d. Then C(T) is the set of splits (A, B) defined by: complexity: polynomial time. A={1,2,3} B={4,5} Q: 1 5 2 4 1,2 | 4,5 3 1,3 | 4,5 2,3 | 4,5 C(T)={(A,B)}
34
Neighbor - Joining Input: a distance matrix d.
Output: unrooted binary tree T. Algorithm Description: For every 2 species, it determines a score, based on the distance matrix. At each step the algorithm joins the pair with the minimum score: make a subtree whose root replaces the two chosen species in the matrix. The distance are recalculated to this new node. This is reapeted until only tree nodes remain. Finally, it connects the remaining two vertices with edge. complexity: polynomial time - o(n3)
35
THE END!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.