Presentation is loading. Please wait.

Presentation is loading. Please wait.

. Robustness to Noise in Distance-Based Phylogenetic Reconstruction Methods Tutorial #13 © Ilan Gronau.

Similar presentations


Presentation on theme: ". Robustness to Noise in Distance-Based Phylogenetic Reconstruction Methods Tutorial #13 © Ilan Gronau."— Presentation transcript:

1 . Robustness to Noise in Distance-Based Phylogenetic Reconstruction Methods Tutorial #13 © Ilan Gronau.

2 . Distance-Based Phylogenetic Reconstruction The distance-based approach: Estimate evolutionary distances between every two species. Reconstruct Phylogenetic tree (best) fitting the dissimilarity matrix. You saw in class: A phylogenetic tree is uniquely defined by its induced metric. (metrics which can be realized by some tree are called additive) There are efficient methods for reconstructing this tree. Today we try to see: Can you recover the topology of a tree from a noisy version of its additive metric?

3 . a b c d e f g h a b c d e f g h A phylogenetic tree is uniquely defined by its induced metric. How dense is this “space”? Can we tolerate some small noise? The Phylogenetic “Error-Correction Code” All matrices in a ball surrounding each additive metric uniquely define the topology of the tree. The radius of these balls depend on the weight of minimal edge in the “center tree”. T1T1 T2T2 T3T3 [Atteson ‘99]

4 . l ∞ norm (worst-case noise): A dissimilarity matrix D is near-additive if there is a binary tree T s.t. ||D,D T || ∞ < ½ *w min (T) Near-additive matrices uniquely define a tree topology. We will show how to reconstruct this topology (using DLCA). There is a simple example (on four-species trees) indicating you cannot tolerate more noise than ½ *w min (T). The Phylogenetic “Error-Correction Code” [Atteson ‘99] ||D,D T1 || ∞ = ½ *w min (T 1 ) ||D,D T2 || ∞ = ½ *w min (T 2 ) T1T1 T2T2 T 1 and T 2 have different topologies !

5 . Input: a dissimilarity matrix D over S. Output: A phylogenetic tree over S. a)Choose root r  S. b)Calculate LCA-depths from r : Stopping condition: if L=[w], return T = Otherwise: 1.Choose a ‘mutually deepest’ pair (i,j) ( L(i,j) = max k≠i { L(i,k) } = max k≠j { L(k,j) } ) 2.Replace i,j with new element v, and reduce L : L(v,v) = L(i,j) For k≠v, L(v,k) = αL(i,k) + (1-α)L(j,k) ( 0 ≤ α ≤ 1 ) 3.Recursively execute the algorithm on the reduced matrix 4.Add i,j as daughter nodes of v with edges of weight: w(v,i) = max{ 0, L(i,i) – L(i,j) } ; w(v,j) = max{ 0, L(j,j) – L(i,j) } r x w Deepest LCA Neighbor Joining convex reduction

6 . Sketch of consistency proof (shown in class): If D is additive, consistent with tree T, then L=LCA(D,r) contains the distances of all taxon-pair LCAs from r. A ‘mutually deepest’ taxon-pair (i,j) is a neighbor-pair (cherries). The reduction computes the ‘real’ LCA-depths corrsponding to v – the parent of (i,j). - L(v,v) = L(i,j). ( v is the LCA of i and j ). - for k≠v, L(v,k) = L(i,k) = L(j,k). Deepest LCA Neighbor Joining

7 . B C A E D 4 1 2 2 6 1 5 D is additive: Deepest LCA Neighbor Joining - Example D: ABCDE A 0107127 B 07149 C 0116 D 07 E 0 L: ABCD A 7331 B 3941 C 3461 D 1117 root B/C C B A/B/C ( B,C ) is the only mutually deepest pair. We can tolerate noise smaller than ±½. row maxima In general we can tolerate any noise which maintains the off-diagonal maximum in every row.

8 . Robustness of DLCA Theorem: If ||D,D T || ∞ < ½*w min (T), then the tree returned by DLCA on input D has the same topology as T. (for any selection of root) DTDT D Let L be the matrix calculated in stage (b) ( L = LCA(D,r) ). Let L T be the “true” LCA matrix ( L T = LCA(D T,r) ). 1.We show that L weakly preserves the order of each row in L T. ( L T (i,j)> L T (i,k)  L(i,j)> L(i,k) ) 2.We prove by induction that this implies that the recursive procedure outputs a tree with the same topology as T.

9 . Robustness of DLCA (cont) L T (i,j) > L T (i,k)  ½(D T (r,i)+D T (r,j)-D T (i,j)) > ½(D T (r,i)+D T (r,k)-D T (i,k))  D T (r,j)-D T (i,j)) > D T (r,k)-D T (i,k)  D T (r,j)+D T (i,k)) > D T (r,k)+D T (i,j)  D T (r,j)+D T (i,k)) ≥ D T (r,k)+D T (i,j)+2 * w min (T)  D(r,j)+D(i,k)) > D(r,k)+D(i,j)  D(r,j)-D(i,j)) > D(r,k)-D(i,k)  ½(D(r,i)+D(r,j)-D(i,j)) > ½(D(r,i)+D(r,k)-D(i,k))  L(i,j) > L(i,k) 1.If ||D,D T || ∞ L T (i,k)  L(i,j)> L(i,k) ) k r i j w ≥ w min (T) T : 4-point condition ||D,D T || ∞ < ½*w min (T)

10 . Robustness of DLCA (cont) 2.If L weakly preserves the order of each row in L T, then the recursive procedure returns a tree with the same topology as T. a)The pair (i’,j’) chosen in step (1) is a neighbor-pair in T. (i’,j’) is a mutually deepest pair in L  For every k≠i’,j’, max{L(i’,k), L(j’,k)} ≤ L(i’,j’)  For every k≠i’,j’, max{L T (i’,k), L T (j’,k)} ≤ L T (i’,j’)  i’ and j’ are neighbors in T. Assume: L T (i,j)> L T (i,k)  L(i,j)> L(i,k) shown in class Base case is immediate

11 . Robustness of DLCA (cont) 2.If L weakly preserves the order of each row in L T, then the recursive procedure returns a tree with the same topology as T. Assume: L T (i,j)> L T (i,k)  L(i,j)> L(i,k) a)The pair (i’,j’) chosen in step (1) is a neighbor-pair in T. b)The reduced matrix L’ calculated in step (2) weakly preserves the order of each row in the reduced L’ T. Assume L’ T (i,j)> L’ T (i,k). If i,j,k≠v (new vertex), then L’(i,j)> L’(i,k) by initial assumption. If i=v, then L’ T (v,j) =L T (i’,j) =L T (j’,j) and L’ T (v,k) =L T (i’,k) =L T (j’,k)  min{L T (i’,j), L T (j’,j)} > max{L T (i’,k), L T (j’,k)}  min{L(i’,j), L(j’,j)} > max{L(i’,k), L(j’,k)}  L’(v,j) > L’(v,k) Can be similarly shown when j=v or k=v. convex reduction

12 . Robustness of DLCA (cont) 2.If L weakly preserves the order of each row in L T, then the recursive procedure returns a tree with the same topology as T. Assume: L T (i,j)> L T (i,k)  L(i,j)> L(i,k) a)The pair (i’,j’) chosen in step (1) is a neighbor-pair in T. b)The reduced matrix L ’ calculated in step (2) weakly preserves the order of each row in the reduced L’ T. c)The induction hypothesis implies that the tree (over S\{i’,j’} U {v} ) returned by the recursive call in step (3) has the same topology as T (with i’,j’ replaced by v ). d)In step (4) we add i’ and j’ as sons of v and the resulting tree has the same topology as T. Q.E.D

13 . Robustness of Other Algorithms Many other algorithms also reconstruct the correct topology given near-additive input: Other neighbor joining algorithms: Saitou and Nei’s NJ, AddTree … All quartet-based algorithms. Atteson defines two reconstruction radii: An algorithm A has l ∞ -radius of ε iff it is guaranteed to return binary tree T given D s.t. ||D,D T || ∞ < ε *w min (T) An algorithm A has edge l ∞ -radius of ε iff it correctly reconstructs all edges in of weight > (1/ ε)* ||D,D T || ∞ edge l ∞ -radius ≤ l ∞ -radius ≤ ½

14 . Generalized Robustness An algorithm A has edge l ∞ -radius of ε iff it correctly reconstructs all edges in of weight > (1/ ε)* ||D,D T || ∞ DLCA has optimal edge l ∞ -radius of ½. NJ has edge l ∞ -radius of ¼.


Download ppt ". Robustness to Noise in Distance-Based Phylogenetic Reconstruction Methods Tutorial #13 © Ilan Gronau."

Similar presentations


Ads by Google