Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reconstruction on trees and Phylogeny 1

Similar presentations


Presentation on theme: "Reconstruction on trees and Phylogeny 1"— Presentation transcript:

1 Reconstruction on trees and Phylogeny 1
Elchanan Mossel, U.C. Berkeley Supported by Microsoft Research and the Miller Institute 9/18/2018

2 General plan Study stochastic process on bounded degree trees.
Vertices of tree T are labeled by random variables. Interested in asymptotic problems where |T| ! 1. + + + + - + + + - - + - + + 9/18/2018

3 The reconstruction problem
We discuss two related problems. In both, want to reconstruct/estimate unknown parameters from observations. The first is the “reconstruction problem”. Here we are given the tree and the values of the random variables at the leaves. Want to reconstruct the value of the random variable at a specific vertex (“root”). Algorithmically “easy” – but when does it “work”? ?? 9/18/2018

4 Phylogeny Here the tree is unknown.
Given a sequence of collections of random variables at the leaves (“species”). Collections are i.i.d.! Want to reconstruct the tree (un-rooted). 9/18/2018

5 Phylogeny Algorithmically “hard”. 9/18/2018

6 Lecture plan Talk 1 [GW 19th cent.; M-Steel,2003]
Introduction. The “random cluster” model – reconstruction. The random cluster model – phylogeny. Talk 2 [Hi77, EKPS2000, M1998,MSW2003] The Ising = CFN model – reconstruction. Talk 3 [M2003] The Ising = CFN model – phylogeny. Talk 4 [M2003, ¸ 2004] General Markov model. Open problems etc. + + + + - + + + - - + - + + 9/18/2018

7 Trees (3-)regular trees.
Binary -- All internal degrees are 3 (bifurcating speciation; results valid if degrees are ¸ 3, or ¸ b+1). General trees. + + + + - + + + - - + - + + 9/18/2018

8 Trees In biology, all internal degrees ¸ 3.
Given a set of species (labeled vertices) X, an X-tree is a tree which has X as the set of leaves. Two X-trees T1 and T2 are identical if there’s a graph isomorphism between T1 and T2 that is the identity map on X. u u Me’ v Me’ Me’’ Me’’ w w d a c b d a b c c a b d 9/18/2018

9 The “random cluster” model
Infinite set A of colors. “real life” – large |A|; e.g. gene order. Defined on an un-rooted tree T=(V,E). Edge e has (non-mutation) probability (e). Character: Perform percolation – edge e open with probability (e). All the vertices v in the same open-cluster have the same color v. Different clusters get different colors. This is the “random cluster” model. 9/18/2018

10 Galton-Watson 9/18/2018

11 Galton-Watson Theorem
For the random cluster model on a rooted binary tree. If (e) > ½ +  for all e, then for all v 2 T, with probability at least s() = 2  / (½ + )2, there exists u 2 T (below v), with (v) = (u). If (e) < ½ -  for all e, then the probability that such u 2 T exists is at most 3 (1 – 2 )d(v, T) 9/18/2018

12 Reconstruction on random clusters
For the random cluster model on a rooted binary tree. If (e) > ½ +  for all e, then for all v 2 T, we may reconstruct (v) with probability ¸ (½ + )2s2(e) from T (below v). Proof: v If (e) < ½ -  for all e, then the probability of reconstructing (v) is · 3 (1 – 2 )d(v, T). Proof: True even given more info (open/closed edges). 9/18/2018

13 Phylogeny from log characters for R.C.
Th1[M-Steel,2003]: Suppose that T is an X-tree on n leaves and for all e, ½ +  < (e)< 1 - . Then k = (2 log n – log )/165 = O(log n - log ) characters suffice to reconstruct the topology with probability ¸ 1-. Colors of leaves Definition: A cherry is a pair of leaves at distance 2. Fact: Every X-tree has at least one cherry. 9/18/2018

14 Testing cherries If x,y is a cherry then there exist no characters  and leaves x’,y’ 2 T - {x,y} s.t. (x) = (x’)  (y) = (y’). x’ x y’ y If x,y is a not a cherry then for each character , the probability that 9 x’,y’ 2 T - {x,y} s.t (x) = (x’)  (y) = (y’) is at least r =  s2 /16, where s() = 2  / (½+)2. x Repeating for k characters, we may find all cherries with error probability bounded by n2 (1-r)k . y’ x’ y 9/18/2018

15 From cherries to trees We wish to continue by replacing each cherry (u,v) by replacing the vertex w at distance 1 from v and u. Problem: We may not know what the color of w is. But: for each character , with probability at least (½ + )2s2(e) we can reconstruct (w) . Now we can repeat. u x w v y 9/18/2018

16 Poly. lower bound for R.C. Phylogeny
Th1[M-Steel,2003]: Suppose that n=3 £ 2q and T is a uniformly chosen (q+1)-level 3-regular X-tree. For all e, (e)< , and < 1/2. Then in order to reconstruct the tree with probability ¸ 0.1, the number of characters must satisfy k ¸ (2)–q+1/100 = (n-log2()+1). Proof: Suffices to prove the same bound given the topology of the bottom q-L levels and the status of the edges there. 9/18/2018

17 Poly. lower bound for R.C. Phylogeny
Proof: X=T ? ? L * k Known Known q-L * k If for all k characters random cluster “dies” in bottom q-L levels, then X is independent of the data. This happens with probability ¸ 1 –k 2L (2 )q-L. 9/18/2018

18 Phylogeny: Conjectures and results
Statistical physics Phylogeny Binary tree in ordered phase conj k = O(log n) Binary tree unordered conj k = poly(n) Percolation critical  = 1/2 Random Cluster M-Steel2003 CFN M-2003 Ising model critical : 22 = 1 Sub-critical representation High mutation M-2003 Problems: How general? What is the critical point? (extremality vs. spectral) 9/18/2018


Download ppt "Reconstruction on trees and Phylogeny 1"

Similar presentations


Ads by Google