Reconstruction on trees and Phylogeny 1

Slides:



Advertisements
Similar presentations
Routing Complexity of Faulty Networks Omer Angel Itai Benjamini Eran Ofek Udi Wieder The Weizmann Institute of Science.
Advertisements

6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 19.
Graph Isomorphism Algorithms and networks. Graph Isomorphism 2 Today Graph isomorphism: definition Complexity: isomorphism completeness The refinement.
Approximation Algorithms for Unique Games Luca Trevisan Slides by Avi Eyal.
CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep
Molecular Evolution Revised 29/12/06
. Perfect Phylogeny Tutorial #11 © Ilan Gronau Original slides by Shlomo Moran.
Parametric Inference.
Perfect Phylogeny MLE for Phylogeny Lecture 14
Estimating Evolutionary Distances from DNA Sequences Lecture 14 ©Shlomo Moran, parts based on Ilan Gronau.
Finding a maximum independent set in a sparse random graph Uriel Feige and Eran Ofek.
Mixing Times of Markov Chains for Self-Organizing Lists and Biased Permutations Prateek Bhakta, Sarah Miracle, Dana Randall and Amanda Streib.
Mixing Times of Self-Organizing Lists and Biased Permutations Sarah Miracle Georgia Institute of Technology.
cover times, blanket times, and majorizing measures Jian Ding U. C. Berkeley James R. Lee University of Washington Yuval Peres Microsoft Research TexPoint.
Proving Non-Reconstruction on Trees by an Iterative Algorithm Elitza Maneva University of Barcelona joint work with N. Bhatnagar, Hebrew University.
Graph Reconstruction Conjecture. Proposed by S.M. Ulan & P.J. Kelly in 1941: The conjecture states that every graph with at least 3 vertices is reconstructible;
1 Steiner Tree Algorithms and Networks 2014/2015 Hans L. Bodlaender Johan M. M. van Rooij.
Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci & a couple of unrelated observations Elchanan Mossel, UC Berkeley Joint.
1 New Coins from old: Computing with unknown bias Elchanan Mossel, U.C. Berkeley
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
11/4/20151 Markovian Models of Genetic Inheritance Elchanan Mossel, U.C. Berkeley
1 Rainbow Decompositions Raphael Yuster University of Haifa Proc. Amer. Math. Soc. (2008), to appear.
CS774. Markov Random Field : Theory and Application Lecture 02
Cover times, blanket times, and the GFF Jian Ding Berkeley-Stanford-Chicago James R. Lee University of Washington Yuval Peres Microsoft Research.
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
Artur Czumaj DIMAP DIMAP (Centre for Discrete Maths and it Applications) Computer Science & Department of Computer Science University of Warwick Testing.
Unique Games Approximation Amit Weinstein Complexity Seminar, Fall 2006 Based on: “Near Optimal Algorithms for Unique Games" by M. Charikar, K. Makarychev,
2/1/20161 Markovian Models of Genetic Inheritance – Lecs 3,4 Correlation Decay and Phylogenetic Reconsruction Elchanan Mossel, U.C. Berkeley
Introduction to Graph Theory
NPC.
Discrete Probability on Graphs: Estimation, Reconstruction of & Optimization on Networks Elchanan Mossel UC Berkeley At: IPAM Mar 2007.
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
Shuffling by semi-random transpositions Elchanan Mossel, U.C. Berkeley Joint work with Yuval Peres and Alistair Sinclair.
6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 23.
Trees.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
Probabilistic Algorithms
Random walks on undirected graphs and a little bit about Markov Chains
Graph theory Definitions Trees, cycles, directed graphs.
From dense to sparse and back again: On testing graph properties (and some properties of Oded)
Character-Based Phylogeny Reconstruction
Path Coupling And Approximate Counting
Algorithms and networks
Spectral Clustering.
Lecture 18: Uniformity Testing Monotonicity Testing
Gibbs measures on trees
From Branching processes to Phylogeny: A history of reproduction
Glauber Dynamics on Trees and Hyperbolic Graphs
CS21 Decidability and Tractability
Computability and Complexity
Phase Transitions In Reconstruction Yuval Peres, U.C. Berkeley
Structural graph parameters Part 2: A hierarchy of parameters
Reconstruction on trees and Phylogeny 2
Phylognetic trees: What to look for and where? Lessons from Statistical Physics Elchanan Mossel, U.C. Berkeley and Microsoft.
Reconstruction on trees and Phylogeny 4
Randomized Algorithms CS648
Randomized Algorithms Markov Chains and Random Walks
Algorithms and networks
4.a The random waves model
Killing and Collapsing
On the effect of randomness on planted 3-coloring models
Reconstruction on trees and Phylogeny 3
Graphs and Algorithms (2MMD30)
Embedding Metrics into Geometric Spaces
Graph Algorithms: Shortest Path
Clustering.
Arun ganesh (UC BERKELEY)
Discrete Mathematics and its Applications Lecture 5 – Random graphs
Perfect Phylogeny Tutorial #10
Locality In Distributed Graph Algorithms
Presentation transcript:

Reconstruction on trees and Phylogeny 1 Elchanan Mossel, U.C. Berkeley mossel@stat.berkeley.edu, http://www.cs.berkeley.edu/~mossel/ Supported by Microsoft Research and the Miller Institute 9/18/2018

General plan Study stochastic process on bounded degree trees. Vertices of tree T are labeled by random variables. Interested in asymptotic problems where |T| ! 1. + + + + - + + + - - + - + + 9/18/2018

The reconstruction problem We discuss two related problems. In both, want to reconstruct/estimate unknown parameters from observations. The first is the “reconstruction problem”. Here we are given the tree and the values of the random variables at the leaves. Want to reconstruct the value of the random variable at a specific vertex (“root”). Algorithmically “easy” – but when does it “work”? ?? 9/18/2018

Phylogeny Here the tree is unknown. Given a sequence of collections of random variables at the leaves (“species”). Collections are i.i.d.! Want to reconstruct the tree (un-rooted). 9/18/2018

Phylogeny Algorithmically “hard”. 9/18/2018

Lecture plan Talk 1 [GW 19th cent.; M-Steel,2003] Introduction. The “random cluster” model – reconstruction. The random cluster model – phylogeny. Talk 2 [Hi77, EKPS2000, M1998,MSW2003] The Ising = CFN model – reconstruction. Talk 3 [M2003] The Ising = CFN model – phylogeny. Talk 4 [M2003, ¸ 2004] General Markov model. Open problems etc. + + + + - + + + - - + - + + 9/18/2018

Trees (3-)regular trees. Binary -- All internal degrees are 3 (bifurcating speciation; results valid if degrees are ¸ 3, or ¸ b+1). General trees. + + + + - + + + - - + - + + 9/18/2018

Trees In biology, all internal degrees ¸ 3. Given a set of species (labeled vertices) X, an X-tree is a tree which has X as the set of leaves. Two X-trees T1 and T2 are identical if there’s a graph isomorphism between T1 and T2 that is the identity map on X. u u Me’ v Me’ Me’’ Me’’ w w d a c b d a b c c a b d 9/18/2018

The “random cluster” model Infinite set A of colors. “real life” – large |A|; e.g. gene order. Defined on an un-rooted tree T=(V,E). Edge e has (non-mutation) probability (e). Character: Perform percolation – edge e open with probability (e). All the vertices v in the same open-cluster have the same color v. Different clusters get different colors. This is the “random cluster” model. 9/18/2018

Galton-Watson 9/18/2018

Galton-Watson Theorem For the random cluster model on a rooted binary tree. If (e) > ½ +  for all e, then for all v 2 T, with probability at least s() = 2  / (½ + )2, there exists u 2 T (below v), with (v) = (u). If (e) < ½ -  for all e, then the probability that such u 2 T exists is at most 3 (1 – 2 )d(v, T) 9/18/2018

Reconstruction on random clusters For the random cluster model on a rooted binary tree. If (e) > ½ +  for all e, then for all v 2 T, we may reconstruct (v) with probability ¸ (½ + )2s2(e) from T (below v). Proof: v If (e) < ½ -  for all e, then the probability of reconstructing (v) is · 3 (1 – 2 )d(v, T). Proof: True even given more info (open/closed edges). 9/18/2018

Phylogeny from log characters for R.C. Th1[M-Steel,2003]: Suppose that T is an X-tree on n leaves and for all e, ½ +  < (e)< 1 - . Then k = (2 log n – log )/165 = O(log n - log ) characters suffice to reconstruct the topology with probability ¸ 1-. Colors of leaves Definition: A cherry is a pair of leaves at distance 2. Fact: Every X-tree has at least one cherry. 9/18/2018

Testing cherries If x,y is a cherry then there exist no characters  and leaves x’,y’ 2 T - {x,y} s.t. (x) = (x’)  (y) = (y’). x’ x y’ y If x,y is a not a cherry then for each character , the probability that 9 x’,y’ 2 T - {x,y} s.t. (x) = (x’)  (y) = (y’) is at least r =  s2 /16, where s() = 2  / (½+)2. x Repeating for k characters, we may find all cherries with error probability bounded by n2 (1-r)k . y’ x’ y 9/18/2018

From cherries to trees We wish to continue by replacing each cherry (u,v) by replacing the vertex w at distance 1 from v and u. Problem: We may not know what the color of w is. But: for each character , with probability at least (½ + )2s2(e) we can reconstruct (w) . Now we can repeat. u x w v y 9/18/2018

Poly. lower bound for R.C. Phylogeny Th1[M-Steel,2003]: Suppose that n=3 £ 2q and T is a uniformly chosen (q+1)-level 3-regular X-tree. For all e, (e)< , and < 1/2. Then in order to reconstruct the tree with probability ¸ 0.1, the number of characters must satisfy k ¸ (2)–q+1/100 = (n-log2()+1). Proof: Suffices to prove the same bound given the topology of the bottom q-L levels and the status of the edges there. 9/18/2018

Poly. lower bound for R.C. Phylogeny Proof: X=T ? ? L * k Known Known q-L * k If for all k characters random cluster “dies” in bottom q-L levels, then X is independent of the data. This happens with probability ¸ 1 –k 2L (2 )q-L. 9/18/2018

Phylogeny: Conjectures and results Statistical physics Phylogeny Binary tree in ordered phase conj k = O(log n) Binary tree unordered conj k = poly(n) Percolation critical  = 1/2 Random Cluster M-Steel2003 CFN M-2003 Ising model critical : 22 = 1 Sub-critical representation High mutation M-2003 Problems: How general? What is the critical point? (extremality vs. spectral) 9/18/2018