Genome evolution: a sequence-centric approach Lecture 4: Beyond Trees. Inference by sampling Pre-lecture draft – update your copy after the lecture!

Slides:

Advertisements

Similar presentations

Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.

Advertisements

Exact Inference in Bayes Nets

Bayesian Estimation in MARK

Dynamic Bayesian Networks (DBNs)

Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.

Gibbs Sampling Qianji Zheng Oct. 5th, 2010.

Markov Chains Modified by Longin Jan Latecki

Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.

CHAPTER 16 MARKOV CHAIN MONTE CARLO

Bayesian statistics – MCMC techniques

BAYESIAN INFERENCE Sampling techniques

Exact Inference (Last Class) variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.

Bayesian network inference

10/28 Temporal Probabilistic Models. Temporal (Sequential) Process A temporal process is the evolution of system state over time Often the system state.

Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:

Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.

Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.

Lecture 5: Learning models using EM

Machine Learning CUNY Graduate Center Lecture 7b: Sampling.

Genome evolution: a sequence-centric approach Lecture 5: Undirected models and variational inference.

Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.

Genome evolution: a sequence-centric approach Lecture 6: Belief propagation.

Genome Evolution. Amos Tanay 2009 Genome evolution Lecture 5: Inference through sampling. Basic phylogenetics.

Genome evolution: a sequence-centric approach Lecture 3: From Trees to HMMs.

Today Introduction to MCMC Particle filters and MCMC

Learning Bayesian Networks

. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.

1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.

Approximate Inference 2: Monte Carlo Markov Chain

Introduction to Monte Carlo Methods D.J.C. Mackay.

Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference.

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.

Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.

1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.

Exact Inference (Last Class) Variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)

Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.

CS Statistical Machine learning Lecture 24

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.

Lecture 2: Statistical learning primer for biologists

An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.

Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

Inference Algorithms for Bayes Networks

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.

HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),

CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.

Advanced Statistical Computing Fall 2016

Bayesian inference Presented by Amir Hadadi

Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.

Markov Networks.

Hidden Markov Models Part 2: Algorithms

Instructors: Fei Fang (This Lecture) and Dave Touretzky

Class #19 – Tuesday, November 3

CS 188: Artificial Intelligence Fall 2008

Expectation-Maximization & Belief Propagation

Class #16 – Tuesday, October 26

Approximate Inference by Sampling

Approximate Inference: Particle-Based Methods

Markov Networks.

Presentation transcript:

Genome evolution: a sequence-centric approach Lecture 4: Beyond Trees. Inference by sampling Pre-lecture draft – update your copy after the lecture!

Course outline Probabilistic models Inference Parameter estimation Genome structure Mutations Population Inferring Selection (Probability, Calculus/Matrix theory, some graph theory, some statistics) CT Markov Chains Simple Tree Models HMMs and variants Dynamic Programming EM

What we can do so far (not much..): Given a set of genomes (sequences), phylogeny Align them to generate a set of loci (not covered) Estimate a ML simple tree model (EM) Infer ancestral sequences posteriors Inferring a phylogeny is generally hard..but quite trivial given entire genomes that are evolutionary close to each other Multi alignment is quite difficult when ambiguous.. Again easy when genomes are similar EM is improving and converging Tend to stuck at local maxima Initial condition is critical For simple tree - not a real problem Inference is easy and accurate for trees

Loci independence does not make sense hijhij h i j+1 h i j-1 Flanking effects: Selection on codes hijhij h i j+1 h i j-1 h pai j h pai j+1 h pai j-1 h i j+2 h pai j+2 Regional effects: (CpG deamination) (G+C content) (Transcription factor binding sites)

Bayesian Networks Defining the joint probability for a set of random variables given: 1)a directed acyclic graph 2)Conditional probabilities Claim: The Up-Down algorithm is correct for trees Proof: Given a node, the distributions of the evidence on two subtrees are independent… Claim: Proof: we use a topological order on the graph (what is this?) Claim/Definition: In a Bayesian net, a node is independent of its non descendents given its parents (The Markov property for BNs) Definition: the descendents of a node X are those accessible from it via a directed path whiteboard/ exercise

Stochastic Processes and Stationary Distributions Stationary Model Process Model t

Dynamic Bayesian Networks Synchronous discrete time process T=1 T=2T=3 T=4 T=5 Conditional probabilities

Context dependent Markov Processes AAACAAGAA Context determines A markov process rate matrix Any dependency structure make sense, including loops AAA C When context is changing, computing probabilities is difficult. Think of the hidden variables as the trajectories Continuous time Bayesian Networks Koller-Noodleman

Modeling simple context in the tree: PhyloHMM Siepel-Haussler 2003 h pai j h i j-1 hijhij h pai j h i j-1 hijhij h i j+! h pai j+! h pai j-1 h k j-1 hkjhkj h k j+1 Heuristically approximating a CTBN? Where exactly it fails? whiteboard/ exercise

So why inference becomes hard (for real, not in worst case and even in a crude heuristic like phylo-hmm)? h pai j h i j-1 hijhij h i j+! h pai j+! h pai j-1 h k j-1 hkjhkj h k j+1 We know how to work out the chains or the trees Together the dependencies cannot be controlled (even given its parents, a path can be found from each node to everywhere.

General approaches to approximate inference Sampling: Variational methods: Generalized message passing: Marginal Probability (integration over all space) Marginal Probability (integration over A sample) P(h|s)Q1Q1 Q2Q2 Q3Q3 Q4Q4 Optimize q i Exact algorithms: (see Pearl 1988 and beyond) – out of the question in our case

Sampling from a BN Naively: If we could sample from Pr(h,s) then: Pr(s) ~ (#samples with s)/(# samples) Forward sampling: use a topological order on the network. Select a node whose parents are already determined sample from its conditional distribution whiteboard/ exercise How to sample from the CPD?

Focus on the observations Naïve sampling is terribly inefficient, why? whiteboard/ exercise A word on sampling error Why don’t we constraint the sampling to fit the evidence s? Two tasks: P(s) and P(f(h)|s), how to approach each/both? This can be done, but we no longer sample from P(h,s), and not from P(h|s) (why?)

Likelihood weighting Likelihood weighting: weight = 1 use a topological order on the network. Select a node whose parents are already determined if no evidence exists: sample from its conditional distribution else: weight *= P(x i |pax i ), add evidence to sample Report weight, sample Pr(h|s) = (total weights of sample with h)/(total weights) 789 Weight=

Generalizing likelihood weighting: Importance sampling f is any function (think 1(h i )) We will use a proposal distribution Q We should have Q(x)>0 whenever P(x)>0 Q should combine or approximate P and f, even if we cannot sample from P (imagine that you like to sample from P(h|s) to recover Pr(h i |s)).

Correctness of likelihood weighting: Importance sampling whiteboard/ exercise Sample: Unnormalized Importance sampling: So we sample with a weight: To minimize the variance, use a Q distribution is proportional to the target function: (Think of the variance of f=1 : We are left with the variance of w) whiteboard/ exercise f is any function (think 1(h i ))

Normalized Importance sampling whiteboard/ exercise Sample: Normalized Importance sampling: When sampling from P(h|s) we don’t know P, so cannot compute w=P/Q We do know P(h,s)=P(h|s)P(s)=P(h|s)  =P’(h) So we will use sampling to estimate both terms:

Normalized Importance sampling How many samples? Biased vs. unbiased estimator whiteboard/ exercise Compare to an ideal Sampler from P(h): The ratio represent how effective your sample was so far: Sampling from P(h|s) could generate posteriors quite rapidly If you estimate Var(w) you know how close your sample is to this ideal The variance of the normalized estimator (not proved):

Back to likelihood weighting: Our proposal distribution Q is defined by fixing the evidence and ignoring the CPDs of variable with evidence. It is like forward sampling from a network that eliminated all edges going into evidence nodes The weights are: The importance sampling machinery now translates to likelihood weighting: Unnormalized version to estimate P(s) Unnormalized version to estimate P(h|s) Normalized version to estimate P(h|s) Q forces s Q forces s and h i

Likelihood weighting is effective here: But not here: observed unobserved Limitations of forward sampling

Markov Chain Monte Carlo (MCMC) We don’t know how to sample from P(h)=P(h|s) (or any complex distribution for that matter) The idea: think of P(h|s) as the stationary distribution of a Reversible Markov chain Find a process with transition probabilities for which: Then sample a trajectory Theorem: (C a counter) Process must be irreducible (you can reach from anywhere to anywhere with p>0) (Start from anywhere!)

The Metropolis(-Hastings) Algorithm Why reversible? Because detailed balance makes it easy to define the stationary distribution in terms of the transitions So how can we find appropriate transition probabilities? We want: Define a proposal distribution: And acceptance probability: What is the big deal? we reduce the problem to computing ratios between P(x) and P(y) xy F

Acceptance ratio for a BN We must compute min(1,P(Y)/P(X)) (e.g. min(1, Pr(h’|s)/Pr(h|s)) But this usually quite easy since e.g., Pr(h’|s)=Pr(h,s)/Pr(s) We affected only the CPDs of h i and its children Definition: the minimal Markov blanket of a node in BN include its children, Parents and Children’s parents. To compute the ratio, we care only about the values of h i and its Markov Blanket For example, if the proposal distribution changes only one variable h i what would be the ratio? whiteboard/ exercise What is a markov blanket?

Gibbs sampling A very similar (in fact, special case of the metropolis algorithm): Start from any state h Iterate: Chose a variable H i Form h t+1 by sampling a new h i from Pr(h i |h t ) This is a reversible process with our target stationary distribution: Gibbs sampling easy to implement for BNs:

Sampling in practice How much time until convergence to P? (Burn-in time) Mixing Burn in Sample Consecutive samples are still correlated! Should we sample only every n-steps? We sample while fixing the evidence. Starting from anywere but waiting some time before starting to collect data A problematic space would be loosely connected: whiteboard/ exercise Examples for bad spaces