Approximate Inference by Sampling

Slides:



Advertisements
Similar presentations
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Advertisements

Lirong Xia Hidden Markov Models Tue, March 28, 2014.
Markov Networks.
Exact Inference (Last Class) variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)
Bayesian network inference
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
Genome evolution: a sequence-centric approach Lecture 4: Beyond Trees. Inference by sampling Pre-lecture draft – update your copy after the lecture!
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Announcements Homework 8 is out Final Contest (Optional)
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
Approximate Inference 2: Monte Carlo Markov Chain
1 Approximate Inference 2: Importance Sampling. (Unnormalized) Importance Sampling.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Monte Carlo Methods for Probabilistic Inference.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
Exact Inference (Last Class) Variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Belief Propagation and its Generalizations Shane Oldenburger.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Inference Algorithms for Bayes Networks
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Bayesian Belief Propagation for Image Understanding David Rosenberg.
UIUC CS 497: Section EA Lecture #7 Reasoning in Artificial Intelligence Professor: Eyal Amir Spring Semester 2004 (Based on slides by Gal Elidan (Hebrew.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
CS b553: Algorithms for Optimization and Learning
Today.
Approximate Inference
Approximate Inference Methods
Artificial Intelligence
CS 4/527: Artificial Intelligence
Prof. Adriana Kovashka University of Pittsburgh April 4, 2017
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
CAP 5636 – Advanced Artificial Intelligence
Markov Networks.
Hidden Markov Models Part 2: Algorithms
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CS 188: Artificial Intelligence
Class #19 – Tuesday, November 3
Simple Sampling Sampling Methods Inference Probabilistic Graphical
CS 188: Artificial Intelligence Fall 2008
Class #16 – Tuesday, October 26
Parameter Learning 2 Structure Learning 1: The good
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Inference III: Approximate Inference
Hidden Markov Models Lirong Xia.
Unifying Variational and GBP Learning Parameters of MNs EM for BNs
BN Semantics 3 – Now it’s personal! Parameter Learning 1
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
Approximate Inference: Particle-Based Methods
Markov Networks.
Variable Elimination Graphical Models – Carlos Guestrin
Mean Field and Variational Methods Loopy Belief Propagation
Generalized Belief Propagation
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

Approximate Inference by Sampling Readings: K&F: 10.1, 10.2, 10.3 Approximate Inference by Sampling Graphical Models – 10708 Ajit Singh Carnegie Mellon University November 3rd, 2006

Approximate inference overview There are many many many many approximate inference algorithms for PGMs We will focus on three representative ones: sampling - today variational inference - continues next class loopy belief propagation and generalized belief propagation 10-708 – Ajit Singh 2006

Goal Often we want expectations given samples x[1] … x[m] from a distribution P. 10-708 – Ajit Singh 2006

Forward Sampling Sample nodes in topological order Generating samples from a joint distribution Generate topological ordering Sampling according to the ordering Sampling from a multinomial (next slide) 10-708 – Ajit Singh 2006

Forward Sampling P(Y = y) = #(Y = y) / N P(Y = y | E = e) = #(Y = y, E = e) / #(E = e) Rejection sampling: throw away samples that do not match the evidence. Sample efficiency How often do we expect to see a record with E = e ? 10-708 – Ajit Singh 2006

Idea What is we just fix the value of evidence nodes ? What is expected number of records with (Intelligence = Low) ? 10-708 – Ajit Singh 2006

Likelihood Weighting In forward sampling, we get D = Easy I = Low What is xi[m] What is w[m] Write the equation for likelihood wrighting P(y|e) = Sum_m w[m] * Ind(y[m] = y) / Sum_m w[m] A weighted average of the samples. W[m] correct for the fact Replaces sampling (in forward sampling) with exact computation --> Lower variance What if P(E = e) is very low ? w[m] is small FINISH - no justification by itself A special case of importance sampling 10-708 – Ajit Singh 2006

Importance Sampling What if you cannot easily sample ? Posterior distribution on a Bayesian network P(Y = y | E = e) where the evidence itself is a rare event. Sampling from a Markov network with cycles is always hard See homework 4 Pick some distribution Q(X) that is easier to sample from. Assume that if P(x) > 0 then Q(x) > 0 Hopefully D(P||Q) is small 10-708 – Ajit Singh 2006

Importance Sampling Unnormalized Importance Sampling Unbiased estimator How to choose Q when P is a Bayesian network ? What happens if the proposal distribution is bad ? 10-708 – Ajit Singh 2006

Mutilated BN Proposal Generating a proposal distribution for a Bayesian network Evidenced nodes have no parents. Each evidence node Zi = zi has distribution P(Zi = zi) = 1 Equivalent to likelihood weighting This is equivalent to likelihood weighting 10-708 – Ajit Singh 2006

Forward Sampling Approaches Forward sampling, rejection sampling, and likelihood weighting are all forward samplers Requires a topological ordering. This limits us to Bayesian networks Tree Markov networks Unnormalized importance sampling can be done on cyclic Markov networks, but it is expensive See homework 4 Limitation Fixing an evidence node only allows it to directly affect its descendents. 10-708 – Ajit Singh 2006

Scratch space 10-708 – Ajit Singh 2006

Markov Blanket Approaches Forward Samplers: Compute weight of Xi given assignment to ancestors in topological ordering Markov Blanket Samplers: Compute weight of Xi given assignment to its Markov Blanket. 10-708 – Ajit Singh 2006

Markov Blanket Samplers Works on any type of graphical model covered in the course thus far. Draw a spin glass / Potts model Remind them what the Markov blanket on a grid is The weight in a MB Sampler is P(Xi | MB(Xi)) 10-708 – Ajit Singh 2006

Gibbs Sampling Let X be the non-evidence variables Generate an initial assignment (0) For t = 1..T (t) = (t-1) For each Xi in X ui = Value of variables X - {Xi} in sample (t) Compute P(Xi | ui) Sample xi(t) from P(Xi | ui) Set the value of Xi = xi(t) in (t) Samples are taken from (0) … (T) Key is P(Xi | ui) Talk about burn in later 10-708 – Ajit Singh 2006

Computing P(Xi | ui) The major task in designing a Gibbs sampler is deriving P(Xi | ui) Use conditional independence Xi  Xj | MB(Xi) for all Xj in X - MB(Xi) - {Xi} P(X|Y = y) = P(Y|X = x) = 10-708 – Ajit Singh 2006

Pairwise Markov Random Field 10-708 – Ajit Singh 2006

Markov Chain Interpretation The state space consists of assignments to X. P(xi | ui) are the transition probability (neighboring states differ only in one variable) Given the transition matrix you could compute the exact stationary distribution Typically impossible to store the transition matrix. Gibbs does not need to store the transition matrix ! 10-708 – Ajit Singh 2006

Scratch space 10-708 – Ajit Singh 2006

Convergence Not all samples (0) … (T) are independent. Consider one marginal P(xi|ui). Burn-in Thinning 10-708 – Ajit Singh 2006

MAP by Sampling Generate a few samples from the posterior For each Xi the MAP is the majority assignment 10-708 – Ajit Singh 2006

What you need to know Forward sampling approaches Forward Sampling / Rejection Sampling Generate samples from P(X) or P(X|e) Likelihood Weighting / Importance Sampling Sampling where the evidence is rare Fixing variables lowers variance of samples when compared to rejection sampling. Useful on Bayesian networks & tree Markov networks Markov blanket approaches Gibbs Sampling Works on any graphical model where we can sample from P(Xi | rest). Markov chain interpretation. Samples are independent when the Markov chain converges. Convergence heuristics, burn-in, thinning. 10-708 – Ajit Singh 2006