Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

Slides:



Advertisements
Similar presentations
Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
Advertisements

Bayesian Estimation in MARK
Gibbs sampling in open-universe stochastic languages Nimar S. Arora Rodrigo de Salvo Braz Erik Sudderth Stuart Russell.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Markov Chains 1.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Image Parsing: Unifying Segmentation and Detection Z. Tu, X. Chen, A.L. Yuille and S-C. Hz ICCV 2003 (Marr Prize) & IJCV 2005 Sanketh Shetty.
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
Announcements Homework 8 is out Final Contest (Optional)
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Bayesian Analysis for Extreme Events Pao-Shin Chu and Xin Zhao Department of Meteorology School of Ocean & Earth Science & Technology University of Hawaii-
Approximate Inference 2: Monte Carlo Markov Chain
Topics Combining probability and first-order logic –BLOG and DBLOG Learning very complex behaviors –ALisp: hierarchical RL with partial programs State.
Bayes Factor Based on Han and Carlin (2001, JASA).
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Priors, Normal Models, Computing Posteriors
First-Order Probabilistic Languages: Into the Unknown Brian Milch and Stuart Russell University of California at Berkeley, USA August 27, 2006 Based on.
1 First-Order Probabilistic Models Brian Milch 9.66: Computational Cognitive Science December 7, 2006.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
IJCAI 2003 Workshop on Learning Statistical Models from Relational Data First-Order Probabilistic Models for Information Extraction Advisor: Hsin-His Chen.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Unknown Objects and BLOG Brian Milch MIT IPAM Summer School July 16, 2007.
Integrating Topics and Syntax -Thomas L
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
BLOG: Probabilistic Models with Unknown Objects Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, Andrey Kolobov University of.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
1 Tree Crown Extraction Using Marked Point Processes Guillaume Perrin Xavier Descombes – Josiane Zerubia ARIANA, joint research group CNRS/INRIA/UNSA INRIA.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
BLOG: Probabilistic Models with Unknown Objects Brian Milch Harvard CS 282 November 29,
Learning and Structural Uncertainty in Relational Probability Models Brian Milch MIT 9.66 November 29, 2007.
Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Probabilistic Reasoning Inference and Relational Bayesian Networks.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Introduction to Sampling based inference and MCMC
MCMC Output & Metropolis-Hastings Algorithm Part I
Advanced Statistical Computing Fall 2016
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Outline Image Segmentation by Data-Driven Markov Chain Monte Carlo
Uncertainty in an Unknown World
CAP 5636 – Advanced Artificial Intelligence
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Machine learning, probabilistic modelling
CS 188: Artificial Intelligence
Gibbs sampling in open-universe stochastic languages
CS 188: Artificial Intelligence Fall 2008
Automatic Inference in PyBLOG
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
The open universe Stuart Russell
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007

2 S. Russel and P. Norvig (1995). Artificial Intelligence: A Modern Approach. Upper Saddle River, NJ: Prentice Hall. Example 1: Bibliographies Russell, Stuart and Norvig, Peter. Articial Intelligence. Prentice-Hall, Stuart Russell Peter Norvig Artificial Intelligence: A Modern Approach

3 (1.9, 6.1, 2.2) (0.6, 5.9, 3.2) Example 2: Aircraft Tracking t=1t=2t=3 (1.9, 9.0, 2.1) (0.7, 5.1, 3.2) (1.8, 7.4, 2.3) (0.9, 5.8, 3.1)

4 Inference on Relational Structures “Russell” “Roberts” “AI: A Mod...” “Rus...”“AI...”“AI: A...”“Rus...”“AI...”“AI: A...” “Rus...”“AI...”“AI: A...” “Rob...”“Adv...”“Rob...” “Shak...”“Haml...”“Wm...”“Seu...”“The...”“Seu...” “Russell” “Norvig” “AI: A Mod...” “Advance...” “Seuss” “The...” “If you...” “Shak...” “Hamlet” “Tempest” 1.2 x x x x x x

5 Markov Chain Monte Carlo (MCMC) Markov chain s 1, s 2,... over worlds where evidence E is true Approximate P(Q|E) as fraction of s 1, s 2,... that satisfy query Q E Q

6 Outline Probabilistic models for relational structures –Modeling the number of objects –Three mistakes that are easy to make Markov chain Monte Carlo (MCMC) –Gibbs sampling –Metropolis-Hastings –MCMC over events Case studies –Citation matching –Multi-target tracking

7 Simple Example: Clustering Wingspan (cm)  = 22  = 49  =

8 Simple Bayesian Mixture Model Number of latent objects is known to be k For each latent object i, have parameter: For each data point j, have object selector and observable value

9 BN for Mixture Model X1X1 X2X2 X3X3 XnXn C1C1 C2C2 C3C3 CnCn 11 22 kk … … …

10 Context-Specific Dependencies X1X1 X2X2 X3X3 XnXn C1C1 C2C2 C3C3 CnCn 11 22 kk … … … = 2= 1= 2

11 Extensions to Mixture Model Random number of latent objects k, with distribution p(k) such as: –Uniform({1, …, 100}) –Geometric(0.1) –Poisson(10) Random distribution  for selecting objects –p(  | k) ~ Dirichlet(  1,...,  k ) (Dirichlet: distribution over probability vectors) –Still symmetric: each  i =  /k unbounded!

12 Existence versus Observation A latent object can exist even if no observations correspond to it –Bird species may not be observed yet –Aircraft may fly over without yielding any blips Two questions: –How many objects correspond to observations? –How many objects are there in total? Observed 3 species, each 100 times: probably no more Observed 200 species, each 1 or 2 times: probably more exist

13 Expecting Additional Objects P(ever observe new species | seen r so far) bounded by P(k  r) So as # species observed  , probability of ever seeing more  0 What if we don’t want this? … … r observed species … observe more later?

14 Dirichlet Process Mixtures Set k = , let  be infinite-dimensional probability vector with stick-breaking prior Another view: Define prior directly on partitions of data points, allowing unbounded number of blocks Drawback: Can’t ask about number of unobserved latent objects (always infinite) 11 22 33 44 55 … [Ferguson 1983; Sethuraman 1994] [tutorials: Jordan 2005; Sudderth 2006]

15 Outline Probabilistic models for relational structures –Modeling the number of objects –Three mistakes that are easy to make Markov chain Monte Carlo (MCMC) –Gibbs sampling –Metropolis-Hastings –MCMC over events Case studies –Citation matching –Multi-target tracking

16 Mistake 1: Ignoring Interchangeability Which birds are in species S1? Latent object indices are interchangeable –Posterior on selector variable C B1 is uniform –Posterior on  S1 has a peak for each cluster of birds Really care about partition of observations Partition with r blocks corresponds to k! / (k-r)! instantiations of the C j variables B1B3B2B5B4 {{1, 3}, {2}, {4, 5}} (1, 2, 1, 3, 3), (1, 2, 1, 4, 4), (1, 4, 1, 3, 3), (2, 1, 2, 3, 3), …

17 Ignoring Interchangeability, Cont’d Say k = 4. What’s prior probability that B1, B3 are in one species, B2 in another? Multiply probabilities for C B1, C B2, C B3 : (1/4) x (1/4) x (1/4) Not enough! Partition {{B1, B3}, {B2}} corresponds to 12 instantiations of C’s Partition with r blocks corresponds to k P r instantiations (S1, S2, S1), (S1, S3, S1), (S1, S4, S1), (S2, S1, S2), (S2, S3, S2), (S2, S4, S2) (S3, S1, S3), (S3, S2, S3), (S3, S4, S3), (S4, S1, S4), (S4, S2, S4), (S4, S3, S4)

18 Mistake 2: Underestimating the Bayesian Ockham’s Razor Effect Say k = 4. Are B1 and B2 in same species? Maximum-likelihood estimation would yield one species with  = 50 and another with  = 52 But Bayesian model trades off likelihood against prior probability of getting those  values Wingspan (cm) X B1 =50X B2 =52

19 Bayesian Ockham’s Razor X B1 =50X B2 =52 H1: Partition is {{B1, B2}}  1.3 x H2: Partition is {{B1}, {B2}}  7.5 x = 0.01 Don’t use more latent objects than necessary to explain your data [MacKay 1992]

20 Mistake 3: Comparing Densities Across Dimensions Wingspan (cm) X B1 =50X B2 =52 H1: Partition is {{B1, B2}},  = 51 H2: Partition is {{B1}, {B2}},  B1 = 50,  B2 = 52  1.5 x  4.8 x H1 wins by greater margin

21 What If We Change the Units? Wingspan (m) X B1 =0.50X B2 =0.52 H1: Partition is {{B1, B2}},  = 0.51 H2: Partition is {{B1}, {B2}},  B1 = 0.50,  B2 = 0.52  15  48 density of Uniform(0, 1) is 1! Now H 2 wins by a landslide

22 Lesson: Comparing Densities Across Dimensions Densities don’t behave like probabilities (e.g., they can be greater than 1) Heights of density peaks in spaces of different dimension are not comparable Work-arounds: –Find most likely partition first, then most likely parameters given that partition –Find region in parameter space where most of the posterior probability mass lies

23 Outline Probabilistic models for relational structures –Modeling the number of objects –Three mistakes that are easy to make Markov chain Monte Carlo (MCMC) –Gibbs sampling –Metropolis-Hastings –MCMC over events Case studies –Citation matching –Multi-target tracking

24 Why Not Exact Inference? Number of possible partitions is superexponential in n Variable elimination? –Summing out  i couples all the C j ’s –Summing out C j couples all the  i ’s X1X1 X2X2 X3X3 XnXn C1C1 C2C2 C3C3 CnCn 11 22 kk … …

25 Markov Chain Monte Carlo (MCMC) Start in arbitrary state (possible world) s 1 satisfying evidence E Sample s 2, s 3,... according to transition kernel T(s i, s i+1 ), yielding Markov chain Approximate p(Q | E) by fraction of s 1, s 2, …, s L that are in Q E Q

26 Why a Markov Chain? Why use Markov chain rather than sampling independently? –Stochastic local search for high-probability s –Once we find such s, explore around it

27 Convergence Stationary distribution  is such that If chain is ergodic (can get to anywhere from anywhere*), then: –It has unique stationary distribution  –Fraction of s 1, s 2,..., s L in Q converges to  (Q) as L   We’ll design T so  (s) = p(s | E) * and it’s aperiodic

28 Gibbs Sampling Order non-evidence variables V 1,V 2,...,V m Given state s, sample from T as follows: –Let s = s –For i = 1 to m Sample v i from p(V i | s -i ) Let s = (s -i, V i = v i ) –Return s Theorem: stationary distribution is p(s | E) [Geman & Geman 1984] Conditional for V i given other vars in s

29 Conditional for V depends only on factors that contain v So condition on V’s Markov blanket mb(V): parents, children, and co-parents Gibbs on Bayesian Network V

30 Gibbs on Bayesian Mixture Model Given current state s: –Resample each  i given prior and {X j : C j = i in s} –Resample each C j given X j and  1:k X1X1 X2X2 X3X3 XnXn C1C1 C2C2 C3C3 CnCn 11 22 kk … … context-specific Markov blanket [Neal 2000]

31 Sampling Given Markov Blanket If V is discrete, just iterate over values, normalize, sample from discrete distrib. If V is continuous: –Simple if child distributions are conjugate to V’s prior: posterior has same form as prior with different parameters –In general, even sampling from p(v | s -V ) can be hard [See BUGS software:

32 Convergence Can Be Slow C j ’s won’t change until  2 is in right area  2 does unguided random walk as long as no observations are associated with it –Especially bad in high dimensions should be two clusters  1 = 20  2 = 90 species 2 is far away Wingspan (cm)

33 Outline Probabilistic models for relational structures –Modeling the number of objects –Three mistakes that are easy to make Markov chain Monte Carlo (MCMC) –Gibbs sampling –Metropolis-Hastings –MCMC over events Case studies –Citation matching –Multi-target tracking

34 Metropolis-Hastings Define T(s i, s i+1 ) as follows: –Sample s from proposal distribution q(s | s) –Compute acceptance probability –With probability , let s i+1 = s; else let s i+1 = s i relative posterior probabilities backward / forward proposal probabilities Can show that p(s | E) is stationary distribution for T [Metropolis et al. 1953; Hastings 1970]

35 Metropolis-Hastings Benefits –Proposal distribution can propose big steps involving several variables –Only need to compute ratio p(s | E) / p(s | E), ignoring normalization factors –Don’t need to sample from conditional distribs Limitations –Proposals must be reversible, else q(s | s) = 0 –Need to be able to compute q(s | s) / q(s | s)

36 Split-Merge Proposals Choose two observations i, j If C i = C j = c, then split cluster c –Get unused latent object c –For each observation m such that C m = c, change C m to c with probability 0.5 –Propose new values for  c,  c Else merge clusters c i and c j –For each m such that C m = c j, set C m = c i –Propose new value for  c [Jain & Neal 2004]

37 Split-Merge Example  1 = 20  2 = 90 Wingspan (cm)  2 = 27 Split two birds from species 1 Resample  2 to match these two birds Move is likely to be accepted

38 Mixtures of Kernels If T 1,…,T m all have stationary distribution , then so does mixture Example: Mixture of split-merge and Gibbs moves Point: Faster convergence

39 Outline Probabilistic models for relational structures –Modeling the number of objects –Three mistakes that are easy to make Markov chain Monte Carlo (MCMC) –Gibbs sampling –Metropolis-Hastings –MCMC over events Case studies –Citation matching –Multi-target tracking

40 MCMC States in Split-Merge Not complete instantiations! –No parameters for unobserved species States are partial instantiations of random variables –Each state corresponds to an event: set of outcomes satisfying description k = 12, C B1 = S2, C B2 = S8,  S2 = 31,  S8 = 84

41 MCMC over Events Markov chain over events , with stationary distrib. proportional to p(  ) Theorem: Fraction of visited events in Q converges to p(Q|E) if: –Each  is either subset of Q or disjoint from Q –Events form partition of E E Q [Milch & Russell 2006]

42 Computing Probabilities of Events Engine needs to compute p(  ) / p(  n ) efficiently (without summations) Use instantiations that include all active parents of the variables they instantiate Then probability is product of CPDs:

43 States That Are Even More Abstract Typical partial instantiation: –Specifies particular species numbers, even though species are interchangeable Let states be abstract partial instantiations: See [Milch & Russell 2006] for conditions under which we can compute probabilities of such events  x  y  x [k = 12, C B1 = x, C B2 = y,  x = 31,  y = 84] k = 12, C B1 = S2, C B2 = S8,  S2 = 31,  S8 = 84

44 Outline Probabilistic models for relational structures –Modeling the number of objects –Three mistakes that are easy to make Markov chain Monte Carlo (MCMC) –Gibbs sampling –Metropolis-Hastings –MCMC over events Case studies –Citation matching –Multi-target tracking

45 Representative Applications Tracking cars with cameras [Pasula et al. 1999] Segmentation in computer vision [Tu & Zhu 2002] Citation matching [Pasula et al. 2003] Multi-target tracking with radar [Oh et al. 2004]

46 Citation Matching Model #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper ~ NumPapersPrior(); FirstAuthor(p) ~ Uniform({Researcher r}); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c))); [Pasula et al. 2003; Milch & Russell 2006]

47 Citation Matching Elaboration of generative model shown earlier Parameter estimation –Priors for names, titles, citation formats learned offline from labeled data –String corruption parameters learned with Monte Carlo EM Inference –MCMC with split-merge proposals –Guided by “canopies” of similar citations –Accuracy stabilizes after ~20 minutes [Pasula et al., NIPS 2002]

48 Citation Matching Results Four data sets of ~ citations, referring to ~ papers

49 Cross-Citation Disambiguation Wauchope, K. Eucalyptus: Integrating Natural Language Input with a Graphical User Interface. NRL Report NRL/FR/ (1994). Is "Eucalyptus" part of the title, or is the author named K. Eucalyptus Wauchope? Kenneth Wauchope (1994). Eucalyptus: Integrating natural language input with a graphical user interface. NRL Report NRL/FR/ , Naval Research Laboratory, Washington, DC, 39pp. Second citation makes it clear how to parse the first one

50 Preliminary Experiments: Information Extraction P(citation text | title, author names) modeled with simple HMM For each paper: recover title, author surnames and given names Fraction whose attributes are recovered perfectly in last MCMC state: –among papers with one citation: 36.1% –among papers with multiple citations: 62.6% Can use inferred knowledge for disambiguation

51 Multi-Object Tracking False Detection Unobserved Object

52 State Estimation for “Aircraft” #Aircraft ~ NumAircraftPrior(); State(a, t) if t = 0 then ~ InitState() else ~ StateTransition(State(a, Pred(t))); #Blip(Source = a, Time = t) ~ NumDetectionsCPD(State(a, t)); #Blip(Time = t) ~ NumFalseAlarmsPrior(); ApparentPos(r) if (Source(r) = null) then ~ FalseAlarmDistrib() else ~ ObsCPD(State(Source(r), Time(r)));

53 Aircraft Entering and Exiting #Aircraft(EntryTime = t) ~ NumAircraftPrior(); Exits(a, t) if InFlight(a, t) then ~ Bernoulli(0.1); InFlight(a, t) if t < EntryTime(a) then = false elseif t = EntryTime(a) then = true else = (InFlight(a, Pred(t)) & !Exits(a, Pred(t))); State(a, t) if t = EntryTime(a) then ~ InitState() elseif InFlight(a, t) then ~ StateTransition(State(a, Pred(t))); #Blip(Source = a, Time = t) if InFlight(a, t) then ~ NumDetectionsCPD(State(a, t)); …plus last two statements from previous slide

54 MCMC for Aircraft Tracking Uses generative model from previous slide (although not with BLOG syntax) Examples of Metropolis-Hastings proposals: [Oh et al., CDC 2004] [Figures by Songhwai Oh]

55 Aircraft Tracking Results [Oh et al., CDC 2004] [Figures by Songhwai Oh] MCMC has smallest error, hardly degrades at all as tracks get dense MCMC is nearly as fast as greedy algorithm; much faster than MHT Estimation Error Running Time

56 Toward General-Purpose Inference Currently, each new application requires new code for: –Proposing moves –Representing MCMC states –Computing acceptance probabilities Goal: –User specifies model and proposal distribution –General-purpose code does the rest

57 General MCMC Engine Propose MCMC state s given s n Compute ratio q(s n | s) / q(s | s n ) Compute acceptance probability based on model Set s n+1 Define p(s) Custom proposal distribution (Java class) General-purpose engine (Java code) Model (in declarative language) MCMC states: partial worlds [Milch & Russell 2006] Handle arbitrary proposals efficiently using context-specific structure

58 Summary Models for relational structures go beyond standard probabilistic inference settings MCMC provides a feasible path for inference Open problems –More general inference –Adaptive MCMC –Integrating discriminative methods

59 References Blei, D. M. and Jordan, M. I. (2005) “Variational inference for Dirichlet process mixtures”. J. Bayesian Analysis 1(1): Casella, G. and Robert, C. P. (1996) “Rao-Blackwellisation of sampling schemes”. Biometrika 83(1): Ferguson T. S. (1983) “Bayesian density estimation by mixtures of normal distributions”. In Rizvi, M. H. et al., eds. Recent Advances in Statistics: Papers in Honor of Herman Chernoff on His Sixtieth Birthday. Academic Press, New York, pages Geman, S. and Geman, D. (1984) “Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images”. IEEE Trans. on Pattern Analysis and Machine Intelligence 6: Gilks, W. R., Thomas, A. and Spiegelhalter, D. J. (1994) “A language and program for complex Bayesian modelling”. The Statistician 43(1): Gilks, W. R., Richardson, S., and Spiegelhalter, D. J., eds. (1996) Markov Chain Monte Carlo in Practice. Chapman and Hall. Green, P. J. (1995) “Reversible jump Markov chain Monte Carlo computation and Bayesian model determination”. Biometrika 82(4):

60 References Hastings, W. K. (1970) “Monte Carlo sampling methods using Markov chains and their applications”. Biometrika 57: Jain, S. and Neal, R. M. (2004) “A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model”. J. Computational and Graphical Statistics 13(1): Jordan M. I. (2005) “Dirichlet processes, Chinese restaurant processes, and all that”. Tutorial at the NIPS Conference, available at MacKay D. J. C. (1992) “Bayesian Interpolation” Neural Computation 4(3): MacEachern, S. N. (1994) “Estimating normal means with a conjugate style Dirichlet process prior” Communications in Statistics: Simulation and Computation 23: Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953) “Equations of state calculations by fast computing machines”. J. Chemical Physics 21: Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D. L., and Kolobov, A. (2005) “BLOG: Probabilistic Models with Unknown Objects”. In Proc. 19th Int’l Joint Conf. on AI, pages Milch, B. and Russell, S. (2006) “General-purpose MCMC inference over relational structures”. In Proc. 22 nd Conf. on Uncertainty in AI, pages

61 References Neal, R. M. (2000) “Markov chain sampling methods for Dirichlet process mixture models”. J. Computational and Graphical Statistics 9: Oh, S., Russell, S. and Sastry, S. (2004) “Markov chain Monte Carlo data association for general multi-target tracking problems”. In Proc. 43 rd IEEE Conf. on Decision and Control, pages Pasula, H., Russell, S. J., Ostland, M., and Ritov, Y. (1999) “Tracking many objects with many sensors”. In Proc. 16 th Int’l Joint Conf. on AI, pages Pasula, H., Marthi, B., Milch, B., Russell, S., and Shpitser, I. (2003) “Identity uncertainty and citation matching”. In Advances in Neural Information Processing Systems 15, MIT Press, pages Richardson,, S. and Green, P. J. (1997) “On Bayesian analysis of mixtures with an unknown number of components”. J. Royal Statistical Society B 59: Sethuraman, J. (1994) “A constructive definition of Dirichlet priors”. Statistica Sinica 4: Sudderth, E. (2006) “Graphical models for visual object recognition and tracking”. Ph.D. thesis, Dept. of EECS, Massachusetts Institute of Technology, Cambridge, MA. Tu, Z. and Zhu, S.-C. (2002) “Image segmentation by data-driven Markov chain Monte Carlo”. IEEE Trans. Pattern Analysis and Machine Intelligence 24(5):