Automatic Inference in PyBLOG

Slides:



Advertisements
Similar presentations
Topic models Source: Topic models, David Blei, MLSS 09.
Advertisements

Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
Ouyang Ruofei Topic Model Latent Dirichlet Allocation Ouyang Ruofei May LDA.
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Hierarchical Dirichlet Processes
Convex Point Estimation using Undirected Bayesian Transfer Hierarchies Gal Elidan Ben Packer Geremy Heitz Daphne Koller Stanford AI Lab.
Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,
Gibbs sampling in open-universe stochastic languages Nimar S. Arora Rodrigo de Salvo Braz Erik Sudderth Stuart Russell.
Automatic Inference in BLOG Nimar S. Arora University of California, Berkeley Stuart Russell University of California, Berkeley Erik Sudderth Brown University.
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.
Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference (Sec. )
Presenting: Assaf Tzabari
Using CTW as a language modeler in Dasher Phil Cowans, Martijn van Veen Inference Group Department of Physics University of Cambridge.
British Museum Library, London Picture Courtesy: flickr.
Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Special discrete distributions Sec
Announcements Homework 8 is out Final Contest (Optional)
Computer vision: models, learning and inference
Computer vision: models, learning and inference Chapter 3 Common probability distributions.
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.
(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
Integrating Topics and Syntax -Thomas L
Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.
Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.
4. Particle Filtering For DBLOG PF, regular BLOG inference in each particle Open-Universe State Estimation with DBLOG Rodrigo de Salvo Braz*, Erik Sudderth,
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
TEMPLATE DESIGN © Vertically Integrated Seismological Analysis II : Inference (S31B-1713) Nimar S. Arora, Stuart Russell,
BLOG: Probabilistic Models with Unknown Objects Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, Andrey Kolobov University of.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
TEMPLATE DESIGN © Vertically Integrated Seismological Analysis I : Modeling Nimar S. Arora, Michael I. Jordan, Stuart.
BLOG: Probabilistic Models with Unknown Objects Brian Milch Harvard CS 282 November 29,
TEMPLATE DESIGN © Approximate Inference Completing the analogy… Inferring Seismic Event Locations We start out with the.
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Oliver Schulte Machine Learning 726
Online Multiscale Dynamic Topic Models
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Appendix A: Probability Theory
Bayes Net Learning: Bayesian Approaches
Computer vision: models, learning and inference
CS 4/527: Artificial Intelligence
Exception Handling Chapter 9.
CAP 5636 – Advanced Artificial Intelligence
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Summarizing Data by Statistics
Bayesian Inference for Mixture Language Models
CS 188: Artificial Intelligence
Markov Chain Monte Carlo Limitations of the Model
Michal Rosen-Zvi University of California, Irvine
Gibbs sampling in open-universe stochastic languages
Probabilistic Databases
Topic Models in Text Processing
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
INF 141: Information Retrieval
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

Automatic Inference in PyBLOG Nimar Arora Rodrigo de Salvo Braz Erik Sudderth Stuart Russell

Outline Motivation Syntax Semantics Distribution Properties Example – LDA Results Inverting Deterministic Relations and Automatic Blocking Example – Simplified Citation Matching Conclusions

Motivation BLOG has three kinds of inference: Likelihood weighted sampling Sample each variable separately by proposing using its distribution (i.e. independent of likelihood) User provided proposers for all the variables in the model Need automatic efficient inference for: Variables with conjugate priors and likelihoods, or discrete variables with finite support, etc. Groups of variables with deterministic relations Need finer grained user specified proposers: For individual variables For small blocks of variables

Syntax y(0) c(0) y(1) y(2) y(3) c(1) c(2) c(3) mu(0) mu(1) z(4) Regular python code interspersed with special functions marked with “decorators”: @var_dist : declares a random variable and the return value of the function is the distribution of that random variable @var : declares a random variable whose value is the return value of the function Can query the posterior distribution of a random variable given observations on other random variable which have a distribution (except for special cases, c.f. blocking) Example: @var_dist def c(i): return Bernoulli(.5) def mu(k): if k==1: return Normal(100,1) else: return Normal(250,1) def y(i): return Normal(mu(c(i))) @var def z(n): return sum(c(i)==0 for i in range(n)) query([z(4)], [y(0)==200, y(1)==20, y(2)==150, y(3)==1000]) y(0) c(0) y(1) y(2) y(3) c(1) c(2) c(3) mu(0) mu(1) z(4) Observed variables Queried variables

Semantics Similar to BLOG Each invocation of a function marked with @var or @var_dist is a random variable. Example: c(0), c(1), c(2), … A world is an assignment of a fixed value to every possible random variable A PyBLOG program defines a distribution over these possible worlds In practice we only care about partial worlds

Distribution Properties A distribution must provide the following properties: Evaluate the density (or mass) at a point Sample a random value To enable Gibbs sampling the following properties are required Likelihood for distribution parameters. For example: Normal(10, x) has likelihood ScaledGamma(.5, 50) The support of finite, discrete variables, e.g. Bernoulli has support on [0,1] In addition, likelihoods should be “multipliable” and “normalizable”

Smoothed LDA (Blei, Ng and Jordan, 2003) h ¯ k w ® µ z N M

Smoothed LDA @var_dist def theta(d): return Dirichlet([alpha0/k for i in range(k)]) def z(d, i): return Categorical(theta(d)) def w(d, i): return Categorical(beta(z(d,i))) def beta(t): return Dirichlet ([eta0/V for i in range(V)]) k = 100 V = 11000 Conjugate prior-likelihoods Finite support Conjugate prior-likelihoods

LDA Results Memory Consumed ~ 5KB per evidence

Inverting Deterministic Relations and Automatic Blocking When a random variable which is a deterministic function of other random variables is given as evidence we need to block sample its parents and we can do this automatically if the deterministic function can be inverted For example, if Z = X + Y (string concatenation) and Z is given as evidence. We can propose (X,Y) as all partitions of Z.

Simplified Citation Matching @var_dist def numpubs(): return RoundedLogNormal(100, 1) def pubcited(c): return UniformInt(0, numpubs()) def format(c): return Bernoulli(.5) def author(p): return authdist def title(p): return titledist @var def citetext(c): p = pubcited(c) if format(c) == 0: return author(p) + title(p) else: return title(p) + author(p) Automatic blocking of pubcited(c), format(c), author(pubcited(c)), and title(pubcited(c)) when citetext(c) is given as evidence Invert the + operator

Conclusions Its easy for probabilistic languages to specify a probabilistic model. Its more interesting if efficient inference can be done without “much” user intervention In PyBLOG, for efficient inference user has to specify only a few properties of a distribution or likelihood function. Deterministic functions force special handling