Markov chain Monte Carlo with people

Slides:



Advertisements
Similar presentations
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
Advertisements

Bayesian Estimation in MARK
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Markov-Chain Monte Carlo
Causes and coincidences Tom Griffiths Cognitive and Linguistic Sciences Brown University.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Lecture 3: Markov processes, master equation
1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Bayesian Reasoning: Markov Chain Monte Carlo
BAYESIAN INFERENCE Sampling techniques
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
Part IV: Monte Carlo and nonparametric Bayes. Outline Monte Carlo methods Nonparametric Bayesian models.
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman.
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences.
A Bayesian view of language evolution by iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana.
Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.
Particle filters (continued…). Recall Particle filters –Track state sequence x i given the measurements ( y 0, y 1, …., y i ) –Non-linear dynamics –Non-linear.
Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences.
Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky.
Markov chain Monte Carlo with people Tom Griffiths Department of Psychology Cognitive Science Program UC Berkeley with Mike Kalish, Stephan Lewandowsky,
Exploring cultural transmission by iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana With thanks to: Anu Asnaani, Brian.
Bayesian Analysis for Extreme Events Pao-Shin Chu and Xin Zhao Department of Meteorology School of Ocean & Earth Science & Technology University of Hawaii-
Analyzing iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana.
Normative models of human inductive inference Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley.
Optimal predictions in everyday cognition Tom Griffiths Josh Tenenbaum Brown University MIT Predicting the future Optimality and Bayesian inference Results.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.
A tutorial on Markov Chain Monte Carlo. Problem  g (x) dx I = If{X } form a Markov chain with stationary probability  i  I  g(x ) i  (x ) i  i=1.
A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.
Everyday inductive leaps Making predictions and detecting coincidences Tom Griffiths Department of Psychology Program in Cognitive Science University of.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Introduction to Sampling based inference and MCMC
MCMC Output & Metropolis-Hastings Algorithm Part I
Incorporating graph priors in Bayesian networks
Advanced Statistical Computing Fall 2016
Bayesian data analysis
Jun Liu Department of Statistics Stanford University
Omiros Papaspiliopoulos and Gareth O. Roberts
Analyzing cultural evolution by iterated learning
Bayesian inference Presented by Amir Hadadi
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
CSCI 5822 Probabilistic Models of Human and Machine Learning
Statistical Analysis Error Bars
Revealing priors on category structures through iterated learning
Ch13 Empirical Methods.
Bayesian Inference for Mixture Language Models
CS 188: Artificial Intelligence
Markov Chain Monte Carlo
A Switching Observer for Human Perceptual Estimation
Slides for Sampling from Posterior of Shape
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
A Switching Observer for Human Perceptual Estimation
Mathematical Foundations of BME Reza Shadmehr
Volume 23, Issue 21, Pages (November 2013)
Presentation transcript:

Markov chain Monte Carlo with people Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley

Two deep questions What are the biases that guide human learning? prior probability distribution P(h) What do mental representations look like? category distribution P(x|c)

Develop ways to sample from these distributions Two deep questions What are the biases that guide human learning? prior probability distribution on hypotheses, P(h) What do mental representations look like? distribution over objects x in category c, P(x|c) Develop ways to sample from these distributions

Outline Markov chain Monte Carlo Sampling from the prior (with Mike Kalish, Steve Lewandowsky, Saiwing Yeung) Sampling from category distributions (with Adam Sanborn)

Outline Markov chain Monte Carlo Sampling from the prior (with Mike Kalish, Steve Lewandowsky, Saiwing Yeung) Sampling from category distributions (with Adam Sanborn)

Markov chains Variables x(t+1) independent of history given x(t) Transition matrix T = P(x(t+1)|x(t)) Variables x(t+1) independent of history given x(t) Converges to a stationary distribution under easily checked conditions (i.e., if it is ergodic)

Markov chain Monte Carlo Sample from a target distribution P(x) by constructing Markov chain for which P(x) is the stationary distribution Two main schemes: Gibbs sampling Metropolis-Hastings algorithm

x-i = x1(t+1), x2(t+1),…, xi-1(t+1), xi+1(t), …, xn(t) Gibbs sampling For variables x = x1, x2, …, xn and target P(x) Draw xi(t+1) from P(xi|x-i) x-i = x1(t+1), x2(t+1),…, xi-1(t+1), xi+1(t), …, xn(t)

Gibbs sampling (MacKay, 2002)

Metropolis-Hastings algorithm (Metropolis et al Metropolis-Hastings algorithm (Metropolis et al., 1953; Hastings, 1970) Step 1: propose a state (we assume symmetrically) Q(x(t+1)|x(t)) = Q(x(t))|x(t+1)) Step 2: decide whether to accept, with probability Metropolis acceptance function Barker acceptance function

Metropolis-Hastings algorithm p(x)

Metropolis-Hastings algorithm p(x)

Metropolis-Hastings algorithm p(x)

Metropolis-Hastings algorithm p(x) A(x(t), x(t+1)) = 0.5

Metropolis-Hastings algorithm p(x)

Metropolis-Hastings algorithm p(x) A(x(t), x(t+1)) = 1

Outline Markov chain Monte Carlo Sampling from the prior (with Mike Kalish, Steve Lewandowsky, Saiwing Yeung) Sampling from category distributions (with Adam Sanborn)

Iterated learning (Kirby, 2001) What are the consequences of learners learning from other learners?

Analyzing iterated learning PL(h|d) PL(h|d) PP(d|h) PP(d|h) PL(h|d): probability of inferring hypothesis h from data d PP(d|h): probability of generating data d from hypothesis h

Iterated Bayesian learning PL(h|d) PL(h|d) PP(d|h) PP(d|h) Assume learners sample from their posterior distribution:

Analyzing iterated learning h1 d1 h2 PL(h|d) PP(d|h) d2 h3 d PP(d|h)PL(h|d) h1 h2 h3 A Markov chain on hypotheses d0 d1 h PL(h|d) PP(d|h) d2 A Markov chain on data

Stationary distributions Markov chain on h converges to the prior, P(h) Markov chain on d converges to the “prior predictive distribution” (Griffiths & Kalish, 2005)

Explaining convergence to the prior PL(h|d) PL(h|d) PP(d|h) PP(d|h) Intuitively: data acts once, prior many times Formally: iterated learning with Bayesian agents is a Gibbs sampler on P(d,h) (Griffiths & Kalish, 2007)

Serial reproduction (Bartlett, 1932) Participants see stimuli, then reproduce them from memory Reproductions of one participant are stimuli for the next Stimuli were interesting, rather than controlled e.g., “War of the Ghosts”

Iterated function learning Each learner sees a set of (x,y) pairs Makes predictions of y for new x values Predictions are data for the next learner data hypotheses (Kalish, Griffiths, & Lewandowsky, 2007)

Function learning experiments Stimulus Response Slider Feedback Examine iterated learning with different initial data

Initial data Iteration 1 2 3 4 5 6 7 8 9

Iterated predicting the future Each learner sees values of t Makes predictions of ttotal The next value of t is chosen from (0, ttotal) data hypotheses A movie has made $30 million so far $60 million total (Lewandowsky, Griffiths & Kalish, 2009)

Chains of predictions Movie grosses Poems ttotal ttotal Iteration (Lewandowsky, Griffiths & Kalish, 2009)

Stationary distributions (Lewandowsky, Griffiths & Kalish, 2009)

Iterated causal learning Each trial shows contingency data Elicits estimates of w0 and w1 Data on next trial sampled with these values data hypotheses C present (c+) C absent (c-) w0 w1 E present (e+) E absent (e-) (Yeung & Griffiths, in press)

Eliciting strength judgments Generative: “Of 100 people who did not express the effect, how many would express the effect in presence of the cause?” Preventive: “Of 100 people who expressed the effect, how many would express the effect in presence of the cause?” Model using (Lu, Yuille, Liljeholm, Cheng, & Holyoak, 2008)

Priors on causal strength (Lu, Yuille, Liljeholm, Cheng, & Holyoak, 2008) “Sparse and Strong”

Comparing models to data (Lu, Yuille, Liljeholm, Cheng, & Holyoak, 2008)

Experiment design (for each subject) four iterated learning chains one independent learning “chain” (Yeung & Griffiths, in press)

Sparse and Strong prior Empirical prior Sparse and Strong prior (Yeung & Griffiths, in press)

Comparing models to data (Yeung & Griffiths, in press)

Empirical fit vs. SS prior fit (Yeung & Griffiths, in press)

Other empirical results… Concept learning (Griffiths, Christian, & Kalish, 2008) Reproduction from memory (Xu & Griffiths, 2008) Estimating linguistic frequencies (Reali & Griffiths, 2009) Systems of color terms (Xu, Dowman, & Griffiths, 2010) “DUP”

Outline Markov chain Monte Carlo Sampling from the prior (with Mike Kalish, Steve Lewandowsky, Saiwing Yeung) Sampling from category distributions (with Adam Sanborn)

Sampling from categories Frog distribution P(x|c)

Ask subjects which of two alternatives comes from a target category A task Ask subjects which of two alternatives comes from a target category Which animal is a frog?

A Bayesian analysis of the task Assume:

Response probabilities If people probability match to the posterior, response probability is equivalent to the Barker acceptance function for target distribution p(x|c)

Collecting the samples Which is the frog? Which is the frog? Which is the frog? Trial 1 Trial 2 Trial 3

Verifying the method

Training Subjects were shown schematic fish of different sizes and trained on whether they came from the ocean (uniform) or a fish farm (Gaussian)

Between-subject conditions

Choice task Subjects judged which of the two fish came from the fish farm (Gaussian) distribution

Examples of subject MCMC chains

Estimates from all subjects Estimated means and standard deviations are significantly different across groups Estimated means are accurate, but standard deviation estimates are high result could be due to perceptual noise or response gain

Sampling from natural categories Examined distributions for four natural categories: giraffes, horses, cats, and dogs Presented stimuli with nine-parameter stick figures (Olman & Kersten, 2004)

Choice task

Samples from Subject 3 (projected onto plane from LDA)

Mean animals by subject giraffe horse cat dog

Marginal densities (aggregated across subjects) Giraffes are distinguished by neck length, body height and body tilt Horses are like giraffes, but with shorter bodies and nearly uniform necks Cats have longer tails than dogs

Relative volume of categories Convex Hull Minimum Enclosing Hypercube Convex hull content divided by enclosing hypercube content Giraffe Horse Cat Dog 0.00004 0.00006 0.00003 0.00002

Discrimination method (Olman & Kersten, 2004)

Parameter space for discrimination Restricted so that most random draws were animal-like

MCMC and discrimination means

Conclusion Markov chain Monte Carlo provides a way to sample from subjective probability distributions Many interesting questions can be framed in terms of subjective probability distributions inductive biases (priors) mental representations (category distributions) Other MCMC methods may provide further empirical methods… Gibbs for categories, adaptive MCMC, …