Normative models of human inductive inference Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Bayes rule, priors and maximum a posteriori
Bayesian Estimation in MARK
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Markov-Chain Monte Carlo
Causes and coincidences Tom Griffiths Cognitive and Linguistic Sciences Brown University.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Bayesian Reasoning: Markov Chain Monte Carlo
BAYESIAN INFERENCE Sampling techniques
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
Part IV: Monte Carlo and nonparametric Bayes. Outline Monte Carlo methods Nonparametric Bayesian models.
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman.
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences.
Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.
A Bayesian view of language evolution by iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana.
Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.
Particle filters (continued…). Recall Particle filters –Track state sequence x i given the measurements ( y 0, y 1, …., y i ) –Non-linear dynamics –Non-linear.
Today Introduction to MCMC Particle filters and MCMC
Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences.
Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky.
Markov chain Monte Carlo with people Tom Griffiths Department of Psychology Cognitive Science Program UC Berkeley with Mike Kalish, Stephan Lewandowsky,
Exploring cultural transmission by iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana With thanks to: Anu Asnaani, Brian.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Part II: How to make a Bayesian model. Questions you can answer… What would an ideal learner or observer infer from these data? What are the effects of.
Bayesian Analysis for Extreme Events Pao-Shin Chu and Xin Zhao Department of Meteorology School of Ocean & Earth Science & Technology University of Hawaii-
Modeling fMRI data generated by overlapping cognitive processes with unknown onsets using Hidden Process Models Rebecca A. Hutchinson (1) Tom M. Mitchell.
Analyzing iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana.
Robin McDougall, Ed Waller and Scott Nokleby Faculties of Engineering & Applied Science and Energy Systems & Nuclear Science 1.
Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Introduction to Monte Carlo Methods D.J.C. Mackay.
Bayes Factor Based on Han and Carlin (2001, JASA).
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Optimal predictions in everyday cognition Tom Griffiths Josh Tenenbaum Brown University MIT Predicting the future Optimality and Bayesian inference Results.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Integrating Topics and Syntax -Thomas L
Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
Everyday inductive leaps Making predictions and detecting coincidences Tom Griffiths Department of Psychology Program in Cognitive Science University of.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Markov Chain Monte Carlo in R
MCMC Output & Metropolis-Hastings Algorithm Part I
Advanced Statistical Computing Fall 2016
Bayesian data analysis
Markov chain Monte Carlo with people
Jun Liu Department of Statistics Stanford University
Omiros Papaspiliopoulos and Gareth O. Roberts
Analyzing cultural evolution by iterated learning
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Revealing priors on category structures through iterated learning
Ch13 Empirical Methods.
LECTURE 07: BAYESIAN ESTIMATION
CS639: Data Management for Data Science
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Normative models of human inductive inference Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley

Perception is optimal Körding & Wolpert (2004)

Cognition is not

Optimality and cognition Can optimal solutions to computational problems shed light on human cognition?

Optimality and cognition Can optimal solutions to computational problems shed light on human cognition? Can we explain aspects of cognition as the result of sensitivity to natural statistics? What kind of representations are extracted from those statistics?

Optimality and cognition Can optimal solutions to computational problems shed light on human cognition? Can we explain aspects of cognition as the result of sensitivity to natural statistics? What kind of representations are extracted from those statistics? Joint work with Josh Tenenbaum

Natural statistics Images of natural scenes sparse coding (Olshausen & Field, 1996) Neural representation

Predicting the future How often is Google News updated? t = time since last update t total = time between updates What should we guess for t total given t?

Reverend Thomas Bayes

Bayes’ theorem Posterior probability LikelihoodPrior probability Sum over space of hypotheses h: hypothesis d: data

Bayes’ theorem h: hypothesis d: data

Bayesian inference p(t total |t)  p(t|t total ) p(t total ) posterior probability likelihoodprior

Bayesian inference p(t total |t)  p(t|t total ) p(t total ) p(t total |t)  1/t total p(t total ) assume random sample (0 < t < t total ) posterior probability likelihoodprior

The effects of priors

Evaluating human predictions Different domains with different priors: –a movie has made $60 million [power-law] –your friend quotes from line 17 of a poem [power-law] –you meet a 78 year old man [Gaussian] –a movie has been running for 55 minutes [Gaussian] –a U.S. congressman has served for 11 years [Erlang] Prior distributions derived from actual data Use 5 values of t for each People predict t total

people parametric prior empirical prior Gott’s rule

Predicting the future People produce accurate predictions for the duration and extent of everyday events People are sensitive to the statistics of their environment in making these predictions –form of the prior (power-law or exponential) –distribution given that form (parameters)

Optimality and cognition Can optimal solutions to computational problems shed light on human cognition? Can we explain aspects of cognition as the result of sensitivity to natural statistics? What kind of representations are extracted from those statistics? Joint work with Adam Sanborn

Categories are central to cognition

Sampling from categories Frog distribution P(x|c)

Markov chain Monte Carlo Sample from a target distribution P(x) by constructing Markov chain for which P(x) is the stationary distribution Markov chain converges to its stationary distribution, providing outcomes that can be used similarly to samples

Metropolis-Hastings algorithm (Metropolis et al., 1953; Hastings, 1970) Step 1: propose a state (we assume symmetrically) Q(x (t+1) |x (t) ) = Q(x (t) )|x (t+1) ) Step 2: decide whether to accept, with probability Metropolis acceptance function Barker acceptance function

Metropolis-Hastings algorithm p(x)p(x)

p(x)p(x)

p(x)p(x)

A(x (t), x (t+1) ) = 0.5 p(x)p(x)

Metropolis-Hastings algorithm p(x)p(x)

A(x (t), x (t+1) ) = 1 p(x)p(x)

A task Ask subjects which of two alternatives comes from a target category Which animal is a frog?

A Bayesian analysis of the task Assume:

Response probabilities If people probability match to the posterior, response probability is equivalent to the Barker acceptance function for target distribution p(x|c)

Collecting the samples Which is the frog? Trial 1Trial 2Trial 3

Verifying the method

Training Subjects were shown schematic fish of different sizes and trained on whether they came from the ocean (uniform) or a fish farm (Gaussian)

Between-subject conditions

Choice task Subjects judged which of the two fish came from the fish farm (Gaussian) distribution

Examples of subject MCMC chains

Estimates from all subjects Estimated means and standard deviations are significantly different across groups Estimated means are accurate, but standard deviation estimates are high –result could be due to perceptual noise or response gain

Sampling from natural categories Examined distributions for four natural categories: giraffes, horses, cats, and dogs Presented stimuli with nine-parameter stick figures (Olman & Kersten, 2004)

Choice task

Samples from Subject 3 (projected onto plane from LDA)

Mean animals by subject giraffe horse cat dog S1S2S3S4S5S6S7S8

Marginal densities (aggregated across subjects) Giraffes are distinguished by neck length, body height and body tilt Horses are like giraffes, but with shorter bodies and nearly uniform necks Cats have longer tails than dogs

Markov chain Monte Carlo with people Normative models can guide the design of experiments to measure psychological variables Markov chain Monte Carlo (and other methods) can be used to sample from subjective probability distributions –category distributions –prior distributions

Conclusion Optimal solutions to computational problems can shed light on human cognition We can explain aspects of cognition as the result of sensitivity to natural statistics We can use optimality to explore representations extracted from those statistics

Relative volume of categories Minimum Enclosing Hypercube GiraffeHorseCatDog Convex hull content divided by enclosing hypercube content Convex Hull

Discrimination method (Olman & Kersten, 2004)

Parameter space for discrimination Restricted so that most random draws were animal-like

MCMC and discrimination means

Iterated learning (Kirby, 2001) Each learner sees data, forms a hypothesis, produces the data given to the next learner With Bayesian learners, the distribution over hypotheses converges to the prior (Griffiths & Kalish, 2005)

Explaining convergence to the prior PL(h|d)PL(h|d) P P (d|h) PL(h|d)PL(h|d) Intuitively: data acts once, prior many times Formally: iterated learning with Bayesian agents is a Gibbs sampler on P(d,h) (Griffiths & Kalish, in press)

Iterated function learning (Kalish, Griffiths, & Lewandowsky, in press) Each learner sees a set of (x,y) pairs Makes predictions of y for new x values Predictions are data for the next learner datahypotheses

Function learning experiments Stimulus Response Slider Feedback Examine iterated learning with different initial data