Rational process models Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley.

Slides:



Advertisements
Similar presentations
The influence of domain priors on intervention strategy Neil Bramley.
Advertisements

Dynamic Bayesian Networks (DBNs)
Causes and coincidences Tom Griffiths Cognitive and Linguistic Sciences Brown University.
Part II: Graphical models
Part IV: Monte Carlo and nonparametric Bayes. Outline Monte Carlo methods Nonparametric Bayesian models.
The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
A Bayesian view of language evolution by iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana.
Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky.
Markov chain Monte Carlo with people Tom Griffiths Department of Psychology Cognitive Science Program UC Berkeley with Mike Kalish, Stephan Lewandowsky,
Exploring cultural transmission by iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana With thanks to: Anu Asnaani, Brian.
Bayesian Filtering for Location Estimation D. Fox, J. Hightower, L. Liao, D. Schulz, and G. Borriello Presented by: Honggang Zhang.
Part II: How to make a Bayesian model. Questions you can answer… What would an ideal learner or observer infer from these data? What are the effects of.
Prediction and Change Detection Mark Steyvers Scott Brown Mike Yi University of California, Irvine This work is supported by a grant from the US Air Force.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Analyzing iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana.
Normative models of human inductive inference Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley.
Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.
Does Naïve Bayes always work?
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Optimal predictions in everyday cognition Tom Griffiths Josh Tenenbaum Brown University MIT Predicting the future Optimality and Bayesian inference Results.
T 7.0 Chapter 7: Questioning for Inquiry Chapter 7: Questioning for Inquiry Central concepts:  Questioning stimulates and guides inquiry  Teachers use.
Perceptual Multistability as Markov Chain Monte Carlo Inference.
Tracking Multiple Cells By Correspondence Resolution In A Sequential Bayesian Framework Nilanjan Ray Gang Dong Scott T. Acton C.L. Brown Department of.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Human causal induction Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Chapter 9 Knowledge. Some Questions to Consider Why is it difficult to decide if a particular object belongs to a particular category, such as “chair,”
Falk Lieder1, Zi L. Sim1, Jane C. Hu, Thomas L
MCMC Output & Metropolis-Hastings Algorithm Part I
Does Naïve Bayes always work?
Bayesian data analysis
Markov chain Monte Carlo with people
Mastering the game of Go with deep neural network and tree search
Reasoning Under Uncertainty: Conditioning, Bayes Rule & Chain Rule
Probabilistic Robotics
“Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child's?” Alan Turing, 1950.
Course: Autonomous Machine Learning
Data Mining Lecture 11.
Introduction to particle filter
CAP 5636 – Advanced Artificial Intelligence
CSCI 5822 Probabilistic Models of Human and Machine Learning
Revealing priors on category structures through iterated learning
Introduction to particle filter
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Research in Psychology
CS 188: Artificial Intelligence
Chapter 2: Evaluative Feedback
A Hierarchical Bayesian Look at Some Debates in Category Learning
PSY 626: Bayesian Statistics for Psychological Science
Class #19 – Tuesday, November 3
The Naïve Bayes (NB) Classifier
Class #16 – Tuesday, October 26
The causal matrix: Learning the background knowledge that makes causal learning possible Josh Tenenbaum MIT Department of Brain and Cognitive Sciences.
CS639: Data Management for Data Science
Bayesian Data Analysis in R
Chapter 2: Evaluative Feedback
CS249: Neural Language Model
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

Rational process models Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley

A strategy for studying the mind input output learning algorithm Determine the inductive biases of this human learning algorithm

Marr’s three levels Computation “What is the goal of the computation, why is it appropriate, and what is the logic of the strategy by which it can be carried out?” Representation and algorithm “What is the representation for the input and output, and the algorithm for the transformation?” Implementation “How can the representation and algorithm be realized physically?” constrains

A problem input output learning algorithm

A problem input output learning algorithm How do inductive biases and representations in our models connect to mental and neural processes?

Marr’s three levels Computation “What is the goal of the computation, why is it appropriate, and what is the logic of the strategy by which it can be carried out?” Representation and algorithm “What is the representation for the input and output, and the algorithm for the transformation?” Implementation “How can the representation and algorithm be realized physically?” constrains

Relationship may not be transparent… Assume people perform Bayesian inference, but with a subset of all hypotheses –hypotheses are generated from a distribution q(h) Behavior will still be consistent with Bayesian inference… …but with a prior where h receives probability proportional to p(h)q(h) –the prior confounds two psychological factors: plausibility and ease of generation (Bonawitz & Griffiths, 2010)

A simple demonstration A causal learning task, where people form predictions and evaluate explanations Within-subjects, manipulate whether hypothesis generation is required –Phase 1: make predictions (requires generation) –Phase 2: provide all hypotheses, ask to evaluate Between-subjects, manipulate ease of generation using a priming vignette –should only influence Phase 1 judgments (Bonawitz & Griffiths, 2010)

Primes Teachers at an elementary school taught their students a game for two children to play. They observed the results of pairs of students playing the game and tried to come up with a way to predict (for any given pair of students) who was going to win the game. At first it was difficult for the teachers to notice anything that would help them correctly predict the outcomes of the games. Then the teachers started organizing the children by the height of the children and the pattern of results quickly became apparent. The teachers were able to use the height of the children and make very accurate predictions as to who (for any given pair of students) was going to win the game. Teachers at an elementary school taught their students a game for two children to play. They observed the results of pairs of students playing the game and tried to come up with a way to predict (for any given pair of students) who was going to win the game. At first it was difficult for the teachers to notice anything that would help them correctly predict the outcomes of the games. Then the teachers started organizing the children by the children’s shirt color and the pattern of results quickly became apparent. The teachers were able to use the color of the children’s shirts and make very accurate predictions as to who (for any given pair of students) was going to win the game. Strong PrimeNeutral Prime (Bonawitz & Griffiths, 2010)

Phase 1 aa Block w lights up! The blocks are slid together and, m m w w Consider what will happen if w and s touch. Will w light? Yes No How confident are you? Not sureVery Sure Will s light? Yes No How confident are you? Not sureVery Sure Consider what will happen if w and k touch. Will w light? Yes No How confident are you? Not sureVery Sure Will k light? Yes No How confident are you? Not sureVery Sure Also observed so far: k sm y y s

Please rate how good you find the following explanation: The blocks can not be organized. They will light or not light randomly, but only one block can be lit at a time Not good Very good Please rate how good you find the following explanation: The blocks can be organized by ‘strength’. The stronger blocks will light the weaker ones: StrongestWeakest skmyw/g Not good Very good Phase 2: Evaluating theories

Results Estimated prior would depend on task Phase 1 Phase 2 (Bonawitz & Griffiths, 2010)

Why should we care? This example shows we should be careful in the psychological interpretation of priors We should be similarly careful in the psychological interpretation of representations The only way to understand how computational level models constrain algorithmic level models is to begin to connect the two…

A top-down approach… The goal is approximating Bayesian inference From computer science and statistics, find good methods for solving this problem See if these methods can be connected to psychological processes The result: “rational process models” (Sanborn et al., 2006; Shi et al., 2008)

The Monte Carlo principle where the x (i) are sampled from p(x)

Sampling as psychological process Sampling can be done using existing “toolbox” –storage/retrieval, simulation, activation/competition Many existing process models use sampling: –Luce choice rule (Luce, 1959) –stochastic models of reaction time –used to explain correlation perception, confidence intervals, utility curves, … (Kareev, 2000; Fiedler & Juslin, 2006; Stewart et al., 2006)

Monte Carlo as mechanism Importance sampling (Shi, Griffiths, Feldman, & Sanborn, 2010) “Win-stay, lose-shift” algorithms (Bonawitz, Denison, Griffiths, & Gopnik, in press) Particle filters (Brown & Steyvers, 2009; Daw & Courville, 2008; Levy et al., 2009; Sanborn et al., 2010) Markov chain Monte Carlo (Gershman, Vul, & Tenenbaum, 2009; Ullman, Goodman, & Tenenbaum, 2010)

Monte Carlo as mechanism Importance sampling (Shi, Griffiths, Feldman, & Sanborn, 2010) “Win-stay, lose-shift” algorithms (Bonawitz, Denison, Griffiths, & Gopnik, in press) Particle filters (Brown & Steyvers, 2009; Daw & Courville, 2008; Levy et al., 2009; Sanborn et al., 2010) Markov chain Monte Carlo (Gershman, Vul, & Tenenbaum, 2009; Ullman, Goodman, & Tenenbaum, 2010)

Importance sampling p(x)p(x) q(x)q(x)

Approximating Bayesian inference Sample from the prior, weight by the likelihood

Exemplar models Assume decisions are made by storing previous events in memory, then activating by similarity For example, categorization: where x (i) are exemplars, s(x,x (i) ) is similarity, I(x (i)  c) is 1 if x (i) is from category c (e.g., Nosofsky, 1986)

Exemplar models Assume decisions are made by storing previous events in memory, then activating by similarity General version: where x (i) are exemplars, s(x,x (i) ) is similarity, f(x (i) ) is quantity of interest

Equivalence Bayes can be approximated using exemplar models, storing hypotheses sampled from prior

Approximating Bayesian inference Sample from prior, weight by likelihood Can be implemented in an exemplar model –store “exemplars” of hypotheses through experience –activate in proportion to likelihood (=“similarity”) (Shi, Feldman, & Griffiths, 2008)

The universal law of generalization (Shi, Griffiths, Feldman, & Sanborn, 2010)

The number game (Shi, Feldman, & Griffiths, 2008)

Making connections (Shi & Griffiths, 2009)

Monte Carlo as mechanism Importance sampling (Shi, Griffiths, Feldman, & Sanborn, 2010) “Win-stay, lose-shift” algorithms (Bonawitz, Denison, Griffiths, & Gopnik, in press) Particle filters (Brown & Steyvers, 2009; Daw & Courville, 2008; Levy et al., 2009; Sanborn et al., 2010) Markov chain Monte Carlo (Gershman, Vul, & Tenenbaum, 2009; Ullman, Goodman, & Tenenbaum, 2010)

Win-stay, lose-shift algorithms A class of algorithms that have been explored in both computer science and psychology Our version: After observing d n, keep hypothesis h with probability else sample h from P(h|d 1, …, d n ) Then h is a sample from P(h|d 1, …, d n ) for all n where c can depend on d 1,…d n-1, and is such that   [0,1] (Bonawitz, Denison, Griffiths, & Gopnik, in press)

Two interesting special cases Case 1: Decide using only current likelihood Case 2 : Minimize the rate of switching (Bonawitz, Denison, Griffiths, & Gopnik, in press)

Blicket Detector

Go First No Go First:

Choice probabilities Go FirstNo Go First (Bonawitz, Denison, Griffiths, & Gopnik, in press)

Switch probabilities People Win Stay, Lose Shift Independent sampling r = 0.78 r = 0.38 (Bonawitz, Denison, Griffiths, & Gopnik, in press)

Monte Carlo as mechanism Importance sampling (Shi, Griffiths, Feldman, & Sanborn, 2010) “Win-stay, lose-shift” algorithms (Bonawitz, Denison, Griffiths, & Gopnik, in press) Particle filters (Brown & Steyvers, 2009; Daw & Courville, 2008; Levy et al., 2009; Sanborn et al., 2010; ) Markov chain Monte Carlo (Gershman, Vul, & Tenenbaum, 2009; Ullman, Goodman, & Tenenbaum, 2010)

Updating distributions over time… Computational costs are compounded when data are observed incrementally… –recompute P(h|d 1, …, d n ) after observing d n Exploit “yesterday’s posterior is today’s prior” Repeatedly using importance sampling results in an algorithm known as a “particle filter”

Particle filter samples from P(h|d 1,…,d n-1 ) weight by P(d n |h) weighted atoms P(h|d 1,…,d n ) samples from P(h|d 1,…,d n )

Dynamic hypotheses d1d1 d2d2 d3d3 d4d4 h1h1 h2h2 h3h3 h4h4

Particle filters samples from P(h 3 |d 1,…,d 3 ) samples from P(h 4 |d 1,…,d 3 ) sample from P(h 4 |h 3 ) weight by P(d 4 |h 4 ) weighted atoms P(h 4 |d 1,…,d 4 ) samples from P(h 4 |d 1,…,d 4 )

The promise of particle filters A general scheme for defining rational process models of updating over time (Sanborn et al., 2006) Model limited memory, and produce order effects (cf. Kruschke, 2006) Used to define rational process models of… –categorization (Sanborn et al., 2006) –associative learning (Daw & Courville, 2008) –changepoint detection (Brown & Steyvers, 2009) –sentence processing (Levy et a1., 2009)

Simple garden-path sentences The woman brought the sandwich from the kitchen tripped

Particle filter with probabilistic grammars S  NP VP 1.0V  brought 0.4 NP  N 0.8V  broke 0.3 NP  N RRC 0.2V  tripped 0.3 RRC  Part N 1.0Part  brought 0.1 VP  V N 1.0Part  broken 0.7 N  woman 0.7Part  tripped 0.2 N  sandwiches 0.3Adv  quickly 1.0 womanbroughtsandwiches RRC N PartN * * * * * * * * * * * tripped VP V S * * * 0.3 NP

Simple garden-path sentences The woman brought the sandwich from the kitchen tripped MAIN VERB (it was the woman who brought the sandwich) REDUCED RELATIVE (the woman was brought the sandwich)

Solving a puzzle A-STom heard the gossip wasn’t true. A-LTom heard the gossip about the neighbors wasn’t true. U-STom heard that the gossip wasn’t true. U-LTom heard that the gossip about the neighbors wasn’t true. Ambiguity increases difficulty… …but so does the length of the ambiguous region (Frazier & Rayner,1982; Tabor & Hutchins, 2004) Our prediction: parse failures at the disambiguating region should increase with sentence difficulty

Resampling-induced drift In ambiguous region, observed words aren’t strongly informative Due to resampling, probabilities of different constructions will drift (more so with time)

Model results

Monte Carlo as mechanism Importance sampling (Shi, Griffiths, Feldman, & Sanborn, 2010) “Win-stay, lose-shift” algorithms (Bonawitz, Denison, Griffiths, & Gopnik, in press) Particle filters (Brown & Steyvers, 2009; Daw & Courville, 2008; Levy et al., 2009; Sanborn et al., 2010) Markov chain Monte Carlo (Gershman, Vul, & Tenenbaum, 2009; Ullman, Goodman, & Tenenbaum, 2010)

Markov chain Monte Carlo Typically assumes all data are available, considers one hypothesis at a time and stochastically explores the space of hypotheses A possible account of perceptual bistability (Gershman, Vul, & Tenenbaum, 2009) A way to explain how children explore the space of possible causal theories ( Ullman, Goodman, & Tenenbaum, 2010)

Monte Carlo as mechanism Importance sampling (Shi, Griffiths, Feldman, & Sanborn, 2010) “Win-stay, lose-shift” algorithms (Bonawitz, Denison, Griffiths, & Gopnik, in press) Particle filters (Brown & Steyvers, 2009; Daw & Courville, 2008; Levy et al., 2009; Sanborn et al., 2010) Markov chain Monte Carlo (Gershman, Vul, & Tenenbaum, 2009; Ullman, Goodman, & Tenenbaum, 2010)

Marr’s three levels Computation “What is the goal of the computation, why is it appropriate, and what is the logic of the strategy by which it can be carried out?” Representation and algorithm “What is the representation for the input and output, and the algorithm for the transformation?” Implementation “How can the representation and algorithm be realized physically?” constrains

Mechanistic pluralism Computation “What is the goal of the computation, why is it appropriate, and what is the logic of the strategy by which it can be carried out?” Representation and algorithm “What is the representation for the input and output, and the algorithm for the transformation?” Implementation “How can the representation and algorithm be realized physically?” WSLS Importance sampling Particle filtering MCMC … RBF networks Associative memories Probabilistic coding …

Conclusions Bayesian models of cognition give us a way to identify human inductive biases But, the relationship of priors and representations in these models to mental and neural processes is not necessarily transparent Monte Carlo methods provide a source of rational process models that could connect these levels, and perhaps the expectation of many mechanisms