Rational process models Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley.

Rational process models Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley

A strategy for studying the mind input output learning algorithm Determine the inductive biases of this human learning algorithm

Marr’s three levels Computation “What is the goal of the computation, why is it appropriate, and what is the logic of the strategy by which it can be carried out?” Representation and algorithm “What is the representation for the input and output, and the algorithm for the transformation?” Implementation “How can the representation and algorithm be realized physically?” constrains

A problem input output learning algorithm

A problem input output learning algorithm How do inductive biases and representations in our models connect to mental and neural processes?

Relationship may not be transparent… Assume people perform Bayesian inference, but with a subset of all hypotheses –hypotheses are generated from a distribution q(h) Behavior will still be consistent with Bayesian inference… …but with a prior where h receives probability proportional to p(h)q(h) –the prior confounds two psychological factors: plausibility and ease of generation (Bonawitz & Griffiths, 2010)

A simple demonstration A causal learning task, where people form predictions and evaluate explanations Within-subjects, manipulate whether hypothesis generation is required –Phase 1: make predictions (requires generation) –Phase 2: provide all hypotheses, ask to evaluate Between-subjects, manipulate ease of generation using a priming vignette –should only influence Phase 1 judgments (Bonawitz & Griffiths, 2010)

Primes Teachers at an elementary school taught their students a game for two children to play. They observed the results of pairs of students playing the game and tried to come up with a way to predict (for any given pair of students) who was going to win the game. At first it was difficult for the teachers to notice anything that would help them correctly predict the outcomes of the games. Then the teachers started organizing the children by the height of the children and the pattern of results quickly became apparent. The teachers were able to use the height of the children and make very accurate predictions as to who (for any given pair of students) was going to win the game. Teachers at an elementary school taught their students a game for two children to play. They observed the results of pairs of students playing the game and tried to come up with a way to predict (for any given pair of students) who was going to win the game. At first it was difficult for the teachers to notice anything that would help them correctly predict the outcomes of the games. Then the teachers started organizing the children by the children’s shirt color and the pattern of results quickly became apparent. The teachers were able to use the color of the children’s shirts and make very accurate predictions as to who (for any given pair of students) was going to win the game. Strong PrimeNeutral Prime (Bonawitz & Griffiths, 2010)

Phase 1 aa Block w lights up! The blocks are slid together and, m m w w Consider what will happen if w and s touch. Will w light? Yes No How confident are you? 1234567 Not sureVery Sure Will s light? Yes No How confident are you? 1234567 Not sureVery Sure Consider what will happen if w and k touch. Will w light? Yes No How confident are you? 1234567 Not sureVery Sure Will k light? Yes No How confident are you? 1234567 Not sureVery Sure Also observed so far: k sm y y s

Please rate how good you find the following explanation: The blocks can not be organized. They will light or not light randomly, but only one block can be lit at a time. 1234567 Not good Very good Please rate how good you find the following explanation: The blocks can be organized by ‘strength’. The stronger blocks will light the weaker ones: StrongestWeakest skmyw/g 1234567 Not good Very good Phase 2: Evaluating theories

Results Estimated prior would depend on task Phase 1 Phase 2 (Bonawitz & Griffiths, 2010)

Why should we care? This example shows we should be careful in the psychological interpretation of priors We should be similarly careful in the psychological interpretation of representations The only way to understand how computational level models constrain algorithmic level models is to begin to connect the two…

A top-down approach… The goal is approximating Bayesian inference From computer science and statistics, find good methods for solving this problem See if these methods can be connected to psychological processes The result: “rational process models” (Sanborn et al., 2006; Shi et al., 2008)

The Monte Carlo principle where the x (i) are sampled from p(x)

Sampling as psychological process Sampling can be done using existing “toolbox” –storage/retrieval, simulation, activation/competition Many existing process models use sampling: –Luce choice rule (Luce, 1959) –stochastic models of reaction time –used to explain correlation perception, confidence intervals, utility curves, … (Kareev, 2000; Fiedler & Juslin, 2006; Stewart et al., 2006)

Monte Carlo as mechanism Importance sampling (Shi, Griffiths, Feldman, & Sanborn, 2010) “Win-stay, lose-shift” algorithms (Bonawitz, Denison, Griffiths, & Gopnik, in press) Particle filters (Brown & Steyvers, 2009; Daw & Courville, 2008; Levy et al., 2009; Sanborn et al., 2010) Markov chain Monte Carlo (Gershman, Vul, & Tenenbaum, 2009; Ullman, Goodman, & Tenenbaum, 2010)

Importance sampling p(x)p(x) q(x)q(x)

Approximating Bayesian inference Sample from the prior, weight by the likelihood

Exemplar models Assume decisions are made by storing previous events in memory, then activating by similarity For example, categorization: where x (i) are exemplars, s(x,x (i) ) is similarity, I(x (i)  c) is 1 if x (i) is from category c (e.g., Nosofsky, 1986)

Exemplar models Assume decisions are made by storing previous events in memory, then activating by similarity General version: where x (i) are exemplars, s(x,x (i) ) is similarity, f(x (i) ) is quantity of interest

Equivalence Bayes can be approximated using exemplar models, storing hypotheses sampled from prior

Approximating Bayesian inference Sample from prior, weight by likelihood Can be implemented in an exemplar model –store “exemplars” of hypotheses through experience –activate in proportion to likelihood (=“similarity”) (Shi, Feldman, & Griffiths, 2008)

The universal law of generalization (Shi, Griffiths, Feldman, & Sanborn, 2010)

The number game (Shi, Feldman, & Griffiths, 2008)

Making connections (Shi & Griffiths, 2009)

Win-stay, lose-shift algorithms A class of algorithms that have been explored in both computer science and psychology Our version: After observing d n, keep hypothesis h with probability else sample h from P(h|d 1, …, d n ) Then h is a sample from P(h|d 1, …, d n ) for all n where c can depend on d 1,…d n-1, and is such that   [0,1] (Bonawitz, Denison, Griffiths, & Gopnik, in press)

Two interesting special cases Case 1: Decide using only current likelihood Case 2 : Minimize the rate of switching (Bonawitz, Denison, Griffiths, & Gopnik, in press)

Blicket Detector

Go First No Go First:

Choice probabilities Go FirstNo Go First (Bonawitz, Denison, Griffiths, & Gopnik, in press)

Switch probabilities People Win Stay, Lose Shift Independent sampling r = 0.78 r = 0.38 (Bonawitz, Denison, Griffiths, & Gopnik, in press)

Monte Carlo as mechanism Importance sampling (Shi, Griffiths, Feldman, & Sanborn, 2010) “Win-stay, lose-shift” algorithms (Bonawitz, Denison, Griffiths, & Gopnik, in press) Particle filters (Brown & Steyvers, 2009; Daw & Courville, 2008; Levy et al., 2009; Sanborn et al., 2010; ) Markov chain Monte Carlo (Gershman, Vul, & Tenenbaum, 2009; Ullman, Goodman, & Tenenbaum, 2010)

Updating distributions over time… Computational costs are compounded when data are observed incrementally… –recompute P(h|d 1, …, d n ) after observing d n Exploit “yesterday’s posterior is today’s prior” Repeatedly using importance sampling results in an algorithm known as a “particle filter”

Particle filter samples from P(h|d 1,…,d n-1 ) weight by P(d n |h) weighted atoms P(h|d 1,…,d n ) samples from P(h|d 1,…,d n )

Dynamic hypotheses d1d1 d2d2 d3d3 d4d4 h1h1 h2h2 h3h3 h4h4

The promise of particle filters A general scheme for defining rational process models of updating over time (Sanborn et al., 2006) Model limited memory, and produce order effects (cf. Kruschke, 2006) Used to define rational process models of… –categorization (Sanborn et al., 2006) –associative learning (Daw & Courville, 2008) –changepoint detection (Brown & Steyvers, 2009) –sentence processing (Levy et a1., 2009)

Simple garden-path sentences The woman brought the sandwich from the kitchen tripped

Particle filter with probabilistic grammars S  NP VP 1.0V  brought 0.4 NP  N 0.8V  broke 0.3 NP  N RRC 0.2V  tripped 0.3 RRC  Part N 1.0Part  brought 0.1 VP  V N 1.0Part  broken 0.7 N  woman 0.7Part  tripped 0.2 N  sandwiches 0.3Adv  quickly 1.0 womanbroughtsandwiches RRC N PartN * * * * * * * * * 0.70.10.3 * * tripped VP V S * * * 0.3 NP

Simple garden-path sentences The woman brought the sandwich from the kitchen tripped MAIN VERB (it was the woman who brought the sandwich) REDUCED RELATIVE (the woman was brought the sandwich)

Solving a puzzle A-STom heard the gossip wasn’t true. A-LTom heard the gossip about the neighbors wasn’t true. U-STom heard that the gossip wasn’t true. U-LTom heard that the gossip about the neighbors wasn’t true. Ambiguity increases difficulty… …but so does the length of the ambiguous region (Frazier & Rayner,1982; Tabor & Hutchins, 2004) Our prediction: parse failures at the disambiguating region should increase with sentence difficulty

Resampling-induced drift In ambiguous region, observed words aren’t strongly informative Due to resampling, probabilities of different constructions will drift (more so with time)

Model results

Markov chain Monte Carlo Typically assumes all data are available, considers one hypothesis at a time and stochastically explores the space of hypotheses A possible account of perceptual bistability (Gershman, Vul, & Tenenbaum, 2009) A way to explain how children explore the space of possible causal theories ( Ullman, Goodman, & Tenenbaum, 2010)

Mechanistic pluralism Computation “What is the goal of the computation, why is it appropriate, and what is the logic of the strategy by which it can be carried out?” Representation and algorithm “What is the representation for the input and output, and the algorithm for the transformation?” Implementation “How can the representation and algorithm be realized physically?” WSLS Importance sampling Particle filtering MCMC … RBF networks Associative memories Probabilistic coding …

Conclusions Bayesian models of cognition give us a way to identify human inductive biases But, the relationship of priors and representations in these models to mental and neural processes is not necessarily transparent Monte Carlo methods provide a source of rational process models that could connect these levels, and perhaps the expectation of many mechanisms

Rational process models Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley.

Similar presentations

Presentation on theme: "Rational process models Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Rational process models Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley.

Similar presentations

Presentation on theme: "Rational process models Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley."— Presentation transcript:

Similar presentations

About project

Feedback