Bayesian models of human learning and reasoning

Slides:



Advertisements
Similar presentations
The influence of domain priors on intervention strategy Neil Bramley.
Advertisements

Probabilistic Models in Human and Machine Intelligence.
Probabilistic Models of Cognition Conceptual Foundations Chater, Tenenbaum, & Yuille TICS, 10(7), (2006)
Causes and coincidences Tom Griffiths Cognitive and Linguistic Sciences Brown University.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman.
Bayesian models of inductive learning
Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences.
Bayesian models of human learning and reasoning Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL)
Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,
Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.
Overview and History of Cognitive Science. How do minds work? What would an answer to this question look like? What is a mind? What is intelligence? How.
Bayesian models of human inference Josh Tenenbaum MIT.
Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences.
Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky.
Infinite block models for belief networks, social networks, and cultural knowledge Josh Tenenbaum, MIT 2007 MURI Review Meeting Work of Charles Kemp, Chris.
Bayesian models of human learning and inference Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL) Thanks.
Modeling Vision as Bayesian Inference: Is it Worth the Effort? Alan L. Yuille. UCLA. Co-Director: Centre for Image and Vision Sciences. Dept. Statistics.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Normative models of human inductive inference Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
9.94 The cognitive science of intuitive theories J. Tenenbaum, T. Lombrozo, L. Schulz, R. Saxe.
Todo How to handle intro quote. Yet I’d like to think…. I’m not the only one. Make sure to practice talking through the shape bias part. –How to explain.
Bayesian models of human inductive learning Josh Tenenbaum MIT.
Bayesian approaches to cognitive sciences. Word learning Bayesian property induction Theory-based causal inference.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Soft Computing Lecture 17 Introduction to probabilistic reasoning. Bayesian nets. Markov models.
Learning causal theories Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL)
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Bayesian models of human inductive learning Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL)
Optimal predictions in everyday cognition Tom Griffiths Josh Tenenbaum Brown University MIT Predicting the future Optimality and Bayesian inference Results.
A Cognitive Substrate for Natural Language Understanding Nick Cassimatis Arthi Murugesan Magdalena Bugajska.
What is Cognitive Science? Josh Tenenbaum MLSS 2010.
Introduction to probabilistic models of cognition Josh Tenenbaum MIT.
Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.
Inferring structure from data Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Machine Learning.
Bayesian models of inductive learning and reasoning Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL)
Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT.
Infinite block models for belief networks, social networks, and cultural knowledge Josh Tenenbaum, MIT 2007 MURI Review Meeting Work of Charles Kemp, Chris.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Chapter 7. Learning through Imitation and Exploration: Towards Humanoid Robots that Learn from Humans in Creating Brain-like Intelligence. Course: Robots.
Probabilistic Models in Human and Machine Intelligence.
RULES Patty Nordstrom Hien Nguyen. "Cognitive Skills are Realized by Production Rules"
Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
1 Artificial Intelligence & Prolog Programming CSL 302.
Basic Bayes: model fitting, model selection, model averaging Josh Tenenbaum MIT.
Chapter 9. A Model of Cultural Evolution and Its Application to Language From “The Computational Nature of Language Learning and Evolution” Summarized.
Todo One slide on “beyond similarity-based induction”
What is cognitive psychology?
Modeling human action understanding as inverse planning
Learning Fast and Slow John E. Laird
Bayesian models of human inference
Bayesian data analysis
Markov chain Monte Carlo with people
Machine Learning for dotNET Developer Bahrudin Hrnjica, MVP
“Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child's?” Alan Turing, 1950.
Machine Learning Basics
Data Mining Lecture 11.
Bayesian models of human learning and inference
Josh Tenenbaum Statistical learning of abstract knowledge:
Finding structure in data
Revealing priors on category structures through iterated learning
Probabilistic Models in Human and Machine Intelligence
The free-energy principle: a rough guide to the brain? K Friston
Building and evaluating models of human-level intelligence
The causal matrix: Learning the background knowledge that makes causal learning possible Josh Tenenbaum MIT Department of Brain and Cognitive Sciences.
Learning overhypotheses with hierarchical Bayesian models
Presentation transcript:

Bayesian models of human learning and reasoning Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL) Acknowledgments: Tom Griffiths, Charles Kemp, The Computational Cognitive Science group at MIT All the researchers whose work I’ll discuss.

Collaborators Tom Griffiths Charles Kemp Chris Baker Noah Goodman Game plan: biology If 5 minutes left at the end, do words and theory acquisition. If no time left, just do theory acquisition. Vikash Mansinghka Amy Perfors Lauren Schmidt Pat Shafto

The probabilistic revolution in AI Principled and effective solutions for inductive inference from ambiguous data: Vision Robotics Machine learning Expert systems / reasoning Natural language processing Standard view: no necessary connection to how the human brain solves these problems.

Bayesian models of cognition Visual perception [Weiss, Simoncelli, Adelson, Richards, Freeman, Feldman, Kersten, Knill, Maloney, Olshausen, Jacobs, Pouget, ...] Language acquisition and processing [Brent, de Marken, Niyogi, Klein, Manning, Jurafsky, Keller, Levy, Hale, Johnson, Griffiths, Perfors, Tenenbaum, …] Motor learning and motor control [Ghahramani, Jordan, Wolpert, Kording, Kawato, Doya, Todorov, Shadmehr, …] Associative learning [Dayan, Daw, Kakade, Courville, Touretzky, Kruschke, …] Memory [Anderson, Schooler, Shiffrin, Steyvers, Griffiths, McClelland, …] Attention [Mozer, Huber, Torralba, Oliva, Geisler, Movellan, Yu, Itti, Baldi, …] Categorization and concept learning [Anderson, Nosfosky, Rehder, Navarro, Griffiths, Feldman, Tenenbaum, Rosseel, Goodman, Kemp, Mansinghka, …] Reasoning [Chater, Oaksford, Sloman, McKenzie, Heit, Tenenbaum, Kemp, …] Causal inference [Waldmann, Sloman, Steyvers, Griffiths, Tenenbaum, Yuille, …] Decision making and theory of mind [Lee, Stankiewicz, Rao, Baker, Goodman, Tenenbaum, …]

Everyday inductive leaps How can people learn so much about the world from such limited evidence? Learning concepts from examples “horse” “horse” “horse”

Learning concepts from examples “tufa”

Everyday inductive leaps How can people learn so much about the world from such limited evidence? Kinds of objects and their properties The meanings of words, phrases, and sentences Cause-effect relations The beliefs, goals and plans of other people Social structures, conventions, and rules

Modeling Goals Principled quantitative models of human behavior, with broad coverage and a minimum of free parameters and ad hoc assumptions. Explain how and why human learning and reasoning works, in terms of (approximations to) optimal statistical inference in natural environments. A framework for studying people’s implicit knowledge about the structure of the world: how it is structured, used, and acquired. A two-way bridge to state-of-the-art AI and machine learning.

The approach: from statistics to intelligence How does background knowledge guide learning from sparsely observed data? Bayesian inference: 2. What form does background knowledge take, across different domains and tasks? Probabilities defined over structured representations: graphs, grammars, predicate logic, schemas, theories. 3. How is background knowledge itself acquired? Hierarchical probabilistic models, with inference at multiple levels of abstraction. Flexible nonparametric models in which complexity grows with the data.

Outline Predicting everyday events Learning concepts from examples The big picture

Basics of Bayesian inference Bayes’ rule: An example Data: John is coughing Some hypotheses: John has a cold John has lung cancer John has a stomach flu Likelihood P(d|h) favors 1 and 2 over 3 Prior probability P(h) favors 1 and 3 over 2 Posterior probability P(h|d) favors 1 over 2 and 3

Bayesian inference in perception and sensorimotor integration (Weiss, Simoncelli & Adelson 2002) (Kording & Wolpert 2004)

Everyday prediction problems (Griffiths & Tenenbaum, 2006) You read about a movie that has made $60 million to date. How much money will it make in total? You see that something has been baking in the oven for 34 minutes. How long until it’s ready? You meet someone who is 78 years old. How long will they live? Your friend quotes to you from line 17 of his favorite poem. How long is the poem? You meet a US congressman who has served for 11 years. How long will he serve in total? You encounter a phenomenon or event with an unknown extent or duration, ttotal, at a random time or value of t <ttotal. What is the total extent or duration ttotal?

P(ttotal|t)  P(t|ttotal) P(ttotal) Bayesian analysis P(ttotal|t)  P(t|ttotal) P(ttotal)  1/ttotal P(ttotal) Assume random sample (for 0 < t < ttotal else = 0) Form of P(ttotal)? e.g., uninformative (Jeffreys) prior  1/ttotal

P(ttotal|t)  1/ttotal 1/ttotal Bayesian analysis P(ttotal|t)  1/ttotal 1/ttotal posterior probability Random sampling “Uninformative” prior P(ttotal|t) ttotal t Best guess for ttotal: t* such that P(ttotal > t*|t) = 0.5 Yields Gott’s Rule: Guess t* = 2t

Evaluating Gott’s Rule You read about a movie that has made $78 million to date. How much money will it make in total? “$156 million” seems reasonable. You meet someone who is 35 years old. How long will they live? “70 years” seems reasonable. Not so simple: You meet someone who is 78 years old. How long will they live? You meet someone who is 6 years old. How long will they live?

Priors P(ttotal) based on empirically measured durations or magnitudes for many real-world events in each class: Median human judgments of the total duration or magnitude ttotal of events in each class, given that they are first observed at a duration or magnitude t, versus Bayesian predictions (median of P(ttotal|t)).

You learn that in ancient Egypt, there was a great flood in the 11th year of a pharaoh’s reign. How long did he reign?

You learn that in ancient Egypt, there was a great flood in the 11th year of a pharaoh’s reign. How long did he reign? How long did the typical pharaoh reign in ancient egypt?

Summary: prediction Predictions about the extent or magnitude of everyday events follow Bayesian principles. Contrast with Bayesian inference in perception, motor control, memory: no “universal priors” here. Predictions depend rationally on priors that are appropriately calibrated for different domains. Form of the prior (e.g., power-law or exponential) Specific distribution given that form (parameters) Non-parametric distribution when necessary. In the absence of concrete experience, priors may be generated by qualitative background knowledge.

Learning concepts from examples “tufa” Word learning “tufa” “tufa” Property induction Cows have T9 hormones. Seals have T9 hormones. Squirrels have T9 hormones. All mammals have T9 hormones. Cows have T9 hormones. Sheep have T9 hormones. Goats have T9 hormones. All mammals have T9 hormones.

The computational problem (c.f., semi-supervised learning) ? Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant ? Features New property (85 features from Osherson et al. E.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘quadrapedal’,…)

Similarity-based models Human judgments of argument strength Model predictions Cows have property P. Elephants have property P. Horses have property P. All mammals have property P. Gorillas have property P. Mice have property P. Seals have property P. All mammals have property P.

Beyond similarity-based induction Reasoning based on dimensional thresholds: (Smith et al., 1993) Reasoning based on causal relations: (Medin et al., 2004; Coley & Shafto, 2003) Poodles can bite through wire. German shepherds can bite through wire. Dobermans can bite through wire. German shepherds can bite through wire. Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Salmon carry E. Spirus bacteria.

} ... ... X Y Prior P(h) Hypotheses h Horses have T9 hormones Rhinos have T9 hormones Cows have T9 hormones } X Y Hypotheses h Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant ? ... ... Prior P(h)

} ... ... X Y Prior P(h) Hypotheses h Prediction P(Y | X) Horses have T9 hormones Rhinos have T9 hormones Cows have T9 hormones } X Y Hypotheses h Prediction P(Y | X) Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant ? ... ... Prior P(h)

Where does the prior come from? Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant ... ... Prior P(h) Why not just enumerate all logically possible hypotheses along with their relative prior probabilities?

Different sources for priors Chimps have T9 hormones. Gorillas have T9 hormones. Taxonomic similarity Poodles can bite through wire. Dobermans can bite through wire. Jaw strength Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Food web relations

Hierarchical Bayesian Framework P(structure | form) P(data | structure) P(form) F: form Tree with species at leaf nodes Background knowledge mouse squirrel chimp gorilla S: structure hormones Has T9 F1 F2 F3 F4 mouse squirrel chimp gorilla ? D: data …

The value of structural form knowledge: inductive bias

Hierarchical Bayesian Framework F: form Tree with species at leaf nodes mouse squirrel chimp gorilla S: structure hormones Has T9 F1 F2 F3 F4 mouse squirrel chimp gorilla ? D: data … Property induction

P(D|S): How the structure constrains the data of experience Define a stochastic process over structure S that generates hypotheses h. Intuitively, properties should vary smoothly over structure. Smooth: P(h) high Not smooth: P(h) low

P(D|S): How the structure constrains the data of experience Gaussian Process (~ random walk, diffusion) [Zhu, Ghahramani & Lafferty 2003] y Threshold h

P(D|S): How the structure constrains the data of experience Gaussian Process (~ random walk, diffusion) [Zhu, Lafferty & Ghahramani 2003] y Threshold h

Structure S Data D Features Species 1 Species 2 Species 3 Species 4 Species 5 Species 6 Species 7 Species 8 Species 9 Species 10 Features 85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…

[c.f., Lawrence, 2004; Smola & Kondor 2003]

Structure S Data D Species 1 Species 2 Species 3 Species 4 Species 5 Species 6 Species 7 Species 8 Species 9 Species 10 ? Features New property 85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…

Cows have property P. Elephants have property P. Horses have property P. Tree 2D Gorillas have property P. Mice have property P. Seals have property P. All mammals have property P.

Testing different priors Correct bias Wrong No Too strong Inductive bias

A connectionist alternative (Rogers and McClelland, 2004) Species Features Emergent structure: clustering on hidden unit activation vectors

Reasoning about spatially varying properties “Native American artifacts” task

taxonomic tree directed chain directed network Property type “has T9 hormones” “can bite through wire” “carry E. Spirus bacteria” Theory Structure taxonomic tree directed chain directed network + diffusion process + drift process + noisy transmission Class D Class C Class G Class F Class E Class D Class B Class A Class D Class A Class A Class F Class E Class C Class C Class B Class G Class E Class B Class F Hypotheses Class G Class A Class B Class C Class D Class E Class F Class G . . . . . . . . .

Reasoning with two property types “Given that X has property P, how likely is it that Y does?” Herring Biological property Tuna Mako shark Sand shark Dolphin Human Disease property Kelp Tree Web Sand shark (Shafto, Kemp, Bonawitz, Coley & Tenenbaum) Kelp Herring Tuna Mako shark Human Dolphin

Summary so far A framework for modeling human inductive reasoning as rational statistical inference over structured knowledge representations Qualitatively different priors are appropriate for different domains of property induction. In each domain, a prior that matches the world’s structure fits people’s judgments well, and better than alternative priors. A language for representing different theories: graph structure defined over objects + probabilistic model for the distribution of properties over that graph. Remaining question: How can we learn appropriate theories for different domains?

Hierarchical Bayesian Framework F: form Chain Tree Space chimp gorilla squirrel mouse mouse squirrel chimp gorilla mouse squirrel S: structure gorilla chimp F1 F2 F3 F4 D: data mouse squirrel chimp gorilla

Discovering structural forms Snake Turtle Crocodile Robin Ostrich Bat Orangutan Discovering structural forms Snake Turtle Crocodile Robin Bat Ostrich Orangutan Ostrich Robin Crocodile Snake Turtle Bat Orangutan

Discovering structural forms Snake Turtle Crocodile Robin Ostrich Bat Orangutan Discovering structural forms “Great chain of being” Rock Plant Snake Turtle Crocodile Robin Bat Ostrich Orangutan Angel God Linnaeus Ostrich Robin Crocodile Snake Turtle Bat Orangutan

People can discover structural forms Scientific discoveries Children’s cognitive development Hierarchical structure of category labels Clique structure of social groups Cyclical structure of seasons or days of the week Transitive structure for value Tree structure for biological species Periodic structure for chemical elements “great chain of being” Systema Naturae Kingdom Animalia  Phylum Chordata   Class Mammalia     Order Primates       Family Hominidae        Genus Homo          Species Homo sapiens (1579) (1735) (1837)

Typical structure learning algorithms assume a fixed structural form Flat Clusters Line Circle K-Means Mixture models Competitive learning Guttman scaling Ideal point models Circumplex models Tree Grid Euclidean Space Hierarchical clustering Bayesian phylogenetics Self-Organizing Map Generative topographic mapping MDS PCA Factor Analysis

“Universal Structure Learner” The ultimate goal “Universal Structure Learner” K-Means Hierarchical clustering Factor Analysis Guttman scaling Circumplex models Self-Organizing maps ··· Data Representation

A “universal grammar” for structural forms Process Form Process

Hierarchical Bayesian Framework F: form Favors simplicity Favors smoothness [Zhu et al., 2003] mouse squirrel chimp gorilla S: structure F1 F2 F3 F4 D: data mouse squirrel chimp gorilla

Model fitting Evaluate each form in parallel For each form, heuristic search over structures based on greedy growth from a one-node seed:

Structural forms from relational data Dominance hierarchy Tree Cliques Ring Primate troop Bush administration Prison inmates Kula islands “x beats y” “x told y” “x likes y” “x trades with y”

Development of structural forms as more data are observed

Beyond “Nativism” versus “Empiricism” “Nativism”: Explicit knowledge of structural forms for core domains is innate. Atran (1998): The tendency to group living kinds into hierarchies reflects an “innately determined cognitive structure”. Chomsky (1980): “The belief that various systems of mind are organized along quite different principles leads to the natural conclusion that these systems are intrinsically determined, not simply the result of common mechanisms of learning or growth.” “Empiricism”: General-purpose learning systems without explicit knowledge of structural form. Connectionist networks (e.g., Rogers and McClelland, 2004). Traditional structure learning in probabilistic graphical models.

Summary: learning from examples Bayesian inference over hierarchies of structured representations provides a framework to understand core questions of human learning: What is the content and form of human knowledge, at multiple levels of abstraction? How does abstract domain knowledge guide learning of new concepts? How is abstract domain knowledge learned? What must be built in? F: form mouse squirrel S: structure chimp gorilla F1 F2 F3 F4 D: data mouse squirrel chimp gorilla How can domain-general learning mechanisms acquire domain-specific representations? How can probabilistic inference work together with symbolic, flexibly structured representations?

Learning word meanings Bayesian inference over tree-structured hypothesis space: (Xu & Tenenbaum; Schmidt & Tenenbaum) “tufa” “tufa” “tufa”

Learning word meanings Shape bias Taxonomic principle Contrast principle Basic-level bias Representative examples Principles Structure Data

Learning causal relations Abstract Principles Structure Data (Griffiths, Tenenbaum, Kemp et al.)

Causal learning with prior knowledge (Griffiths, Sobel, Tenenbaum & Gopnik) “Backwards blocking” paradigm: Initial AB Trial A Trial

First-order probabilistic theories for causal learning

Learning causal relations Structure Data conditions patients has(patient,condition)

Abstract causal theories Classes = {C} Laws = {C C} Classes = {R, D, S} Laws = {R D, D S} Classes = {R, D, S} Laws = {S D}

Learning causal relations R: working in factory, smoking, stress, high fat diet, … D: flu, bronchitis, lung cancer, heart disease, … S: headache, fever, coughing, chest pain, … Abstract Principles Classes = {R, D, S} Laws = {R D, D S} Structure Data conditions patients has(patient,condition)

True structure of graphical model G: Graph G Data D Abstract Theory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # of samples: 20 80 1000 Graph G edge (G) Data D Classes Z 1 2 3 4 5 6 … 7 8 9 10 11 12 13 14 15 16 … class (z) Abstract Theory c1 … c2 h c1 c2 c1 0.0 0.4 … c2 0.0 0.0 … edge (G) Graph G Data D (Mansinghka, Kemp, Tenenbaum, Griffiths UAI 06)

“Universal Grammar” Grammar Phrase structure Utterance Speech signal Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG) P(grammar | UG) Grammar P(phrase structure | grammar) Phrase structure P(utterance | phrase structure) Utterance P(speech | utterance) Speech signal (Jurafsky; Levy & Jaeger; Klein & Manning; Perfors et al., ….)

Vision as probabilistic parsing “Analysis by Synthesis” (Han & Zhu, 2006)

Goal-directed action (production and comprehension) (Wolpert et al., 2003)

Understanding goal-directed actions Heider and Simmel Csibra & Gergely Constraints Goals Principle of rationality: An intentional agent plans actions to achieve its goals most efficiently given its environmental constraints. Actions

Goal inference as inverse probabilistic planning Constraints Goals Goal inference as inverse probabilistic planning Rational planning (PO)MDP (Baker, Tenenbaum & Saxe) Actions human judgments model predictions

The big picture What we need to understand: the mind’s ability to build rich models of the world from sparse data. Learning about objects, categories, and their properties. Causal inference Language comprehension and production Scene understanding Understanding other people’s actions, plans, thoughts, goals What do we need to understand these abilities? Bayesian inference in probabilistic generative models Hierarchical models, with inference at all levels of abstraction Structured representations: graphs, grammars, logic Flexible representations, growing in response to observed data

Open directions and challenges Effective methods for learning structured knowledge How to balance expressiveness and learnability? … flexibility and constraint? More precise relation to psychological processes To what extent do mental processes implement boundedly rational methods of approximate inference? Relation to neural computation How to implement structured representations in brains? Understanding failure cases Are these simply “not Bayesian”, or are people using a different model? How do we avoid circularity?

The “standard model” of learning in neuroscience Supervised Unsupervised

Learning grounded causal models (Goodman, Mansinghka & Tenenbaum) A child learns that petting the cat leads to purring, while pounding leads to growling. But how to learn these symbolic event concepts over which causal links are defined? a b c a b c a b c a b c

The chicken-and-egg problem of structure learning and feature selection A raw data matrix:

The chicken-and-egg problem of structure learning and feature selection Conventional clustering (CRP mixture):

Learning multiple structures to explain different feature subsets (Shafto, Kemp, Mansinghka, Gordon & Tenenbaum, 2006) CrossCat: System 1 System 2 System 3

The “nonparametric safety-net” 12 True structure of graphical model G: 11 1 10 2 9 3 8 4 7 5 6 # of samples: 40 100 1000 Graph G edge (G) Data D edge (G) Abstract theory Z Graph G class (z) Data D

Bayesian prediction P(ttotal|tpast)  1/ttotal P(tpast) posterior probability Random sampling Domain-dependent prior What is the best guess for ttotal? Compute t such that P(ttotal > t|tpast) = 0.5: P(ttotal|tpast) We compared the median of the Bayesian posterior with the median of subjects’ judgments… but what about the distribution of subjects’ judgments? ttotal

Sources of individual differences Individuals’ judgments could by noisy. Individuals’ judgments could be optimal, but with different priors. e.g., each individual has seen only a sparse sample of the relevant population of events. Individuals’ inferences about the posterior could be optimal, but their judgments could be based on probability (or utility) matching rather than maximizing.

Individual differences in prediction P(ttotal|tpast) ttotal Proportion of judgments below predicted value Quantile of Bayesian posterior distribution

Individual differences in prediction P(ttotal|tpast) ttotal Average over all prediction tasks: movie run times movie grosses poem lengths life spans terms in congress cake baking times