Bayesian models of inductive learning and reasoning Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL)

Charles Kemp Pat Shafto Vikash Mansinghka Amy Perfors Lauren Schmidt Chris Baker Noah Goodman Collaborators Tom Griffiths

Everyday inductive leaps How can people learn so much about the world from such limited evidence? –Learning concepts from examples “horse”

Learning concepts from examples “tufa”

Everyday inductive leaps How can people learn so much about the world from such limited evidence? –Kinds of objects and their properties –The meanings of words, phrases, and sentences –Cause-effect relations –The beliefs, goals and plans of other people –Social structures, conventions, and rules

The solution Prior knowledge (inductive bias).

The solution Prior knowledge (inductive bias). –How does background knowledge guide learning from sparsely observed data? –What form does background knowledge take, across different domains and tasks? –How is background knowledge itself acquired? The challenge: Can we answer these questions in precise computational terms?

Modeling goals Principled quantitative models of human inductive inferences, with broad coverage and a minimum of free parameters and ad hoc assumptions. An understanding of how and why human learning and reasoning works, as a species of rational (approximately optimal) statistical inference given the structure of natural environments. A two-way bridge to artificial intelligence and machine learning.

Bayesian inference Bayes’ rule: An example –Data: John is coughing –Some hypotheses: 1. John has a cold 2. John has lung cancer 3. John has a stomach flu –Likelihood P(d|h) favors 1 and 2 over 3 –Prior probability P(h) favors 1 and 3 over 2 –Posterior probability P(h|d) favors 1 over 2 and 3

1.How does background knowledge guide learning from sparsely observed data? Bayesian inference: 2. What form does background knowledge take, across different domains and tasks? Probabilities defined over structured representations: graphs, grammars, predicate logic, schemas, theories. 3. How is background knowledge itself acquired? Hierarchical probabilistic models, with inference at multiple levels of abstraction. Flexible nonparametric models in which complexity grows with the data. The Bayesian modeling toolkit

A case study: learning about objects and their properties “Property induction”, “category-based induction” (Rips, 1975; Osherson, Smith et al., 1990) Gorillas have T9 hormones. Seals have T9 hormones. Squirrels have T9 hormones. Horses have T9 hormones. Gorillas have T9 hormones. Chimps have T9 hormones. Monkeys have T9 hormones. Baboons have T9 hormones. Horses have T9 hormones. Gorillas have T9 hormones. Seals have T9 hormones. Squirrels have T9 hormones. Flies have T9 hormones. “Similarity”, “Typicality”, “Diversity”

20 subjects rated the strength of 45 arguments: X 1 have property P. (e.g., Cows have T4 hormones.) X 2 have property P. X 3 have property P. All mammals have property P. [General argument] 20 subjects rated the strength of 36 arguments: X 1 have property P. X 2 have property P. Horses have property P. [Specific argument] Experiments on property induction ( Osherson, Smith, Wilkie, Lopez, Shafir, 1990)

???????????????? Features New property ? Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant 85 features for 50 animals (Osherson & Wilkie feature rating task). e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,… Property induction as a computational problem

Model Data Similarity-based models Each “ ” represents one argument: X 1 have property P. X 2 have property P. X 3 have property P. All mammals have property P..

Beyond similarity in induction Reasoning based on dimensional thresholds: (Smith et al., 1993) Reasoning based on causal relations: (Medin et al., 2004; Coley & Shafto, 2003) Poodles can bite through wire. German shepherds can bite through wire. Dobermans can bite through wire. German shepherds can bite through wire. Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Salmon carry E. Spirus bacteria.

1.How does background knowledge guide learning from sparsely observed data? Bayesian inference: 2. What form does background knowledge take, across different domains and tasks? Probabilities defined over structured representations: graphs, grammars, predicate logic, schemas, theories. 3. How is background knowledge itself acquired? Hierarchical probabilistic models, with inference at multiple levels of abstraction. Flexible nonparametric models in which complexity grows with the data. The Bayesian modeling toolkit

F: form S: structure D: data Tree with species at leaf nodes mouse squirrel chimp gorilla mouse squirrel chimp gorilla F1 F2 F3 F4 Has T9 hormones ?????? … P(structure | form) P(data | structure) P(form) Model overview

People were given 48 animals, 85 features, and asked to rate whether each animal had each feature. –E.g., elephant: gray, hairless, tough skin, big, bulbous, long legs, tail, chew teeth, tusks, smelly, walks, slow, strong, muscle, quadrapedal, inactive, vegetation, grazer, oldworld, bush, jungle, ground, timid, smart, group. Feature rating data (Osherson and Wilkie) ???????????????? Features New property Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant ?

F: form S: structure D: data Tree with species at leaf nodes mouse squirrel chimp gorilla mouse squirrel chimp gorilla F1 F2 F3 F4 Has T9 hormones ?????? … Model overview

???????????????? Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant... Horses have T9 hormones Rhinos have T9 hormones Cows have T9 hormones X Y } Prior P(h) Hypotheses h

???????????????? Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant... Horses have T9 hormones Rhinos have T9 hormones Cows have T9 hormones } Prediction P(Y | X)Hypotheses h Prior P(h) X Y

Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant... Prior P(h) Why not just enumerate all logically possible hypotheses along with their relative prior probabilities? Where does the prior come from?

The need for inductive bias Without a highly constrained prior, learning from sparse data is impossible. N = 5 N = 100N = 500 An analogy: Learning a smooth probability density by local interpolation (kernel density estimation).

N = 5 Assuming an appropriately structured form for density (e.g., Gaussian) leads to better generalization from sparse data. The need for inductive bias Without a highly constrained prior, learning from sparse data is impossible. An analogy: Learning a smooth probability density by local interpolation (kernel density estimation).

Knowledge-based priors Chimps have T9 hormones. Gorillas have T9 hormones. Poodles can bite through wire. Dobermans can bite through wire. Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Taxonomic similarity Jaw strength Food web relations

The value of structural form knowledge: a more abstract level of inductive bias

F: form S: structure D: data Tree with species at leaf nodes mouse squirrel chimp gorilla mouse squirrel chimp gorilla F1 F2 F3 F4 Has T9 hormones ?????? … Model overview

Smooth: P(h) high P(D|S): How the structure constrains the data of experience Define a stochastic process over structure S that generates candidate property extensions h. –Intuition: properties should vary smoothly over structure. Not smooth: P(h) low

S y P(D|S): How the structure constrains the data of experience h d ij = length of the edge between i and j (= if i and j are not connected) A Gaussian prior ~ N(0,  ), with (Zhu, Lafferty & Ghahramani, 2003)

Species 1 Species 2 Species 3 Species 4 Species 5 Species 6 Species 7 Species 8 Species 9 Species 10 Structure S Data D Features 85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…

Modeling feature covariance based on distance in graph (Zhu et al., 2003; c.f. Sattath & Tversky, 1977)

Modeling feature covariance based on distance in two- dimensional space (Lawrence, 2004; Smola & Kondor 2003; c.f. Shepard, 1987)

Species 1 Species 2 Species 3 Species 4 Species 5 Species 6 Species 7 Species 8 Species 9 Species 10 FeaturesNew property Structure S Data D ???????????????? 85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…

Gorillas have property P. Mice have property P. Seals have property P. All mammals have property P. Cows have property P. Elephants have property P. Horses have property P. Tree 2D

Testing different priors Correct bias Wrong bias Too weak bias Too strong bias Inductive bias x

A connectionist alternative (Rogers and McClelland, 2004) Features Species Emergent structure: clustering on hidden unit activation vectors

Spatially varying properties Geographic inference task: “Given that a certain kind of native American artifact has been found in sites near city X, how likely is the same artifact to be found near city Y?” Tree 2D

Property type “has T9 hormones” “can bite through wire” “carry E. Spirus bacteria” Theory Structure taxonomic tree directed chain directed network + diffusion process + drift process + noisy transmission Class C Class A Class D Class E Class G Class F Class B Class C Class A Class D Class E Class G Class F Class B Class A Class B Class C Class D Class E Class F Class G... Class C Class G Class F Class E Class D Class B Class A Hypotheses

Biological property Disease property Tree Web “Given that A has property P, how likely is it that B does?” Kelp Human Dolphin Sand shark Mako shark Tuna Herring Kelp Human Dolphin Sand shark Mako shark TunaHerring e.g., P = “has X cells” e.g., P = “has X disease”

Summary so far A framework for modeling human inductive reasoning as rational statistical inference over structured knowledge representations –Qualitatively different priors are appropriate for different domains of property induction. –In each domain, a prior that matches the world’s structure fits people’s judgments well, and better than alternative priors. –A language for representing different theories: graph structure defined over objects + probabilistic model for the distribution of properties over that graph. Remaining question: How can we learn appropriate structures for different domains?

Model overview F: form S: structure D: data mouse squirrel chimp gorilla F1 F2 F3 F4 Tree mouse squirrel chimp gorilla mouse squirrel chimp gorilla SpaceChain chimp gorilla squirrel mouse

Discovering structural forms OstrichRobinCrocodileSnakeBatOrangutanTurtle Ostrich Robin Crocodile Snake Bat Orangutan Turtle OstrichRobinCrocodileSnakeBatOrangutanTurtle

OstrichRobinCrocodileSnakeBatOrangutanTurtle Ostrich Robin Crocodile Snake Bat Orangutan Turtle Angel God Rock Plant OstrichRobinCrocodileSnakeBatOrangutanTurtle Discovering structural forms Linnaeus “Great chain of being”

Scientific discoveries Children’s cognitive development –Hierarchical structure of category labels –Clique structure of social groups –Cyclical structure of seasons or days of the week –Transitive structure for value People can discover structural forms Tree structure for biological species Periodic structure for chemical elements (1579) (1837) Systema Naturae Kingdom Animalia Phylum Chordata Class Mammalia Order Primates Family Hominidae Genus Homo Species Homo sapiens (1735) “great chain of being”

Typical structure learning algorithms assume a fixed structural form Flat Clusters K-Means Mixture models Competitive learning Line Guttman scaling Ideal point models Tree Hierarchical clustering Bayesian phylogenetics Circle Circumplex models Euclidean Space MDS PCA Factor Analysis Grid Self-Organizing Map Generative topographic mapping

The ultimate goal “Universal Structure Learner” K-Means Hierarchical clustering Factor Analysis Guttman scaling Circumplex models Self-Organizing maps ··· Data Representation

Hypothesis space of structural forms Order ChainRingPartition HierarchyTreeGridCylinder

A “universal grammar” for structural forms Form Process

Node-replacement graph grammars Production (Line) Derivation

Production (Line) Derivation Node-replacement graph grammars

F: form S: structure D: data mouse squirrel chimp gorilla F1 F2 F3 F4 Favors simplicity Favors smoothness [Zhu et al., 2003] Tree mouse squirrel chimp gorilla ClustersLinear chimp gorilla squirrel mouse squirrel chimp gorilla

F: form S: structure D: data mouse squirrel chimp gorilla F1 F2 F3 F4 Favors simplicity Favors smoothness [Zhu et al., 2003] Tree mouse squirrel chimp gorilla GridLinear chimp gorilla squirrel mouse squirrel chimpgorilla x

Learning algorithm Evaluate each form in parallel For each form, heuristic search over structures based on greedy growth from a one-node seed:

animals features cases judges

objects similarities

Primate troop Bush administration Prison inmates Kula islands “x beats y” “x told y”“x likes y” “x trades with y” Dominance hierarchy Tree Cliques Ring Structural forms from relational data

Using structural forms: Inductive bias for learning about new objects

Lab studies of learning structural forms Training: Observe messages passed between employees (a, b, c, …) in a company. Transfer test: Predict messages sent to and from new employees x and y. Link observed in training Link observed in transfer test

Development of structural forms as more data are observed “blessing of abstraction”

Beyond “Nativism” versus “Empiricism” “Nativism”: Explicit knowledge of structural forms for core domains is innate. –Atran (1998): The tendency to group living kinds into hierarchies reflects an “innately determined cognitive structure”. –Chomsky (1980): “The belief that various systems of mind are organized along quite different principles leads to the natural conclusion that these systems are intrinsically determined, not simply the result of common mechanisms of learning or growth.” “Empiricism”: General-purpose learning systems without explicit knowledge of structural form. –Connectionist networks (e.g., Rogers and McClelland, 2004). –Traditional structure learning in probabilistic graphical models.

Conclusion Bayesian inference over hierarchies of structured representations provides a framework to understand core questions of human cognition: –What is the content and form of human knowledge, at multiple levels of abstraction? –How does abstract domain knowledge guide learning of new concepts? –How is abstract domain knowledge learned? What must be built in? F: form S: structure D: data mouse squirrel chimp gorilla mouse squirrel chimp gorilla F1 F2 F3 F4 –How can domain-general learning mechanisms acquire domain- specific representations? How can probabilistic inference work together with symbolic, flexibly structured representations?

Principles Structure Data Whole-object principle Shape bias Taxonomic principle Contrast principle Basic-level bias Learning word meanings

Causal learning and reasoning Principles Structure Data (Griffiths, Tenenbaum, et al.)

Phrase structure Utterance Speech signal Grammar “Universal Grammar” Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG) P(phrase structure | grammar) P(utterance | phrase structure) P(speech | utterance) (c.f. Chater and Manning, 2006) P(grammar | UG)

(Han & Zhu, 2006; c.f., Zhu, Yuanhao & Yuille NIPS 06 ) Vision as probabilistic parsing

Goal-directed action (production and comprehension) (Wolpert, Doya and Kawato, 2003)

The big picture What we need to understand: the mind’s ability to build rich models of the world from sparse data. –Learning about objects, categories, and their properties. –Language comprehension and production –Scene understanding –Causal inference –Understanding other people’s actions, plans, thoughts, goals What do we need to understand these abilities? –Bayesian inference in probabilistic generative models –Hierarchical models, with inference at all levels of abstraction –Structured representations: graphs, grammars, logic –Flexible representations, growing in response to observed data

Bayesian models of inductive learning and reasoning Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL)

Similar presentations

Presentation on theme: "Bayesian models of inductive learning and reasoning Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bayesian models of inductive learning and reasoning Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL)

Similar presentations

Presentation on theme: "Bayesian models of inductive learning and reasoning Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL)"— Presentation transcript:

Similar presentations

About project

Feedback