Todo How to handle intro quote. Yet I’d like to think…. I’m not the only one. Make sure to practice talking through the shape bias part. –How to explain.

Todo How to handle intro quote. Yet I’d like to think…. I’m not the only one. Make sure to practice talking through the shape bias part. –How to explain the variables in the HBM. –How I will characterize this model, almost a toy model, but it illustrates the idea and serves as a warmup. –Can dirichlet-multi represent diff between 1 2 1 2 1 2 and 1 2 3 4 5 6? Transition from shape bias to property induction? One slide on “beyond similarity-based induction” –Work out timecheck for skipping –Think through what I’ll say, including learning based on instruction. Practice talking about relational learning experiment, and how the relational structural form model is defined. Practice talking about bottom-up causal learning – the intuitions in the blicket detector experiment and the disease symptom case. Figure out how to explain Zhu, Lafferty, Gh, quickly. Nail the conclusion!

Bayesian models of human inductive learning Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL)

Charles Kemp Pat Shafto Vikash Mansinghka Amy Perfors Lauren Schmidt Chris Baker Noah Goodman Lab members Tom Griffiths (alum) Funding: AFOSR Cognition and Decision Program, AFOSR MURI, DARPA IPTO, NSF, ONR MURI, NTT Communication Sciences Laboratories, James S. McDonnell Foundation

The probabilistic revolution in AI Principled and effective solutions for inductive inference from ambiguous data: –Vision –Robotics –Machine learning –Expert systems / reasoning –Natural language processing Standard view: no necessary connection to how the human brain solves these problems. “Many people in machine learning get into the field because they are interested in how humans learn, rather than how convex functions are optimized, and how we can get machines to be more like humans.”

Everyday inductive leaps How can people learn so much about the world from such limited evidence? –Learning concepts from examples “horse”

Learning concepts from examples “tufa”

Everyday inductive leaps How can people learn so much about the world from such limited evidence? –Kinds of objects and their properties –The meanings of words, phrases, and sentences –Cause-effect relations –The beliefs, goals and plans of other people –Social structures, conventions, and rules

The solution Strong prior knowledge (inductive bias).

The solution Strong prior knowledge (inductive bias). –How does background knowledge guide learning from sparsely observed data? –What form does the knowledge take, across different domains and tasks? –How is that knowledge itself learned? –How can inductive biases be so strong yet so flexible? Our goal is a computational framework for answering these questions.

Notes on slide before Highlight the issue of inductive bias, balancing flexibility and constraint. Put this text after the third question on previous slide, setting up the fourth. In principle, inductive biases don’t have to be learned. In ML, they are often thought of as hard-wired, engineered complements to the data-driven component. In cog sci, innate knowledge. But some have to be learned. Some of the important ones aren’t present in the youngest children, but appear later, and are clearly influenced by experience. We are also ready to give them up and adopt new biases seemingly very quickly. E.g., prism adaptation, physics adaptation. The third and fourth questions – the problem of learning good inductive biases, and exploiting strong biases while maintaining flexibility – are key ones for ML. And they may be key to distinctively human learning. the cognitive niche. As best as we can tell, other animals can be very smart, and often can have very clever inductive biases. But more or less these are hard-wired through evolution, they think about the same things they have always thought about. Exceptions to this trend are the most human- like ways that animals act. Continue to consider the order of the three questions. Issue is flow, both in intro and the talk transitions (from shape bias to the rest). Also, tradeoff in representational richness vs. learnability (BUT PROBABLY KEEP THIS ONLY IMPLICIT, NOT EXPLICIT).

1. How does background knowledge guide learning from sparsely observed data? Bayesian inference, with priors based on background knowledge. 2. What form does background knowledge take, across different domains and tasks? Probabilities defined over structured representations: graphs, grammars, predicate logic, relational schemas, theories. 3. How is background knowledge itself learned? Hierarchical Bayesian models, with inference at multiple levels of abstraction. 4. How can inductive biases be so strong yet so flexible? Nonparametric models, growing in complexity as the data require. The approach : from statistics to intelligence

Notes on slide before All these themes are familiar in contemporary ML, but we are mixing them up in some slightly new ways, driven by the problems of human learning. Even if you’re primary interest isn’t in human learning, I hope there might be some lessons here about how to design more human-like ML systems.

Outline Three case studies in inductive learning. 1.Word learning 2.Property induction 3.Causal learning

The “shape bias” in word learning (Landau, Smith, Jones 1988) This is a dax.Show me the dax… English-speaking children typically show the shape bias at 24 months, but not at 20 months. The shape bias is a useful inductive constraint: majority of early words are labels for object categories, and shape may be the best cue to object category membership.

Is the shape bias learned? Smith et al (2002) trained 17-month-olds on labels for 4 artificial categories: After 8 weeks of training (20 min/week), 19-month- olds show the shape bias: “wib” “lug” “zup” “div” This is a dax. Show me the dax… “Transfer Learning” “Learned attentional bias” “Transfer learning”

Transfer to real-world vocabulary The puzzle: The shape bias is a powerful inductive constraint, yet can be learned from very little data.

Learning about feature variability “wib” “lug” “zup” “div” The intuition: - Shape varies across categories but relatively constant within categories. - Other features (size, color, texture) vary both across and within nameable object categories.

? Learning about feature variability Marbles of different colors: …

? … Learning about feature variability

A hierarchical model Level 1: Bag proportions Data … … mostly red mostly brown mostly blue Color varies across bags but not much within bags Level 2: Bags in general mostly yellow mostly green?

A hierarchical Bayesian model Level 1: Bag proportions Data Level 2: Bags in general Level 3: Prior expectations on bags in general … … Simultaneously infer

A hierarchical Bayesian model Level 1: Bag proportions Data Level 2: Bags in general Level 3: Prior expectations on bags in general … … “Bag 1 is mostly red” x

A hierarchical Bayesian model Level 1: Bag proportions Data Level 2: Bags in general Level 3: Prior expectations on bags in general … … “Bag 2 is mostly yellow” x

A hierarchical Bayesian model Level 1: Bag proportions Data Level 2: Bags in general Level 3: Prior expectations on bags in general … … “Color varies across bags but not much within bags”

Learning the shape bias Assuming independent Dirichlet-multinomial models for each dimension, we learn that… –Shape varies across categories but not within categories. –Texture, color, size vary across and within categories. “wib” “lug” “zup” “div” Training

This is a dax. Show me the dax… Training Test Learning the shape bias

Limitations This model makes several simplifying assumptions. –A single multinomial variable codes for the “dimension” of shape, with each state corresponding to a nameable shape category. –Focuses on learning names for object categories, not other kinds of words. Are these assumptions oversimplifications? –They may be plausible for a 1-year-old, and necessary to explain how quickly the shape bias can be learned. –The model can also be extended, replacing these assumptions with more powerful learning capacities.

Notes on limitations This is a very simple model. It leaves out or oversimplifies many things. –Assume that there is a single multinomial variable coding for each nameable shape category. –Assumes that we are only learning names for object categories, not just other kinds of words. –But it is not so clear whether these assumptions really are so oversimplified in the case of human children. The shape-bias is an abstract inductive constraint, useful for learning more specific knowledge at the level of individual category labels, but it itself can be learned remarkably quickly. Perhaps that is because it depends on higher-level inductive biases, already in place, in the form of these assumptions. Even by age 12 months, there is evidence that infants are particularly interested in objects, categorize objects by basic-level shape categories. and many of their first words are names for objects. –But it also possible to extend the model so it does not depend on these assumptions… next slide.

Extensions Learning with weaker shape representations. Learning to transfer selectively, dependent on knowledge of ontological kinds. –By age ~3, children know that a shape bias is appropriate for solid object categories (ball, book, toothbrush, …), while a material bias is appropriate for nonsolid substance categories (juice, sand, toothpaste, …). Category Holes Curvature Edges Aspect ratio Main color Color distribution Oriented texture Roughness Shape features { Other features { TrainingTest 1 1 2 2 3 3 4 4 3 3 2 2 3 3 1 2 1 1 2 3 4 4 4 4 5 3 4 4 1 1 4 4 2 2 1 1 3 1 5 5 1 2 2 5 3 3 1 4 4 3 1 5 2 3 2 4 5 1 2 2 1 4 5 5 2 2 4 2 3 5 4 3 5 ? ? 6 6 5 6 5 6

“dax”“zav”“fep”“wif”“wug”“toof” Variability in solidity, shape, material within kind 1 “toof” material “dax” shape solid non-solid Modeling selective transfer Let be the ontological kind of category i. Given, we could learn a separate Dirichlet-multinomial model for each ontological kind: Variability in solidity, shape, material within kind 2

Notes on slide before Say this, and if possible figure out how to put it on the slide: Feature variability for kind 1: Solidity: fixed across categories (all solid) Shape: variable across categories but fixed within categories Color, texture: variable within categories. Feature variability for kind 2: Solidity: fixed across categories (all nonsolid) Shape: variable within categories Color, texture: variable across categories but fixed within categories.

Chicken-and-egg problem: We don’t know the partition into ontological kinds. The input: Solution: Define a nonparametric prior over this partition. Learning to transfer selectively solid non-solid “zav” “dax” “wif” “wug” (c.f. Roy & Kaelbling IJCAI 07)

How likely is the conclusion, given the premises? “Similarity”, “Typicality”, “Diversity” Gorillas have T9 hormones. Seals have T9 hormones. Squirrels have T9 hormones. Horses have T9 hormones. Gorillas have T9 hormones. Chimps have T9 hormones. Monkeys have T9 hormones. Baboons have T9 hormones. Horses have T9 hormones. Gorillas have T9 hormones. Seals have T9 hormones. Squirrels have T9 hormones. Flies have T9 hormones. Property induction

The computational problem ???????????????? Features New property ? Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant 85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,… “Transfer Learning”, “Semi-Supervised Learning”

F: form S: structure D: data Tree with species at leaf nodes mouse squirrel chimp gorilla mouse squirrel chimp gorilla F1 F2 F3 F4 Has T9 hormones ?????? … P(structure | form) P(data | structure) P(form) Hierarchical Bayesian Framework

The value of structural form knowledge: a more abstract level of inductive bias

F: form S: structure D: data Tree with species at leaf nodes mouse squirrel chimp gorilla mouse squirrel chimp gorilla F1 F2 F3 F4 Has T9 hormones ?????? … Hierarchical Bayesian Framework Property induction

Smooth: P(h) high P(D|S): How the structure constrains the data of experience Define a stochastic process over structure S that generates candidate property extensions h. –Intuition: properties should vary smoothly over structure. Not smooth: P(h) low

S y Gaussian Process (~ random walk, diffusion) Threshold P(D|S): How the structure constrains the data of experience h [Zhu, Lafferty & Ghahramani 2003]

S y Gaussian Process (~ random walk, diffusion) Threshold P(D|S): How the structure constrains the data of experience [Zhu, Lafferty & Ghahramani 2003] h

Let d ij be the length of the edge between i and j (= if i and j are not connected) A graph-based prior A Gaussian prior ~ N(0,  ), with (Zhu, Lafferty & Ghahramani, 2003)

Species 1 Species 2 Species 3 Species 4 Species 5 Species 6 Species 7 Species 8 Species 9 Species 10 Structure S Data D Features 85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…

[c.f., Lawrence, 2004; Smola & Kondor 2003]

Species 1 Species 2 Species 3 Species 4 Species 5 Species 6 Species 7 Species 8 Species 9 Species 10 FeaturesNew property Structure S Data D ???????????????? 85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…

Gorillas have property P. Mice have property P. Seals have property P. All mammals have property P. Cows have property P. Elephants have property P. Horses have property P. Tree 2D

Testing different priors Correct bias Wrong bias Too weak bias Too strong bias Inductive bias

A connectionist alternative (Rogers and McClelland, 2004) Features Species Emergent structure: clustering on hidden unit activation vectors

Learning about spatial properties Geographic inference task: “Given that a certain kind of native American artifact has been found in sites near city X, how likely is the same artifact to be found near city Y?” Tree 2D

Beyond similarity-based induction Biological property Disease property Tree Web “Given that A has property P, how likely is it that B does?” Kelp Human Dolphin Sand shark Mako shark Tuna Herring Kelp Human Dolphin Sand shark Mako shark TunaHerring e.g., P = “has X cells” e.g., P = “has X disease”

Hierarchical Bayesian Framework F: form S: structure D: data mouse squirrel chimp gorilla F1 F2 F3 F4 Tree mouse squirrel chimp gorilla mouse squirrel chimp gorilla SpaceChain chimp gorilla squirrel mouse

Discovering structural forms OstrichRobinCrocodileSnakeBatOrangutanTurtle Ostrich Robin Crocodile Snake Bat Orangutan Turtle OstrichRobinCrocodileSnakeBatOrangutanTurtle

OstrichRobinCrocodileSnakeBatOrangutanTurtle Ostrich Robin Crocodile Snake Bat Orangutan Turtle Angel God Rock Plant OstrichRobinCrocodileSnakeBatOrangutanTurtle Discovering structural forms Linnaeus “Great chain of being”

Scientific discoveries Children’s cognitive development –Hierarchical structure of category labels –Clique structure of social groups –Cyclical structure of seasons or days of the week –Transitive structure for value People can discover structural forms Tree structure for biological species Periodic structure for chemical elements (1579) (1837) Systema Naturae Kingdom Animalia Phylum Chordata Class Mammalia Order Primates Family Hominidae Genus Homo Species Homo sapiens (1735) “great chain of being”

Typical structure learning algorithms assume a fixed structural form Flat Clusters K-Means Mixture models Competitive learning Line Guttman scaling Ideal point models Tree Hierarchical clustering Bayesian phylogenetics Circle Circumplex models Euclidean Space MDS PCA Factor Analysis Grid Self-Organizing Map Generative topographic mapping

The ultimate goal “Universal Structure Learner” K-Means Hierarchical clustering Factor Analysis Guttman scaling Circumplex models Self-Organizing maps ··· Data Representation

A “universal grammar” for structural forms Form Process

F: form S: structure D: data Hierarchical Bayesian Framework Favors simplicity Favors smoothness [Zhu et al., 2003] mouse squirrel chimp gorilla F1 F2 F3 F4 mouse squirrel chimp gorilla

Primate troop Bush administration Prison inmates Kula islands “x beats y” “x told y”“x likes y” “x trades with y” Dominance hierarchy Tree Cliques Ring Structural forms from relational data

Using structural forms: Inductive bias for learning about new objects

Lab studies of learning structural forms Training: Observe messages passed between employees (a, b, c, …) in a company. Transfer test: Predict messages sent to and from new employees x and y. Link observed in training Link observed in transfer test

Development of structural forms as more data are observed “blessing of abstraction”

Beyond “Nativism” versus “Empiricism” “Nativism”: Explicit knowledge of structural forms for core domains is innate. –Atran (1998): The tendency to group living kinds into hierarchies reflects an “innately determined cognitive structure”. –Chomsky (1980): “The belief that various systems of mind are organized along quite different principles leads to the natural conclusion that these systems are intrinsically determined, not simply the result of common mechanisms of learning or growth.” “Empiricism”: General-purpose learning systems without explicit knowledge of structural form. –Connectionist networks (e.g., Rogers and McClelland, 2004). –Traditional structure learning in probabilistic graphical models.

Bayesian network Data Learning causal relations

Abstract theory Data Learning causal relations (c.f. First-Order Probabilistic Models, BLOG) Two types of variables: Contact(Object,Machine), Active(Machine) Contact can cause Activation Machines are (near) deterministic Three types of variables: Behavior(X), Disease(X), Symptom(X) Behaviors can cause Diseases Diseases can cause Symptoms Bayesian network

attributes (1-12) observed data True network Sample 75 observations… patients Learning with a uniform prior on network structures:

True network Sample 75 observations… Learning a block- structured prior on network structures: (Mansinghka et al. UAI 06) attributes (1-12) observed data patients z  1 2 3 4 0.8 0.00.01 0.0 0.75 0.0 5 6 7 8 9 10 11 12

True structure of Bayesian network N: edge (N) class (Z) edge (N) 123456 78910111213141516 # of samples: 20 80 1000 Data D Network N Data D Network N Abstract Theory 1 2 3 4 5 6 … 7 8 9 10 11 12 13 14 15 16 … … 0.4 0.0 … … (Mansinghka, Kemp, Tenenbaum, Griffiths UAI 06) c1c1 c2c2 c1c1 c2c2 c1c1 c2c2 Classes Z  “blessing of abstraction”

The flexibility of a nonparametric prior edge (N) class (Z) edge (N) 1 2 3 4 5 6 7 8 9 10 11 12 # of samples: 40 100 1000 True structure of Bayesian network N: Data D Network N Data D Network N Abstract Theory 1 2 3 4 5 6 7 8 9 10 11 12 … 0.1 c1c1 c1c1 c1c1  Classes Z … … …

Summary Structure Data mouse squirrel chimp gorilla mouse squirrel chimp gorilla F1 F2 F3 F4 Abstract knowledge Modeling human inductive learning as Bayesian inference over hierarchies of flexibly structured representations. Classes of variables: B, D, S Causal laws: B D, D S “dax” “zav” “fep” “zav ” “dax” “fep” Shape varies across categories but not within categories. Texture, color, size vary within categories. Word learningProperty inductionCausal learning

Conclusions Computational tools for studying core questions of human learning (and building more human-like ML?) –What is the content and form of human knowledge, at multiple levels of abstraction? –How does abstract domain knowledge guide new learning? –How can abstract domain knowledge itself be learned? –How can inductive biases be so strong yet so flexible? A different way to think about the development of natural (or artificial?) cognitive systems. –Powerful abstractions can be learned “from the top down”, together with or prior to learning more concrete knowledge. Go beyond the traditional dichotomies of cog sci (and AI). –How can domain-general learning mechanisms acquire domain- specific representations? –How can structured symbolic representations be acquired by statistical learning?

Extra slides

Phrase structure Utterance Speech signal Grammar “Universal Grammar” Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG) P(phrase structure | grammar) P(utterance | phrase structure) P(speech | utterance) (c.f. Chater and Manning, 2006) P(grammar | UG)

(Han & Zhu, 2006; c.f., Zhu, Yuanhao & Yuille NIPS 06 ) Vision as probabilistic parsing

Principles Structure Data Whole-object principle Shape bias Taxonomic principle Contrast principle Basic-level bias Learning word meanings

“tufa” Word learning Bayesian inference over tree- structured hypothesis space: (Xu & Tenenbaum; Schmidt & Tenenbaum)

Causal learning with prior knowledge (Griffiths, Sobel, Tenenbaum & Gopnik) AB Trial A Trial Initial “Backwards blocking” paradigm:

Learning grounded causal models (Goodman, Mansinghka & Tenenbaum) A child learns that petting the cat leads to purring, while pounding leads to growling. But how to learn these symbolic event concepts over which causal links are defined? a b c a b c

The big picture What we need to understand: the mind’s ability to build rich models of the world from sparse data. –Learning about objects, categories, and their properties. –Causal inference –Understanding other people’s actions, plans, thoughts, goals –Language comprehension and production –Scene understanding What do we need to understand these abilities? –Bayesian inference in probabilistic generative models –Hierarchical models, with inference at all levels of abstraction –Structured representations: graphs, grammars, logic –Flexible representations, growing in response to observed data

Overhypotheses Syntax:Universal Grammar Phonology Faithfulness constraints Markedness constraints Word LearningShape bias Principle of contrast Whole object bias Folk physicsObjects are unified, bounded and persistent bodies PredicabilityM-constraint Folk biologyTaxonomic principle... (Spelke) (Markman) (Keil) (Atran) (Chomsky) (Prince, Smolensky)

A raw data matrix: The chicken-and-egg problem of structure learning and feature selection

Conventional clustering (CRP mixture): The chicken-and-egg problem of structure learning and feature selection

Learning multiple structures to explain different feature subsets (Shafto, Kemp, Mansinghka, Gordon & Tenenbaum, 2006) System 1System 2System 3CrossCat:

Beyond similarity-based induction Inference based on dimensional thresholds: (Smith et al., 1993) Inference based on causal relations: (Medin et al., 2004; Coley & Shafto, 2003) Poodles can bite through wire. German shepherds can bite through wire. Dobermans can bite through wire. German shepherds can bite through wire. Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Salmon carry E. Spirus bacteria.

Property type “has T9 hormones” “can bite through wire” “carry E. Spirus bacteria” Form of background knowledge taxonomic tree directed chain directed network + diffusion process + drift process + noisy transmission Class C Class A Class D Class E Class G Class F Class B Class C Class A Class D Class E Class G Class F Class B Class A Class B Class C Class D Class E Class F Class G... Class C Class G Class F Class E Class D Class B Class A Hypotheses

Beyond similarity-based induction Biological property Disease property Tree Web Kelp Human Dolphin Sand shark Mako shark TunaHerring Kelp Human Dolphin Sand shark Mako shark Tuna Herring (Shafto, Kemp, Bonawitz, Coley & Tenenbaum) “Given that X has property P, how likely is it that Y does?”

Node-replacement graph grammars Production (Line) Derivation

Production (Line) Derivation Node-replacement graph grammars

Model fitting Evaluate each form in parallel For each form, heuristic search over structures based on greedy growth from a one-node seed:

Synthetic 2D data FlatLineRingTreeGrid FlatLineRingTreeGrid log posterior probabilities Model Selection results: Data: Continuous features drawn from a Gaussian field over these points.

FlatLineRingTreeGridScoresTrue

The flexibility of a nonparametric prior edge (G) class (z) edge (G) 1 2 3 4 5 6 7 8 9 10 11 12 # of samples: 40 100 1000 Data D Graph G Data D Graph G Abstract theory Z True structure of graphical model G:

Goal-directed action (production and comprehension) (Wolpert et al., 2003)

Goal inference as inverse probabilistic planning (Baker, Tenenbaum & Saxe) ConstraintsGoals Actions Rational planning (PO)MDP model predictions human judgments

O W C G C F LA The causal blocks world (Tenenbaum and Niyogi, 2003)

Learning curves x ? ? Model predictions

Easy MediumHard

Individual differences in concept learning

Optimal behavior under some (evolutionarily natural) circumstances. –Optimal betting theory, portfolio theory –Optimal foraging theory –Competitive games –Dynamic tasks (changing probabilities or utilities) Side-effect of algorithms for approximating complex Bayesian computations. –Markov chain Monte Carlo (MCMC): instead of integrating over complex hypothesis spaces, construct a sample of high-probability hypotheses. –Judgments from individual (independent) samples can on average be almost as good as using the full posterior distribution. Why probability matching?

Markov chain Monte Carlo (Metropolis-Hastings algorithm)

Bayesian inference in perception and sensorimotor integration (Weiss, Simoncelli & Adelson 2002)(Kording & Wolpert 2004)

Learning concepts from examples Cows have T9 hormones. Sheep have T9 hormones. Goats have T9 hormones. All mammals have T9 hormones. Cows have T9 hormones. Seals have T9 hormones. Squirrels have T9 hormones. All mammals have T9 hormones. Property induction Word learning “tufa”

Clustering models for relational data Social networks: block models Does person x respect person y? Does prisoner x like prisoner y?

concept predicate Learning systems of concepts with infinite relational models (Kemp, Tenenbaum, Griffiths, Yamada & Ueda, AAAI 06) Biomedical predicate data from UMLS (McCrae et al.): –134 concepts: enzyme, hormone, organ, disease, cell function... –49 predicates: affects(hormone, organ), complicates(enzyme, cell function), treats(drug, disease), diagnoses(procedure, disease) …

Learning a medical ontology e.g., Diseases affect Organisms Chemicals interact with Chemicals Chemicals cause Diseases

Clustering arbitrary relational systems International relations circa 1965 (Rummel) –14 countries: UK, USA, USSR, China, …. –54 binary relations representing interactions between countries: exports to( USA, UK ), protests( USA, USSR ), …. –90 (dynamic) country features: purges, protests, unemployment, communists, # languages, assassinations, ….

Learning a hierarchical ontology

F: form S: structure D: data People cluster into cliques 1 2 3 4 5 6 7 8 = “x likes y” Relational Data

Bayesian models of cognition Visual perception [Weiss, Simoncelli, Adelson, Richards, Freeman, Feldman, Kersten, Knill, Maloney, Olshausen, Jacobs, Pouget,...] Language acquisition and processing [Brent, de Marken, Niyogi, Klein, Manning, Jurafsky, Keller, Levy, Hale, Johnson, Griffiths, Perfors, Tenenbaum, …] Motor learning and motor control [Ghahramani, Jordan, Wolpert, Kording, Kawato, Doya, Todorov, Shadmehr, …] Associative learning [Dayan, Daw, Kakade, Courville, Touretzky, Kruschke, …] Memory [Anderson, Schooler, Shiffrin, Steyvers, Griffiths, McClelland, …] Attention [Mozer, Huber, Torralba, Oliva, Geisler, Yu, Itti, Baldi, …] Categorization and concept learning [Anderson, Nosfosky, Rehder, Navarro, Griffiths, Feldman, Tenenbaum, Rosseel, Goodman, Kemp, Mansinghka, …] Reasoning [Chater, Oaksford, Sloman, McKenzie, Heit, Tenenbaum, Kemp, …] Causal inference [Waldmann, Sloman, Steyvers, Griffiths, Tenenbaum, Yuille, …] Decision making and theory of mind [Lee, Stankiewicz, Rao, Baker, Goodman, Tenenbaum, …]

“tufa” Concept learning Bayesian inference over tree- structured hypothesis space: (Xu & Tenenbaum; Schmidt & Tenenbaum)

Some questions How confident are we that a tree-structured model is the best way to characterize this learning task? How do people construct an appropriate tree- structured hypothesis space? What other kinds of structured probabilistic models may be needed to explain other inductive leaps that people make, and how do people acquire these different structured models? Are there general unifying principles that explain our capacity to learn and reason with structured probabilistic models across different domains?

Basics of Bayesian inference Bayes’ rule: An example –Data: John is coughing –Some hypotheses: 1. John has a cold 2. John has lung cancer 3. John has a stomach flu –Likelihood P(d|h) favors 1 and 2 over 3 –Prior probability P(h) favors 1 and 3 over 2 –Posterior probability P(h|d) favors 1 over 2 and 3

20 subjects rated the strength of 45 arguments: X 1 have property P. (e.g., Cows have T4 hormones.) X 2 have property P. X 3 have property P. All mammals have property P. [General argument] 20 subjects rated the strength of 36 arguments: X 1 have property P. X 2 have property P. Horses have property P. [Specific argument] Experiments on property induction ( Osherson, Smith, Wilkie, Lopez, Shafir, 1990)

People were given 48 animals, 85 features, and asked to rate whether each animal had each feature. E.g., elephant: 'gray' 'hairless' 'toughskin' 'big' 'bulbous' 'longleg' 'tail' 'chewteeth' 'tusks' 'smelly' 'walks' 'slow' 'strong' 'muscle’ 'quadrapedal' 'inactive' 'vegetation' 'grazer' 'oldworld' 'bush' 'jungle' 'ground' 'timid' 'smart' 'group' Feature rating data (Osherson and Wilkie)

Beyond similarity-based induction Reasoning based on dimensional thresholds: (Smith et al., 1993) Reasoning based on causal relations: (Medin et al., 2004; Coley & Shafto, 2003) Poodles can bite through wire. German shepherds can bite through wire. Dobermans can bite through wire. German shepherds can bite through wire. Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Salmon carry E. Spirus bacteria.

Different sources for priors Chimps have T9 hormones. Gorillas have T9 hormones. Poodles can bite through wire. Dobermans can bite through wire. Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Taxonomic similarity Jaw strength Food web relations

1. How does background knowledge guide learning from sparsely observed data? Bayesian inference: 2. How is background knowledge itself acquired? Hierarchical probabilistic models, with inference at multiple levels of abstraction. Flexible nonparametric models in which complexity grows with the data. 3. What form does background knowledge take, across different domains and tasks? Probabilities defined over structured representations: graphs, grammars, predicate logic, schemas, theories. The approach : from statistics to intelligence

Draft slides for shape bias in “icml07-draft”

Draft slides for property induction and causal learning in “icml07-draft-2”

Big question: Do we have slides on “beyond similarity”? CURRENTLY, NONE OR JUST ONE. Points beyond similarity, and also other ways of learning (e.g., being told directly about relations). Consider boiling down to one slide, a la AFOSR, on fw. No matter what, don’t make the examples interactive. Or, put these slides in the “grand tour” of the conclusion.

Conclusions Computational tools for studying core questions of human learning (and building more human-like ML?) –What is the content and form of human knowledge, at multiple levels of abstraction? –How does abstract domain knowledge guide new learning? –How can abstract domain knowledge itself be learned? –How can inductive biases be so strong yet so flexible? Go beyond the traditional dichotomies of cog sci (and AI). –Instead of “nature vs. nurture”: Powerful abstractions can be learned “from the top down”, together with or prior to learning more concrete knowledge. –Instead of “domain-general” vs. “domain-specific”: Domain-general learning mechanisms acquire domain-specific knowledge representations? –Instead of “statistics” vs. “structure”: How can structured symbolic representations be acquired by statistical learning?

Todo How to handle intro quote. Yet I’d like to think…. I’m not the only one. Make sure to practice talking through the shape bias part. –How to explain.

Similar presentations

Presentation on theme: "Todo How to handle intro quote. Yet I’d like to think…. I’m not the only one. Make sure to practice talking through the shape bias part. –How to explain."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Todo How to handle intro quote. Yet I’d like to think…. I’m not the only one. Make sure to practice talking through the shape bias part. –How to explain.

Similar presentations

Presentation on theme: "Todo How to handle intro quote. Yet I’d like to think…. I’m not the only one. Make sure to practice talking through the shape bias part. –How to explain."— Presentation transcript:

Similar presentations

About project

Feedback