Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning overhypotheses with hierarchical Bayesian models

Similar presentations


Presentation on theme: "Learning overhypotheses with hierarchical Bayesian models"— Presentation transcript:

1 Learning overhypotheses with hierarchical Bayesian models
Charles Kemp, Amy Perfors, Josh Tenenbaum (Developmental Science, 2007)

2 Learning word meanings from examples
“horse” “horse” I’m going to tell you about a broad research program… The problems that intrigue me are all things which people do effortlessly and for th most part quite well, but which we still don’t know how to get computers get do -- which is a sign that we don’t understand the computational basis of how people do these things. “horse”

3 The “shape bias” in word learning (Landau, Smith, Jones 1988)
This is a dax. Show me the dax… English-speaking children have a “shape bias”, picking the object with the same shape. The shape bias is a useful inductive constraint or “overhypothesis”: majority of early words are labels for object categories, and shape may be the best cue to object category membership.

4 What is the relation between y and x?

5 What is the relation between y and x?

6 What is the relation between y and x?

7 ... ... Overhypotheses Syntax: Universal Grammar
Phonology Faithfulness constraints Markedness constraints Word Learning Shape bias Principle of contrast Whole object bias Folk physics Objects are unified, bounded and persistent bodies Predicability M-constraint Folk biology Taxonomic principle (Chomsky) (Prince, Smolensky) (Markman) (Spelke) (Keil) (Atran) ... ...

8 Overhypotheses 1. How does overhypotheses guide learning from sparsely observed data? 2. What form do overhypotheses take, across different domains and tasks? 3. How are overhypotheses themselves acquired? 4. How can overhypotheses provide constraints yet maintain flexibility, balancing assimilation and accommodation?

9 The “shape bias” in word learning (Landau, Smith, Jones 1988)
This is a dax. Show me the dax… English-speaking children have a “shape bias” at 24 months of age, but 20-month-olds do not.…

10 Is the shape bias learned?
“wib” “lug” “zup” “div” Smith et al (2002) trained 17-month-olds on labels for 4 artificial categories: After 8 weeks of training (20 min/week), 19-month-olds show the shape bias: “Learned attentional bias” “Transfer learning” Show me the dax… “Transfer Learning” This is a dax.

11 Transfer to real-world vocabulary
The puzzle: The shape bias is a powerful inductive constraint, yet can be learned from very little data.

12 Learning about feature variability
“wib” “lug” “zup” “div” The intuition: - Shape varies across categories but relatively constant within categories. - Other features (size, color, texture) vary both across and within nameable object categories.

13 Learning about feature variability
Marbles of different colors: ?

14 Learning about feature variability
Marbles of different colors: ?

15 Color varies across bags but not much within bags
A hierarchical model Color varies across bags but not much within bags Level 2: Bags in general Level 1: Bag proportions mostly red mostly yellow mostly brown mostly blue mostly green? Data

16 A hierarchical Bayesian model
Level 3: Prior expectations on bags in general Level 2: Bags in general Simultaneously infer Level 1: Bag proportions Data

17 A hierarchical Bayesian model
Level 3: Prior expectations on bags in general Level 2: Bags in general x “Bag 1 is mostly red” Level 1: Bag proportions Data

18 A hierarchical Bayesian model
Level 3: Prior expectations on bags in general x Level 2: Bags in general “Bag 2 is mostly yellow” Level 1: Bag proportions Data

19 A hierarchical Bayesian model
Level 3: Prior expectations on bags in general Level 2: Bags in general “Color varies across bags but not much within bags” Level 1: Bag proportions Data

20 Learning the shape bias
Training Assuming independent Dirichlet-multinomial models for each dimension… “wib” “lug” “zup” “div” “wib” “lug” “div”

21 Learning the shape bias
Training Assuming independent Dirichlet-multinomial models for each dimension, we learn that: Shape varies across categories but not within categories. Texture, color, size vary across and within categories. “wib” “lug” “zup” “div” “wib” “lug” “div”

22 Learning the shape bias
This is a dax. Show me the dax… Training Test

23 { { Extensions Learning with weaker shape representations.
Learning to transfer selectively, dependent on knowledge of ontological kinds. By age ~3, children know that a shape bias is appropriate for solid object categories (ball, book, toothbrush, …), while a material bias is appropriate for nonsolid substance categories (juice, sand, toothpaste, …). Training Test Category 5 ? ? { Holes Shape features Curvature Edges Aspect ratio { Main color Other features Color distribution Oriented texture Roughness

24 Modeling selective transfer
Let be the ontological kind of category i. Given , we could learn a separate Dirichlet-multinomial model for each ontological kind: Variability in solidity, shape, material within kind 1 Variability in solidity, shape, material within kind 2 “dax” shape “toof” material solid non-solid “dax” “zav” “fep” “wif” “wug” “toof”

25 Learning to transfer selectively
Chicken-and-egg problem: We don’t know the partition into ontological kinds. The input: Solution: Define a nonparametric prior over this partition. “wug” “wif” “wug” “dax” “dax” “dax” “zav” “wif” “wif” “zav” solid “zav” “wug” non-solid (c.f. Roy & Kaelbling IJCAI 07)

26 Summary Inductive constraints or “overhypotheses” are critical for learning so much so fast. New overhypotheses can be learned by children, often very early in development. Not just innate, nor the result of gradual abstraction from many specific experiences. Hierarchical Bayesian models (HBMs) may help explain the role of overhypotheses in learning as well as how overhypotheses may themselves be acquired from experience (even relatively little experience). The “blessing of abstraction” Overhypotheses must constrain learning yet also be flexible, capable of revision, extension or growth. Nonparametric HBMs can navigate this “assimilation vs. accommodation” tradeoff.


Download ppt "Learning overhypotheses with hierarchical Bayesian models"

Similar presentations


Ads by Google