Josh Tenenbaum Statistical learning of abstract knowledge:

Josh Tenenbaum Statistical learning of abstract knowledge:
Representation change and growth in Bayesian models of cognitive development Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL) Acknowledgments: Tom Griffiths, Charles Kemp, The Computational Cognitive Science group at MIT All the researchers whose work I’ll discuss.

Collaborators Amy Perfors Charles Kemp Pat Shafto Game plan: biology If 5 minutes left at the end, do words and theory acquisition. If no time left, just do theory acquisition. Vikash Mansinghka Noah Goodman Tom Griffiths Funding: AFOSR Cognition and Decision Program, AFOSR MURI, DARPA IPTO, NSF, ONR MURI, NTT Communication Sciences Laboratories, James S. McDonnell Foundation

Everyday inductive leaps
How can people learn so much about the world from such limited evidence? Learning words for kinds of objects “horse” “horse” “horse”

Learning words on planet Gazoob
“tufa”

Everyday inductive leaps
How can people learn so much about the world from such limited evidence? Kinds of objects and their properties The meanings of words, phrases, and sentences Cause-effect relations The beliefs, goals and plans of other people Social structures, conventions, and rules

The solution Strong prior knowledge (inductive constraints / bias / “the right representation”).

The solution Strong prior knowledge (inductive constraints / bias / “the right representation”). How does abstract knowledge guide learning from sparsely observed data? How could abstract knowledge itself learned? What form does abstract knowledge take, across different domains and tasks? How to balance strong inductive constraints with representational flexibility? (Related: How to tradeoff accommodation and assimilation?)

The solution A computational toolkit for studying abstract knowledge.
Bayesian inference, with hypothesis spaces and priors derived from abstract knowledge. Hierarchical Bayesian models, with inference at multiple levels of abstraction. Probabilistic models defined over structured representations: graphs, grammars, predicate logic, relational schemas, theories. Nonparametric models, growing in complexity as the data require.

Outline Three case studies Key workshop questions
Learning constraints on word meanings Discovering the structural form of object category representations Learning causal theories Key workshop questions How can “aha!” representational changes be produced by rational computational mechanisms? What can we say about “meta-representations”? What might be the fixed innate substrate over which representations change? How can we discover novel “theoretical entities”?

The shape bias in word learning (Landau, Smith, Jones 1988)
This is a dax. Show me the dax… A useful inductive constraint: many early words are labels for object categories, and shape may be the best cue to object category membership. English-speaking children typically show the shape bias at 24 months, but not at 20 months.

Is the shape bias learned?
“wib” “lug” “zup” “div” Smith et al (2002) trained 17-month-olds on labels for 4 artificial categories: After 8 weeks of training (20 min/week), 19-month-olds show the shape bias: Show me the dax… This is a dax.

Transfer to real-world vocabulary
The puzzle: The shape bias is a powerful inductive constraint, yet can be learned from very little data.

Learning abstract knowledge about feature variability
“wib” “lug” “zup” “div” The intuition: - Shape varies across categories but relatively constant within nameable categories. - Other features (size, color, texture) vary both within and across nameable object categories.

Learning about feature variability
?

Color varies across bags but not much within bags
A hierarchical model Color varies across bags but not much within bags Level 2: Bags in general Level 1: Bag proportions mostly red mostly yellow mostly brown mostly green mostly blue? … Data …

A hierarchical Bayesian model
Level 3: Prior expectations on bags in general Level 2: Bags in general Simultaneously infer Level 1: Bag proportions … Data …

Level 3: Prior expectations on bags in general (within-bag variability) Level 2: Bags in general (overall population distribution) Level 1: Bag proportions … … Data …

Level 3: Prior expectations on bags in general (within-bag variability) Level 2: Bags in general (overall population distribution) Level 1: Bag proportions … Data …

Learning about feature variability (Kemp, Perfors & Tenenbaum)
Prior expectations on categories in general Categories in general Individual categories Data

Learning the shape bias
“wib” “lug” “zup” “div” Assuming independent Dirichlet-multinomial models for each dimension … … we learn that: Shape varies across categories but not within categories. Texture, color, size vary across and within categories. Training

Second-order generalization test
This is a dax. Show me the dax… Training Test

Learning constraints on word meanings Discovering the structural form of object category representations Learning causal theories Key workshop questions How can “aha!” representational changes be produced by rational computational mechanisms? What can we say about “meta-representations”? What might be the fixed innate substrate over which representations change? How can we discover novel “theoretical entities”? The “blessing of abstraction” in Hierarchical Bayesian Models: abstractions can be learned “top down”, together with more concrete knowledge that they constrain, and sometimes from surprisingly little data.

Structural constraints

The discovery of structural form
BIOLOGY POLITICS mouse squirrel chimp gorilla Clinton Giuliani Edwards McCain Ben Kuipers’s “spatial hypothesis” COLOR FRIENDSHIP CHEMISTRY

People can discover structural forms
Scientific discoveries Children’s cognitive development Hierarchical structure of category labels Clique structure of social groups Cyclical structure of seasons or days of the week Transitive structure for comparative relations Tree structure for biological species Periodic structure for chemical elements “great chain of being” Systema Naturae Kingdom Animalia Phylum Chordata Class Mammalia Order Primates Family Hominidae Genus Homo Species Homo sapiens (1579) (1735) (1837)

“Universal Structure Learner”
A modest goal “Universal Structure Learner” K-Means Hierarchical clustering Factor Analysis Guttman scaling Circumplex models Self-Organizing maps ··· Data Representation

Hierarchical Bayesian Framework (Kemp & Tenenbaum)
F: form Tree Favors simplicity Favors smoothness [Zhu et al., 2003] mouse squirrel chimp gorilla S: structure F1 F2 F3 F4 D: data mouse squirrel chimp gorilla

A “universal grammar” for structural forms
“meta-representation” Form Process Form Process

Learning algorithm Evaluate each form in parallel
For each form, heuristic search over structures based on greedy growth from a one-node seed:

features animals cases judges

objects similarities objects

Development of structural forms as more data are observed
The “blessing of abstraction”

Structural forms from relational data
Dominance hierarchy Tree Cliques Ring Primate troop Bush administration Prison inmates Kula islands “x beats y” “x told y” “x likes y” “x trades with y”

Lab studies of learning structural forms
Training: Observe messages passed between employees (a, b, c, …) in a company. Transfer test: Predict messages sent to and from new employees x and y. Link observed in training Link observed in transfer test

Learning constraints on word meanings Discovering the structural form of object category representations Learning causal theories Key workshop questions How can “aha!” representational changes be produced by rational computational mechanisms? What can we say about “meta-representations”? What might be the fixed innate substrate over which representations change? How can we discover novel “theoretical entities”? Simple, general, flexible meta-representations, such as a “universal grammar” of structural forms, may be worth exploring.

Unsupervised theory learning

Unsupervised theory learning
A theory: Theoretical entities magnet north pole, south pole magnetic object nonmagnetic object Theoretical laws magnets interact with each other. north poles and south poles attract; all others repel. magnets and magnetic objects interact. magnetic objects do not interact with each other. nonmagnetic objects interact with nothing. c.f. Aaron Sloman

What makes these theories interesting?
Powerful inductive constraints E.g., if you feel a force between object x and an object known to be magnetic (but not itself a magnet), then you know that x is a magnet and can predict all its interactions. Meaning is holistic – hard to see how learning could get off the ground… Theoretical entities and theoretical laws are only defined in terms of each other… a ‘chicken-and-egg’ problem. Theoretical entities cannot be defined in terms of pre-existing (observable) predicates, nor independently from other theoretical entities. The theory as a whole takes its meaning from the predictions it makes about observables, and each theoretical entity takes its meaning from the role that it plays in this conceptual system.

What makes these theories interesting?
Powerful inductive constraints E.g., if you feel a force between object x and an object known to be magnetic (but not itself a magnet), then you know that x is a magnet and can predict all its interactions. Meaning is holistic – hard to see how learning could get off the ground… Theoretical entities and theoretical laws are only defined in terms of each other. Theoretical entities cannot be defined in terms of pre-existing (observable) predicates, nor independently from other theoretical entities. The theory as a whole takes its meaning from the predictions it makes about observables, and each theoretical entity takes its meaning from the role that it plays in this conceptual system. Traditional approach to abstraction Theory discovery

Theories as abstract relational systems of concepts
Force, mass, acceleration, friction, tension, energy, momentum Parent, child, spouse, family, marriage Buy, sell, transaction, transfer, exchange Alive, dead, inanimate, energy, growth Goal, intent, means, obstacle, ally, assist, hinder

A simple version of the problem
Causal blocks world: (Tenenbaum & Niyogi, 2003) G F O W C L A C

Towards a formal model of theory discovery
Input Output: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 A B C object 3 9 1 13 5 11 7 14 2 10 6 12 4 8 15 A B 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 13 5 11 10 6 12 4 8 15 A object C B C activates(object,object)

Infinite Relational Model (IRM) (Kemp, Griffiths, Tenenbaum, Yamada, & Ueda)
z 13 5 11 10 6 0.1 0.9 0.1 12 4 8 15 h 0.1 0.1 0.9 0.9 0.1 0.1 3 9 1 13 5 11 7 14 2 10 6 12 4 8 15 O

Model fitting

The causal blocks world (Tenenbaum and Niyogi, 2003)
F L A

Model predictions: , Easy Easy Medium Hard Log BF 3 6 9
# of objects observed Medium Hard

? ? F O G X W C A C L A B Training Test Model predictions
Probability of lighting up Model predictions # of objects observed

Learning an ontology Data from UMLS (McCrae et al.): concept predicate
134 terms: enzyme, hormone, organ, disease, cell function ... 49 predicates: affects(hormone, organ), complicates(enzyme, cell function), treats(drug, disease), diagnoses(procedure, disease) …

Learning an ontology … Theoretical entities Theoretical laws
Diseases affect Organisms Chemicals interact with Chemicals Chemicals cause Diseases …

Learnig a hierarchical ontology (Roy, Kemp, Mansinghka & Tenenbaum)

Theory discovery in complex relational domais
International relations circa 1965 (Rummel) 14 countries: UK, USA, USSR, China, …. 54 binary relations representing interactions between countries: exports to( USA, UK ), protests( USA, USSR ), …. 90 (dynamic) country features: purges, protests, unemployment, communists, # languages, assassinations, ….

Learning causal networks
Example: network of diseases, effects, and causes Missing a key level of abstraction: Abstract conceptual framework (theoretical entities and laws) that constrain the causal structures applicable in a given domain.

Abstract causal theories
Theoretical entities: C Theoretical laws: C C Theoretical entities: B, D, S Theoretical laws: B D, D S Theoretical entities: B, D, S Theoretical laws: S D

Using a causal theory Given current causal network beliefs . . .
. . . and some new observed data: Correlation between “working in factory” and “chest pain”.

The theory constrains possible hypotheses:
And rules out others: Allows strong inferences about causal structure from very limited data. Very different from conventional Bayes net learning.

z 0.0 0.8 0.01 Learning a block-structured prior on network structures: (Mansinghka et al. UAI 06) h 0.0 0.0 0.75 0.0 0.0 0.0 “meta-representation” True network Sample 75 observations… attributes (1-12) patients observed data

Learning with a uniform prior on network structures:
True network Sample 75 observations… attributes (1-12) patients observed data

True structure of Bayesian network N: Network N Data D Abstract Theory
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # of samples: Network N edge (N) Data D Classes Z … … class (Z) The blessing of abstraction… or, “You can have your palette and learn it too!” Abstract Theory c1 … c2 h c1 c2 c1 0.0 0.4 … c2 0.0 0.0 … edge (N) Network N Data D (Mansinghka, Kemp, Tenenbaum, Griffiths UAI 06)

The flexibility of a nonparametric prior
12 11 1 True structure of Bayesian network N: 10 2 9 3 8 4 7 5 6 # of samples: Network N edge (N) Data D Abstract Theory Classes Z h c1 class (Z) … c1 0.1 … … … c1 edge (N) Network N Data D

Conclusions Modeling human inductive learning as Bayesian inference over hierarchies of flexibly structured representations. Word learning Property induction Causal learning Shape varies across categories but not within categories. Texture, color, size vary both within and across categories. Theory entities: B, D, S Theory laws: B D, D S Abstract knowledge mouse squirrel “dax” Structure chimp “zav” gorilla “fep” F1 F2 F3 F4 “dax” “zav” mouse squirrel chimp gorilla Data “fep” “zav” “dax” “fep”

Conclusions Key workshop questions
How can “aha!” representational changes be produced by rational computational mechanisms? Blessing of abstraction in HBMs. What can we say about “meta-representations”? What might be the fixed innate substrate over which representations change? May go surprisingly far with HBMs defined over simple and general but fixed languages: grammars for structural forms, relational schemas. How can we discover novel “theoretical entities”? Nonparametric Bayesian learning of relational systems of concepts.

Conclusions Looking forward
A new way to think about cognitive development – beyond “nature vs. nurture”, “symbolic vs. statistical”, … How to learn theories with more expressive meta-representations (e.g., predicate logic)? Can we give general-purpose, computationally tractable and psychologically plausible search algorithms?

The end

A single way of structuring a domain rarely describes all its features…
Raw data matrix:

A single way of structuring a domain rarely describes all its features…
Conventional clustering: infinite (CRP) mixture

Discovering multiple representations to explain different features (Shafto et al.; Shafto, Mansinghka, Tenenbaum, Yamada & Ueda, 2007) CrossCat: System 1 System 2 System 3

Analysis of US House votes 1989-90
101 senators x 638 issues systems of classes. Social hot-button issues Environment & agriculture Core democratic platform … Law and order Big military

Josh Tenenbaum Statistical learning of abstract knowledge:

Similar presentations

Presentation on theme: "Josh Tenenbaum Statistical learning of abstract knowledge:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Josh Tenenbaum Statistical learning of abstract knowledge:

Similar presentations

Presentation on theme: "Josh Tenenbaum Statistical learning of abstract knowledge:"— Presentation transcript:

Similar presentations

About project

Feedback