Download presentation
Presentation is loading. Please wait.
Published byHilary Hodge Modified over 6 years ago
1
Todo One slide on “beyond similarity-based induction”
Finish slides Make sure to practice talking through the shape bias part. How to explain the variables in the HBM. How I will characterize this model, almost a toy model, but it illustrates the idea and serves as a warmup. Can dirichlet-multi represent diff between and ? One slide on “beyond similarity-based induction” Work out timecheck for skipping Think through what I’ll say, including learning based on instruction. Practice talking about relational learning experiment. Practice talking about bottom-up causal learning – the intuitions in the blicket detector experiment and the disease symptom case. Look at how to get the number of block model slides down to just 1.
2
Bayesian models of human inductive learning
Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL) Acknowledgments: Tom Griffiths, Charles Kemp, The Computational Cognitive Science group at MIT All the researchers whose work I’ll discuss.
3
Lab members Funding: AFOSR Cognition and Decision Program, AFOSR MURI,
Chris Baker Noah Goodman Tom Griffiths (alum) Charles Kemp Game plan: biology If 5 minutes left at the end, do words and theory acquisition. If no time left, just do theory acquisition. Vikash Mansinghka Amy Perfors Lauren Schmidt Pat Shafto Funding: AFOSR Cognition and Decision Program, AFOSR MURI, DARPA IPTO, NSF, HSARPA, NTT Communication Sciences Laboratories, James S. McDonnell Foundation
4
The probabilistic revolution in AI
Principled and effective solutions for inductive inference from ambiguous data: Vision Robotics Machine learning Expert systems / reasoning Natural language processing Standard view: no necessary connection to how the human brain solves these problems. “Many people in machine learning get into the field because they are interested in how humans learn, rather than how convex functions are optimized, and how we can get machines to be more like humans.”
5
Everyday inductive leaps
How can people learn so much about the world from such limited evidence? Learning concepts from examples “horse” “horse” “horse”
6
Learning concepts from examples
“tufa”
7
Everyday inductive leaps
How can people learn so much about the world from such limited evidence? Kinds of objects and their properties The meanings of words, phrases, and sentences Cause-effect relations The beliefs, goals and plans of other people Social structures, conventions, and rules
8
The solution Strong prior knowledge (inductive bias).
9
The solution Strong prior knowledge (inductive bias).
How does background knowledge guide learning from sparsely observed data? What form does the knowledge take, across different domains and tasks? How is that knowledge itself acquired? How can inductive biases be so strong yet so flexible? Our goal is a computational framework for answering these questions.
10
Notes on slide before Highlight the issue of inductive bias, balancing flexibility and constraint. Put this text after the third question on previous slide, setting up the fourth. In principle, inductive biases don’t have to be learned. In ML, they are often thought of as hard-wired, engineered complements to the data-driven component. In cog sci, innate knowledge. But some have to be learned. Some of the important ones aren’t present in the youngest children, but appear later, and are clearly influenced by experience. We are also ready to give them up and adopt new biases seemingly very quickly. E.g., prism adaptation, physics adaptation. The third and fourth questions – the problem of learning good inductive biases, and exploiting strong biases while maintaining flexibility – are key ones for ML. And they may be key to distinctively human learning. the cognitive niche. As best as we can tell, other animals can be very smart, and often can have very clever inductive biases. But more or less these are hard-wired through evolution, they think about the same things they have always thought about. Exceptions to this trend are the most human-like ways that animals act. Continue to consider the order of the three questions. Issue is flow, both in intro and the talk transitions (from shape bias to the rest). Also, tradeoff in representational richness vs. learnability (BUT PROBABLY KEEP THIS ONLY IMPLICIT, NOT EXPLICIT).
11
The approach: from statistics to intelligence
1. How does background knowledge guide learning from sparsely observed data? Bayesian inference, with priors based on background knowledge. 2. What form does background knowledge take, across different domains and tasks? Probabilities defined over structured representations: graphs, grammars, predicate logic, schemas, theories. 3. How is background knowledge itself acquired? Hierarchical Bayesian models, with inference at multiple levels of abstraction. 4. How can inductive biases be so strong yet so flexible? Nonparametric models, growing in complexity as the data require.
12
Notes on slide before All these themes are familiar in contemporary ML, but we are mixing them up in some slightly new ways, driven by the problems of human learning. Many of the problems Even if you’re primary interest isn’t in human learning, I hope there might be some lessons here about how to design more human-like ML systems.
13
Outline Three case studies in inductive learning. Word learning
Property induction Causal learning
14
The “shape bias” in word learning (Landau, Smith, Jones 1988)
This is a dax. Show me the dax… English-speaking children show the shape bias at 24 months, but not at 20 months. The shape bias is a useful inductive constraint: majority of early words are labels for object categories, and shape may be the best cue to object category membership.
15
Is the shape bias learned?
“wib” “lug” “zup” “div” Smith et al (2000) trained 17-month-olds on labels for 4 artificial categories: After 8 weeks of training (20 min/week), 19-month-olds show the shape bias: Show me the dax… “Transfer Learning” This is a dax.
16
Transfer to real-world vocabulary
17
Two things that are remarkable here
How much the shape bias helps in learning new words from examples. How it itself can be learned from only a few examples! Maybe have these points appear on right-side of previous slide...?
18
Learning about feature variability
Marbles of different colors: … ?
19
Learning about feature variability
Marbles of different colors: … ?
20
A hierarchical Bayesian model
Color varies across bags but not much within bags Level 2: Bags in general Level 1: Bag proportions mostly red mostly yellow mostly brown mostly blue mostly green … Data …
21
A hierarchical Bayesian model
Level 3: Prior expectations on bags in general Level 2: Bags in general Level 1: Bag proportions … Data …
22
A hierarchical Bayesian model
Level 3: Prior expectations on bags in general Level 2: Bags in general x “Bag 1 is mostly red” Level 1: Bag proportions … Data …
23
A hierarchical Bayesian model
Level 3: Prior expectations on bags in general x Level 2: Bags in general “Bag 2 is mostly yellow” Level 1: Bag proportions … Data …
24
A hierarchical Bayesian model
Level 3: Prior expectations on bags in general Level 2: Bags in general “Color varies across bags but not much within bags” Level 1: Bag proportions … Data …
25
Learning the shape bias
“wib” “lug” “zup” “div” Training Assume independent Dirichlet-multinomial models for each dimension. Should learn: Shape varies across categories but not within categories. Texture, color, size vary within categories.
26
Learning the shape bias
Training This is a dax. Show me the dax… Test
27
Insert slide on limitations (parallel to next slide)
This is a very simple model. It leaves out or oversimplifies many things. Assume that there is a single multinomial variable coding for each nameable shape category. Assumes that we are only learning names for object categories, not just other kinds of words. But it is not so clear whether these assumptions really are so oversimplified in the case of human children. The shape-bias is an abstract inductive constraint, useful for learning more specific knowledge at the level of individual category labels, but it itself can be learned remarkably quickly. Perhaps that is because it depends on higher-level inductive biases, already in place, in the form of these assumptions. Even by age 12 months, there is evidence that infants are particularly interested in objects, categorize objects by basic-level shape categories. and many of their first words are names for objects. But it also possible to extend the model so it does not depend on these assumptions… next slide.
28
{ { Extensions Learning with weaker shape representations.
Learning to transfer selectively, dependent on ontological kinds. By age three, children know that a shape bias is useful for solid object categories (ball, book, toothbrush, …), while a material bias is useful for nonsolid substance categories (juice, sand, toothpaste, …). Training Test Category 5 ? ? { Holes Shape features Curvature Edges Aspect ratio { Main color Other features Color distribution Oriented texture Roughness
29
Learning to transfer selectively
Let be the ontological kind of category i. Given , we could learn a separate Dirichlet-multinomial model for each ontological kind: Variability in solidity, shape, material within kind 1 Variability in solidity, shape, material within kind 2 “dax” shape “toof” material solid non-solid “dax” “zav” “fep” “wif” “wug” “toof”
30
Notes on slide before Say this, and if possible figure out how to put it on the slide: Feature variability for kind 1: Solidity: fixed across categories (all solid) Shape: variable across categories but fixed within categories Color, texture: variable within categories. Feature variability for kind 2: Solidity: fixed across categories (all nonsolid) Shape: variable within categories Color, texture: variable across categories but fixed within categories.
31
Learning to transfer selectively
Chicken-and-egg problem: We don’t know the partition into ontological kinds. The input: Solution: Define a nonparametric prior over this partition. “wug” “wif” “wug” “dax” “dax” “dax” “zav” “wif” “wif” “zav” solid “zav” “wug” non-solid
32
Transition - outline slide - knowledge? More natural tasks
33
Outline Three case studies in inductive learning. Word learning
Property induction Causal learning
34
Property induction How likely is the conclusion, given the premises?
Gorillas have T9 hormones. Seals have T9 hormones. Squirrels have T9 hormones. Flies have T9 hormones. “Similarity”, “Typicality”, “Diversity” Gorillas have T9 hormones. Seals have T9 hormones. Squirrels have T9 hormones. Horses have T9 hormones. Gorillas have T9 hormones. Chimps have T9 hormones. Monkeys have T9 hormones. Baboons have T9 hormones. Horses have T9 hormones.
35
The computational problem
? Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant ? Features New property “transfer learning”, “semi-supervised learning” 85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…
36
Hierarchical Bayesian Framework
P(structure | form) P(data | structure) P(form) F: form Tree with species at leaf nodes Background knowledge mouse squirrel chimp gorilla S: structure hormones Has T9 F1 F2 F3 F4 mouse squirrel chimp gorilla ? D: data …
37
The value of structural form knowledge: a more abstract level of inductive bias
38
Hierarchical Bayesian Framework
F: form Tree with species at leaf nodes mouse squirrel chimp gorilla S: structure hormones Has T9 F1 F2 F3 F4 mouse squirrel chimp gorilla ? D: data … Property induction
39
P(D|S): How the structure constrains the data of experience
Define a stochastic process over structure S that generates candidate property extensions h. Intuition: properties should vary smoothly over structure. Smooth: P(h) high Not smooth: P(h) low
40
P(D|S): How the structure constrains the data of experience
Gaussian Process (~ random walk, diffusion) [Zhu, Lafferty & Ghahramani 2003] y Threshold h
41
P(D|S): How the structure constrains the data of experience
Gaussian Process (~ random walk, diffusion) [Zhu, Lafferty & Ghahramani 2003] y Threshold h
42
A graph-based prior Let dij be the length of the edge between i and j
(= if i and j are not connected) A Gaussian prior ~ N(0, S), with (Zhu, Lafferty & Ghahramani, 2003)
43
Structure S Data D Features
Species 1 Species 2 Species 3 Species 4 Species 5 Species 6 Species 7 Species 8 Species 9 Species 10 Features 85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…
45
[c.f., Lawrence, 2004; Smola & Kondor 2003]
46
Structure S Data D Species 1 Species 2 Species 3 Species 4 Species 5 Species 6 Species 7 Species 8 Species 9 Species 10 ? Features New property 85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…
47
Cows have property P. Elephants have property P. Horses have property P. Tree 2D Gorillas have property P. Mice have property P. Seals have property P. All mammals have property P.
48
Testing different priors
Inductive bias Correct bias Wrong bias Too weak bias Too strong bias
49
A connectionist alternative (Rogers and McClelland, 2004)
Species Features Emergent structure: clustering on hidden unit activation vectors
50
Learning about spatial properties
Geographic inference task: “Given that a certain kind of native American artifact has been found in sites near city X, how likely is the same artifact to be found near city Y?” 2D Tree
51
Big question: Do we have slides on “beyond similarity”?
CURRENTLY, NONE OR JUST ONE. Points beyond similarity, and also other ways of learning (e.g., being told directly about relations). Consider boiling down to one slide, a la AFOSR, on fw. No matter what, don’t make the examples interactive. Or, put these slides in the “grand tour” of the conclusion.
52
Beyond similarity-based induction
“Given that A has property P, how likely is it that B does?” Biological property Disease Tree Web e.g., P = “has X cells” Herring Tuna Mako shark Sand shark Dolphin e.g., P = “has X disease” Human Kelp Kelp Human Dolphin Sand shark Mako shark Tuna Herring
53
Hierarchical Bayesian Framework
F: form Chain Tree Space chimp gorilla squirrel mouse mouse squirrel chimp gorilla mouse squirrel S: structure gorilla chimp F1 F2 F3 F4 D: data mouse squirrel chimp gorilla
54
Discovering structural forms
Snake Turtle Crocodile Robin Ostrich Bat Orangutan Discovering structural forms Snake Turtle Crocodile Robin Bat Ostrich Orangutan Ostrich Robin Crocodile Snake Turtle Bat Orangutan
55
Discovering structural forms
Snake Turtle Crocodile Robin Ostrich Bat Orangutan Discovering structural forms “Great chain of being” Rock Plant Snake Turtle Crocodile Robin Bat Ostrich Orangutan Angel God Linnaeus Ostrich Robin Crocodile Snake Turtle Bat Orangutan
56
People can discover structural forms
Scientific discoveries Children’s cognitive development Hierarchical structure of category labels Clique structure of social groups Cyclical structure of seasons or days of the week Transitive structure for value Tree structure for biological species Periodic structure for chemical elements “great chain of being” Systema Naturae Kingdom Animalia Phylum Chordata Class Mammalia Order Primates Family Hominidae Genus Homo Species Homo sapiens (1579) (1735) (1837)
57
Typical structure learning algorithms assume a fixed structural form
Flat Clusters Line Circle K-Means Mixture models Competitive learning Guttman scaling Ideal point models Circumplex models Tree Grid Euclidean Space Hierarchical clustering Bayesian phylogenetics Self-Organizing Map Generative topographic mapping MDS PCA Factor Analysis
58
“Universal Structure Learner”
The ultimate goal “Universal Structure Learner” K-Means Hierarchical clustering Factor Analysis Guttman scaling Circumplex models Self-Organizing maps ··· Data Representation
59
A “universal grammar” for structural forms
Process Form Process
60
Hierarchical Bayesian Framework
F: form Favors simplicity Favors smoothness [Zhu et al., 2003] mouse squirrel chimp gorilla S: structure F1 F2 F3 F4 D: data mouse squirrel chimp gorilla
62
Structural forms from relational data
Dominance hierarchy Tree Cliques Ring Primate troop Bush administration Prison inmates Kula islands “x beats y” “x told y” “x likes y” “x trades with y”
63
Lab studies of learning structural forms
Training: Observe messages passed between employees (a, b, c, …) in a company. Transfer test: Predict messages sent to and from new employees x and y. Link observed in training Link observed in transfer test
64
Notes on prior slide Make sure to explain how the relational structural form model is defined. Make sure to get exp description fluent.
65
Development of structural forms as more data are observed
“blessing of abstraction”
66
Beyond “Nativism” versus “Empiricism”
“Nativism”: Explicit knowledge of structural forms for core domains is innate. Atran (1998): The tendency to group living kinds into hierarchies reflects an “innately determined cognitive structure”. Chomsky (1980): “The belief that various systems of mind are organized along quite different principles leads to the natural conclusion that these systems are intrinsically determined, not simply the result of common mechanisms of learning or growth.” “Empiricism”: General-purpose learning systems without explicit knowledge of structural form. Connectionist networks (e.g., Rogers and McClelland, 2004). Traditional structure learning in probabilistic graphical models.
67
Outline Three case studies in inductive learning. Word learning
Property induction Causal learning
68
Learning causal relations
Bayesian network Data
69
Learning causal relations
Two types of variables: Contact(Object,Machine), Active(Machine) Contact can cause Activation Machines are (near) deterministic Three types of variables: Behavior(X), Disease(X), Symptom(X) Behaviors can cause Diseases Diseases can cause Symptoms Abstract theory (c.f. First-Order Probabilistic Models, BLOG) Bayesian network Data
70
Learning with a uniform prior on network structures:
True network Sample 75 observations… attributes (1-12) patients observed data
71
z 0.0 0.8 0.01 h Learning a block-structured prior on network structures: (Mansinghka et al. UAI 06) 0.0 0.0 0.75 0.0 0.0 0.0 True network Sample 75 observations… attributes (1-12) patients observed data
72
“blessing of abstraction”
True structure of Bayesian network N: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # of samples: Network N edge (G) Data D Classes Z … … class (Z) Abstract Theory c1 … c2 h c1 c2 c1 0.0 0.4 … c2 0.0 0.0 … edge (N) Network N “blessing of abstraction” Data D (Mansinghka, Kemp, Tenenbaum, Griffiths UAI 06)
73
Summary Modeling human inductive learning as Bayesian inference over hierarchies of flexibly structured representations. Word learning Property induction Causal learning Shape varies across categories but not within categories. Texture, color, size vary within categories. Classes of variables: B, D, S Causal laws: B D, D S Abstract knowledge mouse “dax” squirrel Structure “zav” chimp “fep” gorilla F1 F2 F3 F4 “zav” “dax” mouse squirrel chimp gorilla Data “fep” “zav” “dax” “fep” “fep” “dax” “zav”
74
Insights We have the computational tools to begin studying core questions of human learning (and building human-like ML?) What is the content and form of human knowledge, at multiple levels of abstraction? How does abstract domain knowledge guide new learning? How can abstract domain knowledge itself be learned? How can inductive biases be so strong yet so flexible? A different way to think about the development of natural (or artificial?) cognitive systems. Powerful inductive biases can be learned, and can be learned “from the top down”, together with or prior to learning more concrete knowledge. Beyond traditional dichotomies of cog sci (and AI). How can domain-general learning mechanisms acquire domain-specific representations? How can statistical learning work together with symbolic, flexibly structured representations?
76
Extra slides
77
“Universal Grammar” Grammar Phrase structure Utterance Speech signal
Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG) P(phrase structure | grammar) P(utterance | phrase structure) P(speech | utterance) (c.f. Chater and Manning, 2006) P(grammar | UG) Grammar Phrase structure Utterance Speech signal
78
Vision as probabilistic parsing
(Han & Zhu, 2006; c.f., Zhu, Yuanhao & Yuille NIPS 06 )
79
Learning word meanings
Whole-object principle Shape bias Taxonomic principle Contrast principle Basic-level bias Principles Structure Data
80
Word learning Bayesian inference over tree-structured hypothesis space: (Xu & Tenenbaum; Schmidt & Tenenbaum) “tufa” “tufa” “tufa”
81
Causal learning with prior knowledge (Griffiths, Sobel, Tenenbaum & Gopnik)
“Backwards blocking” paradigm: Initial AB Trial A Trial
82
Learning grounded causal models (Goodman, Mansinghka & Tenenbaum)
A child learns that petting the cat leads to purring, while pounding leads to growling. But how to learn these symbolic event concepts over which causal links are defined? a b c a b c a b c a b c
83
The big picture What we need to understand: the mind’s ability to build rich models of the world from sparse data. Learning about objects, categories, and their properties. Causal inference Understanding other people’s actions, plans, thoughts, goals Language comprehension and production Scene understanding What do we need to understand these abilities? Bayesian inference in probabilistic generative models Hierarchical models, with inference at all levels of abstraction Structured representations: graphs, grammars, logic Flexible representations, growing in response to observed data
84
... ... Overhypotheses Syntax: Universal Grammar
Phonology Faithfulness constraints Markedness constraints Word Learning Shape bias Principle of contrast Whole object bias Folk physics Objects are unified, bounded and persistent bodies Predicability M-constraint Folk biology Taxonomic principle (Chomsky) (Prince, Smolensky) (Markman) (Spelke) (Keil) (Atran) ... ...
85
The chicken-and-egg problem of structure learning and feature selection
A raw data matrix:
86
The chicken-and-egg problem of structure learning and feature selection
Conventional clustering (CRP mixture):
87
Learning multiple structures to explain different feature subsets (Shafto, Kemp, Mansinghka, Gordon & Tenenbaum, 2006) CrossCat: System 1 System 2 System 3
88
Beyond similarity-based induction
Inference based on dimensional thresholds: (Smith et al., 1993) Inference based on causal relations: (Medin et al., 2004; Coley & Shafto, 2003) Poodles can bite through wire. German shepherds can bite through wire. Dobermans can bite through wire. German shepherds can bite through wire. Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Salmon carry E. Spirus bacteria.
89
Form of background knowledge
Property type “has T9 hormones” “can bite through wire” “carry E. Spirus bacteria” Form of background knowledge taxonomic tree directed chain directed network + diffusion process drift process noisy transmission Class D Class C Class G Class F Class E Class D Class B Class A Class D Class A Class A Class F Class E Class C Class C Class B Class G Class E Class B Class F Hypotheses Class G Class A Class B Class C Class D Class E Class F Class G . . . . . . . . .
90
Beyond similarity-based induction
“Given that X has property P, how likely is it that Y does?” Herring Biological property Tuna Mako shark Sand shark Dolphin Human Disease property Kelp Tree Web Sand shark (Shafto, Kemp, Bonawitz, Coley & Tenenbaum) Kelp Herring Tuna Mako shark Human Dolphin
91
Node-replacement graph grammars
Production (Line) Derivation
92
Node-replacement graph grammars
Production (Line) Derivation
93
Node-replacement graph grammars
Production (Line) Derivation
94
Model fitting Evaluate each form in parallel
For each form, heuristic search over structures based on greedy growth from a one-node seed:
95
Synthetic 2D data Data: Model Selection results: Flat Line Ring Tree
Continuous features drawn from a Gaussian field over these points. Model Selection results: log posterior probabilities Flat Line Ring Tree Grid Flat Line Ring Tree Grid
96
True Flat Line Ring Tree Grid Scores
97
The “nonparametric safety-net”
12 True structure of graphical model G: 11 1 10 2 9 3 8 4 7 5 6 # of samples: Graph G edge (G) Data D edge (G) Abstract theory Z Graph G class (z) Data D
98
Goal-directed action (production and comprehension)
(Wolpert et al., 2003)
99
Goal inference as inverse probabilistic planning
Constraints Goals Goal inference as inverse probabilistic planning Rational planning (PO)MDP (Baker, Tenenbaum & Saxe) Actions human judgments model predictions
100
Individual differences in concept learning
101
Why probability matching?
Optimal behavior under some (evolutionarily natural) circumstances. Optimal betting theory, portfolio theory Optimal foraging theory Competitive games Dynamic tasks (changing probabilities or utilities) Side-effect of algorithms for approximating complex Bayesian computations. Markov chain Monte Carlo (MCMC): instead of integrating over complex hypothesis spaces, construct a sample of high-probability hypotheses. Judgments from individual (independent) samples can on average be almost as good as using the full posterior distribution.
102
Markov chain Monte Carlo
(Metropolis-Hastings algorithm)
103
Bayesian inference in perception and sensorimotor integration
(Weiss, Simoncelli & Adelson 2002) (Kording & Wolpert 2004)
104
Learning concepts from examples
“tufa” Word learning “tufa” “tufa” Property induction Cows have T9 hormones. Seals have T9 hormones. Squirrels have T9 hormones. All mammals have T9 hormones. Cows have T9 hormones. Sheep have T9 hormones. Goats have T9 hormones. All mammals have T9 hormones.
105
Clustering models for relational data
Social networks: block models Does prisoner x like prisoner y? Does person x respect person y?
106
Learning systems of concepts with infinite relational models (Kemp, Tenenbaum, Griffiths, Yamada & Ueda, AAAI 06) concept predicate concept Biomedical predicate data from UMLS (McCrae et al.): 134 concepts: enzyme, hormone, organ, disease, cell function ... 49 predicates: affects(hormone, organ), complicates(enzyme, cell function), treats(drug, disease), diagnoses(procedure, disease) …
107
Learning a medical ontology
e.g., Diseases affect Organisms Chemicals interact with Chemicals Chemicals cause Diseases
108
Clustering arbitrary relational systems
International relations circa 1965 (Rummel) 14 countries: UK, USA, USSR, China, …. 54 binary relations representing interactions between countries: exports to( USA, UK ), protests( USA, USSR ), …. 90 (dynamic) country features: purges, protests, unemployment, communists, # languages, assassinations, ….
110
Learning a hierarchical ontology
111
Relational Data F: form S: structure D: data 1 2 4 5 7 8 3 6
People cluster into cliques S: structure 1 2 4 5 7 8 3 6 D: data = “x likes y”
113
Bayesian models of cognition
Visual perception [Weiss, Simoncelli, Adelson, Richards, Freeman, Feldman, Kersten, Knill, Maloney, Olshausen, Jacobs, Pouget, ...] Language acquisition and processing [Brent, de Marken, Niyogi, Klein, Manning, Jurafsky, Keller, Levy, Hale, Johnson, Griffiths, Perfors, Tenenbaum, …] Motor learning and motor control [Ghahramani, Jordan, Wolpert, Kording, Kawato, Doya, Todorov, Shadmehr, …] Associative learning [Dayan, Daw, Kakade, Courville, Touretzky, Kruschke, …] Memory [Anderson, Schooler, Shiffrin, Steyvers, Griffiths, McClelland, …] Attention [Mozer, Huber, Torralba, Oliva, Geisler, Yu, Itti, Baldi, …] Categorization and concept learning [Anderson, Nosfosky, Rehder, Navarro, Griffiths, Feldman, Tenenbaum, Rosseel, Goodman, Kemp, Mansinghka, …] Reasoning [Chater, Oaksford, Sloman, McKenzie, Heit, Tenenbaum, Kemp, …] Causal inference [Waldmann, Sloman, Steyvers, Griffiths, Tenenbaum, Yuille, …] Decision making and theory of mind [Lee, Stankiewicz, Rao, Baker, Goodman, Tenenbaum, …]
114
Concept learning Bayesian inference over tree-structured hypothesis space: (Xu & Tenenbaum; Schmidt & Tenenbaum) “tufa” “tufa” “tufa”
115
Some questions How confident are we that a tree-structured model is the best way to characterize this learning task? How do people construct an appropriate tree-structured hypothesis space? What other kinds of structured probabilistic models may be needed to explain other inductive leaps that people make, and how do people acquire these different structured models? Are there general unifying principles that explain our capacity to learn and reason with structured probabilistic models across different domains?
116
Basics of Bayesian inference
Bayes’ rule: An example Data: John is coughing Some hypotheses: John has a cold John has lung cancer John has a stomach flu Likelihood P(d|h) favors 1 and 2 over 3 Prior probability P(h) favors 1 and 3 over 2 Posterior probability P(h|d) favors 1 over 2 and 3
117
Experiments on property induction (Osherson, Smith, Wilkie, Lopez, Shafir, 1990)
20 subjects rated the strength of 45 arguments: X1 have property P. (e.g., Cows have T4 hormones.) X2 have property P. X3 have property P. All mammals have property P [General argument] 20 subjects rated the strength of 36 arguments: X1 have property P. Horses have property P [Specific argument]
118
Feature rating data (Osherson and Wilkie)
People were given 48 animals, 85 features, and asked to rate whether each animal had each feature. E.g., elephant: 'gray' 'hairless' 'toughskin' 'big' 'bulbous' 'longleg' 'tail' 'chewteeth' 'tusks' 'smelly' 'walks' 'slow' 'strong' 'muscle’ 'quadrapedal' 'inactive' 'vegetation' 'grazer' 'oldworld' 'bush' 'jungle' 'ground' 'timid' 'smart' 'group'
119
Beyond similarity-based induction
Reasoning based on dimensional thresholds: (Smith et al., 1993) Reasoning based on causal relations: (Medin et al., 2004; Coley & Shafto, 2003) Poodles can bite through wire. German shepherds can bite through wire. Dobermans can bite through wire. German shepherds can bite through wire. Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Salmon carry E. Spirus bacteria.
120
Different sources for priors
Chimps have T9 hormones. Gorillas have T9 hormones. Taxonomic similarity Poodles can bite through wire. Dobermans can bite through wire. Jaw strength Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Food web relations
121
The approach: from statistics to intelligence
1. How does background knowledge guide learning from sparsely observed data? Bayesian inference: 2. How is background knowledge itself acquired? Hierarchical probabilistic models, with inference at multiple levels of abstraction. Flexible nonparametric models in which complexity grows with the data. 3. What form does background knowledge take, across different domains and tasks? Probabilities defined over structured representations: graphs, grammars, predicate logic, schemas, theories.
123
Draft slides for shape bias in “icml07-draft”
125
Semi-supervised learning: similarity-based approach
“has T9 hormones”, “is a tufa”
126
Semi-supervised learning: similarity-based approach
“has T9 hormones”, “is a tufa”
127
Similarity-based models
Human judgments of argument strength Model predictions Cows have property P. Elephants have property P. Horses have property P. All mammals have property P. Gorillas have property P. Mice have property P. Seals have property P. All mammals have property P.
128
P(D|S): How the structure constrains the data of experience
Define a stochastic process over structure S that generates hypotheses h. For generic properties, prior should favor hypotheses that vary smoothly over structure. Many properties of biological species were actually generated by such a process (i.e., mutation + selection). Smooth: P(h) high Not smooth: P(h) low
129
Cows have property P. Elephants have property P. Horses have property P. Tree 2D Gorillas have property P. Mice have property P. Seals have property P. All mammals have property P.
130
Property induction “Given that X1, …, Xn have property P, how likely is it that Y does?” Tree 2D Horses All mammals Tree 2D Minneapolis Houston (Kemp & Tenenbaum)
131
Perhaps no summary needed – Just go straight to next slide.
Summary so far A framework for modeling human inductive reasoning as rational statistical inference over structured knowledge representations Qualitatively different priors are appropriate for different domains of property induction. In each domain, a prior that matches the world’s structure fits people’s judgments well, and better than alternative priors. A language for representing different theories: graph structure defined over objects + probabilistic model for the distribution of properties over that graph. Remaining question: How can we learn appropriate theories for different domains? Slide will have to be modified if we don’t use other non-sim-based priors. Perhaps no summary needed – Just go straight to next slide.
132
Many sources of priors Chimps have T9 hormones. Similarity
Gorillas have T9 hormones. Similarity Poodles can bite through wire. Dobermans can bite through wire. Jaw strength Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Food web relations
133
People can discover structural forms
Scientists Tree structure for living kinds (Linnaeus) Periodic structure for chemical elements (Mendeleev) Children Hierarchical structure of category labels Clique structure of social groups Cyclical structure of seasons or days of the week Transitive structure for value
134
Learning causal relations
Diseases can cause Symptoms Behaviors can cause Diseases Abstract theory Objects can activate Machines Activation requires contact Machines are (near) deterministic (c.f. First-Order Probabilistic Models, BLOG) Causal structure Data
135
First-order probabilistic theories for causal inference
136
The probabilistic revolution in AI
Principled and effective solutions for inductive inference from ambiguous data: Vision Robotics Machine learning Expert systems / reasoning Natural language processing Standard view: no necessary connection to how the human brain solves these problems. Yet, “Many people in machine learning get into the field because they are interested in how humans learn, rather than how convex functions are optimized :-), and how we can get machines to be more like humans.”
137
… … c1 c2 c1 c2 c1 0.0 0.4 … c2 0.0 0.0 … 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
138
“dax” “zav” “fep” “wif” “wug” “toof”
139
The approach: from statistics to intelligence
How does background knowledge guide learning from sparsely observed data? Bayesian inference: 2. What form does background knowledge take, across different domains and tasks? Probabilities defined over structured representations: graphs, grammars, predicate logic, schemas. 3. How is background knowledge itself acquired? Hierarchical Bayesian models, with inference at multiple levels of abstraction. Flexible nonparametric models in which complexity grows with the data.
140
Central themes Bayesian inference. Structured representations.
Bayesian models rely on a prior, and can therefore incorporate structured background knowledge. Structured representations. Structured representations provide an inductive bias that is crucial when learning from sparse or noisy data. Multiple levels of abstraction. Hierarchical models explain how structured representations can be acquired. Property induction
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.