Download presentation
Presentation is loading. Please wait.
1
Part III Hierarchical Bayesian Models
2
Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG)
3
(Han and Zhu, 2006) Vision
4
Principles Structure Data Whole-object principle Shape bias Taxonomic principle Contrast principle Basic-level bias Word learning
5
Hierarchical Bayesian models Can represent and reason about knowledge at multiple levels of abstraction. Have been used by statisticians for many years.
6
Hierarchical Bayesian models Can represent and reason about knowledge at multiple levels of abstraction. Have been used by statisticians for many years. Have been applied to many cognitive problems: –causal reasoning (Mansinghka et al, 06) –language (Chater and Manning, 06) –vision (Fei-Fei, Fergus, Perona, 03) –word learning (Kemp, Perfors, Tenenbaum,06) –decision making (Lee, 06)
7
Outline A high-level view of HBMs A case study –Semantic knowledge
8
Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG) P(phrase structure | grammar) P(utterance | phrase structure) P(speech | utterance) P(grammar | UG)
9
Phrase structure Utterance Grammar Universal Grammar u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 G U Hierarchical Bayesian model P(G|U) P(s|G) P(u|s)
10
Phrase structure Utterance Grammar Universal Grammar u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 G U A hierarchical Bayesian model specifies a joint distribution over all variables in the hierarchy: P({u i }, {s i }, G | U) = P ({u i } | {s i }) P({s i } | G) P(G|U) Hierarchical Bayesian model P(G|U) P(s|G) P(u|s)
11
Knowledge at multiple levels 1.Top-down inferences: –How does abstract knowledge guide inferences at lower levels? 2.Bottom-up inferences: –How can abstract knowledge be acquired? 3.Simultaneous learning at multiple levels of abstraction
12
Phrase structure Utterance Grammar Universal Grammar u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 G U Top-down inferences Given grammar G and a collection of utterances, construct a phrase structure for each utterance.
13
Phrase structure Utterance Grammar Universal Grammar u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 G U Infer {s i } given {u i }, G: P( {s i } | {u i }, G) α P( {u i } | {s i } ) P( {s i } |G) Top-down inferences
14
Phrase structure Utterance Grammar Universal Grammar u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 G U Bottom-up inferences Given a collection of phrase structures, learn a grammar G.
15
Phrase structure Utterance Grammar Universal Grammar u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 G U Infer G given {s i } and U: P(G| {s i }, U) α P( {s i } | G) P(G|U) Bottom-up inferences
16
Phrase structure Utterance Grammar Universal Grammar u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 G U Given a set of utterances {u i } and innate knowledge U, construct a grammar G and a phrase structure for each utterance. Simultaneous learning at multiple levels
17
Phrase structure Utterance Grammar Universal Grammar u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 G U Simultaneous learning at multiple levels A chicken-or-egg problem: –Given a grammar, phrase structures can be constructed –Given a set of phrase structures, a grammar can be learned
18
Phrase structure Utterance Grammar Universal Grammar u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 G U Infer G and {s i } given {u i } and U: P(G, {s i } | {u i }, U) α P( {u i } | {s i } )P({s i } |G)P(G|U) Simultaneous learning at multiple levels
19
Phrase structure Utterance Grammar Universal Grammar u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 G U Hierarchical Bayesian model P(G|U) P(s|G) P(u|s)
20
Knowledge at multiple levels 1.Top-down inferences: –How does abstract knowledge guide inferences at lower levels? 2.Bottom-up inferences: –How can abstract knowledge be acquired? 3.Simultaneous learning at multiple levels of abstraction
21
Outline A high-level view of HBMs A case study: Semantic knowledge
22
Folk Biology R: principles S: structure D: data mouse squirrel chimp gorilla The relationships between living kinds are well described by tree- structured representations “Gorillas have hands”
23
Folk Biology R: principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree
24
Outline A high-level view of HBMs A case study: Semantic knowledge –Property induction –Learning structured representations –Learning the abstract organizing principles of a domain
25
Property induction R: principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree
26
Property Induction R: Principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion Approach: work with the distribution P(D|S,R)
27
Property Induction Horses have T4 cells. Elephants have T4 cells. All mammals have T4 cells. Horses have T4 cells. Seals have T4 cells. All mammals have T4 cells. Previous approaches: Rips (75), Osherson et al (90), Sloman (93), Heit (98)
28
Hypotheses Bayesian Property Induction
29
Hypotheses Bayesian Property Induction
30
Horses have T4 cells. Elephants have T4 cells. Cows have T4 cells. D C }
31
Choosing a prior Chimps have T4 cells. Gorillas have T4 cells. Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Poodles can bite through wire. Dobermans can bite through wire. Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Taxonomic similarity Jaw strength Food web relations
32
Bayesian Property Induction A challenge: –We have to specify the prior, which typically includes many numbers An opportunity: –The prior can capture knowledge about the problem.
33
Property Induction R: Principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion
34
Biological properties Structure: –Living kinds are organized into a tree Stochastic process: –Nearby species in the tree tend to share properties
35
Structure:
37
Smooth Not smooth Stochastic Process Nearby species in the tree tend to share properties. In other words, properties tend to be smooth over the tree.
38
Hypotheses Stochastic process
39
Generating a property yh where y tends to be smooth over the tree: threshold
40
S
41
The diffusion process where Ө(y i ) is 1 if y i ≥ 0 and 0 otherwise the covariance K encourages y to be smooth over the graph S
42
Let y i be the feature value at node i } i j p(y|S,R): Generating a property (Zhu, Lafferty, Ghahramani 03)
43
Biological properties R: Principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion Approach: work with the distribution P(D|S,R)
44
Horses have T4 cells. Elephants have T4 cells. Cows have T4 cells. D C }
45
Results Dolphins have property P. Seals have property P. Horses have property P. (Osherson et al) Cows have property P. Elephants have property P. Horses have property P. Model Human
46
Results Cows have property P. Elephants have property P. Horses have property P. All mammals have property P. Gorillas have property P. Mice have property P. Seals have property P. All mammals have property P. Model Human
47
Spatial model R: principles S: structure D: data mouse squirrel chimp gorilla Structural form: 2D space Stochastic process: diffusion
48
Structure:
50
Tree vs 2D “horse” “all mammals” Tree + diffusion 2D + diffusion
51
Biological Properties R: Principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion
52
Class C Class A Class D Class E Class G Class F Class B Class C Class A Class D Class E Class G Class F Class B Class C Class G Class F Class E Class D Class B Class A Three inductive contexts R: S: tree + diffusion process chain + drift process network + causal transmission “has T4 cells” “can bite through wire” “carries E. Spirus bacteria”
53
Threshold properties “can bite through wire” “has skin that is more resistant to penetration than most synthetic fibers” Hippo CatLion Camel Elephant Poodle Collie Doberman (Osherson et al; Blok et al)
54
Threshold properties Structure: –The categories can be organized along a single dimension Stochastic process: –Categories towards one end of the dimension are more likely to have the novel property
55
Results “has skin that is more resistant to penetration than most synthetic fibers” (Blok et al, Smith et al) 1D + drift 1D + diffusion
56
Class C Class A Class D Class E Class G Class F Class B Class C Class A Class D Class E Class G Class F Class B Class C Class G Class F Class E Class D Class B Class A Three inductive contexts R: S: tree + diffusion process chain + drift process network + causal transmission “has T4 cells” “can bite through wire” “carries E. Spirus bacteria”
57
Causally transmitted properties (Medin et al; Shafto and Coley) Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Salmon Grizzly bear
58
Causally transmitted properties Structure: –The categories can be organized into a directed network Stochastic process: –Properties are generated by a noisy transmission process
59
Experiment: disease properties Island Mammals (Shafto et al)
60
Results: disease properties Mammals Island Web + transmission
61
Class C Class A Class D Class E Class G Class F Class B Class C Class A Class D Class E Class G Class F Class B Class C Class G Class F Class E Class D Class B Class A Three inductive contexts R: S: tree + diffusion process chain + drift process network + causal transmission “has T4 cells” “can bite through wire” “carries E. Spirus bacteria”
62
Property Induction R: Principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion Approach: work with the distribution P(D|S,R)
63
Conclusions : property induction Hierarchical Bayesian models help to explain how abstract knowledge can be used for induction
64
Outline A high-level view of HBMs A case study: Semantic knowledge –Property induction –Learning structured representations –Learning the abstract organizing principles of a domain
65
Structure learning R: Principles S: structure D: data Structural form: tree Stochastic process: diffusion mouse squirrel chimp gorilla
66
Structure learning R: principles S: structure D: data ? Goal: find S that maximizes P(S|D,R) Structural form: tree Stochastic process: diffusion
67
Structure learning R: principles S: structure D: data ? Goal: find S that maximizes P(S|D,R) α P(D|S,R) P(S|R) Structural form: tree Stochastic process: diffusion
68
Structure learning R: principles S: structure D: data ? Goal: find S that maximizes P(S|D,R) α P(D|S,R) P(S|R) The distribution previously used for property induction Structural form: tree Stochastic process: diffusion
69
mouse squirrel chimp gorilla Generating features over the tree
70
mouse squirrel chimp gorilla Generating features over the tree
71
Structure learning R: principles S: structure D: data ? Goal: find S that maximizes P(S|D,R) α P(D|S,R) P(S|R) Structural form: tree Stochastic process: diffusion
72
P(S|R): Generating structures mouse squirrel chimp gorilla mouse squirrel chimp gorilla mouse squirrel chimpgorilla Consistent with R Inconsistent with R
73
P(S|R): Generating structures mouse squirrel chimp gorilla mouse squirrel chimp gorilla mouse squirrel chimpgorilla Complex Simple
74
P(S|R): Generating structures if S inconsistent with R otherwise Each structure is weighted by the number of nodes it contains: where is the number of nodes in S mouse squirrel chimp gorilla mouse squirrel chimp gorilla mouse squirrel chimpgorilla
75
Structure Learning P(S|D,R) will be high when: –The features in D vary smoothly over S –S is a simple graph (a graph with few nodes) Aim: find S that maximizes P(S|D,R) α P(D|S) P(S|R) R: principles S: structure D: data
76
Structure Learning P(S|D,R) will be high when: –The features in D vary smoothly over S –S is a simple graph (a graph with few nodes) Aim: find S that maximizes P(S|D,R) α P(D|S) P(S|R) R: principles S: structure D: data
77
Participants rated the goodness of 85 features for 48 animals E.g., elephant: gray hairless toughskin big bulbous longleg tail chewteeth tusks smelly walks slow strong muscle quadrapedal inactive vegetation grazer oldworld bush jungle ground timid smart group Structure learning example (Osherson et al)
78
Biological Data Features Animals
79
Tree:
80
Spatial model R: principles S: structure D: data mouse squirrel chimp gorilla Structural form: 2D space Stochastic process: diffusion
81
2D space:
82
Conclusions: structure learning Hierarchical Bayesian models provide a unified framework for the acquisition and use of structured representations
83
Outline A high-level view of HBMs A case study: Semantic knowledge –Property induction –Learning structured representations –Learning the abstract organizing principles of a domain
84
Learning structural form R: principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion
85
OstrichRobinCrocodileSnakeBatOrangutanTurtle Ostrich Robin Crocodile Snake Bat Orangutan Turtle Which form is best?
86
Structural forms Order ChainRingPartition HierarchyTreeGridCylinder
87
Learning structural form R: principles S: structure D: data ? Goal: find S,F that maximize P(S,F|D) could be tree, 2D space, ring, …. Structural form: F Stochastic process: diffusion
88
Learning structural form R: principles S: structure D: data ? Aim: find S,F that maximize P(S,F|D) α P(D|S)P(S|F) P(F) Uniform distribution on the set of forms Structural form: F Stochastic process: diffusion
89
Learning structural form R: principles S: structure D: data ? Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F) The distribution used for property induction Structural form: F Stochastic process: diffusion
90
Learning structural form R: principles S: structure D: data ? Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F) Structural form: F Stochastic process: diffusion The distribution used for structure learning
91
P(S|F): Generating structures from forms if S inconsistent with F otherwise Each structure is weighted by the number of nodes it contains: where is the number of nodes in S mouse squirrel chimp gorilla mouse squirrel chimp gorilla mouse squirrel chimpgorilla
92
Simpler forms are preferred ABC P(S|F): Generating structures from forms D All possible graph structures S P(S|F) Chain Grid
93
Learning structural form F: form S: structure D: data ? Goal: find S,F that maximize P(S,F|D) ?
94
Learning structural form P(S,F|D) will be high when: –The features in D vary smoothly over S –S is a simple graph (a graph with few nodes) –F is a simple form (a form that can generate only a few structures) F: form S: structure D: data Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F)
95
Learning structural form P(S,F|D) will be high when: –The features in D vary smoothly over F –S is a simple graph (a graph with few nodes) –F is a simple form (a form that can generate only a few structures) F: form S: structure D: data Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F)
96
Form learning: Biological Data Features Animals ● 33 animals, 110 features
97
Form learning: Biological Data
98
Supreme Court (Spaeth) ● Votes on 1600 cases (1987-2005)
99
Color (Ekman)
100
Outline A high-level view of HBMs A case study: Semantic knowledge –Property induction –Learning structured representations –Learning the abstract organizing principles of a domain
101
Where do priors come from?
102
mouse squirrel chimp gorilla Stochastic process: diffusion
103
mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion
104
mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion
105
Order ChainRingPartition HierarchyTreeGridCylinder Where do structural forms come from?
106
Form Process
107
Node-replacement graph grammars Production (Chain) Derivation
108
Node-replacement graph grammars Production (Chain) Derivation
109
Node-replacement graph grammars Production (Chain) Derivation
110
Where do structural forms come from? Form Process
111
The complete space of grammars 1 4096...
112
When can we stop adding levels? When the knowledge at the top level is simple or general enough that it can be plausibly assumed to be innate.
113
Conclusions Hierarchical Bayesian models provide a unified framework which can –Explain how abstract knowledge is used for induction –Explain how abstract knowledge can be acquired
114
Learning abstract knowledge Applications of hierarchical Bayesian models at this conference: 1.Semantic knowledge: Schmidt et al. –Learning the M-constraint 2.Syntax: Perfors et al. –Learning that language is hierarchically organized 3.Word learning: Kemp et al. –Learning the shape bias
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.