Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Part III Hierarchical Bayesian Models

Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG)

(Han and Zhu, 2006) Vision

Principles Structure Data Whole-object principle Shape bias Taxonomic principle Contrast principle Basic-level bias Word learning

Hierarchical Bayesian models Can represent and reason about knowledge at multiple levels of abstraction. Have been used by statisticians for many years.

Hierarchical Bayesian models Can represent and reason about knowledge at multiple levels of abstraction. Have been used by statisticians for many years. Have been applied to many cognitive problems: –causal reasoning (Mansinghka et al, 06) –language (Chater and Manning, 06) –vision (Fei-Fei, Fergus, Perona, 03) –word learning (Kemp, Perfors, Tenenbaum,06) –decision making (Lee, 06)

Outline A high-level view of HBMs A case study –Semantic knowledge

Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG) P(phrase structure | grammar) P(utterance | phrase structure) P(speech | utterance) P(grammar | UG)

Phrase structure Utterance Grammar Universal Grammar u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 G U Hierarchical Bayesian model P(G|U) P(s|G) P(u|s)

Phrase structure Utterance Grammar Universal Grammar u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 G U A hierarchical Bayesian model specifies a joint distribution over all variables in the hierarchy: P({u i }, {s i }, G | U) = P ({u i } | {s i }) P({s i } | G) P(G|U) Hierarchical Bayesian model P(G|U) P(s|G) P(u|s)

Knowledge at multiple levels 1.Top-down inferences: –How does abstract knowledge guide inferences at lower levels? 2.Bottom-up inferences: –How can abstract knowledge be acquired? 3.Simultaneous learning at multiple levels of abstraction

Phrase structure Utterance Grammar Universal Grammar u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 G U Top-down inferences Given grammar G and a collection of utterances, construct a phrase structure for each utterance.

Phrase structure Utterance Grammar Universal Grammar u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 G U Infer {s i } given {u i }, G: P( {s i } | {u i }, G) α P( {u i } | {s i } ) P( {s i } |G) Top-down inferences

Phrase structure Utterance Grammar Universal Grammar u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 G U Bottom-up inferences Given a collection of phrase structures, learn a grammar G.

Phrase structure Utterance Grammar Universal Grammar u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 G U Infer G given {s i } and U: P(G| {s i }, U) α P( {s i } | G) P(G|U) Bottom-up inferences

Phrase structure Utterance Grammar Universal Grammar u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 G U Given a set of utterances {u i } and innate knowledge U, construct a grammar G and a phrase structure for each utterance. Simultaneous learning at multiple levels

Phrase structure Utterance Grammar Universal Grammar u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 G U Simultaneous learning at multiple levels A chicken-or-egg problem: –Given a grammar, phrase structures can be constructed –Given a set of phrase structures, a grammar can be learned

Phrase structure Utterance Grammar Universal Grammar u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 G U Infer G and {s i } given {u i } and U: P(G, {s i } | {u i }, U) α P( {u i } | {s i } )P({s i } |G)P(G|U) Simultaneous learning at multiple levels

Phrase structure Utterance Grammar Universal Grammar u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 G U Hierarchical Bayesian model P(G|U) P(s|G) P(u|s)

Knowledge at multiple levels 1.Top-down inferences: –How does abstract knowledge guide inferences at lower levels? 2.Bottom-up inferences: –How can abstract knowledge be acquired? 3.Simultaneous learning at multiple levels of abstraction

Outline A high-level view of HBMs A case study: Semantic knowledge

Folk Biology R: principles S: structure D: data mouse squirrel chimp gorilla The relationships between living kinds are well described by tree- structured representations “Gorillas have hands”

Folk Biology R: principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree

Outline A high-level view of HBMs A case study: Semantic knowledge –Property induction –Learning structured representations –Learning the abstract organizing principles of a domain

Property induction R: principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree

Property Induction R: Principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion Approach: work with the distribution P(D|S,R)

Property Induction Horses have T4 cells. Elephants have T4 cells. All mammals have T4 cells. Horses have T4 cells. Seals have T4 cells. All mammals have T4 cells. Previous approaches: Rips (75), Osherson et al (90), Sloman (93), Heit (98)

Hypotheses Bayesian Property Induction

Horses have T4 cells. Elephants have T4 cells. Cows have T4 cells. D C }

Choosing a prior Chimps have T4 cells. Gorillas have T4 cells. Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Poodles can bite through wire. Dobermans can bite through wire. Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Taxonomic similarity Jaw strength Food web relations

Bayesian Property Induction A challenge: –We have to specify the prior, which typically includes many numbers An opportunity: –The prior can capture knowledge about the problem.

Property Induction R: Principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion

Biological properties Structure: –Living kinds are organized into a tree Stochastic process: –Nearby species in the tree tend to share properties

Structure:

Smooth Not smooth Stochastic Process Nearby species in the tree tend to share properties. In other words, properties tend to be smooth over the tree.

Hypotheses Stochastic process

Generating a property yh where y tends to be smooth over the tree: threshold

The diffusion process where Ө(y i ) is 1 if y i ≥ 0 and 0 otherwise the covariance K encourages y to be smooth over the graph S

Let y i be the feature value at node i } i j p(y|S,R): Generating a property (Zhu, Lafferty, Ghahramani 03)

Biological properties R: Principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion Approach: work with the distribution P(D|S,R)

Horses have T4 cells. Elephants have T4 cells. Cows have T4 cells. D C }

Results Dolphins have property P. Seals have property P. Horses have property P. (Osherson et al) Cows have property P. Elephants have property P. Horses have property P. Model Human

Results Cows have property P. Elephants have property P. Horses have property P. All mammals have property P. Gorillas have property P. Mice have property P. Seals have property P. All mammals have property P. Model Human

Spatial model R: principles S: structure D: data mouse squirrel chimp gorilla Structural form: 2D space Stochastic process: diffusion

Structure:

Tree vs 2D “horse” “all mammals” Tree + diffusion 2D + diffusion

Biological Properties R: Principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion

Class C Class A Class D Class E Class G Class F Class B Class C Class A Class D Class E Class G Class F Class B Class C Class G Class F Class E Class D Class B Class A Three inductive contexts R: S: tree + diffusion process chain + drift process network + causal transmission “has T4 cells” “can bite through wire” “carries E. Spirus bacteria”

Threshold properties “can bite through wire” “has skin that is more resistant to penetration than most synthetic fibers” Hippo CatLion Camel Elephant Poodle Collie Doberman (Osherson et al; Blok et al)

Threshold properties Structure: –The categories can be organized along a single dimension Stochastic process: –Categories towards one end of the dimension are more likely to have the novel property

Results “has skin that is more resistant to penetration than most synthetic fibers” (Blok et al, Smith et al) 1D + drift 1D + diffusion

Causally transmitted properties (Medin et al; Shafto and Coley) Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Salmon Grizzly bear

Causally transmitted properties Structure: –The categories can be organized into a directed network Stochastic process: –Properties are generated by a noisy transmission process

Experiment: disease properties Island Mammals (Shafto et al)

Results: disease properties Mammals Island Web + transmission

Property Induction R: Principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion Approach: work with the distribution P(D|S,R)

Conclusions : property induction Hierarchical Bayesian models help to explain how abstract knowledge can be used for induction

Structure learning R: Principles S: structure D: data Structural form: tree Stochastic process: diffusion mouse squirrel chimp gorilla

Structure learning R: principles S: structure D: data ? Goal: find S that maximizes P(S|D,R) Structural form: tree Stochastic process: diffusion

Structure learning R: principles S: structure D: data ? Goal: find S that maximizes P(S|D,R) α P(D|S,R) P(S|R) Structural form: tree Stochastic process: diffusion

Structure learning R: principles S: structure D: data ? Goal: find S that maximizes P(S|D,R) α P(D|S,R) P(S|R) The distribution previously used for property induction Structural form: tree Stochastic process: diffusion

mouse squirrel chimp gorilla Generating features over the tree

Structure learning R: principles S: structure D: data ? Goal: find S that maximizes P(S|D,R) α P(D|S,R) P(S|R) Structural form: tree Stochastic process: diffusion

P(S|R): Generating structures mouse squirrel chimp gorilla mouse squirrel chimp gorilla mouse squirrel chimpgorilla Consistent with R Inconsistent with R

P(S|R): Generating structures mouse squirrel chimp gorilla mouse squirrel chimp gorilla mouse squirrel chimpgorilla Complex Simple

P(S|R): Generating structures if S inconsistent with R otherwise Each structure is weighted by the number of nodes it contains: where is the number of nodes in S mouse squirrel chimp gorilla mouse squirrel chimp gorilla mouse squirrel chimpgorilla

Structure Learning P(S|D,R) will be high when: –The features in D vary smoothly over S –S is a simple graph (a graph with few nodes) Aim: find S that maximizes P(S|D,R) α P(D|S) P(S|R) R: principles S: structure D: data

Participants rated the goodness of 85 features for 48 animals E.g., elephant: gray hairless toughskin big bulbous longleg tail chewteeth tusks smelly walks slow strong muscle quadrapedal inactive vegetation grazer oldworld bush jungle ground timid smart group Structure learning example (Osherson et al)

Biological Data Features Animals

Spatial model R: principles S: structure D: data mouse squirrel chimp gorilla Structural form: 2D space Stochastic process: diffusion

2D space:

Conclusions: structure learning Hierarchical Bayesian models provide a unified framework for the acquisition and use of structured representations

Learning structural form R: principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion

OstrichRobinCrocodileSnakeBatOrangutanTurtle Ostrich Robin Crocodile Snake Bat Orangutan Turtle Which form is best?

Structural forms Order ChainRingPartition HierarchyTreeGridCylinder

Learning structural form R: principles S: structure D: data ? Goal: find S,F that maximize P(S,F|D) could be tree, 2D space, ring, …. Structural form: F Stochastic process: diffusion

Learning structural form R: principles S: structure D: data ? Aim: find S,F that maximize P(S,F|D) α P(D|S)P(S|F) P(F) Uniform distribution on the set of forms Structural form: F Stochastic process: diffusion

Learning structural form R: principles S: structure D: data ? Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F) The distribution used for property induction Structural form: F Stochastic process: diffusion

Learning structural form R: principles S: structure D: data ? Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F) Structural form: F Stochastic process: diffusion The distribution used for structure learning

P(S|F): Generating structures from forms if S inconsistent with F otherwise Each structure is weighted by the number of nodes it contains: where is the number of nodes in S mouse squirrel chimp gorilla mouse squirrel chimp gorilla mouse squirrel chimpgorilla

Simpler forms are preferred ABC P(S|F): Generating structures from forms D All possible graph structures S P(S|F) Chain Grid

Learning structural form F: form S: structure D: data ? Goal: find S,F that maximize P(S,F|D) ?

Learning structural form P(S,F|D) will be high when: –The features in D vary smoothly over S –S is a simple graph (a graph with few nodes) –F is a simple form (a form that can generate only a few structures) F: form S: structure D: data Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F)

Learning structural form P(S,F|D) will be high when: –The features in D vary smoothly over F –S is a simple graph (a graph with few nodes) –F is a simple form (a form that can generate only a few structures) F: form S: structure D: data Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F)

Form learning: Biological Data Features Animals ● 33 animals, 110 features

Form learning: Biological Data

Supreme Court (Spaeth) ● Votes on 1600 cases (1987-2005)

Color (Ekman)

Where do priors come from?

mouse squirrel chimp gorilla Stochastic process: diffusion

mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion

Order ChainRingPartition HierarchyTreeGridCylinder Where do structural forms come from?

Form Process

Node-replacement graph grammars Production (Chain) Derivation

Where do structural forms come from? Form Process

The complete space of grammars 1 4096...

When can we stop adding levels? When the knowledge at the top level is simple or general enough that it can be plausibly assumed to be innate.

Conclusions Hierarchical Bayesian models provide a unified framework which can –Explain how abstract knowledge is used for induction –Explain how abstract knowledge can be acquired

Learning abstract knowledge Applications of hierarchical Bayesian models at this conference: 1.Semantic knowledge: Schmidt et al. –Learning the M-constraint 2.Syntax: Perfors et al. –Learning that language is hierarchically organized 3.Word learning: Kemp et al. –Learning the shape bias

Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Similar presentations

Presentation on theme: "Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Similar presentations

Presentation on theme: "Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,"— Presentation transcript:

Similar presentations

About project

Feedback