Finding structure in data

Finding structure in data
Josh Tenenbaum MIT MLSS 2010

The big question How does the mind get so much out of so little?
Perceiving the world from sense data Learning about kinds of objects and their properties Inferring causal relations Learning the meanings of words, phrases, and sentences Learning and using intuitive theories of physics, psychology, biology, … Learning social structures, conventions, and rules Example of intuitive psychology: Inferring the mental states of other people (beliefs, desires, preferences) from observing their actions The goal: A general-purpose computational framework for understanding how people make these inferences, and how they can be successful.

The approach How much structure exists? What kind of structure exists?
1. How does knowledge guide learning and inference from sparse data? Bayesian inference in probabilistic generative models. 2. What form does human knowledge take, across different domains and tasks? Probabilities defined over a range of structured representations: spaces, clusters, graphs, grammars, predicate logic, programs. 3. How is more abstract knowledge acquired – balancing complexity versus fit, constraint versus flexibility? Hierarchical models, with inference at multiple levels (“learning to learn”). Nonparametric (“infinite”) models, growing complexity and adapting their structure as the data require. How much structure exists? What kind of structure exists?

Property induction “Similarity” “Typicality” “Diversity”
Gorillas have T9 hormones. Seals have T9 hormones. Anteaters have T9 hormones. “Similarity” “Typicality” “Diversity” Gorillas have T9 hormones. Seals have T9 hormones. Horses have T9 hormones. Gorillas have T9 hormones. Chimps have T9 hormones. Monkeys have T9 hormones. Baboons have T9 hormones. Horses have T9 hormones.

Hierarchical Bayesian Framework (Kemp & Tenenbaum, Psych Review, 2009)
P(structure | form) P(data | structure) P(form) F: form Tree with species at leaf nodes Theory... mouse squirrel chimp gorilla S: structure hormones Has T9 F1 F2 F3 F4 mouse squirrel chimp gorilla ? D: data …

Experiments on property induction (Osherson, Smith, Wilkie, Lopez, Shafir, 1990)
20 subjects rated the strength of 45 arguments: X1 have property P. (e.g., Cows have T4 hormones.) X2 have property P. X3 have property P. All mammals have property P [General argument] 20 subjects rated the strength of 36 arguments: X1 have property P. Horses have property P [Specific argument]

Feature rating data (Osherson and Wilkie)
People were given 48 animals, 85 features, and asked to rate whether each animal had each feature. E.g., elephant: 'gray' 'hairless' 'toughskin' 'big' 'bulbous' 'longleg' 'tail' 'chewteeth' 'tusks' 'smelly' 'walks' 'slow' 'strong' 'muscle’ 'quadrapedal' 'inactive' 'vegetation' 'grazer' 'oldworld' 'bush' 'jungle' 'ground' 'timid' 'smart' 'group‘, …

The computational problem
Horses have T9 hormones. Rhinos have T9 hormones. Cows have T9 hormones. ? Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant ? Features New property Cf. semi-supervised learning, sparse matrix completion

Similarity-based induction (Osherson, Smith, Wilkie, Lopez, Shafir, 1990)
Human judgments of argument strength Model predictions Cows have property P. Elephants have property P. Horses have property P. All mammals have property P. Gorillas have property P. Mice have property P. Seals have property P. All mammals have property P.

Beyond similarity-based induction
Reasoning based on dimensional thresholds: (Smith et al., 1993) Reasoning based on causal relations: (Medin et al., 2004; Coley & Shafto, 2003) Poodles can bite through wire. German shepherds can bite through wire. Dobermans can bite through wire. German shepherds can bite through wire. Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Salmon carry E. Spirus bacteria.

Hierarchical Bayesian Framework
F: form Tree with species at leaf nodes mouse squirrel chimp gorilla S: structure hormones Has T9 F1 F2 F3 F4 mouse squirrel chimp gorilla ? D: data … Property induction

} ... ... X Y Prior P(h) Hypotheses h Horses have T9 hormones
Rhinos have T9 hormones Cows have T9 hormones } X Y Hypotheses h Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant ? ... ... Prior P(h)

} ... ... X Y Prior P(h) Hypotheses h Prediction P(Y | X)
Horses have T9 hormones Rhinos have T9 hormones Cows have T9 hormones } X Y Hypotheses h Prediction P(Y | X) Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant ? ... ... Prior P(h)

Where does the prior come from?
Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant ... ... Prior P(h) A simple empiricist answer? Remember how often in the past you’ve seen each possible feature pattern, and set priors for the corresponding hypotheses proportionately.

The need for inductive bias
Without constraints or a prior on the hypothesis space, learning from sparse data is impossible. An analogy: Learning a smooth probability density by local interpolation (kernel density estimation). Assuming an appropriately structured form for density (e.g., Gaussian) leads to better generalization from sparse data. N = 5 N = 5

Theory-based priors Taxonomic similarity Jaw strength
Chimps have T9 hormones. Gorillas have T9 hormones. Taxonomic similarity Poodles can bite through wire. Dobermans can bite through wire. Jaw strength Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Food web relations

P(h): Taxonomic similarity
Biological species were actually generated by a branching process (evolution). An intuitive taxonomic tree: Choose a structured representation S. Here, assume S is a tree with basic categories at leaf nodes. Biological species were actually created by a (Collins & Quillian, 1969)

P(h): Taxonomic similarity
P(D|S): Define a stochastic process over structure S that generates possible feature vectors f. Intuitively, properties should vary smoothly over structure. Many properties of biological species were actually generated by such a process (i.e., mutation + selection). Smooth: P(f | S) high Not smooth: P(f | S) low

A graph-based prior Let dij = length of the edge between objects i and j (= if i and j are not connected in S), fi = value of the feature for object i. A Gaussian prior ~ N(0, S), with (Zhu, Lafferty & Ghahramani, 2003)

Results (Osherson et al, Smith et al) Cows have property P.
Elephants have property P. Horses have property P. Data Model Dolphins have property P. Seals have property P. Horses have property P. (Osherson et al, Smith et al)

Results Gorillas have property P. Mice have property P.
Seals have property P. All mammals have property P. Data Model Cows have property P. Elephants have property P. Horses have property P. All mammals have property P.

F: form Tree with species at leaf nodes mouse squirrel chimp gorilla S: structure hormones Has T9 F1 F2 F3 F4 Structure learning mouse squirrel chimp gorilla ? D: data …

Structure S Data D Features
Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant Features 85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…

Structure S Features f New property Species 1 Species 2 ? Species 3

F: form Clusters of species mouse squirrel S: structure chimp hormones Has T9 F1 F2 F3 F4 gorilla mouse squirrel chimp gorilla ? D: data …

Best Cluster Structure (DP mixture)
Beaver Otter Rat Weasel Raccoon Chihuahua Persian Cat Siamese Cat Dalmatian Collie German Shepherd Lion Tiger Leopard Wolf Bobcat Fox Polar Bear Grizzly Bear Best Cluster Structure (DP mixture) Cow Pig Ox Sheep Buffalo Moose Horse Zebra Antelope Deer Giraffe Rhinoceros Elephant Hippopotamus Giant Panda Rabbit Mouse Hamster Mole Skunk Squirrel Gorilla Chimp Monkey Bat Structure is a generative model for object-feature matrices Dolphin Seal Humpback Whale Blue Whale Walrus Killer Whale

Gorillas have property P. Mice have property P. Seals have property P.
Cows have property P. Elephants have property P. Horses have property P. Tree Beaver Otter Rat Weasel Raccoon Chihuahua Persian Cat Siamese Cat Dalmatian Collie German Shepherd Lion Tiger Leopard Wolf Bobcat Fox Polar Bear Grizzly Bear Rabbit Mouse Hamster Mole Skunk Squirrel Cow Pig Ox Sheep Buffalo Moose Horse Zebra Antelope Deer Giraffe Rhinoceros Elephant Hippo Giant Panda Clusters Gorilla Chimp Monkey Bat Gorillas have property P. Mice have property P. Seals have property P. All mammals have property P. Dolphin Seal Humpback Whale Blue Whale Walrus Killer Whale

Results with DP mixture model
Persian Cats have property X Otters have property X Conclusion Animal Persian Cats have property X Pigs have property X Argument Strength Persian Cats have property X Blue whales have property X Structure is a generative model for object-feature matrices

F: form Low-dimensional space of species gorilla chimp S: structure mouse squirrel hormones Has T9 F1 F2 F3 F4 mouse squirrel chimp gorilla ? D: data …

[c.f., Lawrence; Smola & Kondor]

Gorillas have property P. Mice have property P. Seals have property P.
Cows have property P. Elephants have property P. Horses have property P. Tree 2D Gorillas have property P. Mice have property P. Seals have property P. All mammals have property P.

Reasoning about spatially varying properties
Geographic inference task: e.g., “Given that a certain kind of native American artifact has been found in sites near city X, how likely is the same artifact to be found near city Y?” 2D Tree

Do people learn explicit structures of different forms?
A connectionist alternative: (Rogers and McClelland, 2004) Species Features Emergent structure: clustering on hidden unit activation vectors

Do people learn explicit structures of different forms?
A sparse graphical models alternative: (Lake and Tenenbaum, 2010)

taxonomic tree directed chain directed network
Property type “has T9 hormones” “can bite through wire” “carry E. Spirus bacteria” Theory Structure taxonomic tree directed chain directed network + diffusion process drift process noisy transmission Class D Class C Class G Class F Class E Class D Class B Class A Class D Class A Class A Class F Class E Class C Class C Class B Class G Class E Class B Class F Properties Class G Class A Class B Class C Class D Class E Class F Class G . . . . . . . . .

Reasoning with linear-threshold properties
drift Hippo Camel Cat Elephant Lion diffusion 1D + Tree + diffusion e.g., “has skin that is more resistant to penetration than most synthetic fibers” Blok et al. 4 colleges Blok et al. 5 colleges Smith et al. Adapts to dark Smith et al. Thick skin

Reasoning with two property types
“Given that X has property P, how likely is it that Y does?” Herring Biological property Tuna Mako shark Sand shark Dolphin Human Disease property Kelp Tree Web Sand shark (Shafto, Kemp, Bonawitz, Coley & Tenenbaum) Kelp Herring Tuna Mako shark Human Dolphin

Summary so far A framework for modeling human inductive reasoning as rational statistical inference over structured knowledge representations. Qualitatively different priors are appropriate for different domains of property induction. In each domain, a prior matching the world’s structure fits people’s judgments well, and better than alternative priors. A language for representing different theories: relational structure defined over objects + probabilistic model for the distribution of properties over that structure. The deepest learning question remain! What is the appropriate form of structure for each domain? When and how should we learn multiple structures, to capture different aspects of a domain?

Hierarchical Bayesian framework
F: form Tree Space Chain X1 X3 X4 X5 X6 X2 X1 X1 X2 X2 X3 X3 X4 S: structure X5 X4 X6 X6 Structure learning X5 Features X1 X2 X3 X4 X5 X6 D: data …

Learning structural forms
People can discover structural forms… Scientists Children e.g., hierarchical structure of category labels, cyclical structure of seasons or days of the week, clique structure of social networks. … but standard learning algorithms assume fixed forms. Principal components analysis: low-dimensional spatial structure Hierarchical clustering: tree structure k-means clustering, mixture models: flat partition. Linnaeus Kingdom Animalia Phylum Chordata Class Mammalia Order Primates Family Hominidae Genus Homo Species Homo sapiens Darwin Mendeleev

Goal: A universal framework for unsupervised learning
“Universal Learner” K-Means Hierarchical clustering Factor Analysis Guttman scaling Circumplex models Self-Organizing maps ··· Data Representation

Hypothesis space of structural forms (Kemp & Tenenbaum, PNAS 2008)
Process Form Process

Node-replacement graph grammars
Production (Line) Derivation

A hierarchical Bayesian approach (Kemp & Tenenbaum, PNAS 2008)
P(F) F: form x P(S | F) Simplicity X1 X3 X4 X5 X6 X2 X1 X2 X1 X2 X6 S: structure X3 X3 X4 X5 X5 X4 X6 P(D | S) Smoothness (Fit to data) How much structure exists? What kind of structure exists? Features X1 X2 X3 X4 X5 X6 D: data …

features animals cases judges

objects similarities objects

How many different ways to structure a domain?

Unsupervised categorization with a nonparametric clustering model (c.f. Anderson, 1990; Griffiths et al):

(Shafto, Kemp, Mansingka, Tenenbaum, submitted) “CrossCat”: nonparametric clustering over features, with a different clustering of objects for each feature-cluster.

Conclusions How does the mind get so much from so little?
A framework for studying the nature, use and acquisition of abstract knowledge: Bayesian inference in probabilistic generative models. Probabilistic models defined over a range of structured representations: spaces, graphs, grammars, predicate logic, schemas, programs. Hierarchical models, with inference at multiple levels of abstraction. Nonparametric models, adapting their complexity to the data and balancing constraint with flexibility. Inspiration and tools for building more new human-like machine-learning systems. What kind of structure exists in a domain? More general, unifying versions of “how much structure exists?”: How many clusters? How many levels in a taxonomy? How many dimensions? How complex is each dimension? How many different structures? Discovery of structural form, CrossCat Feature discovery, Finding the words

Finding structure in data

Similar presentations

Presentation on theme: "Finding structure in data"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Finding structure in data

Similar presentations

Presentation on theme: "Finding structure in data"— Presentation transcript:

Similar presentations

About project

Feedback