Statistical learning meets abstract knowledge: A challenge for the probabilistic-connectionist interface Josh Tenenbaum MIT.

Statistical learning meets abstract knowledge: A challenge for the probabilistic-connectionist interface Josh Tenenbaum MIT

Probabilistic models meet connectionism
Where and how do they met up naturally? Where are the challenges? How to address them? Focus on a shared challenge for both approaches in making contact with the rest of cognitive science – non-modelers as well as modelers from other traditions.

(1986) …

… (1986) father(Christopher, Arthur) father(Andrew, James)
mother(Penelope, Arthur) mother(Christine, James) husband(Christopher, Penelope) husband(Andrew, Christine) wife(Penelope, Christopher) wife(Christine, Andrew) son(Arthur, Christopher) son(Arthur, Penelope) daughter(Victoria, Christopher) daughter(Victoria, Penelope) …

(1986) … Enter Bayes?

Probabilistic and connectionist approaches to rules for learning & inference are easy to relate. Learning by backprop with weight decay ~ Bayesian MAP estimation. (Hinton, McClelland, Rumelhart, …) Inference by spreading activation ~ Bayesian belief propagation. (Jordan, Weiss, …) Basic Bayesian inferences can be implemented in semi-realistic neural circuits. (Pouget, Rao, Zemel, …) Some simple Bayesian cognitive models can be approximated in connectionist-style architectures. (Shepard, Tenenbaum, Kruschke, Griffiths, Shi, …)

Probabilistic and connectionist approaches to knowledge representation seem very different. Connectionist models emphasize core representational commitments, inspired by neuroscience. Distributed representations, graded weight vectors, … Probabilistic models make few (if any) representational commitments. Pick your favorite representation… Euclidean spaces ~ Gaussian mixture model Binary feature spaces ~ Bernoulli mixture model Trees ~ Bayesian hierarchical clustering Directed graphs ~ Bayesian networks Undirected graphs ~ Markov random fields, Gaussian fields Continuous functions ~ Gaussian processes Schemas ~ Plates, Probabilistic relational models (PRMs) Grammars ~ Probabilistic context-free grammars Predicate logic ~ Markov logic, BLOG, Bayesian logic programs Functional programs ~ IBAL, Church

Probabilistic and connectionist approaches to knowledge representation seem very different. Connectionist models emphasize core representational commitments, inspired by neuroscience. Probabilistic models make few (if any) representational commitments. Pick your favorite representation… To many cognitive scientists who take knowledge representation seriously, neither approach is very satisfying. Connectionist representations seem too limited, too weak to capture human language, concepts, theories, … Probabilistic models seem too unconstrained. They may work with sophisticated representations, but then those representations “do all the work” and must be “wired in by hand”.

The knowledge representation challenge
Both approaches have a hard road to travel… Connectionism: start with representations that seem too limited and try to show how apparently sophisticated knowledge emerges, given the right learning mechanisms and training experience. Probabilistic models: start with representations that seem too unconstrained or dependent on hand-coding and then show how they may become constrained or learned automatically. … I’ll argue for the latter strategy, but note this is a subjective argument, dependent on my explanatory goals.

Focus on two aspects of knowledge representation that appear critical for explaining the power of human cognition. Symbolic structure Multiple levels of abstraction Key challenge for cognitive modeling is to show how powerful mechanisms of learning and inference (probabilistic, connectionist, …) can operate with these “powerful” representations.

An example Semantic taxonomy (Collins & Quillian, 1969):
Taxonomic tree: Canaries are birds. Birds have wings. Canaries have wings. Abstract laws: - category membership is transitive: - properties inherit to subcategories: is_a(X, Y) ← is_a(X, Z) & is_a(Z, Y) has_a(X, P) ← is_a(X, Y) & has_a(Y, P)

Broader Transfer Deduction: Induction:
Blickets are gazzers. Blickets have wugs. Crubs are gazzers. Crubs have wugs. Delks are gazzers. Delks have wugs. Gazzers have wugs. Deduction: Induction: Blickets are gazzers. Gazzers have wugs. Blickets have wugs. Blickets are gazzers. Blickets have wugs. Gazzers have wugs? Abduction (“categorization as explanation”): Blickets have P1 and P2. Gazzers have P3 and P4. Wugs have P1 and P2. Wugs do not have P3 or P4. Wugs are blickets. Blickets have P1 and P2. Gazzers have P3 and P4. Wugs have P1. Wugs are blickets? is_a(X, Y) ← is_a(X, Z) & is_a(Z, Y) has_a(X, P) ← is_a(X, Y) & has_a(Y, P)

Other examples of probabilistic inference with abstract symbolic representations
Language understanding. “If a burkle tumps that one of its gazzers is about to blick one of its wugs, then the burkle will prin that gazzer.” Reasoning about quantities. Causal reasoning. Social relations. Is the mother of Boris’s father his grandmother? Is the father of Boris’s sister his father? Is the son of Boris’s sister his son? Some blickets are gazzers. All gazzers are wugs. Some blickets are wugs. Most blickets are gazzers. Most gazzers are wugs. Some blickets are wugs? Some blickets are gazzers. Some gazzers are wugs. *Most blickets are wugs. X causes Y. Y causes Z. X occurs. Z occurs. X causes Y. Y causes Z. Z occurs. X occurs? X causes Y. Y prevents Z. Z occurs. X does not occur. X causes Y. Z prevents Y. X occurs. Y occurs?

Other examples of probabilistic inference with abstract symbolic representations
Intuitive physics. Intuitive psychology.

How can powerful mechanisms of learning and inference operate over abstract symbolic knowledge representations? Several prominent connectionist models attempt to capture this kind of behavior as an emergent property… Hinton’s models for learning relational concepts (e.g., learning family trees). Rumelhart, Smolensky, McClelland, Hinton schema model. Models of semantic cognition by Rumelhart & Todd, Rogers & McClelland. … but the jury is still out. (N.B.: I’m deliberately leaving off Smolensky’s recent work, because it is more “implementational” than “emergentist”)

Learning family relationships (Hinton, 1986)
Tests of generalization: 112 possible facts of the form r(X, Y). Network trained on 108 examples can generalize well to the other 4 facts. Doesn’t work well with less training... father(Christopher, Arthur) father(Andrew, James) mother(Penelope, Arthur) mother(Christine, James) husband(Christopher, Penelope) husband(Andrew, Christine) wife(Penelope, Christopher) wife(Christine, Andrew) son(Arthur, Christopher) son(Arthur, Penelope) daughter(Victoria, Christopher) daughter(Victoria, Penelope) …

Learning semantic taxonomies (Rogers & McClelland, 2004)

Broader transfer? Deduction Induction Blickets are gazzers. Gazzers have wugs. Blickets have wugs. Blickets are gazzers. Blickets have wugs. Crubs are gazzers. Crubs have wugs. Delks are gazzers. Delks have wugs. … Gazzers have wugs.

Broader transfer? No. Deduction Induction Blickets are gazzers. Gazzers have wugs. Blickets have wugs. Blickets are gazzers. Blickets have wugs. Crubs are gazzers. Crubs have wugs. Delks are gazzers. Delks have wugs. … Gazzers have wugs. Basic common-sense inferences are not supported because there is no representation of the abstract laws that hold for any categories X, Y and properties P: is_a(X, Y) ← is_a(X, Z) & is_a(Z, Y) has_a(X, P) ← is_a(X, Y) & has_a(Y, P) (c.f., Fodor & Pylyshyn, Marcus)

How can powerful mechanisms of statistical learning and inference operate over abstract symbolic knowledge representations? Several prominent connectionist models attempt to capture this kind of behavior as an emergent property… Hinton’s models for learning relational concepts (e.g., learning family trees). Rumelhart, Smolensky, McClelland, Hinton schema model. Models of semantic cognition by Rumelhart & Todd, Rogers & McClelland. … but the jury is still out. More recently, probabilistic approaches attempt to tackle these problems head on. Arguably more satisfying, if still very incomplete…

Modeling semantic cognition as logical dimensionality reduction (Katz, Goodman, Kemp, Kersting, Tenenbaum, Cog Sci 08) Use abstract symbolic knowledge to define a probabilistic generative model for observable facts, based on a sparser set of unobservable “core” domain facts. [Analogous to generative models for dimensionality reduction] Use Bayesian inference to work backwards from observed facts to core domain structure.

Broader transfer (using Church, Goodman et al. 2008)
Deduction: Induction: Blickets are gazzers. Blickets have wugs. Crubs are gazzers. Crubs have wugs. … Gazzers have wugs. Blickets are gazzers. Gazzers have wugs. Blickets have wugs. Blickets are gazzers. Blickets have wugs. Gazzers have wugs. P = 0.995 # examples P(gen.) Abduction (categorization): Blickets have P1 and P2. Gazzers have P3 and P4. Wugs have P1 and P2. Wugs do not have P3 or P4. Wugs are blickets. Blickets have P1 and P2. Gazzers have P3 and P4. Wugs have P1. Wugs are blickets. Blickets have P1 and P2. Gazzers have P3 and P4. Wugs are blickets. P = 0.33 P = 0.57 P = 0.975

Search is still hard, but…
Not the only option for learning: e.g., Cultural transmission Not impossible: e.g., Inductive logic programming (ILP)

Algorithmic discovery of structural form (Kemp and Tenenbaum, PNAS 2008)
P(F) F: form Dimensional Tree-structured Clustered P(S | F) Simplicity X1 X3 X4 X5 X6 X7 X2 X1 X3 X4 X5 X6 X7 X2 X1 X3 X2 X5 X4 X6 X7 S: structure P(D | S) Fit to data Features X1 X2 X3 X4 X5 X6 X7 D: data …

Grammar framework Grammar Phrase structure Utterance Speech signal
Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG) P(grammar | framework) Grammar P(phrase structure | grammar) Phrase structure P(utterance | phrase structure) Utterance P(speech | utterance) Speech signal (c.f., Chater & Manning, 2006)

Vision as probabilistic parsing
“Analysis by Synthesis” (Han & Zhu, 2006; Yuille, Zhu, Geman, Feldman, …)

Causal learning and reasoning
Principles Structure Data (Griffiths & Tenenbaum; Kemp, Goodman, Tenenbaum; Griffiths & Lucas)

Goal-directed action (production and comprehension)
(Wolpert et al., 2003; Baker, Saxe, Goodman, Tenenbaum; Buschbaum, Gopnik, Griffiths)

Conclusions To understand cognition, we will need to understand how statistical inference and learning can operate over structured (symbolic, abstract) representations – or some functional equivalent thereof. This is a deep problem for both Bayesian and connectionist models, and for their interface. Two very different approaches to the challenge: Connectionist: start with representations that seem too limited and try to show how apparently sophisticated knowledge emerges, given the right learning mechanisms and training experience. Bayesian: start with representations that seem too unconstrained or hand-coded and then show how they may become constrained or learned automatically. At least for this challenge, the top-down approach seems more immediately promising. More generally, when is a top-down, bottom-up, or middle-out most promising?

Discussion: on emergence and reduction
Reductionist programs in science usually flow top-down: First, a non-mechanistic theory achieves striking quantitative success at predicting key macroscopic phenomena. Then, a more mechanistic theory is developed, reducing key concepts of the former theory to emergent properties of more fundamental structures or processes. Examples from physics, chemistry, biology: Thermodynamics -> statistical mechanics Chemistry -> atomic physics -> quantum mechanics Newtonian theory of gravity -> general relativity Mendelian genetics -> DNA (molecular biology of the genome) Top-down approach arguably more relevant to cog sci than physics, because brains are designed (selected) for functionality at the macro level. Potential pitfalls: Premature reduction: early Greek atomists, or “continuous trait” inheritance theorists (Mendel’s rivals, e.g., Galton). Preconceptions about plausible mechanisms: Einstein’s “god does not play dice with the universe”. Concerns for connectionism: we are far from an adequate macro theory of cognition, and the most basic aspects of neural mechanism are not settled (e.g., rate code or spike code? graded or discrete synaptic plasticity?).

Statistical learning meets abstract knowledge: A challenge for the probabilistic-connectionist interface Josh Tenenbaum MIT.

Similar presentations

Presentation on theme: "Statistical learning meets abstract knowledge: A challenge for the probabilistic-connectionist interface Josh Tenenbaum MIT."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistical learning meets abstract knowledge: A challenge for the probabilistic-connectionist interface Josh Tenenbaum MIT.

Similar presentations

Presentation on theme: "Statistical learning meets abstract knowledge: A challenge for the probabilistic-connectionist interface Josh Tenenbaum MIT."— Presentation transcript:

Similar presentations

About project

Feedback