The causal matrix: Learning the background knowledge that makes causal learning possible Josh Tenenbaum MIT Department of Brain and Cognitive Sciences.

Slides:



Advertisements
Similar presentations
The influence of domain priors on intervention strategy Neil Bramley.
Advertisements

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Elena Popa.  Children’s causal learning and evidence.  Causation, intervention, and Bayes nets.  The conditional intervention principle and Woodward’s.
MEG/EEG Inverse problem and solutions In a Bayesian Framework EEG/MEG SPM course, Bruxelles, 2011 Jérémie Mattout Lyon Neuroscience Research Centre ? ?
Causal learning in humans Alison Gopnik Dept. of Psychology UC-Berkeley.
Introduction of Probabilistic Reasoning and Bayesian Networks
Causes and coincidences Tom Griffiths Cognitive and Linguistic Sciences Brown University.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Part II: Graphical models
The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman.
Bayesian models of inductive learning
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Bayesian models of inductive learning
Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.
Tom Griffiths CogSci C131/Psych C123 Computational Models of Cognition.
Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky.
1 gR2002 Peter Spirtes Carnegie Mellon University.
Modeling fMRI data generated by overlapping cognitive processes with unknown onsets using Hidden Process Models Rebecca A. Hutchinson (1) Tom M. Mitchell.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.
Bayesian approaches to cognitive sciences. Word learning Bayesian property induction Theory-based causal inference.
Soft Computing Lecture 17 Introduction to probabilistic reasoning. Bayesian nets. Markov models.
Psych 156A/ Ling 150: Acquisition of Language II Lecture 16 Language Structure II.
Learning causal theories Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL)
Theory-based causal induction Tom Griffiths Brown University Josh Tenenbaum MIT.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Optimal predictions in everyday cognition Tom Griffiths Josh Tenenbaum Brown University MIT Predicting the future Optimality and Bayesian inference Results.
Introduction to probabilistic models of cognition Josh Tenenbaum MIT.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Tetrad project 
Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT.
Bayesian models of inductive learning Tom Griffiths UC Berkeley Josh Tenenbaum MIT Charles Kemp CMU.
Infinite block models for belief networks, social networks, and cultural knowledge Josh Tenenbaum, MIT 2007 MURI Review Meeting Work of Charles Kemp, Chris.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Course files
Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Basic Bayes: model fitting, model selection, model averaging Josh Tenenbaum MIT.
Human causal induction Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley.
Artificial Intelligence Bayes’ Nets: Independence Instructors: David Suter and Qince Li Course Harbin Institute of Technology [Many slides.
Integrative Genomics I BME 230. Probabilistic Networks Incorporate uncertainty explicitly Capture sparseness of wiring Incorporate multiple kinds of data.
Stochasticity and Probability. A new approach to insight Pose question and think of the answer needed to answer it. Ask: How do the data arise? What is.
Lecture 7: Constrained Conditional Models
CS 188: Artificial Intelligence Spring 2007
Modeling human action understanding as inverse planning
Qian Liu CSE spring University of Pennsylvania
“Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child's?” Alan Turing, 1950.
Knowledge Representation
Data Mining Lecture 11.
CSCI 5822 Probabilistic Models of Human and Machine Learning
Josh Tenenbaum Statistical learning of abstract knowledge:
A Short Tutorial on Causal Network Modeling and Discovery
Revealing priors on category structures through iterated learning
CS 188: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
Bayesian Statistics and Belief Networks
CS 188: Artificial Intelligence Fall 2007
Building and evaluating models of human-level intelligence
CS 188: Artificial Intelligence
A Switching Observer for Human Perceptual Estimation
Knowledge Representation I (Propositional Logic)
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
CS 188: Artificial Intelligence Spring 2006
Version Space Machine Learning Fall 2018.
Learning overhypotheses with hierarchical Bayesian models
CS 188: Artificial Intelligence Fall 2008
Habib Ullah qamar Mscs(se)
Presentation transcript:

The causal matrix: Learning the background knowledge that makes causal learning possible Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL) Acknowledgments: Tom Griffiths, Charles Kemp, The Computational Cognitive Science group at MIT All the researchers whose work I’ll discuss.

Collaborators Tom Griffiths Noah Goodman Vikash Mansinghka Game plan: biology If 5 minutes left at the end, do words and theory acquisition. If no time left, just do theory acquisition. Vikash Mansinghka Charles Kemp

Learning causal relations Structure Data Goal: Computational models that explain how people learn causal relations from data.

A Bayesian approach Data d Causal hypotheses h X4 X3 X4 X3 X1 X2 X1 X2 1. What is the most likely network h given observed data d ? 2. How likely is there to be a link X4 X2 ? (e.g., Griffiths & Tenenbaum, 2005; Steyvers et al 2003)

What’s missing from this account? Framework theories or causal schemas: domain-specific constraints on “natural” causal hypotheses Abstract classes of variables and mechanisms Causal laws defined over these classes Causal variables: constituents of causal hypotheses Which variables are relevant How variables ground out in perceptual and motor experience Causal understanding: domain-general properties of causal models Directionality Locality (sparsity, minimality) Intervention

The approach What we want to understand: How are these different aspects of background knowledge represented, used to support causal learning, and themselves acquired? Abstract domain-specific frameworks or causal schemas Causal variables grounded in sensorimotor experience Domain-general causal understanding What we need to answer these questions: Bayesian inference in probabilistic generative models. Probabilities defined over structured representations: graphs, grammars, predicate logic. Hierarchical probabilistic models, with inference at multiple levels of abstraction. Flexible representations, growing in response to observed data.

Outline Framework theories or causal schemas: domain-specific constraints on “natural” causal hypotheses Abstract classes of variables and mechanisms Causal laws defined over these concepts Causal variables: constituents of causal hypotheses Which variables are relevant How variables ground out in perceptual and motor experience Causal understanding: domain-general properties of causal models Directionality Locality (sparsity, minimality) Intervention

Causal Machines (Gopnik, Sobel, Schulz et al.) Oooh, it’s a blicket! Let’s put this one on the machine. See this? It’s a blicket machine. Blickets make it go.

“Backward blocking” (Sobel, Tenenbaum & Gopnik, 2004) AB Trial A Trial Initially: Nothing on detector – detector silent (A=0, B=0, E=0) Trial 1: A B on detector – detector active (A=1, B=1, E=1) Trial 2: A on detector – detector active (A=1, B=0, E=1) 4-year-olds judge if each object is a blicket A: a blicket (100% say yes) B: probably not a blicket (34% say yes) A B ? ? E

Possible hypotheses? A B A B A B A B A B A B A B A B E E E E E E E E A

Bayesian causal learning With a uniform prior on hypotheses, generic parameterization: Probability of being a blicket: A B 0.32 0.32 0.34 0.34

A stronger hypothesis space generated by abstract domain knowledge Links can only exist from blocks to detectors. Blocks are blickets with prior probability q. Blickets always activate detectors, detectors never activate on their own (i.e., deterministic OR parameterization, no hidden causes). P(h00) = (1 – q)2 P(h01) = (1 – q) q P(h10) = q(1 – q) P(h11) = q2 A B A B A B A B E E E E P(E=1 | A=0, B=0): 0 0 0 0 P(E=1 | A=1, B=0): 0 0 1 1 P(E=1 | A=0, B=1): 0 1 0 1 P(E=1 | A=1, B=1): 0 1 1 1

Manipulating prior probability (Tenenbaum, Sobel, Griffiths, & Gopnik) Initial AB Trial A Trial

Inferences from ambiguous data I. Pre-training phase: Blickets are rare . . . . II. Two trials: A B detector, B C detector A B C Trial 1 Trial 2 After each trial, adults judge the probability that each object is a blicket.

Same domain theory generates hypothesis space for 3 objects: Hypotheses: h000 = h100 = h010 = h001 = h110 = h011 = h101 = h111 = Likelihoods: E E A B C A B C E E A B C A B C E E A B C A B C E E P(E=1| A, B, C; h) = 1 if A = 1 and A E exists, or B = 1 and B E exists, or C = 1 and C E exists, else 0.

“Rare” condition: First observe 12 objects on detector, of which 2 set it off.

4-year-olds (w/ Dave Sobel) I. “Backward blocking” Trial 1 Trial 2 “Is this a blicket?” 100% 25% (Rare) 100% 81% (Common) II. Two trials: A B detector, B C detector A B C Trial 1 Trial 2 “Is this a 87% 56% 56% blicket?”

Formalizing framework theories theory Causal structure Event data

Formalizing framework theories theory You shot the wumpus. Phrase structure Utterance Grammar Causal structure Event data

A framework theory for detectors: probabilistic first-order logic

Formalizing framework theories theory Causal structure Event data

Alternative framework theories Classes = {C} Laws = {C C} Classes = {R, D, S} Laws = {R D, D S} Classes = {R, D, S} Laws = {S D}

The abstract theory constrains possible hypotheses: And rules out others: Allows strong inferences about causal structure from very limited data. Very different from conventional Bayes net learning.

Learning with a uniform prior on network structures: True network Sample 75 observations… attributes (1-12) patients observed data

z 1 2 3 4 0.0 0.8 0.01 5 6 7 8 h Learning a block-structured prior on network structures: (Mansinghka et al. 2006) 0.0 0.0 0.75 9 10 11 12 0.0 0.0 0.0 True network Sample 75 observations… attributes (1-12) patients observed data

True structure of graphical model G: Graph G Data D Abstract Theory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # of samples: 20 80 1000 Graph G edge (G) Data D Classes Z 1 2 3 4 5 6 … 7 8 9 10 11 12 13 14 15 16 … class (z) Abstract Theory c1 … c2 h c1 c2 c1 0.0 0.4 … c2 0.0 0.0 … edge (G) Graph G Data D (Mansinghka, Kemp, Tenenbaum, Griffiths UAI 06)

Human learning of abstract causal frameworks Lien & Cheng (2000) Shanks & Darby (1998) Tenenbaum & Niyogi (2003) Schulz, Goodman, Tenenbaum & Jenkins (submitted) Kemp, Goodman & Tenenbaum (in progress)

The causal blocks world (Tenenbaum and Niyogi, 2003) F L A

? x Learning curves ? Model predictions

Animal learning of abstract causal frameworks? theory O W C G F L A Causal structure Event data

Outline Framework theories or causal schemas: domain-specific constraints on “natural” causal hypotheses Abstract classes of variables and mechanisms Causal laws defined over these concepts Causal variables: constituents of causal hypotheses Which variables are relevant How variables ground out in perceptual and motor experience Causal understanding: domain-general properties of causal models Directionality Locality (sparsity, minimality) Intervention

The problem ? A child learns that petting the cat leads to purring, while pounding leads to growling. But what are the origins of these symbolic event concepts (“variables”) over which causal links are defined? Option 1: Variables are innate. Option 2 (“clusters than causes”): Variables are learned first, independent of causal relations, through a kind of bottom-up perceptual clustering. Option 3: Variables are learned together with causal relations.

A hierarchical Bayesian framework for learning grounded causal models (Goodman, Mansinghka & Tenenbaum, CogSci 07) Hypotheses: Data: … Time t Time t’

“Alien control panel” experiment Condition A Condition B subj: HZ74 Condition C

Mean responses vs. model Blue bars: human proportion of responses Red bars: model posterior probability

Outline Framework theories or causal schemas: domain-specific constraints on “natural” causal hypotheses Abstract classes of variables and mechanisms Causal laws defined over these concepts Causal variables: constituents of causal hypotheses Which variables are relevant How variables ground out in perceptual and motor experience Causal understanding: domain-general properties of causal models Directionality Locality (sparsity, minimality) Intervention

Domain-general causal understanding World: Correlations Temporally directed associative strenghts Bayesian networks: minimal structure fitting conditional dependencies. Causal Bayesian networks (BNs + interventions) y a b x z c Possible alternative models: y a b y a b Blocks are open collections of variables. x z x z c c y a b y a b x z x z c c

Domain-general causal understanding W A W A W A System 1 System 2 System 3 An abstract schema for causal learning in any domain. Essentially equivalent to Pearl- style learning for CBNs. W A Blocks are open collections of variables. System X

Some alternatives W A W A W A V V V … V V V Blocks are open collections of variables. V V V

Some alternatives A A W W A W W A W A W A W A W A W A Blocks are open collections of variables. W A W A W A

Can a Bayesian learner infer the correct domain-general properties of causality, using data from multiple systems, while simultaneously learning how each system works? V , V , W A , W A , W A , W A System 1 System 2 System N ... Sample 1 Sample 2 Sample 1 (Goodman & Tenenbaum) ... ... Sample 3 ...

Yes. pic2.fig

specific system learning examples -- make sure to show blessing of abstraction. pic3.fig

Summary What we want to understand: How are different aspects of background knowledge represented, used to support causal learning, and themselves acquired? Abstract domain-specific frameworks or causal schemas Causal variables grounded in sensorimotor experience Domain-general causal understanding What we need to answer these questions: Bayesian inference in probabilistic generative models. Probabilities defined over structured representations: graphs, grammars, predicate logic. Hierarchical probabilistic models, with inference at multiple levels of abstraction. Flexible representations, growing in response to observed data.

Insights Aspects of background knowledge which have been either taken for granted or presumed to be innate could in fact be learned from data by rational inferential means, together with specific causal relations. Domain-specific frameworks or schemas and domain-general properties of causality could be learned by similar means. Abstract causal knowledge can in some cases be learned more quickly and more easily than specific concrete causal relations (the “blessing of abstraction”).

Bayesian Occam’s Razor All possible data sets d p(D = d | M ) M1 M2 (MacKay, 2003; Ghahramani tutorials) For any model M, Law of “conservation of belief”: A model that can predict many possible data sets must assign each of them low probability.

Learning causation from contingencies C present (c+) C absent (c-) e.g., “Does injecting this chemical cause mice to express a certain gene?” a E present (e+) c E absent (e-) b d Subjects judge the extent C to which causes E (rate on a scale from 0 to 100)

Learning more complex structures Tenenbaum et al., Griffiths & Sobel: detectors with more than two objects and noisy mechanisms Steyvers et al., Sobel & Kushnir: active learning with interventions (c.f. Tong & Koller, Murphy) Lagnado & Sloman: learning from interventions on continuous dynamical systems

Inferring hidden causes Common unobserved cause 4 x 2 x 2 x Independent unobserved causes 1 x 2 x 2 x 2 x 2 x One observed cause The “stick ball” machine 2 x 4 x (Kushnir, Schulz, Gopnik, & Danks, 2003)

Bayesian learning with unknown number of hidden variables (Griffiths et al 2006)

a = 0.3 w = 0.8 r = 0.94 Common unobserved cause Independent unobserved causes One observed cause a = 0.3 w = 0.8 r = 0.94

Inferring latent causes in classical conditioning (Courville, Daw, Gordon, Touretzky 2003) e.g., A noise X tone B click US shock Training: A US A X B US Test: X X B

Summary: causal inference & learning Human causal induction can be explained using core principles of graphical models. Bayesian inference (explaining away, screening off) Bayesian structure learning (Occam’s razor, model averaging) Active learning with interventions Identifying latent causes

Summary: causal inference & learning Crucial constraints on hypothesis spaces come from abstract prior knowledge, or “intuitive theories”. What are the variables? How can they be connected? How are their effects parameterized? Big open questions… How can these theories be described formally? How can these theories be learned?

Learning causal relations Abstract Principles Structure Data (Griffiths, Tenenbaum, Kemp et al.)

“Universal Grammar” Grammar Phrase structure Utterance Speech signal Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG) P(grammar | UG) Grammar P(phrase structure | grammar) Phrase structure P(utterance | phrase structure) Utterance P(speech | utterance) Speech signal (Jurafsky; Levy & Jaeger; Klein & Manning; Perfors et al., ….)

... ... ... ... CBN Framework System 1 System 2 System N We’ve fixed this CBN Framework To learn these System 1 System 2 System N ... Sample 1 Sample 2 Sample 1 ... ... Sample 3 ... Observed these

... ... ... ... Causal Framework System 1 System 2 System N But perhaps we can learn the framework too? Causal Framework System 1 System 2 System N ... Sample 1 Sample 2 Sample 1 ... ... Sample 3 ...

... ... ... ... Causal Framework System 1 System 2 System N Sample 1 so, yes, there’s a bayesian way to evaluate the likelihood of a framework... but what space of frameworks? System 1 System 2 System N ... Sample 1 Sample 2 Sample 1 ... ... Sample 3 ...

Block Structured Causal Frameworks We can consider different kinds of block relations: “may connect” “must connect” “may connect once” “breaks other arrows” This gives us many frameworks: Fully Connected: DAG: CBN: V A W Exogenous Actions: W A Soft Interventions: W A

The approach 1. How does background knowledge guide causal learning from sparsely observed data? Bayesian inference: 2. What form does background knowledge take, across different domains and tasks? Probabilities defined over structured representations: graphs, grammars, predicate logic, schemas, theories. 3. How can background knowledge itself be learned, perhaps together with specific causal relations? Hierarchical probabilistic models, with inference at multiple levels of abstraction. Flexible nonparametric models in which complexity grows with the data.

The approach What we want to understand: How do these different aspects of background knowledge guide learning of causal relations from sparsely observed data? What form does this background knowledge take? How could this background knowledge itself be learned, together with or prior to learning causal relations? What do we need to understand these abilities? Bayesian inference in probabilistic generative models. Probabilities defined over structured representations: graphs, grammars, predicate logic. Hierarchical probabilistic models, with inference at multiple levels of abstraction. Flexible representations, growing in response to observed data.