Presentation is loading. Please wait.

Presentation is loading. Please wait.

The causal matrix: Learning the background knowledge that makes causal learning possible Josh Tenenbaum MIT Department of Brain and Cognitive Sciences.

Similar presentations


Presentation on theme: "The causal matrix: Learning the background knowledge that makes causal learning possible Josh Tenenbaum MIT Department of Brain and Cognitive Sciences."— Presentation transcript:

1 The causal matrix: Learning the background knowledge that makes causal learning possible Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL) Acknowledgments: Tom Griffiths, Charles Kemp, The Computational Cognitive Science group at MIT All the researchers whose work I’ll discuss.

2 Collaborators Tom Griffiths Noah Goodman Vikash Mansinghka
Game plan: biology If 5 minutes left at the end, do words and theory acquisition. If no time left, just do theory acquisition. Vikash Mansinghka Charles Kemp

3 Learning causal relations
Structure Data Goal: Computational models that explain how people learn causal relations from data.

4 A Bayesian approach Data d Causal hypotheses h
X4 X3 X4 X3 X1 X2 X1 X2 1. What is the most likely network h given observed data d ? 2. How likely is there to be a link X X2 ? (e.g., Griffiths & Tenenbaum, 2005; Steyvers et al 2003)

5 What’s missing from this account?
Framework theories or causal schemas: domain-specific constraints on “natural” causal hypotheses Abstract classes of variables and mechanisms Causal laws defined over these classes Causal variables: constituents of causal hypotheses Which variables are relevant How variables ground out in perceptual and motor experience Causal understanding: domain-general properties of causal models Directionality Locality (sparsity, minimality) Intervention

6 The approach What we want to understand: How are these different aspects of background knowledge represented, used to support causal learning, and themselves acquired? Abstract domain-specific frameworks or causal schemas Causal variables grounded in sensorimotor experience Domain-general causal understanding What we need to answer these questions: Bayesian inference in probabilistic generative models. Probabilities defined over structured representations: graphs, grammars, predicate logic. Hierarchical probabilistic models, with inference at multiple levels of abstraction. Flexible representations, growing in response to observed data.

7 Outline Framework theories or causal schemas: domain-specific constraints on “natural” causal hypotheses Abstract classes of variables and mechanisms Causal laws defined over these concepts Causal variables: constituents of causal hypotheses Which variables are relevant How variables ground out in perceptual and motor experience Causal understanding: domain-general properties of causal models Directionality Locality (sparsity, minimality) Intervention

8 Causal Machines (Gopnik, Sobel, Schulz et al.)
Oooh, it’s a blicket! Let’s put this one on the machine. See this? It’s a blicket machine. Blickets make it go.

9 “Backward blocking” (Sobel, Tenenbaum & Gopnik, 2004)
AB Trial A Trial Initially: Nothing on detector – detector silent (A=0, B=0, E=0) Trial 1: A B on detector – detector active (A=1, B=1, E=1) Trial 2: A on detector – detector active (A=1, B=0, E=1) 4-year-olds judge if each object is a blicket A: a blicket (100% say yes) B: probably not a blicket (34% say yes) A B ? ? E

10 Possible hypotheses? A B A B A B A B A B A B A B A B E E E E E E E E A

11 Bayesian causal learning
With a uniform prior on hypotheses, generic parameterization: Probability of being a blicket: A B 0.32 0.32 0.34 0.34

12 A stronger hypothesis space generated by abstract domain knowledge
Links can only exist from blocks to detectors. Blocks are blickets with prior probability q. Blickets always activate detectors, detectors never activate on their own (i.e., deterministic OR parameterization, no hidden causes). P(h00) = (1 – q)2 P(h01) = (1 – q) q P(h10) = q(1 – q) P(h11) = q2 A B A B A B A B E E E E P(E=1 | A=0, B=0): P(E=1 | A=1, B=0): P(E=1 | A=0, B=1): P(E=1 | A=1, B=1):

13 Manipulating prior probability (Tenenbaum, Sobel, Griffiths, & Gopnik)
Initial AB Trial A Trial

14 Inferences from ambiguous data
I. Pre-training phase: Blickets are rare II. Two trials: A B detector, B C detector A B C Trial 1 Trial 2 After each trial, adults judge the probability that each object is a blicket.

15 Same domain theory generates hypothesis space for 3 objects:
Hypotheses: h000 = h100 = h010 = h001 = h110 = h011 = h101 = h111 = Likelihoods: E E A B C A B C E E A B C A B C E E A B C A B C E E P(E=1| A, B, C; h) = 1 if A = 1 and A E exists, or B = 1 and B E exists, or C = 1 and C E exists, else 0.

16 “Rare” condition: First observe 12 objects on detector, of which 2 set it off.

17 4-year-olds (w/ Dave Sobel)
I. “Backward blocking” Trial 1 Trial 2 “Is this a blicket?” 100% % (Rare) 100% % (Common) II. Two trials: A B detector, B C detector A B C Trial 1 Trial 2 “Is this a % 56% 56% blicket?”

18 Formalizing framework theories
theory Causal structure Event data

19 Formalizing framework theories
theory You shot the wumpus. Phrase structure Utterance Grammar Causal structure Event data

20 A framework theory for detectors: probabilistic first-order logic

21 Formalizing framework theories
theory Causal structure Event data

22 Alternative framework theories
Classes = {C} Laws = {C C} Classes = {R, D, S} Laws = {R D, D S} Classes = {R, D, S} Laws = {S D}

23 The abstract theory constrains possible hypotheses:
And rules out others: Allows strong inferences about causal structure from very limited data. Very different from conventional Bayes net learning.

24 Learning with a uniform prior on network structures:
True network Sample 75 observations… attributes (1-12) patients observed data

25 z 0.0 0.8 0.01 h Learning a block-structured prior on network structures: (Mansinghka et al. 2006) 0.0 0.0 0.75 0.0 0.0 0.0 True network Sample 75 observations… attributes (1-12) patients observed data

26 True structure of graphical model G: Graph G Data D Abstract Theory
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # of samples: Graph G edge (G) Data D Classes Z class (z) Abstract Theory c1 c2 h c1 c2 c1 0.0 0.4 c2 0.0 0.0 edge (G) Graph G Data D (Mansinghka, Kemp, Tenenbaum, Griffiths UAI 06)

27 Human learning of abstract causal frameworks
Lien & Cheng (2000) Shanks & Darby (1998) Tenenbaum & Niyogi (2003) Schulz, Goodman, Tenenbaum & Jenkins (submitted) Kemp, Goodman & Tenenbaum (in progress)

28 The causal blocks world (Tenenbaum and Niyogi, 2003)
F L A

29 ? x Learning curves ? Model predictions

30 Animal learning of abstract causal frameworks?
theory O W C G F L A Causal structure Event data

31 Outline Framework theories or causal schemas: domain-specific constraints on “natural” causal hypotheses Abstract classes of variables and mechanisms Causal laws defined over these concepts Causal variables: constituents of causal hypotheses Which variables are relevant How variables ground out in perceptual and motor experience Causal understanding: domain-general properties of causal models Directionality Locality (sparsity, minimality) Intervention

32 The problem ? A child learns that petting the cat leads to purring, while pounding leads to growling. But what are the origins of these symbolic event concepts (“variables”) over which causal links are defined? Option 1: Variables are innate. Option 2 (“clusters than causes”): Variables are learned first, independent of causal relations, through a kind of bottom-up perceptual clustering. Option 3: Variables are learned together with causal relations.

33 A hierarchical Bayesian framework for learning grounded causal models (Goodman, Mansinghka & Tenenbaum, CogSci 07) Hypotheses: Data: Time t Time t’

34 “Alien control panel” experiment
Condition A Condition B subj: HZ74 Condition C

35 Mean responses vs. model
Blue bars: human proportion of responses Red bars: model posterior probability

36 Outline Framework theories or causal schemas: domain-specific constraints on “natural” causal hypotheses Abstract classes of variables and mechanisms Causal laws defined over these concepts Causal variables: constituents of causal hypotheses Which variables are relevant How variables ground out in perceptual and motor experience Causal understanding: domain-general properties of causal models Directionality Locality (sparsity, minimality) Intervention

37 Domain-general causal understanding
World: Correlations Temporally directed associative strenghts Bayesian networks: minimal structure fitting conditional dependencies. Causal Bayesian networks (BNs + interventions) y a b x z c Possible alternative models: y a b y a b Blocks are open collections of variables. x z x z c c y a b y a b x z x z c c

38 Domain-general causal understanding
W A W A W A System 1 System 2 System 3 An abstract schema for causal learning in any domain. Essentially equivalent to Pearl style learning for CBNs. W A Blocks are open collections of variables. System X

39 Some alternatives W A W A W A V V V … V V V
Blocks are open collections of variables. V V V

40 Some alternatives A A W W A W W A W A W A W A W A W A
Blocks are open collections of variables. W A W A W A

41 Can a Bayesian learner infer the correct domain-general properties of causality, using data from multiple systems, while simultaneously learning how each system works? V , V , W A , W A , W A , W A System 1 System 2 System N ... Sample 1 Sample 2 Sample 1 (Goodman & Tenenbaum) ... ... Sample 3 ...

42 Yes. pic2.fig

43 specific system learning examples -- make sure to show blessing of abstraction.
pic3.fig

44 Summary What we want to understand: How are different aspects of background knowledge represented, used to support causal learning, and themselves acquired? Abstract domain-specific frameworks or causal schemas Causal variables grounded in sensorimotor experience Domain-general causal understanding What we need to answer these questions: Bayesian inference in probabilistic generative models. Probabilities defined over structured representations: graphs, grammars, predicate logic. Hierarchical probabilistic models, with inference at multiple levels of abstraction. Flexible representations, growing in response to observed data.

45 Insights Aspects of background knowledge which have been either taken for granted or presumed to be innate could in fact be learned from data by rational inferential means, together with specific causal relations. Domain-specific frameworks or schemas and domain-general properties of causality could be learned by similar means. Abstract causal knowledge can in some cases be learned more quickly and more easily than specific concrete causal relations (the “blessing of abstraction”).

46

47 Bayesian Occam’s Razor
All possible data sets d p(D = d | M ) M1 M2 (MacKay, 2003; Ghahramani tutorials) For any model M, Law of “conservation of belief”: A model that can predict many possible data sets must assign each of them low probability.

48 Learning causation from contingencies
C present (c+) C absent (c-) e.g., “Does injecting this chemical cause mice to express a certain gene?” a E present (e+) c E absent (e-) b d Subjects judge the extent C to which causes E (rate on a scale from 0 to 100)

49 Learning more complex structures
Tenenbaum et al., Griffiths & Sobel: detectors with more than two objects and noisy mechanisms Steyvers et al., Sobel & Kushnir: active learning with interventions (c.f. Tong & Koller, Murphy) Lagnado & Sloman: learning from interventions on continuous dynamical systems

50 Inferring hidden causes
Common unobserved cause 4 x 2 x 2 x Independent unobserved causes 1 x 2 x 2 x 2 x 2 x One observed cause The “stick ball” machine 2 x 4 x (Kushnir, Schulz, Gopnik, & Danks, 2003)

51 Bayesian learning with unknown number of hidden variables
(Griffiths et al 2006)

52 a = 0.3 w = 0.8 r = 0.94 Common unobserved cause
Independent unobserved causes One observed cause a = 0.3 w = 0.8 r = 0.94

53 Inferring latent causes in classical conditioning (Courville, Daw, Gordon, Touretzky 2003)
e.g., A noise X tone B click US shock Training: A US A X B US Test: X X B

54 Summary: causal inference & learning
Human causal induction can be explained using core principles of graphical models. Bayesian inference (explaining away, screening off) Bayesian structure learning (Occam’s razor, model averaging) Active learning with interventions Identifying latent causes

55 Summary: causal inference & learning
Crucial constraints on hypothesis spaces come from abstract prior knowledge, or “intuitive theories”. What are the variables? How can they be connected? How are their effects parameterized? Big open questions… How can these theories be described formally? How can these theories be learned?

56 Learning causal relations
Abstract Principles Structure Data (Griffiths, Tenenbaum, Kemp et al.)

57 “Universal Grammar” Grammar Phrase structure Utterance Speech signal
Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG) P(grammar | UG) Grammar P(phrase structure | grammar) Phrase structure P(utterance | phrase structure) Utterance P(speech | utterance) Speech signal (Jurafsky; Levy & Jaeger; Klein & Manning; Perfors et al., ….)

58 ... ... ... ... CBN Framework System 1 System 2 System N
We’ve fixed this CBN Framework To learn these System 1 System 2 System N ... Sample 1 Sample 2 Sample 1 ... ... Sample 3 ... Observed these

59 ... ... ... ... Causal Framework System 1 System 2 System N
But perhaps we can learn the framework too? Causal Framework System 1 System 2 System N ... Sample 1 Sample 2 Sample 1 ... ... Sample 3 ...

60 ... ... ... ... Causal Framework System 1 System 2 System N Sample 1
so, yes, there’s a bayesian way to evaluate the likelihood of a framework... but what space of frameworks? System 1 System 2 System N ... Sample 1 Sample 2 Sample 1 ... ... Sample 3 ...

61 Block Structured Causal Frameworks
We can consider different kinds of block relations: “may connect” “must connect” “may connect once” “breaks other arrows” This gives us many frameworks: Fully Connected: DAG: CBN: V A W Exogenous Actions: W A Soft Interventions: W A

62 The approach 1. How does background knowledge guide causal learning from sparsely observed data? Bayesian inference: 2. What form does background knowledge take, across different domains and tasks? Probabilities defined over structured representations: graphs, grammars, predicate logic, schemas, theories. 3. How can background knowledge itself be learned, perhaps together with specific causal relations? Hierarchical probabilistic models, with inference at multiple levels of abstraction. Flexible nonparametric models in which complexity grows with the data.

63 The approach What we want to understand:
How do these different aspects of background knowledge guide learning of causal relations from sparsely observed data? What form does this background knowledge take? How could this background knowledge itself be learned, together with or prior to learning causal relations? What do we need to understand these abilities? Bayesian inference in probabilistic generative models. Probabilities defined over structured representations: graphs, grammars, predicate logic. Hierarchical probabilistic models, with inference at multiple levels of abstraction. Flexible representations, growing in response to observed data.


Download ppt "The causal matrix: Learning the background knowledge that makes causal learning possible Josh Tenenbaum MIT Department of Brain and Cognitive Sciences."

Similar presentations


Ads by Google