Presentation is loading. Please wait.

Presentation is loading. Please wait.

Theory-based causal induction Tom Griffiths Brown University Josh Tenenbaum MIT.

Similar presentations


Presentation on theme: "Theory-based causal induction Tom Griffiths Brown University Josh Tenenbaum MIT."— Presentation transcript:

1 Theory-based causal induction Tom Griffiths Brown University Josh Tenenbaum MIT

2 Three kinds of causal induction

3 contingency data

4 “To what extent does C cause E?” (rate on a scale from 0 to 100) E present (e + ) E absent (e - ) C present (c + ) C absent (c - ) a b c d

5 Three kinds of causal induction contingency data physical systems

6 AB The stick-ball machine (Kushnir, Schulz, Gopnik, & Danks, 2003)

7 Three kinds of causal induction contingency data physical systems perceived causality

8 Michotte (1963)

9

10 Three kinds of causal induction contingency data physical systems perceived causality bottom-up covariation information top-down mechanism knowledge object physics module

11 Three kinds of causal induction contingency data physical systems perceived causality more constrainedless constrained prior knowledge + statistical inference more data less data

12 prior knowledge + statistical inference

13 Theory-based causal induction Theory Bayesian inference X Y Z X Y Z X Y Z X Y Z Hypothesis space generates Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0... Data generates

14 An analogy to language Theory X Y Z X Y Z X Y Z X Y Z Hypothesis space generates Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0... Data generates Grammar Parse trees generates Sentence generates The quick brown fox …

15 Outline contingency data physical systems perceived causality

16 Outline contingency data physical systems perceived causality

17 “To what extent does C cause E?” (rate on a scale from 0 to 100) E present (e + ) E absent (e - ) C present (c + ) C absent (c - ) a b c d

18 Buehner & Cheng (1997) “To what extent does the chemical cause gene expression?” (rate on a scale from 0 to 100) E present (e + ) E absent (e - ) C present (c + ) C absent (c - ) 6 2 4 4 Gene Chemical

19 Humans Buehner & Cheng (1997) Showed participants all combinations of P(e+|c+) and P(e+|c-) in increments of 0.25

20 Humans Buehner & Cheng (1997) Showed participants all combinations of P(e+|c+) and P(e+|c-) in increments of 0.25 Curious phenomenon: “frequency illusion”: –why do people’s judgments change when the cause does not change the probability of the effect?

21 Causal graphical models Framework for representing, reasoning, and learning about causality (also called Bayes nets) (Pearl, 2000; Spirtes, Glymour, & Schienes, 1993) Becoming widespread in psychology (Glymour, 2001; Gopnik et al., 2004; Lagnado & Sloman, 2002; Tenenbaum & Griffiths, 2001; Steyvers et al., 2003; Waldmann & Martignon, 1998)

22 Causal graphical models X Y Z Variables

23 Causal graphical models X Y Z Variables Structure

24 Causal graphical models X Y Z Variables Structure Conditional probabilities P(Z|X,Y) P(X) P(Y) Defines probability distribution over variables (for both observation, and intervention)

25 Causal graphical models Provide a basic framework for representing causal systems But… where is the prior knowledge?

26 Hamadeh et al. (2002) Toxicological sciences. ClofibrateWyeth 14,643 GemfibrozilPhenobarbital p450 2B1 Carnitine Palmitoyl Transferase 1 chemicals genes

27 ClofibrateWyeth 14,643 GemfibrozilPhenobarbital p450 2B1 Carnitine Palmitoyl Transferase 1 X Hamadeh et al. (2002) Toxicological sciences. chemicals genes

28 ClofibrateWyeth 14,643 GemfibrozilPhenobarbital p450 2B1 Carnitine Palmitoyl Transferase 1 Chemical X + + + peroxisome proliferators Hamadeh et al. (2002) Toxicological sciences. chemicals genes

29 Beyond causal graphical models Prior knowledge produces expectations about: –types of entities –plausible relations –functional form This cannot be captured by graphical models A theory consists of three interrelated components: a set of phenomena that are in its domain, the causal laws and other explanatory mechanisms in terms of which the phenomena are accounted for, and the concepts in terms of which the phenomena and explanatory apparatus are expressed. (Carey, 1985)

30 Causal theory Causal graphical model Observed data

31 Specific theories versus “framework theories” Wellman (1990; Gelman & Wellman, 1992) Causal theory Causal graphical model Observed data “Specific theories are detailed scientific formulations about a delimited set of phenomena.”

32 Specific theories versus “framework theories” Wellman (1990; Gelman & Wellman, 1992) Causal theory Causal graphical model Observed data “Framework theories outline the ontology and the basic causal systems for their specific theories, thereby defining a coherent form of reasoning about a particular set of phenomena.” “Specific theories are detailed scientific formulations about a delimited set of phenomena.”

33 Component of theory: Ontology Plausible relations Functional form Generates: Variables Structure Conditional probabilities A causal theory is a hypothesis space generator P(h|data)  P(data|h) P(h) Hypotheses are evaluated by Bayesian inference Theory-based causal induction

34 Ontology –Types: Chemical, Gene, Mouse –Predicates: Injected(Chemical,Mouse) Expressed(Gene,Mouse) Theory E C B E = 1 if effect occurs (mouse expresses gene), else 0 C = 1 if cause occurs (mouse is injected), else 0

35 Plausible relations –For any Chemical c and Gene g, with prior probability p: For all Mice m, Injected(c,m)  Expressed(g,m) Theory P(Graph 1) = p P(Graph 0) =1 – p No hypotheses with E C, B C, C B, …. E B C E B C B B

36 Ontology –Types: Chemical, Gene, Mouse –Predicates: Injected(Chemical,Mouse) Expressed(Gene,Mouse) Plausible relations –For any Chemical c and Gene g, with prior probability p : For all Mice m, Injected(c,m)  Expressed(g,m) Functional form of causal relations Theory

37 Functional form Structures: 1 = 0 = Parameterization: E B C E B C C B 0 1 0 0 1 1 1: P(E = 1 | C, B) 0: P(E = 1| C, B) p 00 p 10 p 01 p 11 p0p0p1p1p0p0p1p1 Generic

38 Functional form Structures: 1 = 0 = Parameterization: E B C E B C w0w0 w1w1 w0w0 w 0, w 1 : strength parameters for B, C C B 0 1 0 0 1 1 1: P(E = 1 | C, B) 0: P(E = 1| C, B) 0 w 1 w 0 w 1 + w 0 – w 1 w 0 00w0w000w0w0 “Noisy-OR”

39 Ontology –Types: Chemical, Gene, Mouse –Predicates: Injected(Chemical,Mouse) Expressed(Gene,Mouse) Constraints on causal relations –For any Chemical c and Gene g, with prior probability p: For all Mice m, Injected(c,m)  Expressed(g,m) Functional form of causal relations –Causes of Expressed(g,m) are independent probabilistic mechanisms, with causal strengths w i. An independent background cause is always present with strength w 0. Theory

40 Evaluating a causal relationship P(Graph 1) = p P(Graph 0) =1 – p E B C E B C B B P(Graph 1|D) = P(D|Graph 1) P(Graph 1)  i P(D|Graph i) P(Graph i)

41 Humans Bayesian PP Causal power (Cheng, 1997) 

42 Generativity is essential Predictions result from “ceiling effect” –ceiling effects only matter if you believe a cause increases the probability of an effect –follows from use of Noisy-OR (after Cheng, 1997) P(e+|c+)P(e+|c+) P(e+|c-)P(e+|c-) 8/8 6/8 4/8 2/8 0/8 Bayesian 100 50 0

43 Noisy-AND-NOT causes decrease probability of their effects Noisy-OR causes increase probability of their effects Generic probability differs across conditions Generativity is essential

44 Humans Noisy-OR Generic Noisy AND-NOT

45 Manipulating functional form Noisy-AND-NOT causes decrease probability of their effects appropriate for preventive causes Noisy-OR causes increase probability of their effects appropriate for generative causes Generic probability differs across conditions appropriate for assessing differences

46 Manipulating functional form Noisy AND-NOT Generic Noisy-OR Generative DifferencePreventive

47 Causal induction from contingency data The simplest case of causal learning: a single cause-effect relationship and plentiful data Nonetheless, exhibits complex effects of prior knowledge (in the assumed functional form) These effects reflect appropriate causal theories

48 Outline contingency data physical systems perceived causality

49 AB The stick-ball machine (Kushnir, Schulz, Gopnik, & Danks, 2003)

50 Inferring hidden causal structure Can people accurately infer hidden causal structure from small amounts of data? Kushnir et al. (2003): four kinds of structure A causes BB causes A common causeseparate causes

51 Inferring hidden causal structure Common unobserved cause 4 x 2 x (Kushnir, Schulz, Gopnik, & Danks, 2003) A causes BB causes A common causeseparate causes

52 Inferring hidden causal structure Common unobserved cause 4 x 2 x Independent unobserved causes 1 x 2 x (Kushnir, Schulz, Gopnik, & Danks, 2003) A causes BB causes A common causeseparate causes

53 Inferring hidden causal structure Common unobserved cause 4 x 2 x Independent unobserved causes 1 x 2 x One observed cause 2 x4 x (Kushnir, Schulz, Gopnik, & Danks, 2003) A causes BB causes A common causeseparate causes

54 Common unobserved cause Independent unobserved causes One observed cause Probability separatecommonA causes BB causes A separatecommonA causes BB causes A separatecommonA causes BB causes A A causes BB causes A common causeseparate causes

55 Ontology –Types: Ball, HiddenCause, Trial –Predicates: Moves(Ball, Trial), Active(HiddenCause, Trial) Plausible relations –For any Ball a and Ball b (a  b), with prior probability p: For all Trials t, Moves(a,t)  Moves(b,t) –For some HiddenCause h and Ball b, with prior probability q: For all Trials t, Active(h,t)  Moves(b,t) Functional form of causal relations –Causes result in Moves(b,t) with probability . Otherwise, Moves(b,t) occurs with probability 0. –Active(h,t) occurs with probability . Theory

56

57 Hypotheses (  ) 2  (1-  ) (1-  )   (1-  ) (1-  )  2  (1-  )  (1-  )   (1-  ) (1-  )  2  (1-  ) 0  (1-  ) (1-  )  2 0  (1-  )  (1-  ) (1-  )

58 Independent unobserved causes One observed cause Probability separatecommonA causes BB causes A separatecommonA causes BB causes A separatecommonA causes BB causes A Common unobserved cause A causes BB causes A common causeseparate causes

59 Other physical systems From blicket detectors… …to lemur colonies Oooh, it’s a blicket!

60 Outline contingency data physical systems perceived causality

61 Michotte (1963) Affected by… –timing of events –velocity of balls –proximity

62 Nitro X Affected by… –timing of events –velocity of balls –proximity (joint work with Liz Baraff)

63

64

65

66 Test trials Show explosions involving multiple cans –allows inferences about causal structure For each trial, choose one of: –chain reaction –spontaneous explosions –other

67

68

69

70 Ontology –Types: Can, HiddenCause –Predicates: ExplosionTime(Can), ActivationTime(HiddenCause) Constraints on causal relations –For any Can y and Can x, with prior probability 1: ExplosionTime(y)  ExplosionTime(x) –For some HiddenCause c and Can x, with prior probability 1: ActivationTime(c)  ExplosionTime(x) Functional form of causal relations –Explosion at ActivationTime(c), and after appropriate delay from ExplosionTime(y) with probability set by . Otherwise explosions occur with probability 0. –Low probability of hidden causes activating. Theory

71 Using the theory

72 What kind of explosive is this?

73

74 spontaneity volatility rate

75 Using the theory What kind of explosive is this? What caused what?

76 Using the theory What kind of explosive is this? What caused what? What is the causal structure?

77

78 Testing a prediction of the theory Evidence for a hidden cause should increase with the number of simultaneous explosions Four groups of 16 participants saw displays using m = 2, 3, 4, or 6 cans For each trial, choose one of: –chain reaction –spontaneous explosions –other coded for reference to hidden cause

79  2 (3) = 11.36, p <.01 Number of canisters Probability of hidden cause Gradual transition from few to most identifying hidden cause

80 Further predictions Explains chain reaction inferences Attribution of causality should be sensitive to interaction between time and distance Simultaneous explosions that occur sooner provide stronger evidence for common cause

81 Three kinds of causal induction contingency data physical systems perceived causality more constrainedless constrained prior knowledge + statistical inference more data less data

82 Combining knowledge and statistics How do people... –identify causal relationships from small samples? –learn hidden causal structure with ease? –reason about complex dynamic causal systems? Constraints from knowledge + powerful statistics Key ideas: –prior knowledge expressed in causal theory –theory generates hypothesis space for inference

83 Further questions Are there unifying principles across theories?

84 Stick-balls: –Causes result in Moves(b,t) with probability . Otherwise, Moves(b,t) occurs with probability 0. Nitro X: –Explosion at ActivationTime(c), and after appropriate delay from ExplosionTime(y), with probability set by  Otherwise explosions occur with probability 0. Functional form 1. Each force acting on a system has an opportunity to change its state 2. Without external influence a system will not change its state

85 Further questions Are there unifying principles across theories? How are theories learned?

86 Learning causal theories Theory Ontology Plausible relations Functional form X Y Z X Y Z X Y Z X Y Z Hypothesis space generates Bayesian inference Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0... Data generates

87 Learning causal theories Theory Ontology Plausible relations Functional form X Y Z X Y Z X Y Z X Y Z Hypothesis space generates Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0... Data generates

88 Learning causal theories Theory Ontology Plausible relations Functional form Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0... Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0... Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0... Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0... Bayesian inference X Y Z X Y Z X Y Z X Y Z Hypothesis space generates Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0... Data generates

89 Further questions Are there unifying principles across theories? How are theories learned? What is an appropriate prior over theories?

90

91 Causal induction with rates Different functional form results in models that apply to different kinds of data Rate: number of times effect occurs in time interval, in presence and absence of cause Does the electric field cause the mineral to emit particles?

92 Ontology –Types: Mineral, Field, Time –Predicates: Emitted(Mineral,Time), Active(Field,Time) Plausible relations –For any Mineral m and Field f, with prior probability p:For all Times t, Active(f,t)  Emitted(m,t) Functional form of causal relations –Causes of Emitted(m,t) are independent probabilistic mechanisms, with causal strengths w i. An independent background cause is always present with strength w 0. –Implies number of emissions is a Poisson process, with rate at time t given by w 0 + Active(f,t) w 1. Theory

93 Humans  R Bayesian Causal induction with rates Power (N = 150)

94

95

96

97 Learning causal theories T 1 : bacteria die at random T 2 : bacteria die at random, or in waves P(wave|T 2 ) > P(wave|T 1 ) Having inferred the existence of a new force, need to find a mechanism...

98

99 Lemur colonies A researcher in Madagascar is studying the effects of environmental resources on the location of lemur colonies. She has studied twelve different parts of Madagascar, and is trying to establish which areas show evidence of being affected by the distribution of resources in order to decide where she should focus her research.

100

101 (uniform) Spread Location Ratio Number Change in... Human data

102 Ontology –Types: Colony, Resource –Predicates: Location(Colony), Location(Resource) Plausible relations –For any Colony c and Resource r, with probability p: Location(r)  Location(c) Functional form of causal relations –Without a hidden cause, Location(c) is uniform –With a hidden cause r, Location(c) is Gaussian with mean Location(r) and covariance matrix  –Location(r) is uniform Theory

103 Is there a resource? C x x x x x x x x x x No: Yes: uniform + regularity sum over all structures sum over all regularities

104 (uniform) Spread Location Ratio Number Change in... Human dataBayesian

105

106 A B C E 1 0 0 0 0 1 0 0 0 0 1 1 1 1 Schulz & Gopnik (in press)

107 A B C E Biology Ahchoo! 1 0 0 0 0 1 0 0 0 0 1 1 1 1 Schulz & Gopnik (in press)

108 A B C E BiologyPsychology Ahchoo! Eek! 1 0 0 0 0 1 0 0 0 0 1 1 1 1

109 A theory of sneezing –a flower is a cause with probability –no sneezing without a cause –causes each produce sneezing with probability  A theory of fear –an animal is a cause with probability –no fear without a cause –a cause produces fear with probability  Common functional form

110 A B C E 1 0 0 0 0 1 0 0 0 0 1 1 1 1 Children: choose just C, never just A or just B

111 Common functional form A B C E 1 0 0 0 0 1 0 0 0 0 1 1 1 1 Children: choose just C, never just A or just B A BC E (1- ) 3 (1- ) 2 2 (1- ) 3

112 Common functional form A B C E 1 0 0 0 0 1 0 0 0 0 1 1 1 1 Children: choose just C, never just A or just B Bayes: just C is preferred, never just A or just B (1- ) 2 2 (1- ) 3

113 Inter-domain causation Physical: noise-making machine –A & B are magnetic buttons, C is talking Psychological: confederate giggling –A & B are silly faces, C is a switch Procedure: –baseline: which could be causes? –trials: same contingencies as Experiment 3 –test: which are causes? (Schulz & Gopnik, in press, Experiment 4)

114 Inter-domain causation A theory with inter-domain causes –intra-domain entities are causes with probability 1 –inter-domain entities are causes with probability 0 –no effect occurs without a cause –causes produce effects with probability  Lower prior probability for inter-domain causes (i.e. 0 much lower than 1 )

115 A problem with priors? If lack of mechanism results in lower prior probability, shouldn’t inferences change? Intra-domain causes (Experiment 3): –biological: 78% took C –psychological: 67% took C Inter-domain causes (Experiment 4): –physics: 75% took C –psychological: 81% took C

116 A B C E 1 0 0 0 0 1 0 0 0 0 1 1 1 1 A BC E (1- 0 )(1- 1 ) 2 0 (1- 1 ) 2 0 1 (1- 1 ) 0 1 2 (1- 0 )(1- 1 ) 1 (1- 0 ) 1 2

117 A B C E 1 0 0 0 0 1 0 0 0 0 1 1 1 1 0 (1- 1 ) 2 0 1 (1- 1 ) 0 1 2

118 A B C E 1 0 0 0 0 1 0 0 0 0 1 1 1 1 0 (1- 1 ) 2 0 1 (1- 1 ) 0 1 2

119 A direct test of inter-domain priors Ambiguous causes: –A and C together produce E –B and C together produce E –A and B and C together produce E For C intra-domain, choose C (Sobel et al., in press) For C inter-domain, should choose A and B

120

121 The plausibility matrix Grounded predicates Plausibility of relation Identifies plausible causal graphs

122 Injected(c1)1 1 1 Injected(c2)1 1 1 Injected(c3)1 1 1 Expressed(g1) Expressed(g2) Expressed(g3) Injected(c1) Injected(c2) Injected(c3) Expressed(g1) Expressed(g2) Expressed(g3) Entities: c1, c2, c3, g1, g2, g3 Predicates: Injected, Expressed M =

123 The Chomsky hierarchy Languages Type 0 (computable) Type 1 (context sensitive) Type 2 (context free) Type 3 (regular) Machines Turing machine Bounded TM Push-down automaton Finite state automaton (Chomsky, 1956) Languages in each class a strict subset of higher classes

124 Grammaticality and plausibility Grammar: –indicates admissibility of (infinitely many) sentences generated from terminals Theory: –indicates plausibility of (infinitely many) relations generated from grounded predicates sentences predicates

125


Download ppt "Theory-based causal induction Tom Griffiths Brown University Josh Tenenbaum MIT."

Similar presentations


Ads by Google