Download presentation
Presentation is loading. Please wait.
Published byJason Hopkins Modified over 6 years ago
1
Workshop Files
2
Center for Causal Discovery: Summer Short Course/Datathon - 2018
June 11-15, 2018 Carnegie Mellon University
3
Goals Basic working knowledge of graphical causal models
Basic working knowledge of Tetrad 6 Basic understanding of search algorithms “Fully started” on using CCD algorithms/tools on real data, preferably your own. Grow the community of researchers, users, and students interested in causal discovery in biomedical research
4
Monday: Basics of Graphical Causal Models, Tetrad
Morning: 9 AM – Noon, Baker Hall A51 : Giant Eagle Auditorium Introduction – Greg Cooper Representing/Modeling Causal Systems Causal Graphs/Interventions Parametric Models Instantiated Models Afternoon: 1:30 PM – 4 PM, Baker Hall A51 : Giant Eagle Auditorium Estimation, Inference, and Model fit Real data examples: Charitable Giving College Plans Dinner: On your own
5
Tuesday: Basics of Search, Break-out Sessions
Morning: 9 AM – Noon, Baker Hall A51 : Giant Eagle Auditorium Searching for Causal Systems Model Equivalence Basic Search Algorithms Hands-on Work Afternoon: 1:30 PM – 4 PM, Baker Hall A51 breakout rooms CCD Software and Bridges Supercomputer (Jeremy Espino) Break-out Sessions 1: Brain/fMRI Cancer Protein signaling: single cell cytometry 4 – 6 PM - Reception: Hors d’oeuvres etc., Lobby outside A51 Dinner: On your own
6
Wednesday: Latent Variables, etc., Break-out Sessions
Morning: 9 AM – Noon, Baker Hall A51 : Giant Eagle Auditorium Latent Variable Model Search Mixed Data (Continuous and Discrete) Afternoon: 1:30 PM – 3:30 PM, Baker Hall A51 breakout rooms Break-out Sessions – 2 Dinner: On your own
7
Thursday: Research Area Overviews, Datathon
Morning: 9 AM – Noon, Baker Hall A51 : Giant Eagle Auditorium fMRI – Brain Cancer: Genomic Drivers Protein Signaling Pathways: Single Cell Data Consultation sign-up Short Course : Ends at Noon Datathon – Noon – 8 PM, Baker Hall A51: Giant Eagle Auditorium
8
Friday: DataThon Morning: 9 AM – Noon, Baker Hall A51 : Giant Eagle Auditorium Breakfast with Q&A (9 -10 AM) Data Hacking (10-12) Lunch, 12-1: on your own Afternoon: 1:00 PM – 4 PM, Giant Eagle Auditorium: Datathon 1:00 – 3:00 Data Hacking & Visualization 3:00 Participant Presentations
9
Questions?
10
Overview of CCD Greg Cooper, MD, PhD
11
Causal Discovery
12
25 Years of Advances 1992 2017: tremendous progress in:
Mathematically representing causal networks Discovery algorithms for finding causal networks from a combination of data and knowledge. These methods apply to biomedical data. Statistics, Computer Science, Philosophy, etc. America, Europe, Japan
13
Modern Theory of Statistical Causal Models
Graphical Models Intervention & Manipulation Modern Theory of Statistical Causal Models Potential Outcome Models Testable Constraints (e.g., Independence) Counterfactuals 1
14
Causal Inference Requires More than Probability
Prediction from Observation ≠ Prediction from Intervention P(Lung Cancer 1960 = yes | Tar-stained fingers = yes) ≠ P(Lung Cancer 1960 = yes | Tar-stained fingers do(1950 = yes)) In general: P(Y=y | X=x, Z=z) ≠ P(Y=y | do(X=x), Z=z)) Causal Prediction vs. Statistical Prediction: Non-experimental data (observational study) Background Knowledge P(Y,X,Z) P(Y=y | X=x, Z=z) Causal Structure P(Y=y | do(X=x, Z=z)
15
P(Stained Teeth = yes) =
.5
16
P(Smoker= yes) = .5
17
P(Lung Cancer = yes) = .5
18
Conditioning on Stained Teeth = no
19
P(Lung Cancer = yes | Stained Teeth = no) =
.25
20
Lung Cancer _||_ Stained Teeth
21
Manipulating White Teeth
22
Pm(Lung cancer = yes | do(Stained Teeth= no)) =
.5
23
Lung Cancer _||_m Stained Teeth
P(LC = yes) = = Pm(LC= yes | do(Stained Teeth = no)) = .5 Lung Cancer _||_m Stained Teeth
25
Causal Estimation vs. Causal Search
Estimation (Potential Outcomes) Causal Question: Effect of Zidovudine on Survival among HIV-positive men (Hernan, et al., 2000) Problem: confounders (CD4 lymphocyte count) vary over time, and they are dependent on previous treatment with Zidovudine Estimation method discussed: marginal structural models Assumptions: Treatment measured reliably Measured covariates sufficient to capture major sources of confounding Model of treatment given the past is accurate Output: Effect estimate with confidence intervals Fundamental Problem: estimation/inference is conditional on the model 1
26
Causal Estimation vs. Causal Search
Search (Causal Graphical Models) Causal Question: which genes regulate flowering in Arbidopsis Problem: over 25,000 potential genes. Method: graphical model search Assumptions: RNA microarray measurement reasonable proxy for gene expression Causal Markov assumption Etc. Output: Suggestions for follow-up experiments Fundamental Problem: model space grows super-exponentially with the number of variables 1
27
Number of variables (nodes)
The Number of Causal Model Structures as a Function of the Number of Measured Variables* Number of variables (nodes) Number of Causal Model Structures 1 2 3 25 4 543 5 29,281 6 3,781,503 7 1.1 x 109 8 7.8 x 1011 9 1.2 x 1015 10 4.2 x 1018 * Assumes there are no latent variables and no directed cycles.
28
Causal Search Causal Search:
Find/compute all the causal models that are indistinguishable given background knowledge and data Represent features common to all such models Multiple Regression is often the wrong tool for Causal Search: Example: Foreign Investment & Democracy
29
Foreign Investment Does Foreign Investment in 3rd World Countries inhibit Democracy? Timberlake, M. and Williams, K. (1984). Dependence, political exclusion, and government repression: Some cross-national evidence. American Sociological Review 49, N = 72 PO degree of political exclusivity CV lack of civil liberties EN energy consumption per capita (economic development) FI level of foreign investment 1
30
Foreign Investment Correlations po fi en cv po 1.0 fi -.175 1.0
31
Case Study: Foreign Investment
Regression Results po = *fi *en *cv SE (.058) (.059) (.060) t P Interpretation: foreign investment increases political repression 1
32
Case Study: Foreign Investment Alternative Models
There is no model with testable constraints (df > 0) that is not rejected by the data, in which FI has a positive effect on PO.
33
Outline Representing/Modeling Causal Systems Causal Graphs
Parametric Models Bayes Nets Structural Equation Models Generalized SEMs
34
Causal Graphs Causal Graph G = {V,E}
Each edge X Y represents a direct causal claim: X is a direct cause of Y relative to V Years of Education Income 1. don’t define causality - but will introduce axioms to connect probability to causality 2. many fields proceed without agreement on definition - probability, “force” in mechanics, interpretation of quantum mechanics, etc. 3. a number of different kinds of graphs represent probability distributions and independence - advantage of directed graphs is also represents causal relations 4. will introduce several extensions Income Skills and Knowledge Years of Education
35
Omitteed Common Causes
Causal Graphs Omitteed Causes Not Cause Complete Income Skills and Knowledge Years of Education Common Cause Complete Omitteed Common Causes 1. don’t define causality - but will introduce axioms to connect probability to causality 2. many fields proceed without agreement on definition - probability, “force” in mechanics, interpretation of quantum mechanics, etc. 3. a number of different kinds of graphs represent probability distributions and independence - advantage of directed graphs is also represents causal relations 4. will introduce several extensions Income Skills and Knowledge Years of Education
36
Tetrad: Complete Causal Modeling Tool
37
Tetrad Demo & Hands-On Smoking Stained_Teeth LC
Build and Save two acyclic causal graphs: Build the Smoking graph picture above Build your own graph with 4 variables Build 2 causal graphs - one for smoking, yf, lc
38
Modeling Ideal Interventions
Interventions on the Effect Post Pre-experimental System Room Temperature Sweaters On
39
Modeling Ideal Interventions
Interventions on the Cause Post Pre-experimental System Sweaters On Room Temperature
40
Interventions & Causal Graphs
Model an ideal intervention by adding an “intervention” variable outside the original system as a direct cause of its target. Pre-intervention graph Intervene on Income “Hard” Intervention Fat Hand - intervention - cholesterol drug -- arythmia “Soft” Intervention
41
Interventions & Causal Graphs
X5 X2 X1 Pre-intervention Graph X4 X3 X6 Intervention: hard intervention on both X1, X4 Soft intervention on X3 X1 X2 X3 X4 X6 X5 I S Fat Hand - intervention - cholesterol drug -- arythmia Post-Intervention Graph?
42
Interventions & Causal Graphs
X5 X2 X1 Pre-intervention Graph X4 X3 X6 Intervention: hard intervention on both X1, X4 Soft intervention on X3 X1 X2 X3 X4 X6 X5 I S Fat Hand - intervention - cholesterol drug -- arythmia Post-Intervention Graph?
43
Interventions & Causal Graphs
X5 X2 X1 Pre-intervention Graph X4 X3 X6 Intervention: hard intervention on X3 Soft interventions on X6, X4 I S X1 X2 X3 X4 X6 X5 Fat Hand - intervention - cholesterol drug -- arythmia Post-Intervention Graph?
44
Interventions & Causal Graphs
Smoking Pre-intervention Graph Stained_Teeth LC Trek between Stained_Teeth and LC In Pre-Intervention Graph? Treks Association Yes Stained_Teeth _||_ LC Smoking Paint Teeth White Stained_Teeth LC Fat Hand - intervention - cholesterol drug -- arythmia Trek between Stained_Teeth and LC In Post-Intervention Graph? Treks Association No Stained_Teeth _||_m LC
45
Parametric Models
46
Instantiated Models
47
Causal Bayes Networks The Joint Distribution Factors
According to the Causal Graph, P(S,YF, L) = P(S) P(YF | S) P(LC | S)
48
Causal Bayes Networks The Joint Distribution Factors
According to the Causal Graph, P(S) P(YF | S) P(LC | S) = f() All variables binary [0,1]: = {1, 2,3,4,5, } P(S = 0) = 1 P(S = 1) = 1 - 1 P(YF = 0 | S = 0) = 2 P(LC = 0 | S = 0) = 4 P(YF = 1 | S = 0) = 1- 2 P(LC = 1 | S = 0) = 1- 4 P(YF = 0 | S = 1) = 3 P(LC = 0 | S = 1) = 5 P(YF = 1 | S = 1) = 1- 3 P(LC = 1 | S = 1) = 1- 5
49
Causal Bayes Networks The Joint Distribution Factors
According to the Causal Graph, P(S,YF, LC) = P(S) P(YF | S) P(LC | S) = f() All variables binary [0,1]: = {1, 2,3,4,5, } P(S,YF, LC) = P(S) P(YF | S) P(LC | YF, S) = f() All variables binary [0,1]: = {1, 2,3,4,5, 6,7, }
50
Causal Bayes Networks The Joint Distribution Factors
According to the Causal Graph, P(S,YF, L) = P(S) P(YF | S) P(LC | S) P(S = 0) = .7 P(S = 1) = .3 P(YF = 0 | S = 0) = .99 P(LC = 0 | S = 0) = .95 P(YF = 1 | S = 0) = .01 P(LC = 1 | S = 0) = .05 P(YF = 0 | S = 1) = .20 P(LC = 0 | S = 1) = .80 P(YF = 1 | S = 1) = .80 P(LC = 1 | S = 1) = .20 P(S=1,YF=1, LC=1) = ?
51
Causal Bayes Networks The Joint Distribution Factors
According to the Causal Graph, P(S,YF, L) = P(S) P(YF | S) P(LC | S) P(S = 0) = .7 P(S = 1) = .3 P(YF = 0 | S = 0) = .99 P(LC = 0 | S = 0) = .95 P(YF = 1 | S = 0) = .01 P(LC = 1 | S = 0) = .05 P(YF = 0 | S = 1) = .20 P(LC = 0 | S = 1) = .80 P(YF = 1 | S = 1) = .80 P(LC = 1 | S = 1) = .20 P(S=1,YF=1, LC=1) = P(S=1) P(YF=1 | S=1) P(LC = 1 | S=1) P(S=1,YF=1, LC=1) = * * .20 = .048
52
Calculating the effect of a hard interventions
P(YF,S,L) = P(S) P(YF|S) P(L|S) Pm (YF,S,L) = P(S) P(L|S) P(YF| I)
53
Calculating the effect of a hard intervention
P(S,YF, L) = P(S) P(YF | S) P(LC | S) P(S=1,YF=1, LC=1) = * * = .048 P(YF =1 | I ) = .5 Pm (S=1,do(YF=1), LC=1) = ? Pm (S=1, do(YF=1), LC=1) = P(S) P(YF | I) P(LC | S) Pm (S=1, do(YF=1), LC=1) = .3 * * = .03
54
Calculating the effect of a soft intervention
P(YF,S,L) = P(S) P(YF|S) P(L|S) Pm (YF,S,L) = P(S) P(L|S) P(YF| S, Soft)
55
Tetrad Demo & Hands-On Use the DAG you built for Smoking, YF, and LC
Define the Bayes PM (# and values of categories for each variable) Attach a Bayes IM to the Bayes PM Fill in the Conditional Probability Tables (make the values plausible). Build 2 causal graphs - one for smoking, yf, lc
56
Updating
57
Tetrad Demo Use the IM just built of Smoking, YF, LC
Update LC on evidence: YF = yes Update LC on evidence: do(YF = yes)
58
Structural Equation Models
Causal Graph Structural Equations For each variable X V, an assignment equation: X := fX(immediate-causes(X), eX) 1. example of recursive structural equation model without correlated errors 2. can show that assumption of independence of errors guarantees correctness of probabilitic interpretation 3. this represents both probability and causality Exogenous Distribution: Joint distribution over the exogenous vars : P(e)
59
Linear Structural Equation Models
Path diagram Causal Graph Equations: Education := Education Income :=Educationincome Longevity :=EducationLongevity Exogenous Distribution: P(ed, Income,Income ) - i≠j ei ej (pairwise independence) - no variance is zero 1. example of recursive structural equation model without correlated errors 2. can show that assumption of independence of errors guarantees correctness of probabilitic interpretation 3. this represents both probability and causality Structural Equation Model: V = BV + E E.g. (ed, Income,Income ) ~N(0,2) 2 diagonal, - no variance is zero
60
Hands On Parameterize a DAG as a SEM. Add an IM
Make all coefficients positive Examine “Implied Covariance Matrix” and “Implied Correlation Matrix” Attach another IM Standardized SEM
61
Lunch
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.