Tetrad 1)Main website: 2)Download:

Slides:



Advertisements
Similar presentations
1 Learning Causal Structure from Observational and Experimental Data Richard Scheines Carnegie Mellon University.
Advertisements

Structural Equation Modeling
Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.
Topic Outline Motivation Representing/Modeling Causal Systems
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.
Challenges posed by Structural Equation Models Thomas Richardson Department of Statistics University of Washington Joint work with Mathias Drton, UC Berkeley.
1 Automatic Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour Dept. of Philosophy & CALD Carnegie Mellon.
 Confounders are usually controlled with the “standard” response regression model.  The standard model includes confounders as covariates in the response.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Chapter 12 Simple Regression
Statistics for Business and Economics
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Final Review Session.
Ambiguous Manipulations
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
1 gR2002 Peter Spirtes Carnegie Mellon University.
Linear Regression Example Data
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Structural Equation Modeling Intro to SEM Psy 524 Ainsworth.
1 Center for Causal Discovery: Summer Workshop June 8-11, 2015 Carnegie Mellon University.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
CAUSAL SEARCH IN THE REAL WORLD. A menu of topics  Some real-world challenges:  Convergence & error bounds  Sample selection bias  Simpson’s paradox.
Bayes Net Perspectives on Causation and Causal Inference
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 11 Simple Regression
1 Tetrad: Machine Learning and Graphcial Causal Models Richard Scheines Joe Ramsey Carnegie Mellon University Peter Spirtes, Clark Glymour.
Simple Linear Regression
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
1 Tutorial: Causal Model Search Richard Scheines Carnegie Mellon University Peter Spirtes, Clark Glymour, Joe Ramsey, others.
1 Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
1 Center for Causal Discovery: Summer Workshop June 8-11, 2015 Carnegie Mellon University.
Nov. 13th, Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour, and many others Dept. of Philosophy & CALD Carnegie Mellon.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Methodological Problems in Cognitive Psychology David Danks Institute for Human & Machine Cognition January 10, 2003.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
CJT 765: Structural Equation Modeling Class 12: Wrap Up: Latent Growth Models, Pitfalls, Critique and Future Directions for SEM.
Penn State - March 23, The TETRAD Project: Computational Aids to Causal Discovery Peter Spirtes, Clark Glymour, Richard Scheines and many others.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
Lecture 2: Statistical learning primer for biologists
Mediation: The Causal Inference Approach David A. Kenny.
SUMMARY EQT 271 MADAM SITI AISYAH ZAKARIA SEMESTER /2015.
CJT 765: Structural Equation Modeling Final Lecture: Multiple-Group Models, a Word about Latent Growth Models, Pitfalls, Critique and Future Directions.
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.
INFERENCE FOR BIG DATA Mike Daniels The University of Texas at Austin Department of Statistics & Data Sciences Department of Integrative Biology.
Chapter 13 Simple Linear Regression
Workshop Files
Workshop Files
A Short Tutorial on Causal Network Modeling and Discovery
Center for Causal Discovery: Summer Short Course/Datathon
Center for Causal Discovery: Summer Short Course/Datathon
Causal Data Mining Richard Scheines
Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour,
Extra Slides.
Causal Models Lecture 12.
Searching for Graphical Causal Models of Education Data
Counterfactual models Time dependent confounding
Structural Equation Modeling
Presentation transcript:

Tetrad 1)Main website: 2)Download: a)JNLP version: Tetrad 5.3.0Tetrad b)Jar file: Tetrad (6/10/2016 Version 1)Tetrad )Data files: 1

2 Center for Causal Discovery: Summer Short Course/Datathon June 13-18, 2015 Carnegie Mellon University

Goals 1)Basic working knowledge of graphical causal models 2)Basic working knowledge of Tetrad V 3)Basic understanding of search algorithms 4)“Fully started” on using CCD algorithms/tools on real data, preferably your own. 5)Provide us with useful feedback on: 1)The intro to graphical models/search with Tetrad segments 2)The breakout sessions 3)Follow up after the workshop: integrating CCD tools into your own research 6)Form community of researchers, users, and students interested in causal discovery in biomedical research 3

Monday: Basics of Graphical Causal Models, Tetrad Morning: 9 AM – Noon, Baker Hall A51 : Giant Eagle Auditorium 1.Introduction 2.Representing/Modeling Causal Systems a)Causal Graphs/Interventions b)Parametric Models c)Instantiated Models Afternoon: 1:30 PM – 4 PM, Baker Hall A51 : Giant Eagle Auditorium 1.Estimation, Inference, and Model fit 2.Case Study: Charitable Giving Dinner: On your own 4

Tuesday: Basics of Search, Break-out Sessions Morning: 9 AM – Noon, Baker Hall A51 : Giant Eagle Auditorium 1.D-separation & Model Equivalence 2.Searching for Causal Systems Afternoon: 1:30 PM – 4 PM, Baker Hall A51  breakout rooms 1.Break-out Session 1: A.Brain/fMRI B.Cancer C.Lung Disease Dinner: On your own 5

Wednesday: Latent Variables, etc., Break-out Sessions Morning: 9 AM – Noon, Baker Hall A51 : Giant Eagle Auditorium 1.Latent Variable Model Search 2.Measurement Afternoon: 1:30 PM – 3:30 PM, Baker Hall A51  breakout rooms 1.Break-out Session 2 Evening: O’Hara Student Center (Pitt), 2 nd Floor Ballroom 1.5:30 – 6:15 Poster Session 2.6:15 – 8:00 Dinner (keynote speaker: Greg Cooper) 6

Thursday: Research Area Overviews, Break-out Sessions Morning: 9 AM – Noon, Baker Hall A51 : Giant Eagle Auditorium 1.fMRI – Brain 2.Cancer: Genomic Drivers 3.Lung Disease Pathways 4.Genetic Regulatory Network Search Afternoon: 1:30 PM – 4 PM, Baker Hall A51  breakout rooms 1.Break-out Sessions 3 Dinner: On your own 7

Friday: Wrap-up, DataThon Morning: 9 AM – Noon, Baker Hall A51 : Giant Eagle Auditorium 1.Break-out Group Reports 2.General Debrief Q&A 3.Evaluations Afternoon: 1:30 PM – 4 PM, Giant Eagle Auditorium: Datathon 1:00 Intro 1:30 Team Introductions 2:00 Data Prep 3:00 Supercomputing Resources 3:30 – 6:00 Data Analysis Dinner: 6-8 PM: Pizza 8

Saturday: DataThon Morning: 9 AM – Noon, Baker Hall A51 : Giant Eagle Auditorium 1.9 AM: Breakfast and Q&A 2.10AM – Noon: Data hacking Noon – 1 PM: Lunch: on your own Afternoon: 1:00 PM – 3 PM, Giant Eagle Auditorium 1:00 – 3:00: Data Hacking 3:00: Participant Presentations 9

Questions? 10

Causation and Statistics 11 Francis Bacon Galileo Galilei Charles Spearman Udny Yule Sewall Wright Sir Ronald A. Fisher Jerzy Neyman … Judea Pearl Potential Outcomes Don Rubin Jamie Robins Graphical Causal Models

Modern Theory of Statistical Causal Models Counterfactuals Testable Constraints (e.g., Independence) Graphical Models Intervention & Manipulation Potential Outcome Models

Causal Inference Requires More than Probability In general: P(Y=y | X=x, Z=z) ≠ P(Y=y | X set =x, Z=z) Prediction from Observation ≠ Prediction from Intervention P(Lung Cancer 1960 = y | Tar-stained fingers 1950 = no) Causal Prediction vs. Statistical Prediction: Non-experimental data (observational study) Background Knowledge P(Y,X,Z) P(Y=y | X=x, Z=z) Causal Structure P(Y=y | X set =x, Z=z) ≠ P(Lung Cancer 1960 = y | Tar-stained fingers 1950 set = no) 13

Estimation vs. Search Estimation (Potential Outcomes) Causal Question: Effect of Zidovudine on Survival among HIV-positive men (Hernan, et al., 2000) Problem: confounders (CD4 lymphocyte count) vary over time, and they are dependent on previous treatment with Zidovudine Estimation method discussed: marginal structural models Assumptions: Treatment measured reliably Measured covariates sufficient to capture major sources of confounding Model of treatment given the past is accurate Output: Effect estimate with confidence intervals Fundamental Problem: estimation/inference is conditional on the model

Estimation vs. Search Search (Causal Graphical Models) Causal Question: which genes regulate flowering in Arbidopsis Problem: over 25,000 potential genes. Method: graphical model search Assumptions: RNA microarray measurement reasonable proxy for gene expression Causal Markov assumption Etc. Output: Suggestions for follow-up experiments Fundamental Problem: model space grows super-exponentially with the number of variables

Causal Search 16 Causal Search: 1.Find/compute all the causal models that are indistinguishable given background knowledge and data 2.Represent features common to all such models Multiple Regression is often the wrong tool for Causal Search: Example: Foreign Investment & Democracy

17 Foreign Investment Does Foreign Investment in 3 rd World Countries inhibit Democracy? Timberlake, M. and Williams, K. (1984). Dependence, political exclusion, and government repression: Some cross-national evidence. American Sociological Review 49, N = 72 POdegree of political exclusivity CVlack of civil liberties ENenergy consumption per capita (economic development) FIlevel of foreign investment

18 Correlations po fi en cv po1.0 fi en cv Foreign Investment

19 Regression Results po =.227*fi -.176*en +.880*cv SE (.058) (.059) (.060) t P Interpretation: foreign investment increases political repression Case Study: Foreign Investment

Case Study: Foreign Investment Alternative Models There is no model with testable constraints (df > 0) that is not rejected by the data, in which FI has a positive effect on PO.

Outline Representing/Modeling Causal Systems 1)Causal Graphs 2)Parametric Models a)Bayes Nets b)Structural Equation Models c)Generalized SEMs 21

22 Causal Graph G = {V,E} Each edge X  Y represents a direct causal claim: X is a direct cause of Y relative to V Causal Graphs Years of Education Income Skills and Knowledge Years of Education

23 Causal Graphs Not Cause Complete Common Cause Complete Income Skills and Knowledge Years of Education Omitteed Causes Omitteed Common Causes Income Skills and Knowledge Years of Education

Tetrad: Complete Causal Modeling Tool 24

Tetrad 1)Main website: 2)Download: a)JNLP version: Tetrad 5.3.0Tetrad b)Jar file: Tetrad (6/10/2016 Version 1)Tetrad )Data files:

26 Tetrad Demo & Hands-On Build and Save two acyclic causal graphs: 1)Build the Smoking graph picture above 2)Build your own graph with 4 variables Smoking YFLC

27 Sweaters On Room Temperature Pre-experimental SystemPost Modeling Ideal Interventions Interventions on the Effect

28 Modeling Ideal Interventions Sweaters On Room Temperature Pre-experimental SystemPost Interventions on the Cause

29 Interventions & Causal Graphs Model an ideal intervention by adding an “intervention” variable outside the original system as a direct cause of its target. Pre-intervention graph Intervene on Income “Soft” Intervention “Hard” Intervention

30 Interventions & Causal Graphs Pre-intervention Graph Post-Intervention Graph? Intervention: hard intervention on both X1, X4 Soft intervention on X3 X1 X2 X3 X4 X6 X5 X1 X2 X3 X4 X6 X5 I I S

31 Interventions & Causal Graphs Pre-intervention Graph Post-Intervention Graph? Intervention: hard intervention on both X1, X4 Soft intervention on X3 X1 X2 X3 X4 X6 X5 X1 X2 X3 X4 X6 X5 I I S

32 Interventions & Causal Graphs Pre-intervention Graph Post-Intervention Graph? Intervention: hard intervention on X3 Soft interventions on X6, X4 X1 X2 X3 X4 X6 X5 I S S X1 X2 X3 X4 X6 X5

33 Parametric Models

34 Instantiated Models

35 Causal Bayes Networks P(S,YF, L) = The Joint Distribution Factors According to the Causal Graph, P(LC | S) P(S) P(YF | S)

36 Causal Bayes Networks P(S = 0) =  1 P(S = 1) = 1 -  1 P(YF = 0 | S = 0) =  2 P(LC = 0 | S = 0) =  4 P(YF = 1 | S = 0) = 1-  2 P(LC = 1 | S = 0) = 1-  4 P(YF = 0 | S = 1) =  3 P(LC = 0 | S = 1) =  5 P(YF = 1 | S = 1) = 1-  3 P(LC = 1 | S = 1) = 1-  5 P(S) P(YF | S) P(LC | S) = f(  ) The Joint Distribution Factors According to the Causal Graph, All variables binary [0,1]:  = {  1,  2,  3,  4,  5, }

37 Causal Bayes Networks P(S,YF, LC) = P(S) P(YF | S) P(LC | S) = f(  ) The Joint Distribution Factors According to the Causal Graph, All variables binary [0,1]:  = {  1,  2,  3,  4,  5, } All variables binary [0,1]:  = P(S,YF, LC) = P(S) P(YF | S) P(LC | YF, S) = f(  ) {  1,  2,  3,  4,  5,  6,  7, }

38 Causal Bayes Networks P(S = 0) =.7 P(S = 1) =.3 P(YF = 0 | S = 0) =.99P(LC = 0 | S = 0) =.95 P(YF = 1 | S = 0) =.01P(LC = 1 | S = 0) =.05 P(YF = 0 | S = 1) =.20P(LC = 0 | S = 1) =.80 P(YF = 1 | S = 1) =.80P(LC = 1 | S = 1) =.20 P(S,YF, L) = P(S) P(YF | S) P(LC | S) P(S=1,YF=1, LC=1) = ? The Joint Distribution Factors According to the Causal Graph,

39 Causal Bayes Networks P(S = 0) =.7 P(S = 1) =.3 P(YF = 0 | S = 0) =.99P(LC = 0 | S = 0) =.95 P(YF = 1 | S = 0) =.01P(LC = 1 | S = 0) =.05 P(YF = 0 | S = 1) =.20P(LC = 0 | S = 1) =.80 P(YF = 1 | S = 1) =.80P(LC = 1 | S = 1) =.20 P(S,YF, L) = P(S) P(YF | S) P(LC | S) P(S=1,YF=1, LC=1) = The Joint Distribution Factors According to the Causal Graph, P(S=1,YF=1, LC=1) =.3 * = *.20 P(LC = 1 | S=1)P(S=1)P(YF=1 | S=1)

P(YF,S,L) = P(S) P(YF|S) P(L|S) P(YF| I) Calculating the effect of a hard interventions P m (YF,S,L) = P(S) P(L|S)

41 P(S,YF, L) = P(S) P(YF | S) P(LC | S) P(S=1,YF=1, LC=1) =.3 *.8 *.2 =.048 P m (S=1,YF set =1, LC=1) = P(S) P(YF | I) P(LC | S) P(YF =1 | I ) =.5 P m (S=1,YF set =1, LC=1) =.3 *.5 *.2 =.03 P m (S=1,YF set =1, LC=1) = ? Calculating the effect of a hard intervention

P(YF,S,L) = P(S) P(YF|S) P(L|S) P(YF| S, Soft) Calculating the effect of a soft intervention P m (YF,S,L) =P(S) P(L|S)

43 Tetrad Demo & Hands-On 1)Use the DAG you built for Smoking, YF, and LC 2)Define the Bayes PM (# and values of categories for each variable) 3)Attach a Bayes IM to the Bayes PM 4)Fill in the Conditional Probability Tables (make the values plausible).

44 Updating

45 Tetrad Demo 1)Use the IM just built of Smoking, YF, LC 2)Update LC on evidence: YF = 1 3)Update LC on evidence: YF set = 1

46 Structural Equation Models zStructural Equations F or each variable X  V, an assignment equation: X := f X (immediate-causes(X),  X ) Causal Graph zExogenous Distribution : Joint distribution over the exogenous vars : P(  )

47 Equations: Education :=  Education Income :=    Education  income Longevity :=    Education  Longevity Causal Graph Path diagram Linear Structural Equation Models E.g. (  ed,  Income,  Income ) ~N(0,  2 )  2 diagonal, - no variance is zero Exogenous Distribution: P(  ed,  Income,  Income ) -  i≠j  i   j (pairwise independence) - no variance is zero Structural Equation Model: V = BV + E

48 Tetrad Demo & Hands-On 1)Attach a SEM PM to your 3-4 variable graph 2)Attach a SEM IM to the SEM PM 3)Change the coefficient values. 4)Attach a Standardized SEM IM to the SEM PM, or the SEM IM

49 Simulated Data

50 Tetrad Demo & Hands-On 1)Simulate Data from both your SEM IM and your Bayes IM

Generalized SEM 1)The Generalized SEM is a generalization of the linear SEM model. 2)Allows for arbitrary connection functions 3)Allows for arbitrary distributions 4)Simulation from cyclic models supported. Causal Graph SEM Equations: Education :=  Education Income :=    Education  income Longevity :=    Education  Longevity P(  ed,  Income,  Income ) ~N(0,  2 ) Generalized SEM Equations: Education :=  Education Income :=    Education 2  income Longevity :=    ln  Education)  Longevity P(  ed,  Income,  Income ) ~U(0,1)

Hands On 1)Create a DAG. 2)Parameterize it as a Generalized SEM. 3)In PM – select from Tools menu “show error terms” Click on error term, change its distribution to Uniform 4)Make at least one function non-linear 5)Make at least one function interactive 6)Save the session as “generalizedSEM”.

Extra Slides 53

A Few Causal Discovery Highlights 54

ASD vs. NT Usual Approach: Search for differential recruitment of brain regions Autism Catherine Hanson, Rutgers

Face processing network Theory of Mind network Action understanding network ASD vs. NT Causal Modeling Approach: Examine connectivity of ROIs

Results FACE TOM ACTION

What was Learned face processing: ASD  NT Theory of Mind:ASD ≠ NT action understanding:ASD ≠ NT when faces involved

Genetic Regulatory Networks Arbidopsis Marloes Maathuis ZTH (Zurich)

Genetic Regulatory Networks Micro-array data ~25,000 variables Causal Discovery Candidate Regulators of Flowering time Greenhouse experiments on flowering time

Genetic Regulatory Networks Which genes affect flowering time in Arabidopsis thaliana? (Stekhoven et al., Bioinformatics, 2012) ~25,000 genes Modification of PC (stability) Among 25 genes in final ranking: 5 known regulators of flowering 20 remaining genes: For 13 of 20, seeds available 9 of 13 yielded replicates 4 of 9 affected flowering time Other techniques are little better than chance

62 Other Applications Educational Research: Online Courses, MOOCs (the “Doer” effect) Cog. Tutors Economics: Causes of Meat Prices, Effects of International Trade Lead and IQ Stress, Depression, Religiosity Climate Change Modeling The Effects of Welfare Reform Etc. !