Decision Analysis Lecture 13

Slides:



Advertisements
Similar presentations
Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Analysis of variance (ANOVA)-the General Linear Model (GLM)
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Introduction of Probabilistic Reasoning and Bayesian Networks
Lecture 23: Tues., Dec. 2 Today: Thursday:
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Statistical Methods Chichang Jou Tamkang University.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Correlation and Regression Analysis
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
The Paradigm of Econometrics Based on Greene’s Note 1.
Bayes Net Perspectives on Causation and Causal Inference
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
Simple Linear Regression
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Correlation & Regression
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
V13: Causality Aims: (1) understand the causal relationships between the variables of a network (2) interpret a Bayesian network as a causal model whose.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Lecture 2: Statistical learning primer for biologists
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
Tutorial I: Missing Value Analysis
Mediation: The Causal Inference Approach David A. Kenny.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Summary: connecting the question to the analysis(es) Jay S. Kaufman, PhD McGill University, Montreal QC 26 February :40 PM – 4:20 PM National Academy.
A Causal Analytics Toolkit (CAT) for air pollution health effects Tony Cox May 5, Download free CAT software from:
Variable selection in Regression modelling Simon Thornley.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Stochasticity and Probability. A new approach to insight Pose question and think of the answer needed to answer it. Ask: How do the data arise? What is.
Methods of Presenting and Interpreting Information Class 9.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Decision Analysis Lecture 12
Math 6330: Statistical Consulting Class 7
Chapter 13 Simple Linear Regression
Evaluating causal relationships: Modernizing the Hill considerations
Math 6330: Statistical Consulting Class 2
Math 6330: Statistical Consulting Class 6
Chapter 4 Basic Estimation Techniques
Chapter 7. Classification and Prediction
CS 2750: Machine Learning Directed Graphical Models
Math 6330: Statistical Consulting Class 5
Inference for Least Squares Lines
Deep Feedforward Networks
Welcome to Seattle, Washington
CJT 765: Structural Equation Modeling
Artificial Intelligence
Elementary Statistics
CS 4/527: Artificial Intelligence
Correlation and Regression
CAP 5636 – Advanced Artificial Intelligence
CHAPTER 26: Inference for Regression
Hidden Markov Models Part 2: Algorithms
6-1 Introduction To Empirical Models
Chapter 10 Correlation and Regression
Center for Causal Discovery: Summer Short Course/Datathon
Propagation Algorithm in Bayesian Networks
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence
Product moment correlation
An Introduction to Correlational Research
Rachael Bedford Mplus: Longitudinal Analysis Workshop 23/06/2015
Presentation transcript:

Decision Analysis Lecture 13 Tony Cox My e-mail: tcoxdenver@aol.com Course web site: http://cox-associates.com/DA/

Five Challenges Learning: How to infer (“learn”) correct DAG models from observational data? Under what conditions is this possible? (“Identifiability”) When possible, how to do it? (Learning algorithms) How to characterize remaining uncertainties? Inference: How to use DAG model to infer probable values of unobserved variables from values of observed variables? Prediction: How to use DAG to predict how changing X would change Y? “Manipulative causality” Attribution: How to use DAG to attribute effects on Y to X? How to define and estimate direct, total, controlled direct, natural direct, natural indirect, and mediated effects of X on Y? Generalization: How to generalize answers from study populations(s) to other populations? “Transportability” or “external validity” question

Key steps Use Bayesian network (BN) learning algorithms to identify plausible causal DAG structures What DAG structure(s) best explain the data? What parts of DAG structure are identified by data? Quantify conditional probability tables (CPTs) using BN learning algorithms or CART trees Estimate direct effects of parents at each node Validate quantified causal BNs Use causal BNs to answer practical questions: Prediction, attribution, optimization, explanation, inference, generalization

Learning causal DAGs from data (“Structure learning”)

Example: What affects what? How much? Which interactions matter? CDC Behavioral Risk Factor Surveillance System (BRFSS), and EPA data, Cox, 2017 www.ncbi.nlm.nih.gov/pubmed/28208075

Data Model Predictions Causal analytics: Algorithms to learn causal models from data, apply them to answer queries Data Causal analytics Model Monte Carlo simulation Predictions

How to get from data to causal predictions… objectively? Probabilistic causal prediction: Doing X will change conditional probability distribution of Y, given covariates Z Goal: Manipulative causation (vs. associational, counterfactual, predictive, computational, etc.) Data: Observed (X, Y, Z) values Challenge: How will changing X change the probabilities of Y values?

Model uncertainty undermines valid causal predictions from data How would cutting exposure concentration C in half affect future response rate R? Community Concentration , C Income, I Response rate, R A 4 100 8 B 60 16 C 12 20 24 Model 1: R = 2C If this is a valid structural equation (causal model), then ∆R = 2∆C The corresponding DAG is: C  R

Non-identifiability: Multiple models fit data but make different predictions How would cutting exposure concentration C in half affect future response rate R? No way to determine from historical data Community Concentration , C Income, I Mortality rate, R A 4 100 8 B 60 16 C 12 20 24 Model 1: R = 2C, (I = 140 – 10C) Model 2: R = 35 – 0.5C – 0.25*I Model 3: R = 28 – 0.2*I, (C = 14 – 0.1*I) So, decreasing C could decrease R, increase it, or leave it unchanged.

Implications Ambiguous associations obscure dependency of outputs on decision inputs, making sound modeling and causal inference more difficult Causal conclusions are not purely data-driven hypothesis  data  conclusion Instead, they conflate data and modeling assumptions hypothesis/model/assumptions  conclusions  data Undermines sound (objective, trustworthy, well-justified, independently repeatable, verifiable) inference Undermined when conclusions rest on untested assumptions Ambiguous associations are common in practice Wanted: A way to reach valid, robust (model-independent) conclusions from data that can be fully specified before seeing the data.

Example: Identifiability of causal DAG structures from data Principle: Effects are not conditionally independent of their direct causes. Can use this as a screen for possible causes in a multivariate database Suppose we had an “oracle” (e.g., a perfect CART tree or BN learning algorithm) for detecting conditional independence Which of these could it distinguish among? X  Z  Y (e.g., exposure  lifestyle  health) Z  X  Y (e.g., lifestyle  exposure  health) X  Y  Z (e.g., exposure  health  lifestyle) X  Y  Z (e.g., exposure  health  lifestyle) X  Z  Y (e.g., exposure  lifestyle  health)

Identifiability of causal DAG structures from data X  Z  Y (e.g., exposure  lifestyle  health) Z  X  Y (e.g., lifestyle  exposure  health) X  Y  Z (e.g., exposure  health  lifestyle) X  Y  Z (e.g., exposure  health  lifestyle) X  Z  Y (e.g., exposure  lifestyle  health) In 1 and 5, but not the rest, X and Y are conditionally independent given Z Markov equivalence class can be identified In 4, but not the rest, X and Z are conditionally independent given Y In 2, but not the rest, Z and Y are conditionally independent given X In 3, X and Z are unconditionally independent but conditionally dependent given Y

Partial identifiability of DAGs based on conditional independence Conditional independence (CI) typically allows some, but not all, DAG structures to be rejected based on data X  Y  Z and X  Y  Z imply same CI Testable implications of DAG structures based on CI can be enumerated using current software such as dagitty (Textor, 2015). http://www.dagitty.net/dags.html http://www.dagitty.net/manual-2.x.pdf

Other principles for identifying DAG structure from data Likelihood principle (bnlearn score-based algs.) Choose DAG model to maximize likelihood of data Composition principle: If X  Y  Z, then dz/dx = (dz/dy)*(dy/dx) Time series analysis: Information flows from causes to their effects over time Transfer entropy, Yin & Yao, 2016, www.nature.com/articles/srep29192 Measurement error analysis in nonlinear models: effect = f(cause) + error LiNGAM software, https://arxiv.org/ftp/arxiv/papers/1408/1408.2038.pdf Homogeneity/invariance principles (Li et al., 2015, https://pdfs.semanticscholar.org/a051/9a2c6b85ca65d0df037142f550cf87d4e43f.pdf )

LiNGAM basic idea Plot of y = f(x) + error looks different from plot of y = f(x + error). The one with well-behaved residuals identifies the correct causal model if it effect = f(cause) + error (and f is nonlinear and/or error is non-Gaussian) Homoskedasticity Heteroskedasticity Shimizu et al., UAI, 2009 https://arxiv.org/ftp/arxiv/papers/1408/1408.2038.pdf

CAT’s DAG structure learning using bnlearn package (default settings) Arrows are not causal: Income and Smoking are not causes of Male and Smoking is not a cause of Hispanic.

Causal DAG learned when Age, Sex, and Hispanic are specified as sources Causal DAG structure emerges from knowledge-based constraints on directions of arrows

Different learning algorithms produce slightly different DAG models Age directly affects Income Use ensemble of DAGs produced using different algorithms and random subsets of data for maximum robustness of conclusions

Different learning algorithms produce slightly different DAG models In all models, Heart disease depends on the same parents and Income and Hispanic are ancestors of PM2.5 and Heart disease.

Discovering DAG structure resolves ambiguous associations How would cutting exposure concentration C in half affect future response rate R? No way to determine from historical data Community Concentration , C Income, I Mortality rate, R A 4 100 8 B 60 16 C 12 20 24 Model 1: R = 2C, (I = 140 – 10C), DAG: I  C  R, I  C  R Model 2: R = 35 – 0.5C – 0.25*I, DAG: C  R  I Model 3: R = 28 – 0.2*I, (C = 14 – 0.1*I), DAG: C  I  R So, decreasing C could decrease R, increase it, or leave it unchanged.

Discovering DAG structure resolves ambiguous associations How would cutting exposure concentration C in half affect future response rate R? No way to determine from historical data Community Concentration , C Income, I Mortality rate, R A 4 100 8 B 60 16 C 12 20 24 Model 1: Income  Conc  Mortality: mortality would be halved Model 2: Conc  Mortality  Income: mortality would increase Model 3: Conc  Income  Mortality: mortality would not change

Validating DAG structures Compare results from multiple algorithms using different principles Conditional independence constraints, likelihood scores, LiNGAM, composition Cross-validation Occam’s Window: Bayesian Model-Averaging ensemble modeling for DAG models Internal consistency checks of effects estimates using multiple adjustment sets

Wrap-up on causal DAG structure-learning Practical algorithms are available now to learn BN DAG structures from data For the example data set, run time < 1second To obtain BNs in which arrows have causal interpretations, knowledge-based constraints must be imposed Which variables are sources or sinks? Any arrows required, allowed, or forbidden? Different DAG-learning algorithms give slightly different DAGs, but main results are robust

Key steps Use Bayesian network (BN) learning algorithms to identify plausible causal DAG structures What DAG structure(s) best explain the data? What parts of DAG structure are identified by data? Quantify conditional probability tables (CPTs) using BN learning algorithms or CART trees Estimate direct effects of parents at each node Validate quantified causal BNs Use causal BNs to answer practical questions: Prediction, attribution, optimization, explanation, inference, generalization

Basics of dagitty software: d-separation and conditional independence http://www.dagitty.net/dags.html http://www.dagitty.net/manual-2.x.pdf

dagitty functions List empirically testable implications (pairwise marginal and conditional independencies involving only observed quantities) Identify (minimal) adjustment sets: What must and must not be conditioned in to estimate dependencies along causal paths

Three connection types (Pearl, 1988; Charniak, 1991) Linear: A  B  C Example: family-out  dog-out  hear-bark A and C are conditionally independent, given B Diverging: A  B  C Example: light-on  family-out  dog-out B is a confounder (of association between A & C) Converging (B is a “collider”): A  B  C Example: family-out  dog-out  bowel problem A and C are “conditionally dependent”, given B! “Explaining away” observed value of B

d-connection A path from A to C is d-connecting (given the facts/observations in E) if every interior node is of one of the three connection types and is: Not in E if it is linear or diverging. Rationale: Don’t create conditional independence! Not A  E  C Not A  E  C In E (or has descendant in E) if it is converging. Rationale: Create dependence via competing explanations Yes to: A  E  C Two nodes with no d-connecting path between them are d-separated.

Interpreting d-connection “Roughly speaking, two nodes are d-connected if there is a causal path between them, or there is evidence that renders the two nodes correlated with each other.” (Charniak, 1991) X and Y are d-connected if and only if they are not conditionally independent of each other given E Causal path: All arrows point from exposure (X) toward response (Y) Much more general than traditional path analysis or linear structural equation model (SEM) Allows nonlinear dependencies, high-order interactions, non-Gaussian errors, inter-individual heterogeneity

Checking d-separation List all undirected paths from X to Y. If any non-colliding vertex on a path is in evidence set E (or conditioned on), then this path is blocked: A  E  C, A  E  C If any collider is not in E (and has no descendant in E), then the path is blocked. A  E  C X and Y are d-separated (no causal influence flows from X to Y) if every undirected path between them is blocked.

Examples of d-separation X  U  V  W  Y X and V are unconditionally independent X and V are dependent, conditioned on U U and W are unconditionally dependent U and W are CI given V X and Y are unconditionally independent X and Y are not CI | U and W

d-connectivity calculations support dagitty functions List empirically testable implications (pairwise marginal and conditional independencies involving only observed quantities) Identify (minimal) adjustment sets: What to condition on and what not to, to estimate dependencies along causal paths Leave paths unblocked to estimate total effects, path coefficients in linear model

Adjustment sets for total effects Key idea: An adjustment set, Z, for estimating the total causal effect of X on Y must block all non-causal paths but no causal paths. X = exposure, Y = response, Z = covariates to adjust for Adjustment set(s) can be computed automatically from DAG Algorithm solves the variable-selection problem for regression modeling If such a set exists, then adjusting for it (by conditioning on variables in this set as covariates) allows the total effect of X on Y to be non-parametrically identified: Adjustment formula: P(y | do(x)) = z in ZP(y | x, z)P(z) P(z) can be obtained from BN solver such as Netica P(y | x, z) can be estimated by CART trees, random forest, etc.

What effects do we want to identify and estimate? X  Y  Z Total effect of X on Y: How would changing value of X from x to x’ change the distribution of Y? What is Pr(y | do(x’))? Direct effect of X on Y: How would changing X from old x to new x’ change the distribution of Y if Z is held fixed? Pr(y | do(x’), do(z)) Controlled direct effect = E(Y | do(x’), do(z)) – E(Y | do(x), do(z)) Pure or natural direct effect: Effect on E(Y) of changing X from x to x’ (meaning from do(x) to do(x’)) while holding Z fixed at the value it had for do(x). Natural indirect effect: Change in E(Y) if X is held fixed at x (i.e., do(x)) and Z adjusts to the values it would have for do(x’) Mediated effect: Effect of changing X on changing Y via changing Z Pearl, 2009 http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf

What effects can we identify and estimate? Some effects can be uniquely identified from associational data and a DAG structure “Identifiable” = uniquely determined by data and assumptions Typical DAG model assumptions: Model shows all relevant variables and conditional independence and dependence relationships Causal Markov and faithfulness assumptions (Spirtes and Zhang, 2014, https://arxiv.org/pdf/1502.00829.pdf) Other effects may not be identifiable without more data, e.g., time series and interventions Special algorithms and software can enumerate all identifiable effects, given a DAG model Pearl’s do-calculus; dagitty software (Textor, 2015, https://cran.r-project.org/web/packages/dagitty/dagitty.pdf )

Interpret BN structure using dagitty: List testable implications Click DAG under B Bayesian Network to generate dagitty interpretations of DAG. Automatically shows testable implications, identifiable effects, and minimal adjustment sets for estimating them with minimal bias _||_ = “is conditionally independent of”

Interpret BN structure using dagitty: List identifiable path coefficients Click DAG under B Bayesian Network to generate dagitty interpretations of DAG. identifiable path coefficients

Interpret BN structure using dagitty: List path coefficients identifiable via IV Click DAG under B Bayesian Network to generate dagitty interpretations of a DAG. Lists path coefficients identifiable via instrumental variables (IVs) Instrumental variable (IV): Change X but not Y directly, see how Y moves

Dagitty lists total effects identifiable via regression modeling Click DAG under B Bayesian Network: Identifiable total effects Selects variables to include Solves automated variable selection problem

Wrap-up on dagitty and d-separation The graph property of d-connectivity allows information to flow between variables Links conditional independence to DAG Algorithms and software now allow automatic interpretation of DAG models in terms of which effects can be identified from data, and how What to condition on: Automated variable selection Path coefficients, regression coefficients Non-parametric structural equations and conditional probability tables/CPTs

Making DAG learning easier: Causal Analytics Toolkit (CAT) http://cox-associates.com/downloads/

Learning goals for this section See how to apply R packages to carry out causal analytics based on information-theoretic principles and algorithms Causal Analytics Toolkit (CAT) for R packages BN learning algorithms CART trees randomForest ensembles partial dependence plots Apply to example data

Principles of most successful causal effect estimation algorithms Information principle: Causes are informative about (help to predict) their effects So, exploit predictive analytics algorithms! Use DAGs, trees, Random Forests , etc. to find informative variables Propagation principle: Changes in causes help to predict and explain changes in their effects Information flows from causes to their effects over time Use non-parametric effects estimates CART trees estimate conditional probabilities directly from data, no parametric model Avoids errors from modeling biases, specification errors, uncertain assumptions Allow nonlinearities and interactions Average over ensembles of hundreds of non-parametric estimates/predictions

CAT applies R and Python packages via Excel Load data in Excel, click Excel to R to send it to R Los Angeles air basin 1461 days, 2007-2010 (Lopiano et al., 2015, thanks to Stan Young for data) PM2.5 data from CARB Elderly mortality (“AllCause75”) from CA Department of Health Daily min and max temps & max relative humidity from ORNL and EPA Risk question: Does PM2.5 exposure increase elderly mortality risk? If so, by how much? P(AllCause75 | PM2.5) = ? E(AllCause75 | PM2.5) = ?

Using CAT to examine associations: Plotting the data Send data from Excel to R Highlight columns Click on “Excel to R” Select columns to analyze Click on column headers Cntrl-click toggles selection Click on Plots to view frequency distributions, scatter plots, correlation, smooth regression curves PM2.5 is slightly negatively associated with mortality

Using CAT to examine associations: Plotting more data Send data from Excel to R Highlight columns Click on “Excel to R” Select columns Click on column heads Cntrl-click toggles selection Click on Plots to view frequency distributions, scatter plots, correlations, smooth regression curves Temperature is positively associated with PM2.5 Temperature is negatively associated with mortality,

Other visualizations correlations: Corrgram

Other visualizations for partial and total correlations: qgraph network Red = negative Green = positive Thicker = stronger Closer = stronger Assumes linear relationships.

Nonparametric BN networks detect many more dependencies than partial (linear) correlations

Basic ideas of information-based Causal Analytics Use a (DAG) network to show which variables provide direct information about each other Arrows between variables show they are informative about each other Learn network structure directly from data Scoring algorithms, constraint algorithms, hybrids Carefully check conclusions In non-parametric analyses we trust! Do power analyses using simulation Interpret neighbors in network as potential direct causes (satisfying necessary condition) Use partial and total dependence plots learned from data (based on averaging over many trees) to quantify relation between inputs and dependent variables.

Run BN structure discovery algorithms Click B Bayesian Network to generate DAG structure. Only variables connected to response variable by an arrow are identified as potential direct causes Multiple pathways between two variables reveal potential direct and indirect effects Example: Direct and indirect paths between tmax and AllCause75.

By contrast, regression estimates total associations, given an assumed model Click on Automatic under Regression Models CAT selects and runs appropriate regression models, reports results Quasi-Poisson regression model shows significant positive total C-R association between PM2.5 and elderly mortality (AllCause75) Significant negative total association between temperature and elderly mortality

Confirm or refute/refine BN structure with additional non-parametric tests Conditioning on very different values of a direct cause should cause the distribution of the response variable to change If the response variable does not change, then any association between them may be due to indirect pathways (e.g., confounding)

Confirm or refute/refine BN structure with additional non-parametric tests Conditioning on very different values of a direct cause should cause the distribution of the response variable to change If the response variable does not change, then any association between them may be due to indirect pathways (e.g., confounding)

Key steps Use Bayesian network (BN) learning algorithms to identify plausible causal DAG structures What DAG structure(s) best explain the data? What parts of DAG structure are identified by data? Quantify conditional probability tables (CPTs) using BN learning algorithms or CART trees Estimate direct effects of parents at each node Validate quantified causal BNs Use causal BNs to answer practical questions: Prediction, attribution, optimization, explanation, inference, generalization

Quantify total causal relations using CART trees, Random Forest, etc. Procedure: Grow tree or random forest for response variable using only its parents in causal DAG model

Quantify CPTs using CART trees, Random Forest, etc. Procedure: Grow tree or random forest for response variable using only its parents in causal DAG model Conditional probability of heart disease varies 10-fold based on age, sex, smoking, and income

Identify implications for effects of actions Reducing smoking most reduces heart disease risk among men over 68 and people in lower income groups (1-4)

Quantify direct causal relations Procedure: To quantify direct (potentially causal) relations after controlling for other variables, use partial dependence plot for response R vs. (potential) cause C. RandomForest algorithm averages multiple independent conditional probability predictions of outcome Y for each value of x Rationale: DAG structure shows that the relation might be causal (X helps to predict Y). Partial dependence estimates size of potential effect. Data-based simulation of conditional expected values generates curve

Key steps Use Bayesian network (BN) learning algorithms to identify plausible causal DAG structures What DAG structure(s) best explain the data? What parts of DAG structure are identified by data? Quantify conditional probability tables (CPTs) using BN learning algorithms or CART trees Estimate direct effects of parents at each node Validate quantified causal BNs Use causal BNs to answer practical questions: Prediction, attribution, optimization, explanation, inference, generalization

Validating quantified DAG models Compare results from multiple algorithms using different principles Conditional independence constraints, likelihood scores, LiNGAM, composition Cross-validation Occam’s Window: Bayesian Model-Averaging ensemble modeling for DAG models Internal consistency checks of effects estimates using multiple adjustment sets

Validate quantified relations in hold-out samples CAT currently quantifies uncertainty using bootstrap and cross- validation approaches for Random Forest ensembles Averaging over many trees reduces MSE (mean squared prediction error)

Validation test: Do variables not joined by arrows have flat PDPs? Boston Scale including 0 PM2.5 has positive regression coefficient as predictor of AllCause75 in both data sets PM2.5 is not a significant predictive causal predictor in either data set C-R function learned from Boston does not apply in LA (and vice versa)

Key steps Use Bayesian network (BN) learning algorithms to identify plausible causal DAG structures What DAG structure(s) best explain the data? What parts of DAG structure are identified by data? Quantify conditional probability tables (CPTs) using BN learning algorithms or CART trees Estimate direct effects of parents at each node Validate quantified causal BNs Use causal BNs to answer practical questions: Prediction, attribution, optimization, explanation, inference, generalization

Use BN/influence diagram (ID) to optimize actions and decisions

Estimating implied dependencies Quantify total effects (or dependency) plots via BN algorithms Procedure: Compose DAG CPTs along causal paths using Monte Carlo simulation Gibbs sampling for E(Y | do(x), z) and P(z) Quantify direct effects via partial dependency plots

Extensions Latent variables Measurement error Transportability Detecting unmeasured confounders Measurement error Transportability Dynamic simulation Decision optimization: Influence diagrams

Testing DAGs for hidden (“latent”) variables and confounders To test for effects of unobserved (“hidden” or “latent”) confounders, partition study population into disjoint subsets Men vs. women Younger vs. older If mortality rate in one appears as direct cause of mortality in the other, then there is probably an omitted confounder that affects both.

Detecting hidden/omitted variables

Detecting unmeasured confounders year is not a direct cause of elderly mortality in women (F75)

Transportability: Causal laws and mechanisms hold across settings Example model (or theory) structure for causes of response: Quantify Pr(mortality | age, sex, exposure) (“CPT”) Conditional C-R relation, conditional probability table (CPT) Response is conditionally independent of other variables, given the values of its direct parents in this network (“DAG model”) A valid causal model or law (CPT) describing underlying mechanisms should be the same in all studies Can be “transported” (generalized) across applications Does not change based on arrows into age, sex, exposure Otherwise, the causal theory needs to be expanded A directed acyclic graph (DAG) structure

Example: Testing transportability Partial dependence relations between exposure (PM2.5) and mortality counts in two different cities look very different.

Summary of CAT’s causal analytics Screen for total, partial, and temporal associations and information relations Learn BN network structure from data Estimate quantitative dependence relations among neighboring variables Use partial dependence plots (Random Forest ensemble of non-parametric trees) Use trees to quantify multivariate dependencies on multiple neighbors simultaneously Validate on hold-out samples Check internal consistency (dagitty, www.dagitty.net/dags.html), transportability, possible omitted variables

Wrap-up on CAT Modern software makes it easy to apply information-based causal analytics to data Entire analysis process can be automated Click on “Analyze” in CAT Minimizes roles of modeling choices, p-hacking, confirmation bias, etc. Limited but useful outputs: Possible causal relationships detected and quantified directly from data Predictive causality, not necessarily manipulative

Inference engines and influence diagrams based on BN technology

Five Challenges Learning: How to infer (“learn”) correct DAG models from observational data? Under what conditions is this possible? (“Identifiability”) When possible, how to do it? (Learning algorithms) How to characterize remaining uncertainties? Inference: How to use DAG model to infer probable values of unobserved variables from values of observed variables? Prediction and optimization: How to use DAG to predict how changing X would change Y? To optimize choice of X? “Manipulative causality” Attribution: How to use DAG to attribute effects on Y to X? How to define and estimate direct, total, controlled direct, natural direct, natural indirect, and mediated effects of X on Y? Generalization: How to generalize answers from study populations(s) to other populations? “Transportability” or “external validity” question

Inference: Design of a probabilistic inference expert system BN The system stores a joint PDF for many variables, Pr(X1, X2, …, Xn) as a causal Bayesian network The user specifies a query (subset of variables, Q) and some evidence (observed or assumed values of other variables, K) System calculates Pr(Q | K) to answer query Method: Pr(Q | K) = Pr(Q and K)/Pr(K) Pr(K) is obtained from joint PDF by marginalizing out (summing over all the values of) other variables Monte Carlo: In practice, sample many times from joint PDF to get ratio Pr(Q and K)/Pr(K)

“Marginalizing Out” Given a joint PDF, Pr(K, U), the marginal distribution of K is: Pr(K = k) = uPr(K = k & U = u) K = known/assumed values U = unknown/uncertain values In general, given a joint PDF for many variables, Pr(X1, X2, …, Xn), we can “marginalize out” the PDF for any variable (or the joint PDF for any subset of variables) by summing/integrating over all values of all other variables.

Key BN idea: Factoring a Joint PDF For two variables, we have Pr(X = x, Y = y) = Pr(X = x)Pr(Y = y | X = x) More briefly: Pr(x, y) = Pr(x)Pr(y | x) For n variables, we have (perhaps after renumbering variables so that all of a variable’s direct parents precede it in the new ordering): Pr(x1, x2, …, xn) = Pr(x1)Pr(x2 | x1)Pr(x3 | x1, x2)…Pr(xn | x1, x2, …, xn-1) Most of these terms simplify for sparse DAGs! Example: Find Pr(x1, x2, x3, x3) in the following DAG: X1  X2  X3  X4 A: Pr(x1, x2, x3, x4) = Pr(x1)Pr(x2 | x1)Pr(x4)Pr(x3 | x2 , x4)