Presentation is loading. Please wait.

Presentation is loading. Please wait.

Math 6330: Statistical Consulting Class 6

Similar presentations


Presentation on theme: "Math 6330: Statistical Consulting Class 6"— Presentation transcript:

1 Math 6330: Statistical Consulting Class 6
Tony Cox University of Colorado at Denver Course web site:

2 Readings on Bayesian Networks
Charniak (1991), pages 50-53, Build the network in Figure 2 Pearl (2009), Sections 1 and 2 (through page 102). Methods to Accelerate the Learning of Bayesian Network Structures, Daly and Shen (2007)

3 Causal questions Retrospective (evaluation)
How would Y (or its probability distribution) have been different if X had been different? Would Y have occurred if X had not occurred? Answers usually depend on the assumptions we make about why X would have been different Prospective (decision optimization) What will happen to Y (or its probability distribution) if we change X? How sure can we be? Explanatory Why does Y have the value (or probability distribution) that it has? To what extent is it because of the value of X?

4 Implications among types of causation
attributive etiologic fraction population attributable risk probability of causation burden of disease refutationist quasi-experiments weight of evidence regularity associational relative risk (RR) odds ratio (OR) regression coefficients computational/exogeneity Simon-Iwasaki causal ordering mechanistic structural equations simulation causal pathways manipulative do-calculus dynamic causal models predictive transfer entropy Granger causality statistical dependence DAG graph models Causal Bayesian networks mediation counterfactual/potential outcomes propensity scores, marginal structural models instrumental variables intervention studies

5 Types of effects Direct effect: How a change in X changes Y if all other variables are held fixed Total effect: How a change in X changes Y if all other variables are allowed to respond Mediated effect: How a change in X changes Y by changing mediator Z Transient and comparative statics effects Example: Effect of a change in volume on pressure in an ideal gas P = nRT/V

6 Associations are unreliable guides to causation

7 Non-causal associations between X and Y
Confounding: X  Z  Y Failing to condition on Z leads to spurious association between X and Y Leads many statisticians to “control for” possible confounding by putting all variables on rhs of regression model Selection (Berkson): X  Z  Y Conditioning on Z leads to spurious association between X and Y

8 Example of selection bias
Suppose that the only workers who continue to work in an industry are those who (a) Are accustomed to high exposures; or (b) Are very healthy. DAG: High exposure Stay  Healthy Then, among workers who stay, high exposure is associated with lower health, even if exposure does not increase risk.

9 Non-causal associations between measured X and measured Y values
X  Z  Y Y = Z x = measured z + small error y = measured z + large error Then regression model may identify X but not Z as a significant predictor of Y Even though Z and not X is a direct cause of Y

10 Non-causal associations between X and Y
X  Z  Y Y = Z2 X = Z2 Then linear regression model may identify X but not Z as a significant predictor of Y Even though Z and not X is a direct cause of Y

11 Identifiability of causal impacts
Principle: Effects are not conditionally independent of their direct causes. We can use this as a screen for possible causes in a ,ultivariate datbase Suppose we had an “oracle” (e.g., a perfect CART tree or BN learning algorithm) for detecting conditional independence Which of these could it distinguish among? X  Z  Y (e.g., exposure  lifestyle  health) Z  X  Y (e.g., lifestyle  exposure  health) X  Y  Z (e.g., exposure  health  lifestyle) X  Y  Z (e.g., exposure  health  lifestyle) X  Z  Y (e.g., exposure  lifestyle  health)

12 Identifiability of causal impacts
X  Z  Y (e.g., exposure  lifestyle  health) Z  X  Y (e.g., lifestyle  exposure  health) X  Y  Z (e.g., exposure  health  lifestyle) X  Y  Z (e.g., exposure  health  lifestyle) X  Z  Y (e.g., exposure  lifestyle  health) In 1 and 5, but not the rest, X and Y are conditionally independent given Z Markov equivalence class can be identified In 4, but not the rest, X and Z are conditionally independent given Y In 2, but not the rest, Z and Y are conditionally independent given X In 3, X and Z are unconditionally independent but conditionally dependent given Y

13 Quasi-experiments: Refuting non-causal explanations with control groups
Example: Do delinquency interventions work?

14 Threats to validity of causal inferences

15 Generalizability of findings
Invariance of causal laws across contexts “Transportability” of causal effect estimates Threats to external validity in quasi-experiments (QEs)

16 Overview of causal analytics techniques
Causal graph models Path diagrams, structural equations models (Causal) Bayesian Networks, DBNs, influence diagrams (IDs) Time series methods Granger causality: Causes help to predict effects Transfer Entropy: Info flows from causes to their effects Hybrid techniques: Inferring causal graph models from time series data Systems dynamics simulation models

17 Path analysis Input Output
Allows estimation of direct, indirect, and total effects

18 Path analysis (cont.) Input Output
Causal hypotheses are provided as inputs; effects strengths are estimated as outputs.

19 Time series: Granger causality
X is a Granger-cause of Y if the future of Y is not conditionally independent of the history of X, given the history of Y Test based on time series regression and F test for non-independence

20 Granger test example

21 Granger causality F-tests Asymmetry

22 From: Disruption of Frontal–Parietal Communication by Ketamine, Propofol, and Sevoflurane
Anesthesiology. 2013;118(6): doi: /ALN.0b013e f5 Figure Legend: Schematic illustration of transfer entropy. Symbolic transfer entropy measures the causal influence of source signal X on target signal Y, and is based on information theory. The information transfer from signal X to Y is measured by the difference of two mutual information values, I [YF; XP, YP] and I [YF; YP], where XP, YP, and YF are, respectively, the past of source and target signals and the future of the target signal. The difference corresponds to information transferred from the past of source signal XP to the future of the target signal YF and not from the past of the target signal itself. The average overall vector points measures the information transferred from the source signal to the target signal. The vector points are symbolized with the rank of their components: e.g., a vector point (30,78,51) is symbolized to (1,3,2) with the rank of components in ascending order. Date of download: 2/16/2017 Copyright © 2017 American Society of Anesthesiologists. All rights reserved.

23 Algorithmic challenges
Learning: Learn causal graph from data Structure (DAG) Learning CPT estimation Dirichlet prior and Bayesian estimation Monte Carlo sampling Inference: Use causal graph to draw inferences about probabilities of variables given observations

24 How to get from data to causal predictions… objectively?
Deterministic causal prediction: Doing X will make Y happen to people of type Z Probabilistic causal prediction: Doing X will change conditional probability distribution of Y, given covariates Z Goal: Manipulative causation (vs. associational, counterfactual, predictive, computational, etc.) Data: Observed (X, Y, Z) values Challenge: How will changing X change Y?


Download ppt "Math 6330: Statistical Consulting Class 6"

Similar presentations


Ads by Google