Math 6330: Statistical Consulting Class 6

Slides:



Advertisements
Similar presentations
Statistics for Improving the Efficiency of Public Administration Daniel Peña Universidad Carlos III Madrid, Spain NTTS 2009 Brussels.
Advertisements

Andrea M. Landis, PhD, RN UW LEAH
Forecasting Using the Simple Linear Regression Model and Correlation
Correlation and regression Dr. Ghada Abo-Zaid
Introduction of Probabilistic Reasoning and Bayesian Networks
PHSSR IG CyberSeminar Introductory Remarks Bryan Dowd Division of Health Policy and Management School of Public Health University of Minnesota.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
19-1 Chapter Nineteen MULTIVARIATE ANALYSIS: An Overview.
Statistics Micro Mini Threats to Your Experiment!
Topic 3: Regression.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Decision analysis and Risk Management course in Kuopio
McGraw-Hill © 2006 The McGraw-Hill Companies, Inc. All rights reserved. Correlational Research Chapter Fifteen.
Bayes Net Perspectives on Causation and Causal Inference
Regression and Correlation Methods Judy Zhong Ph.D.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
EVAL 6970: Cost Analysis for Evaluation Dr. Chris L. S. Coryn Nick Saxton Fall 2014.
Understanding Statistics
ECON ECON Health Economic Policy Lab Kem P. Krueger, Pharm.D., Ph.D. Anne Alexander, M.S., Ph.D. University of Wyoming.
Lecture 8: Generalized Linear Models for Longitudinal Data.
Association between 2 variables
Data Quality Sharp project 5 June Statistical Problems with Data Quality in EHR Missing Data Missing Data Uncertain Diagnosis Uncertain Diagnosis.
Chapter 16 Social Statistics. Chapter Outline The Origins of the Elaboration Model The Elaboration Paradigm Elaboration and Ex Post Facto Hypothesizing.
Design of Clinical Research Studies ASAP Session by: Robert McCarter, ScD Dir. Biostatistics and Informatics, CNMC
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Mediation: The Causal Inference Approach David A. Kenny.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Chapter 16 PATH ANALYSIS. Chapter 16 PATH ANALYSIS.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
Stochasticity and Probability. A new approach to insight Pose question and think of the answer needed to answer it. Ask: How do the data arise? What is.
Methods of Presenting and Interpreting Information Class 9.
Decision Analysis Lecture 13
Howard Community College
Descriptive and Causal
Math 6330: Statistical Consulting Class 7
CHOOSING A RESEARCH DESIGN
Chapter 13 Simple Linear Regression
Math 6330: Statistical Consulting Class 2
Math 6330: Statistical Consulting Class 5
Comparison of three Observational Analytical strategies
Experimental Research
Chapter 10 Causal Inference and Correlational Designs
Math 6330: Statistical Consulting Class 8
Statistical Data Analysis
Longitudinal Designs.
Random error, Confidence intervals and P-values
BPK 304W Correlation.
The Modeling Process Objective Hierarchies Variables and Attributes
Lecture 1: Fundamentals of epidemiologic study design and analysis
Chapter Eight: Quantitative Methods
Making Causal Inferences and Ruling out Rival Explanations
Thoughts on the Future of Statistics Teaching in the light of Big Data
Experiments and Quasi-Experiments
Experiments and Quasi-Experiments
Introduction to Hypothesis Testing
Introduction to Experimental Design
Evaluating Impacts: An Overview of Quantitative Methods
Critical Appraisal วิจารณญาณ
Group Experimental Design
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
MECH 3550 : Simulation & Visualization
Parametric Methods Berlin Chen, 2005 References:
Non-Experimental designs: Correlational & Quasi-experimental designs
Counterfactual models Time dependent confounding
Misc Internal Validity Scenarios External Validity Construct Validity
Effect Modifiers.
Introduction to Decision Sciences
Presentation transcript:

Math 6330: Statistical Consulting Class 6 Tony Cox tcoxdenver@aol.com University of Colorado at Denver Course web site: http://cox-associates.com/6330/

Readings on Bayesian Networks Charniak (1991), pages 50-53, http://www.aaai.org/ojs/index.php/aimagazine/article/viewFile/918/836 Build the network in Figure 2 Pearl (2009), Sections 1 and 2 (through page 102). http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf Methods to Accelerate the Learning of Bayesian Network Structures, Daly and Shen (2007) https://pdfs.semanticscholar.org/e7d3/029e84a1775bb12e7e67541beaf2367f7a88.pdf

Causal questions Retrospective (evaluation) How would Y (or its probability distribution) have been different if X had been different? Would Y have occurred if X had not occurred? Answers usually depend on the assumptions we make about why X would have been different Prospective (decision optimization) What will happen to Y (or its probability distribution) if we change X? How sure can we be? Explanatory Why does Y have the value (or probability distribution) that it has? To what extent is it because of the value of X?

Implications among types of causation attributive etiologic fraction population attributable risk probability of causation burden of disease refutationist quasi-experiments weight of evidence regularity associational relative risk (RR) odds ratio (OR) regression coefficients computational/exogeneity Simon-Iwasaki causal ordering mechanistic structural equations simulation causal pathways manipulative do-calculus dynamic causal models predictive transfer entropy Granger causality statistical dependence DAG graph models Causal Bayesian networks mediation counterfactual/potential outcomes propensity scores, marginal structural models instrumental variables intervention studies

Types of effects Direct effect: How a change in X changes Y if all other variables are held fixed Total effect: How a change in X changes Y if all other variables are allowed to respond Mediated effect: How a change in X changes Y by changing mediator Z Transient and comparative statics effects Example: Effect of a change in volume on pressure in an ideal gas P = nRT/V

Associations are unreliable guides to causation

Non-causal associations between X and Y Confounding: X  Z  Y Failing to condition on Z leads to spurious association between X and Y Leads many statisticians to “control for” possible confounding by putting all variables on rhs of regression model Selection (Berkson): X  Z  Y Conditioning on Z leads to spurious association between X and Y

Example of selection bias Suppose that the only workers who continue to work in an industry are those who (a) Are accustomed to high exposures; or (b) Are very healthy. DAG: High exposure Stay  Healthy Then, among workers who stay, high exposure is associated with lower health, even if exposure does not increase risk.

Non-causal associations between measured X and measured Y values X  Z  Y Y = Z x = measured z + small error y = measured z + large error Then regression model may identify X but not Z as a significant predictor of Y Even though Z and not X is a direct cause of Y

Non-causal associations between X and Y X  Z  Y Y = Z2 X = Z2 Then linear regression model may identify X but not Z as a significant predictor of Y Even though Z and not X is a direct cause of Y

Identifiability of causal impacts Principle: Effects are not conditionally independent of their direct causes. We can use this as a screen for possible causes in a ,ultivariate datbase Suppose we had an “oracle” (e.g., a perfect CART tree or BN learning algorithm) for detecting conditional independence Which of these could it distinguish among? X  Z  Y (e.g., exposure  lifestyle  health) Z  X  Y (e.g., lifestyle  exposure  health) X  Y  Z (e.g., exposure  health  lifestyle) X  Y  Z (e.g., exposure  health  lifestyle) X  Z  Y (e.g., exposure  lifestyle  health)

Identifiability of causal impacts X  Z  Y (e.g., exposure  lifestyle  health) Z  X  Y (e.g., lifestyle  exposure  health) X  Y  Z (e.g., exposure  health  lifestyle) X  Y  Z (e.g., exposure  health  lifestyle) X  Z  Y (e.g., exposure  lifestyle  health) In 1 and 5, but not the rest, X and Y are conditionally independent given Z Markov equivalence class can be identified In 4, but not the rest, X and Z are conditionally independent given Y In 2, but not the rest, Z and Y are conditionally independent given X In 3, X and Z are unconditionally independent but conditionally dependent given Y

Quasi-experiments: Refuting non-causal explanations with control groups Example: Do delinquency interventions work? http://www.slideshare.net/horatjitra/research-design-and-validity

Threats to validity of causal inferences http://spectrum.troy.edu/renckly/week6a.htm

Generalizability of findings Invariance of causal laws across contexts “Transportability” of causal effect estimates Threats to external validity in quasi-experiments (QEs)

Overview of causal analytics techniques Causal graph models Path diagrams, structural equations models (Causal) Bayesian Networks, DBNs, influence diagrams (IDs) Time series methods Granger causality: Causes help to predict effects Transfer Entropy: Info flows from causes to their effects Hybrid techniques: Inferring causal graph models from time series data Systems dynamics simulation models

Path analysis Input Output Allows estimation of direct, indirect, and total effects http://crab.rutgers.edu/~goertzel/pathanal.htm

Path analysis (cont.) Input Output Causal hypotheses are provided as inputs; effects strengths are estimated as outputs. http://crab.rutgers.edu/~goertzel/pathanal.htm

Time series: Granger causality X is a Granger-cause of Y if the future of Y is not conditionally independent of the history of X, given the history of Y Test based on time series regression and F test for non-independence

Granger test example http://davegiles.blogspot.com/2011/04/testing-for-granger-causality.html

Granger causality F-tests Asymmetry http://epilepsyu.com/blog/tag/granger-causality-test/

From: Disruption of Frontal–Parietal Communication by Ketamine, Propofol, and Sevoflurane Anesthesiology. 2013;118(6):1264-1275. doi:10.1097/ALN.0b013e31829103f5 Figure Legend: Schematic illustration of transfer entropy. Symbolic transfer entropy measures the causal influence of source signal X on target signal Y, and is based on information theory. The information transfer from signal X to Y is measured by the difference of two mutual information values, I [YF; XP, YP] and I [YF; YP], where XP, YP, and YF are, respectively, the past of source and target signals and the future of the target signal. The difference corresponds to information transferred from the past of source signal XP to the future of the target signal YF and not from the past of the target signal itself. The average overall vector points measures the information transferred from the source signal to the target signal. The vector points are symbolized with the rank of their components: e.g., a vector point (30,78,51) is symbolized to (1,3,2) with the rank of components in ascending order. Date of download: 2/16/2017 Copyright © 2017 American Society of Anesthesiologists. All rights reserved.

Algorithmic challenges Learning: Learn causal graph from data Structure (DAG) Learning CPT estimation Dirichlet prior and Bayesian estimation Monte Carlo sampling Inference: Use causal graph to draw inferences about probabilities of variables given observations

How to get from data to causal predictions… objectively? Deterministic causal prediction: Doing X will make Y happen to people of type Z Probabilistic causal prediction: Doing X will change conditional probability distribution of Y, given covariates Z Goal: Manipulative causation (vs. associational, counterfactual, predictive, computational, etc.) Data: Observed (X, Y, Z) values Challenge: How will changing X change Y?