Summary: connecting the question to the analysis(es) Jay S. Kaufman, PhD McGill University, Montreal QC 26 February 2016 3:40 PM – 4:20 PM National Academy.

Slides:



Advertisements
Similar presentations
Chapter 3 Introduction to Quantitative Research
Advertisements

Study Size Planning for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
A workshop introducing doubly robust estimation of treatment effects
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
MTH 161: Introduction To Statistics
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 Introduction to Inference Confidence Intervals William P. Wattles, Ph.D. Psychology 302.
Sensitivity Analysis for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
PHSSR IG CyberSeminar Introductory Remarks Bryan Dowd Division of Health Policy and Management School of Public Health University of Minnesota.
Chance, bias and confounding
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.

Concept of Measurement
Inferences About Means of Two Independent Samples Chapter 11 Homework: 1, 2, 3, 4, 6, 7.
Clustered or Multilevel Data
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 4 Choosing a Research Design.
Chapter 11 Multiple Regression.
Today Concepts underlying inferential statistics
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Chapter 14 Inferential Data Analysis
Richard M. Jacobs, OSA, Ph.D.
Regression and Correlation
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay.
1 Tetrad: Machine Learning and Graphcial Causal Models Richard Scheines Joe Ramsey Carnegie Mellon University Peter Spirtes, Clark Glymour.
McGraw-Hill/Irwin Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. Choosing a Research Design.
Simple Linear Regression
EVAL 6970: Cost Analysis for Evaluation Dr. Chris L. S. Coryn Nick Saxton Fall 2014.
Epidemiology The Basics Only… Adapted with permission from a class presentation developed by Dr. Charles Lynch – University of Iowa, Iowa City.
ECON ECON Health Economic Policy Lab Kem P. Krueger, Pharm.D., Ph.D. Anne Alexander, M.S., Ph.D. University of Wyoming.
October 15H.S.1 Causal inference Hein Stigum Presentation, data and programs at:
Role of Statistics in Geography
Assessing ETA Violations, and Selecting Attainable/Realistic Parameters Causal Effect/Variable Importance Estimation and the Experimental Treatment Assumption.
Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding.
Techniques of research control: -Extraneous variables (confounding) are: The variables which could have an unwanted effect on the dependent variable under.
Interaction Effects and Theory Testing Kaiser et al. (2006) social identity theory –tested hypotheses about attention to prejudice cues in the environment.
Advanced Higher Statistics Data Analysis and Modelling Hypothesis Testing Statistical Inference AH.
Chapter 7: Data for Decisions Lesson Plan Sampling Bad Sampling Methods Simple Random Samples Cautions About Sample Surveys Experiments Thinking About.
Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR 5/06.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
V13: Causality Aims: (1) understand the causal relationships between the variables of a network (2) interpret a Bayesian network as a causal model whose.
An Introductory Lecture to Environmental Epidemiology Part 5. Ecological Studies. Mark S. Goldberg INRS-Institut Armand-Frappier, University of Quebec,
BIOST 536 Lecture 11 1 Lecture 11 – Additional topics in Logistic Regression C-statistic (“concordance statistic”)  Same as Area under the curve (AUC)
Chapter 13 Repeated-Measures and Two-Factor Analysis of Variance
Constructs AKA... AKA... Latent variables Latent variables Unmeasured variables Unmeasured variables Factors Factors Unobserved variables Unobserved variables.
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Using Propensity Score Matching in Observational Services Research Neal Wallace, Ph.D. Portland State University February
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Statistics Correlation and regression. 2 Introduction Some methods involve one variable is Treatment A as effective in relieving arthritic pain as Treatment.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Research design By Dr.Ali Almesrawi asst. professor Ph.D.
Chapter 22 Inferential Data Analysis: Part 2 PowerPoint presentation developed by: Jennifer L. Bellamy & Sarah E. Bledsoe.
PH240A Epidemiology and the Curse of Dimensionality Alan Hubbard Division of Biostatistics U.C. Berkeley
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Methods of Presenting and Interpreting Information Class 9.
Research Designs for Explanation Experimental, Quasi-experimental, Non-experimental, Observational.
Statistical Data Analysis
Lecture Slides Elementary Statistics Thirteenth Edition
Presenter: Wen-Ching Lan Date: 2018/03/28
RESEARCH METHODS Lecture 33
Statistical Data Analysis
Research Design Research Methodology and Methods of Social Inquiry
Measuring the Wealth of Nations
Enhancing Causal Inference in Observational Studies
Enhancing Causal Inference in Observational Studies
Presentation transcript:

Summary: connecting the question to the analysis(es) Jay S. Kaufman, PhD McGill University, Montreal QC 26 February :40 PM – 4:20 PM National Academy of Sciences 2101 Constitution Ave NW, Washington, DC USA

Causal inference is necessary for medical and public policy decision-making because we hope to optimize some outcome. Causal inference is about inherently unobservable things (i.e. the future under different scenarios) Because we can’t directly observe what we want to know, we model it. Good models Bad models From 1999 to 2009, the number of Americans who fell into a swimming pool and drowned each year is correlated with the number of films in which Nicholas Cage appeared that year. Shall we reduce the number of pool drownings by keeping Cage off the screen?

Statistical models are used to estimate relationships between variables in observational data sets. Y X Y X 0 1 β0β0 β1β1 But it is mechanistic knowledge or structural assumptions that allow us to infer causal effects from these relationships (not statistical considerations)

Frequency Measure of Dental Caries Hours of Television Viewing per capita  = expected change in outcome per unit change in exposure Consider two different bivariate associations: 1) Relation between ecologic levels of TV viewing and ecologic rates of dental caries, by country:

Frequency Measure of Dental Caries Daily Grams of Refined Sugar Consumed per capita  = expected change in outcome per unit change in exposure Consider two different bivariate associations: 2) Relation between ecologic levels of refined sugar consumption and ecologic rates of dental caries, by country: Adjust for (condition on) level of socioeconomic development, and find that Pr(Y|SET[X=x]) is null in scenario #1, and non-null in scenario #2.

Read: Pr(Y|SET[X=x]) as:Pr(Y|SET[X=x 1 ]) versus Pr(Y|SET[X=x 2 ]) where x 1 and x 2 are two levels at which you can intervene to set the exposure, and the contrast is usually a difference or ratio. Clearly, quantities intermediate between exposure and outcome are not "confounders", they are just part of the mechanism through which the exposure has the effect that it has.

For example: X Z Y cigarette tax cigarette consumptionlung cancer mortality The causal effect of manipulating cigarette tax is: Pr(Y|SET[X=$1]) versus Pr(Y|SET[X=$2]) If these are the only three variables relevant to this problem, this causal effect is estimated without bias by the contrast of the observed probabilities: Pr(Y| X=$1) versus Pr(Y| X=$2) NOT by adjusting for the intermediate Z. In fact, the adjusted effect would be null, which may be very far from the truth.

That’s exactly why we use this graphical language for encoding subject matter knowledge about causality A way of communicating structural assumptions. Non-parametric. Cannot be deduced from the data.

Compare a graphical model with a typical parametric epidemiologic model, such as logistic regression: XY Z The graphical model asserts only that: Y =  (X, Z,  y ) and that X =  (U,  X ) and Z =  (U,  Z ) The logistic regression model: makes MANY assertions, including the multiplicative interaction of X and Z, and the linearity of the ln(odds) of Y across all values of X and Z. U

On the other hand, a graphical model can represent many structural relations that cannot be encoded in a typical statistical model: ZY X The graphical model asserts that: X =  (Z,  X ) The logistic regression model: cannot easily represent this constraint, even if it is know by the investigators to be true on subject matter grounds (e.g., Z = SEX, X = SMOKING)

Confounding Confounding is a divergence between two kinds of conditional probability distributions of Y: the distribution given that we find X at the value x (estimable from the data), and the distribution given that we intervene to force X to take the value x. Confounding is the distinction between seeing and doing: ZY X

Identification Can we express the Y|SET(X=x) quantity in terms of observables? Estimation What is the actual numerical value of the contrast E(Y|SET(X=1)) – E(Y|SET(X=0))? ZY X

Most causal inference methods assume that you have no unmeasured confounders: Regression Propensity Scores Marginal Structural Models G-methods (SNMs, G-Formula, etc) “Quasi-Experimental” Methods use structural assumptions to achieve identification even in the presence of unmeasured confounding: Instrumental variables Regression Discontinuity Fixed Effects Differences in Differences

Some causal inference methods achieve identification based on extrapolation of a parametric model. Semi-parametric methods (e.g. propensity scores, IPTW, TMLE, etc) rely less on model form. Letting a computer pick the model reduces “wish bias”. Non-parametric methods (e.g. matching) require no model at all. Doubly robust methods require that at least one model be right, but not both. Computer intensive methods (e.g. bootstrapping) reduce reliance on distributional assumptions.

Summary: Models can be used to parameterize associations between treatment and response variables. Decision makers need to interpret associations causally, to predict the change in Y that will occur under specific interventions on X. The validity of this causal interpretation is threatened by both systematic and random errors. The systematic errors are all functions of causal structure which cannot be deduced from the data. Most of the “complex” methods described are only “complex” because of time.