SIMPSON’S PARADOX Any statistical relationship between two variables may be reversed by including additional factors in the analysis. Application: The.

Slides:



Advertisements
Similar presentations
Random Assignment Experiments
Advertisements

Simpson’s Paradox Fr Chris - St Francis High School October 5, 2006.
Chapter 4: More on Two- Variable Data.  Correlation and Regression Describe only linear relationships Are not resistant  One influential observation.
Copyright © 2011 by Pearson Education, Inc. All rights reserved Statistics for the Behavioral and Social Sciences: A Brief Course Fifth Edition Arthur.
Beyond Null Hypothesis Testing Supplementary Statistical Techniques.
ODAC May 3, Subgroup Analyses in Clinical Trials Stephen L George, PhD Department of Biostatistics and Bioinformatics Duke University Medical Center.
Correlation Chapter 9.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
BPSChapter 61 Two-Way Tables. BPSChapter 62 To study associations between quantitative variables  correlation & regression (Ch 4 & Ch 5) To study associations.
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Richard M. Jacobs, OSA, Ph.D.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
Relationships Among Variables
SIMPSON’S PARADOX, ACTIONS, DECISIONS, AND FREE WILL Judea Pearl UCLA
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
COLLECTING QUANTITATIVE DATA: Sampling and Data collection
Copyright © 2010 Pearson Education, Inc. Chapter 22 Comparing Two Proportions.
Copyright © 2011 Pearson Education, Inc. Numbers in the Real World.
Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.
1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.
CH. 3 Day 4 AP Statistics EQ: What is Simpson’s Paradox?
Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)
SIMPSON’S PARADOX, ACTIONS, DECISIONS, AND FREE WILL Judea Pearl UCLA
Correlational Research Chapter Fifteen Bring Schraw et al.
Chapter 22: Comparing Two Proportions
October 15. In Chapter 19: 19.1 Preventing Confounding 19.2 Simpson’s Paradox 19.3 Mantel-Haenszel Methods 19.4 Interaction.
ANOVA and Linear Regression ScWk 242 – Week 13 Slides.
Chapter 4 More on Two-Variable Data “Each of us is a statistical impossibility around which hover a million other lives that were never destined to be.
Chapter 2 Notes Math Math 1680 Assignments Look over Chapter 1 and 2 before Wednesday Assignment #2: Chapter 2 Exercise Set A (all, but #7, 8, and.
1 Introduction to Research Methods How we come to know about crime.
Chapter 22: Comparing Two Proportions. Yet Another Standard Deviation (YASD) Standard deviation of the sampling distribution The variance of the sum or.
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Copyright © 2010 Pearson Education, Inc. Slide Beware: Lots of hidden slides!
Copyright © 2010 Pearson Education, Inc. Chapter 22 Comparing Two Proportions.
CHAPTER 6: Two-Way Tables. Chapter 6 Concepts 2  Two-Way Tables  Row and Column Variables  Marginal Distributions  Conditional Distributions  Simpson’s.
Professor B. Jones University of California, Davis.
Applicants and Admissions for Six Graduate Programs at US Berkeley, 1973 Source: P. Bickel and J. W. O’Connell, Science, Vol. 187, 1975, pp MenWomen.
Stat1510: Statistical Thinking and Concepts Two Way Tables.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
AP STATISTICS LESSON 4 – 2 ( DAY 1 ) Cautions About Correlation and Regression.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
Copyright © 2009 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Feb. 13 Chapter 12, Try 1-9 Read Ch. 15 for next Monday No meeting Friday.
© Department of Statistics 2012 STATS 330 Lecture 30: Slide 1 Stats 330: Lecture 30.
Tuesday, April 8 n Inferential statistics – Part 2 n Hypothesis testing n Statistical significance n continued….
Chapter 3: Descriptive Study of Bivariate Data. Univariate Data: data involving a single variable. Multivariate Data: data involving more than one variable.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Heart Disease Example Male residents age Two models examined A) independence 1)logit(╥) = α B) linear logit 1)logit(╥) = α + βx¡
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In prior chapters we studied the relationship between two quantitative variables with.
Single-Subject and Correlational Research Bring Schraw et al.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
T-tests Chi-square Seminar 7. The previous week… We examined the z-test and one-sample t-test. Psychologists seldom use them, but they are useful to understand.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Variable selection in Regression modelling Simon Thornley.
Hypothesis Tests for 1-Proportion Presentation 9.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Two-Way Tables and The Chi-Square Test*
AP Statistics Chapter 3 Part 3
Confounding and Effect Modification
Chapter 2 Looking at Data— Relationships
Human Diversity Why learn about human diversity?
Chapter 17 Measurement Key Concept: If you want to estimate the demand curve, you need to find cases where the supply curve shifts.
Chapter 4: More on Two-Variable Data
Presentation transcript:

SIMPSON’S PARADOX Any statistical relationship between two variables may be reversed by including additional factors in the analysis. Application: The adjustment problem Which factors should be included in the analysis. (Pearson et al. 1899; Yule 1903; Simpson 1951)

e.g., UC Berkeley's alleged sex bias in graduate admission (Science ). Overall data showed a higher rate of admission among male applicants, but, broken down by departments, data showed a slight bias in favor of admitting female applicants. e.g., "reverse regression" ( ): Should one, in salary discrimination cases, compare salaries of equally qualified men and women, or, instead, compare qualifications of equally paid men and women. (Opposite conclusions.) Practical Dilemma: Why break down by department? How about by some other variable Z? Find Z such that P(y|do(x)) = ∑ z P(y|x,z)P(z) Solution: The back-door algorithm (Chapter 3). EXAMPLES OF SIMPSON’S REVERSAL

PEARSON’S SHOCK: “SPURIOUS CORRELATION” We are thus forced to the conclusion that a mixture of heterogeneous groups, each of which exhibits in itself no organic correlation, will exhibit a greater or less amount of correlation. This correlation may properly be called spurious, yet as it is almost impossible to guarantee the absolute homogeneity of any community, our results for correlation are always liable to an error, the amount of which cannot be foretold. To those who persist on looking upon all correlation as cause and effect, the fact that correlation can be produced between two quite uncorrelated characters A and B by taking an artificial mixture of two closely allied races, must come as rather a shock. [Pearson, Lee & Brandy-Moore (1899)] 1. Causation = perfect correlation 2. “Not all correlations are correlations” (Aldrich 1994)

SIMPSON’S PARADOX (1951 – 1994) M R R T T M R R T T R R T T T – TreatedT – Not treated R – RecoveredR – Dead M – MalesM – Females Easy question ( ) When / why the reversal? Harder questions (1994) Is the treatment useful? Which table to consult? Why is Simpson’s reversal a paradox? +=

SIMPSON’S REVERSAL Pr(recovery | drug, male) > Pr(recovery | no-drug, male) Pr(recovery | drug, female) > Pr(recovery | no-drug, female) Group behavior: Overall behavior: Pr(recovery | drug) < Pr(recovery | no-drug)

TO ADJUST OR NOT TO ADJUST? Treatment Recovery Gender X Z Mediating factor Z Treatment X Y Recovery Y

TWO PROOFS: 1.Surprise surfaces only when we speak about “efficacy,” not about evidence for prediction. 2.When two causal models generate the same statistical data and In one we decide to use the drug yet in the other not to use it, our decision must be driven by causal and not by statistical considerations. Thus, there is no statistical criterion to warn us against consulting the wrong table. Q.Can Temporal information help? ANo!, see Figure 6.3 (c). THE INEVITABLE CONCLUSION: THE PARADOX STEMS FROM CAUSAL INTERPRETATION

In (c), F may occur before or after C, and the correct answer is to consult the combined table. In (d), may occur before or after C, and the correct answer is to consult the F-specific tables WHY TEMPORAL INFORMATION DOES NOT HELP Treatment Recovery Gender C F E Treatment Recovery Blood Pressure Treatment Recovery C F E C F E C F E (a)(b)(c)(d)

1. People think causes, not proportions. 2. "Reversal" is possible in the calculus of proportions but impossible in the calculus of causes. WHY SIMPSON’S PARADOX EVOKES SURPRISE

CAUSAL CALCULUS PROHIBITS REVERSAL do{drug} do{no-drug} Pr(recovery | drug) > Pr(recovery | no-drug) Pr (male | do{drug} ) = Pr (male | do{no-drug}) do{drug} do{no-drug} Pr(recovery | drug, male) > Pr(recovery | no-drug, male) Pr(recovery | drug, female) > Pr(recovery | no-drug, female) do{drug} do{no-drug} Group behavior: Overall behavior: Assumption:

THE SURE THING PRINCIPLE Theorem An action C that increases the probability of an event E in each subpopulation must also increase the probability of E in the population as a whole, provided that the action does not change the distribution of the subpopulations.