Professor B. Jones University of California, Davis.

Slides:



Advertisements
Similar presentations
1. 2 Interesting Statistical Phenomenon VA San Diego Addictions Seminar 4/16/08 Kevin Cummins.
Advertisements

Comparitive Graphs.
AP Statistics Causation & Relations in Categorical Data.
2.4 Cautions about Correlation and Regression. Residuals (again!) Recall our discussion about residuals- what is a residual? The idea for line of best.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
2.6 The Question of Causation. The goal in many studies is to establish a causal link between a change in the explanatory variable and a change in the.
Correlation: Relationships Can Be Deceiving. The Impact Outliers Have on Correlation An outlier that is consistent with the trend of the rest of the data.
LSP 121 Statistics That Deceive. Simpson’s Paradox It is well accepted knowledge that the larger the data set, the better the results Simpson’s Paradox.
Correlation: Relationships Can Be Deceiving. An outlier is a data point that does not fit the overall trend. Speculate on what influence outliers have.
BPSChapter 61 Two-Way Tables. BPSChapter 62 To study associations between quantitative variables  correlation & regression (Ch 4 & Ch 5) To study associations.
Sampling & Experimental Control Psych 231: Research Methods in Psychology.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
THREE CONCEPTS ABOUT THE RELATIONSHIPS OF VARIABLES IN RESEARCH
Experiments and Observational Studies.  A study at a high school in California compared academic performance of music students with that of non-music.
Categorical Variables, Relative Risk, Odds Ratios STA 220 – Lecture #8 1.
1 10. Causality and Correlation ECON 251 Research Methods.
1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.
Section 4.4: Simpson Paradox Section 4.5: Linearizing an association between two variable by performing a Mathematical Transformations 4-11.
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
Scatterplots, Association,
Kenyon Early Decision Plans Why should you apply Early Decision? Candidates who feel strongly that Kenyon is their first choice for college should apply.
Research Design RCPT 436 Research & Technology Applications.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics.
CH. 3 Day 4 AP Statistics EQ: What is Simpson’s Paradox?
October 15. In Chapter 19: 19.1 Preventing Confounding 19.2 Simpson’s Paradox 19.3 Mantel-Haenszel Methods 19.4 Interaction.
Batting Average My example of rational numbers are batting average. I picked 1 player to show you my example. My player is Ty Cobb. He is the all time.
Chapter 2 Notes Math Math 1680 Assignments Look over Chapter 1 and 2 before Wednesday Assignment #2: Chapter 2 Exercise Set A (all, but #7, 8, and.
CHAPTER 6: Two-Way Tables. Chapter 6 Concepts 2  Two-Way Tables  Row and Column Variables  Marginal Distributions  Conditional Distributions  Simpson’s.
Chapter 4 More on Two-Variable Data YMS 4.1 Transforming Relationships.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In this chapter we will study the relationship between two categorical variables (variables.
Stat1510: Statistical Thinking and Concepts Two Way Tables.
Two-Way Tables Categorical Data. Chapter 4 1.  In this chapter we will study the relationship between two categorical variables (variables whose values.
Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.
AP STATISTICS LESSON 4 – 2 ( DAY 1 ) Cautions About Correlation and Regression.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
SIMPSON’S PARADOX Any statistical relationship between two variables may be reversed by including additional factors in the analysis. Application: The.
Feb. 13 Chapter 12, Try 1-9 Read Ch. 15 for next Monday No meeting Friday.
Warm-up An investigator wants to study the effectiveness of two surgical procedures to correct near-sightedness: Procedure A uses cuts from a scalpel and.
Chapter 6 Two-Way Tables BPS - 5th Ed.Chapter 61.
Chapter 3: Descriptive Study of Bivariate Data. Univariate Data: data involving a single variable. Multivariate Data: data involving more than one variable.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In prior chapters we studied the relationship between two quantitative variables with.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
Section 4.4 Contingency Tables and Association. Definitions Contingency Table (Two-Way Table): Relates two categories of data Row Variable: Each row in.
CHAPTER 6: Two-Way Tables*
Producing Data: Experiments BPS - 5th Ed. Chapter 9 1.
UNIT 6 CHAPTERS 23,24,25 By Courtney and Ayla. CHAPTER 23 In this chapter we discuss the true means of one scenario. We compare a given mean for a scenario.
Variable selection in Regression modelling Simon Thornley.
Apr. 28 Statistic for the day: Average number of zeros per in- class quiz in Stat (not counting students who late- dropped): 68 Assignment: Answer.
Review Design of experiments, histograms, average and standard deviation, normal approximation, measurement error, and probability.
Chapter 13! One Brick At A Time!.
Statistics 200 Lecture #9 Tuesday, September 20, 2016
2.7 The Question of Causation
Analyzing Categorical Data
Statistics in Clinical Trials: Key Concepts
Cautions About Correlation and Regression
Two-Way Tables and The Chi-Square Test*
AP Statistics Chapter 3 Part 3
Confounding and Effect Modification
Chapter 2 Looking at Data— Relationships
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Displaying and Describing Categorical Data
Looking at Data - Relationships Data analysis for two-way tables
Chapter 2 Looking at Data— Relationships
Data Analysis for Two-Way Tables
Mean, Median, Mode The Mean is the simple average of the data values. Most appropriate for symmetric data. The Median is the middle value. It’s best.
Chapter 17 Measurement Key Concept: If you want to estimate the demand curve, you need to find cases where the supply curve shifts.
Chapter 12 Power Analysis.
4.2 Relationships between Categorical Variables and Simpson’s Paradox
Chapter 4: More on Two-Variable Data
Concepts to be included
Presentation transcript:

Professor B. Jones University of California, Davis

 Pitfalls and Paradoxes  The Concept of a “Lurker”

 Bottom of the ninth, down by 1 run  Two Outs  Runners on second and third  …and the pitcher is up  You have only two players left  …and this is the National League.  What will you do?

 Player 1: 280 hits from 1200 at bats.  Player 2: 110 hits from 500 at bats.  Their “batting average”  Player 1: 110/500=.220  Player 2: 280/1200=.233  Who would you choose?  On batting average, Player 2 > Player 1

 Both players are switch-hitters (they can bat from the left or right side of the plate)  We’ll go “money ball” and play the best match-up  The data: Player 1Player 2 SideFrom RightFrom LeftFrom RightFrom Left At Bats Hits

 What happened?  Not accounting for switch hitting, Player 2 is preferred to Player 1  When accounting for switch hitting, Player 1 is preferred to Player 2  Worse! From either side of the plate, we would conclude Player 1 is better than Player 2 even though Player 2’s overall batting average is higher!

 University Admission Statistics  1000 women apply, 1000 men apply  Admission Rate:  Women: 510/1000=51 percent  Men: 800/1000=80 percent  Conclusion?  Evidence of gender bias?  This was basis of U.C. Berkeley gender bias case in the 1970s Source:

 Two colleges students apply to, College A and College B.  The Admissions Data:  Findings?  Admission Rate for each college is higher for women than men.  Overall admission rate is higher for men. FemaleMale CollegeAppliedAcceptedRateAppliedAcceptedRate A % % B20 100% % Total % %

 Two preceding examples illustrate Simpson’s Paradox  Named for E.H. Simpson (based on 1951 paper)  Phenomenon has been known since at least 1899 (and Yule 1903 published a paper on it).  Why a paradox?  The result is counterintuitive.

 The Paradox:  A “reversal result”  The relationship between two variables found within sub-groups differ in direction when the sub- groups are combined  Batting Averages on Left/Right Side vs. Overall  Gender admissions by college vs. Overall Gender Admission Rate  Consider admissions data again.

 Our example  The “model”: Admission Rate=f(Gender)  Gender Bias Hypothesis: Admission rates of women will be lower than men.  Y=Admission Rate; X=Gender  Data seem consistent with the hypothesis.  The Problem:  There is a third variable; what is it?  College to which students applied (A vs. B)  Z=College

 The Problem is Simple  (A) There is a strong association between Y and Z  One college (B) is easier to “get into” than the other college (A)  (B) There is a strong association between X and Z  Women tend to apply to the harder college (A) at higher rates; men tend to apply to the easier college (B) at higher rates.  Therefore, because of (A) and (B), there is a strong connection between Y and X  This connection, however, is spurious.

 The Nature of the Problem Gender (X) Admission Rate (Y) College (Z)

 Beware the Lurker Variable  Lurker Variable:  A lurking variable (confounding factor or variable, or simply a confound or confounder) is a "hidden" variable in a statistical or research model that affects the variables in question but is not known or acknowledged, and thus (potentially) distorts the resulting data. This hidden third variable causes the two measured variables to falsely appear to be in a causal relation. Such a relation between two observed variables is termed a spurious relationship. (Source: relationshiphttp://en.wikipedia.org/wiki/Confounder  The Problem: Z is a confounder. If we had accounted for Z, we would have arrived at different conclusions.

 Berkeley, 1973  Gender bias not found when accounting for departmental admission rates  Interestingly, it was found that women tended to apply to more difficult graduate programs than men.  Across departments, graduate admission rates were higher for women.  Not accounting for departmental differences, gender bias appeared

 Combining sub-groups (aggregation) can lead to serious inferential problems  ESPECIALLY if the presence of lurking variables are not accounted for  Large samples with lots of subgroups can lead to these kinds of problems  Simpson’s Paradox is a real concern  …but often not recognized.