Statistics That Deceive. Simpson’s Paradox  It is a widely accepted rule that the larger the data set, the better  Simpson’s Paradox demonstrates that.

Slides:



Advertisements
Similar presentations
Displaying & Describing Categorical Data Chapter 3.
Advertisements

A Sampling Distribution
Statistics That Deceive.  It is well accepted knowledge that the larger the data set, the better the results  Simpson’s Paradox demonstrates that a.
Standard Normal Table Area Under the Curve
SPSS Session 1: Levels of Measurement and Frequency Distributions
Using Statistics in Research Psych 231: Research Methods in Psychology.
LSP 121 Statistics That Deceive. Simpson’s Paradox It is well accepted knowledge that the larger the data set, the better the results Simpson’s Paradox.
ISP 121 Statistics That Deceive. Simpson’s Paradox It’s a well accepted rule of thumb that the larger the data set, the better Simpson’s Paradox demonstrates.
The C value paradox The C value:
Econ 3790: Business and Economics Statistics
Solve for y when x = 1, 2, 3 and 4. 1.) y = x ) y = 5x 4 3.) y = 3x Solve for y when x is -2, -1, 0, 1. Patterns and Functions Day 2.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
1 Hedonic pricing supposes that a good or service has a number of characteristics that individually give it value to the purchaser. The market price of.
Using the Empirical Rule. Normal Distributions These are special density curves. They have the same overall shape  Symmetric  Single-Peaked  Bell-Shaped.
A Sampling Distribution
Statistics 101 Chapter 10. Section 10-1 We want to infer from the sample data some conclusion about a wider population that the sample represents. Inferential.
Kenyon Early Decision Plans Why should you apply Early Decision? Candidates who feel strongly that Kenyon is their first choice for college should apply.
Do Now Have you: Read Harry Potter and the Deathly Hallows Seen Harry Potter and the Deathly Hallows (part 2)
4.3 Categorical Data Relationships.
DISCRETE PROBABILITY DISTRIBUTIONS
AS MATHS (High Grades) for 2007 and AS Maths % %
Using the Empirical Rule. Normal Distributions These are special density curves. They have the same overall shape  Symmetric  Single-Peaked  Bell-Shaped.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
CHAPTER 6: Two-Way Tables. Chapter 6 Concepts 2  Two-Way Tables  Row and Column Variables  Marginal Distributions  Conditional Distributions  Simpson’s.
Professor B. Jones University of California, Davis.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In this chapter we will study the relationship between two categorical variables (variables.
Stat1510: Statistical Thinking and Concepts Two Way Tables.
Two-Way Tables Categorical Data. Chapter 4 1.  In this chapter we will study the relationship between two categorical variables (variables whose values.
Part III – Gathering Data
Test of Independence Lecture 43 Section 14.5 Mon, Apr 23, 2007.
Chapter 6 Two-Way Tables BPS - 5th Ed.Chapter 61.
1 Testing of Hypothesis Two Sample test Dr. T. T. Kachwala.
Chapter 3: Descriptive Study of Bivariate Data. Univariate Data: data involving a single variable. Multivariate Data: data involving more than one variable.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In prior chapters we studied the relationship between two quantitative variables with.
Chapter 13 Sampling distributions
Categorical Data! Frequency Table –Records the totals (counts or percentage of observations) for each category. If percentages are shown, it is a relative.
CHAPTER 6: Two-Way Tables*
7.2 Means & Variances of Random Variables AP Statistics.
4.3 Reading Quiz (second half) 1. In a two way table when looking at education given a person is 55+ we refer to it as ____________ distribution. 2. True.
Mean, Median, Mode & Range Outlier An outlier is a data item that is much higher or much lower than items in a data set. 1, 2, 5, 27, 3, 4.
Displaying and Describing Categorical Data
CHAPTER 6: SAMPLING, SAMPLING DISTRIBUTIONS, AND ESTIMATION Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
Statistics 202: Statistical Aspects of Data Mining
Statistics 200 Lecture #9 Tuesday, September 20, 2016
5.5 The Trapezoid Rule.
Welcome to Stat 200.
KEEPER 5: Final Grade Average Slugging Average SAS 6 (1 – 5)
AP Statistics Chapter 3 Part 3
Displaying and Describing Categorical Data
Looking at Data - Relationships Data analysis for two-way tables
Using the Empirical Rule
Sample vs Population comparing mean and standard deviations
Warm-up We are going to collect some data and determine if it is “normal” Roll each pair of dice 10 times and record the SUM of the two digits in your.
Chapter 9 Hypothesis Testing.
Risk ratios 12/6/ : Risk Ratios 12/6/2018 Risk ratios StatPrimer.
Representation of Data
Starting out with Statistics
Lecture 38 Section 14.5 Mon, Dec 4, 2006
Objective Find slope by using the slope formula..
Normal Distribution Z-distribution.
Section 4-3 Relations in Categorical Data
Where it Pays to Attend College
4.2 Relationships between Categorical Variables and Simpson’s Paradox
Section Way Tables and Marginal Distributions
KEEPER 5: Final Grade Average Slugging Average SAS 6 (1 – 5)
Accuracy and Precision
Standard Normal Table Area Under the Curve
Standard Normal Table Area Under the Curve
Chapter Outline The Normal Curve Sample and Population Probability
Presentation transcript:

Statistics That Deceive

Simpson’s Paradox  It is a widely accepted rule that the larger the data set, the better  Simpson’s Paradox demonstrates that a great deal of care has to be taken when combining smaller data sets into a larger one  Sometimes the conclusions from the larger data set are opposite the conclusion from the smaller data sets

Example: Simpson’s Paradox First HalfSecond HalfTotal Season Carson Kennington Baseball batting statistics for two players: How could Carson beat Kennington for both halves individually, but then have a lower total season batting average?

Example Continued First HalfSecond HalfTotal Season Carson4/10 (.400)25/100 (.250)29/110 (.264) Kennington35/100 (.350) 2/10 (.200)37/110 (.336) We weren’t told how many at bats each player had: Carson’s dismal second half and Kennington’s great first half had higher weights than the other two values.

Another Example Average college physics grades for students in an engineering program: HS PhysicsNo HS Physics Number of Students505 Average Grade8070 Average college physics grades for students in a liberal arts program: HS PhysicsNo HS Physics Number of Students550 Average Grade9585 It appears that in both classes, taking high school physics improves your college physics grade by 10.

Example continued In order to get better results, let’s combine our datasets. In particular, let’s combine all the students that took high school physics. More precisely, combine the students in the engineering program that took high school physics with those students in the liberal arts program that took high school physics. Likewise, combine the students in the engineering program that did not take high school physics with those students in the liberal arts program that did not take high school physics. But be careful! You can’t just take the average of the two averages, because each dataset has a different number of values.

Example continued Average college physics grades for students who took high school physics: # StudentsGradesWeight Engineering508050/55*80=72.7 Lib Arts5955/55*95=8.6 Total55 Average ( ) 81.3 Average college physics grades for students who did not take high school physics: # StudentsGradesWeight Engineering5705/55*70=6.4 Lib Arts508550/55*85=77.3 Total55 Average ( ) 83.7 Did the students that did not have high school physics actually do better?

Example another way Average college physics grades for students who took high school physics: # StudentsGradesGrade Pts Engineering Lib Arts Total Average (4000/4475* /4475*95) 81.3 Average college physics grades for students who did not take high school physics: # StudentsGradesGrade Pts Engineering Lib Arts Total Average (350/4600* /4600*85) 83.7 Did the students that did not have high school physics actually do better?

The Problem  Two problems with combining the data  There was a larger percentage of one type of student in each table  The engineering students had a more rigorous physics class than the liberal arts students, thus there is a hidden variable  So be very careful when you combine data into a larger set

More …  There are many real examples of this type of situation which leads to an apparent contradiction  The deceptive results is based on this [remember this]: If you view the same data in 2 different ways or break it into 2 different parts, you CAN get different results!