ISP 121 Statistics That Deceive. Simpson’s Paradox It’s a well accepted rule of thumb that the larger the data set, the better Simpson’s Paradox demonstrates.

Slides:



Advertisements
Similar presentations
Conditional Probability
Advertisements

THE DISTRIBUTION OF SAMPLE MEANS How samples can tell us about populations.
Statistics That Deceive.  It is well accepted knowledge that the larger the data set, the better the results  Simpson’s Paradox demonstrates that a.
When Intuition Differs from Relative Frequency
Standard Normal Table Area Under the Curve
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
Processing physical evidence discovering, recognizing and examining it; collecting, recording and identifying it; packaging, conveying and storing it;
Tobias, 2005Econometrics 472 Down’s Syndrome Example.
Evaluation.
3.3 Toward Statistical Inference. What is statistical inference? Statistical inference is using a fact about a sample to estimate the truth about the.
Using Statistics in Research Psych 231: Research Methods in Psychology.
LSP 121 Statistics That Deceive. Simpson’s Paradox It is well accepted knowledge that the larger the data set, the better the results Simpson’s Paradox.
Suppose we are interested in the digits in people’s phone numbers. There is some population mean (μ) and standard deviation (σ) Now suppose we take a sample.
Evaluation.
Introduction to Probability and Risk
Sampling Distributions
Baye’s Rule and Medical Screening Tests. Baye’s Rule Baye’s Rule is used in medicine and epidemiology to calculate the probability that an individual.
1 The Sample Mean rule Recall we learned a variable could have a normal distribution? This was useful because then we could say approximately.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Section 3E How Numbers Deceive: Polygraphs, Mammograms and More
Probability, Bayes’ Theorem and the Monty Hall Problem
Introduction to Hypothesis Testing
Introduction to Probability and Risk.  Theoretical, or a priori probability – based on a model in which all outcomes are equally likely. Probability.
5.3A Conditional Probability, General Multiplication Rule and Tree Diagrams AP Statistics.
HYPOTHESIS TESTING: A FORM OF STATISTICAL INFERENCE Mrs. Watkins AP Statistics Chapters 23,20,21.
Determining Sample Size
Copyright © 2011 Pearson Education, Inc. Numbers in the Real World.
Probability Distributions What proportion of a group of kittens lie in any selected part of a pile of kittens?
Lecture 4: Assessing Diagnostic and Screening Tests
Academic Viva POWER and ERROR T R Wilson. Impact Factor Measure reflecting the average number of citations to recent articles published in that journal.
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
Section 3E How numbers deceive Pages Simpson’s Paradox 3-E Since Shaq has the better shooting percentages in both the first half and second half.
PARAMETRIC STATISTICAL INFERENCE
Jeopardy Chi-Squared Confidence Intervals Hypothesis Testing Vocabulary Formulas Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $200 Q $300 Q $400 Q $500.
Chapter 9 Power. Decisions A null hypothesis significance test tells us the probability of obtaining our results when the null hypothesis is true p(Results|H.
Stats 95.
Chapter 6 Lecture 3 Sections: 6.4 – 6.5.
DISCRETE PROBABILITY DISTRIBUTIONS
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Hypothesis Testing. The 2 nd type of formal statistical inference Our goal is to assess the evidence provided by data from a sample about some claim concerning.
Distributions of the Sample Mean
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Statistics That Deceive. Simpson’s Paradox  It is a widely accepted rule that the larger the data set, the better  Simpson’s Paradox demonstrates that.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In this chapter we will study the relationship between two categorical variables (variables.
Stat1510: Statistical Thinking and Concepts Two Way Tables.
Lecture: Forensic Evidence and Probability Characteristics of evidence Class characteristics Individual characteristics  features that place the item.
 I am going to calculate what the imaginary student, Suzie Que, must do to make the grade she seeks in this course.  To see what you must do, just follow.
Positive Predictive Value and Negative Predictive Value
Nonparametric Tests of Significance Statistics for Political Science Levin and Fox Chapter Nine Part One.
Hypothesis Testing and the T Test. First: Lets Remember Z Scores So: you received a 75 on a test. How did you do? If I said the mean was 72 what do you.
MM207 Statistics Welcome to the Unit 6 Seminar Wednesday, March 7, to 9 PM ET.
Copyright © 2009 Pearson Education, Inc. 8.1 Sampling Distributions LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the.
Copyright (c) Bani Mallick1 STAT 651 Lecture 8. Copyright (c) Bani Mallick2 Topics in Lecture #8 Sign test for paired comparisons Wilcoxon signed rank.
Copyright © 2015, 2011, 2008 Pearson Education, Inc. Chapter 3, Unit E, Slide 1 Numbers in the Real World 3.
Chapter 7: Sampling Distributions Section 7.2 Sample Proportions.
MATH Section 4.4.
Populations and Samples Hypothesis Testing Example.
This represents the most probable value of the measured variable. The more readings you take, the more accurate result you will get.
HL2 Math - Santowski Lesson 93 – Bayes’ Theorem. Bayes’ Theorem  Main theorem: Suppose we know We would like to use this information to find if possible.
Accuracy and Precision Understanding measurements.
Copyright © 2009 Pearson Education, Inc. 4.4 Statistical Paradoxes LEARNING GOAL Investigate a few common paradoxes that arise in statistics, such as how.
Jeopardy Vocabulary Formulas Q $100 Q $100 Q $100 Q $100 Q $100 Q $200
Medical Diagnosis Problem
MATH 2311 Section 4.4.
Lecture: Forensic Evidence and Probability Characteristics of evidence
Section Way Tables and Marginal Distributions
Standard Normal Table Area Under the Curve
Evidence Based Diagnosis
MATH 2311 Section 4.4.
Standard Normal Table Area Under the Curve
Presentation transcript:

ISP 121 Statistics That Deceive

Simpson’s Paradox It’s a well accepted rule of thumb that the larger the data set, the better Simpson’s Paradox demonstrates that a great deal of care has to be taken when combining smaller data sets into a larger one Sometimes the conclusions from the larger data set are opposite the conclusion from the smaller data sets

Example: Simpson’s Paradox Average college physics grades for students in an engineering program: HS PhysicsNone Number of Students505 Average Grade8070 Average college physics grades for students in a liberal arts program: HS PhysicsNone Number of Students550 Average Grade9585 It appears that in both classes, taking high school physics improves your college physics grade by 10.

Example continued In order to get better results, let’s combine our datasets. In particular, let’s combine all the students that took high school physics. More precisely, combine the students in the engineering program that took high school physics with those students in the liberal arts program that took high school physics. Likewise, combine the students in the engineering program that did not take high school physics with those students in the liberal arts program that did not take high school physics. But be careful! You can’t just take the average of the two averages, because each dataset has a different number of values.

Example continued Average college physics grades for students who took high school physics: # StudentsGradesGrade Pts Engineering Lib Arts Total Average (4000/4475* /4475*95) 81.4 Average college physics grades for students who did not take high school physics: # StudentsGradesGrade Pts Engineering Lib Arts Total Average (350/4600* /4600*85) 83.6 Did the students that did not have high school physics actually do better?

The Problem Two problems with combining the data –There was a larger percentage of one type of student in each table –The engineering students had a more rigorous physics class than the liberal arts students, thus there is a hidden variable So be very careful when you combine data into a larger set

IT 121 Statistics That Deceive, Part Two

Tumors and Cancer Most people associate tumors with cancers, but not all tumors are cancerous Tumors caused by cancer are malignant Non-cancerous tumors are benign

Mammograms Suppose your patient has a breast tumor. Is it cancerous? Probably not Studies have shown that only about 1 in 100 breast tumors turn out to be malignant Nonetheless, you order a mammogram Suppose the mammogram comes back positive. Does the patient have cancer?

Accuracy Earlier mammogram screening was 85% accurate 85% would lead you to think that if you tested positive, there is a pretty good chance that you have cancer. But this is not true.

Actual Results Consider a study in which mammograms are given to 10,000 women with breast tumors Assume that 1% of the tumors are malignant (100 women actually have cancer, 9900 have benign tumors)

Actual Results Mammogram screening correctly identifies 85% of the 100 malignant tumors as malignant These are called true positives The other 15% had negative results even though they actually have cancer These are called false negatives

Actual Results Mammogram screening correctly identifies 85% of the 9900 benign tumors as benign Thus it gives negative (benign) results for 85% of 9900, or 8415 These are called true negatives The other 15% of the 9900 (1485) get positive results in which the mammogram incorrectly suggest their tumors are malignant. These are called false positives.

Note: Start with 10,000 samples. 1 in 100 are malignant, so that gives you 100 total malignant. 99 in 100 are benign, so that gives you 9900 total benign. Now we know that a mammogram is 85% accurate, so 85% of 100 is 85 True Positives. Likewise, 85% of 9900 gives you 8415 True Negatives.

Results Overall, the mammogram screening gives positive results to 85 women who actually have cancer and to 1485 women who do not have cancer The total number of positive results is 1570 Because only 85 of these are true positives, that is 85/1570

Results Thus, the chance that a positive result really means cancer is only 5.4% Therefore, when your patient’s mammogram comes back positive, you should reassure her that there’s still only a small chance that she has cancer

Another Question Suppose you are a doctor seeing a patient with a breast tumor. Her mammogram comes back negative. Based on the numbers above, what is the chance that she has cancer?

Answer 15/8430, or , or slightly less than 2 in 1000