Presentation is loading. Please wait.

Presentation is loading. Please wait.

LSP 121 Statistics That Deceive. Simpson’s Paradox It is well accepted knowledge that the larger the data set, the better the results Simpson’s Paradox.

Similar presentations


Presentation on theme: "LSP 121 Statistics That Deceive. Simpson’s Paradox It is well accepted knowledge that the larger the data set, the better the results Simpson’s Paradox."— Presentation transcript:

1 LSP 121 Statistics That Deceive

2 Simpson’s Paradox It is well accepted knowledge that the larger the data set, the better the results Simpson’s Paradox demonstrates that a great deal of care has to be taken when combining smaller data sets into a larger one Sometimes the conclusions from the larger data set are opposite the conclusion from the smaller data sets

3 Example: Simpson’s Paradox First HalfSecond HalfTotal Season Player A.400.250.264 Player B.350.200.336 Baseball batting statistics for two players: How could Player A beat Player B for both halves individually, but then have a lower total season batting average?

4 Example Continued First HalfSecond HalfTotal Season Player A4/10 (.400)25/100 (.250)29/110 (.264) Player B35/100 (.350)2/10 (.200)37/110 (.336) We weren’t told how many at bats each player had: Player A’s dismal second half and Player B’s great first half had higher weights than the other two values.

5 Another Example Average college physics grades for students in an engineering program: taken HS physicsno HS physics Number of Students505 Average Grade8070 Average college physics grades for students in a liberal arts program: taken HS physicsno HS physics Number of Students550 Average Grade9585 It appears that in both classes, taking high school physics improves your college physics grade by 10.

6 Example continued In order to get better results, let’s combine our datasets. In particular, let’s combine all the students that took high school physics. More precisely, combine the students in the engineering program that took high school physics with those students in the liberal arts program that took high school physics. Likewise, combine the students in the engineering program that did not take high school physics with those students in the liberal arts program that did not take high school physics. But be careful! You can’t just take the average of the two averages, because each dataset has a different number of values!!

7 Example continued Average college physics grades for students who took high school physics: # StudentsAvgGradesWeighted Grade Engineering508050/55*80=72.7 Lib Arts5955/55*95=8.6 Total55 Average (72.7 + 8.6) 81.3 Average college physics grades for students who did not take high school physics: # StudentsAvgGradesWeighted Grade Engineering5705/55*70=6.4 Lib Arts508550/55*85=77.3 Total55 Average (6.4 + 77.3) 83.7 Did the students that did not have high school physics actually do better?

8 The Problem Two problems with combining the data – There was a larger percentage of one type of student in each table – The engineering students had a more rigorous physics class than the liberal arts students, thus there is a hidden variable So be very careful when you combine data into a larger set


Download ppt "LSP 121 Statistics That Deceive. Simpson’s Paradox It is well accepted knowledge that the larger the data set, the better the results Simpson’s Paradox."

Similar presentations


Ads by Google