Download presentation
Presentation is loading. Please wait.
Published byAngel Walsh Modified over 9 years ago
1
1.9 Comparing Two Data Sets
2
Revisiting Go For the Gold! 3a) Whose slope is larger? The women’s is growing faster (0.01875 m/year > 0.01353 m/year)
3
Revisiting Go For the Gold! 3b) Residuals? The men’s residuals seem slightly more scattered The women’s residuals have more of a pattern
4
Revisiting Go For the Gold! 3b) r values? The women have a strong positive linear correlation (r = 0.83) The men have a weaker but still strong positive linear correlation (r = 0.70)
5
Revisiting Go For the Gold! 3b) r 2 values? 69% of the change in the women’s data is due to yearly increases (r 2 = 0.69) The men have a much less reliable fit; only 49% of the men’s increase is due to yearly increases. (r 2 = 0.49)
6
Revisiting Go For the Gold! 3b) Predicted values? Both predicted values for 2012 are off by more than 1 m! Neither model is extremely reliable, but the women’s seem to be a generally more reliable, although the model is worse in the women’s case.
7
Revisiting Go For the Gold! 3c) y- intercepts? In year 0, the men and women will be jumping backwards! Not much meaning Limits to predictive value of this model
8
Revisiting Go For the Gold! 3d) When will they jump equal distances? Find the year when their distances are the same. y = 0.01353x – 18.43 y = 0.01875x – 30.3 0.01353x – 18.43 = 0.01875x – 30.3 - 0.01875x + 0.01353 = - 30.3 + 18.43 -0.00522x = -11.87 x = 2273.9 The men and women will jump equal distances by the year 2274. Don’t wait up.
9
4. Comparing the two sets of data For each metre the men increase, the women increases by 0.95 m. When the men jump 0 m, the women jump -1.1 m. Backwards? Or just that the women’s distances will generally be less than the men’s – seems reasonable Remember we are comparing like quantities The y-intercept lowers the line
10
4. Comparing the two sets of data r = 0.81 Strong positive linear correlation r 2 = 0.66 34% of the change in the women’s distances is due to random fluctuations Residuals: Scattered, so linear fit is a good model. We could probably use it to predict women’s distances based on the men’s (or vice versa). Knowing the men’s distance in 2012 is 8.31 m, the women’s distance should be 6.8 m (actual is 7.12).
11
5. Effect of Outliers
12
Be careful to only remove the data from the men’s or women’s side Should we remove it at all? –Typo or other human error? –Is the sample representative of the population? –Is this merely a bad regression model?
13
5. Effect of Outliers Note that removing the outlier increases the correlation for the men’s model but decreases it for the women!
14
5. Effect of Outliers Consider the residuals. The linear model is still not a good one for this data. A logarithmic model is probably better.
15
5. Effect of Outliers The point is a likely outlier for the men’s data, but probably not for the women’s data
16
5. The Effect of Outliers Removing the outliers doesn’t affect the men vs women model too much Predicted value for women’s distance in 2012 is the same: 6.8 m
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.