Download presentation
Presentation is loading. Please wait.
1
Eugenie Chung Covering Unit 9
M140 Tutorial 9 Eugenie Chung Covering Unit 9
2
Case study John Wilkinson is working as an analyst in a well known sports betting company. He is looking at some sports data.
3
Recap Unit 5: Scatter plots
The points will not lie exactly on a straight line. Give the overall impression Positive or negative relationship Linear or non-linear Strong or weak relationship Unusual points
4
Correlation coefficient
By only looking at the pattern in the scatter plot, we can have a rough idea of the relationship between two variables. However, to quantify the relationship, we need a statistic to summarize the relationship. The correlation coefficient r lies between -1 (perfect negative correlation) and +1 (perfect positive correlation).
6
Properties of correlation coefficient
8
Case study-sprint record 100m vs 200m
John has gathered the sprint record of medallists from Olympics and World Championship. He wants to see how the 100m record for sprinter is related to his/her 200m time. He plotted the data for both men and women medallist. We have slightly more data for women as there are more data available for sprinters who compete in both races.
9
Scatterplot of 100 vs 200m Positive relationship?
10
Scatterplot of 100 vs 200m No relationship??
11
Case study-sprint record 100m vs 200m
From the scatterplot, we can see that there is an apparent linear relationship between the 100m and 200 m records for men but not for women. To verify this, let’s calculate the correlation coefficient.
12
Step 1 Calculate: x = 9.58+…….+9.99 y = 19.19+…….+19.8
13
Step 1 Calculate: x = 107.82 y = 216.22 x2 = 1057 y2 = 4250.97
14
Step 2 𝑥− 𝑥 2 = 𝑥 2 − 𝑥 2 𝑛 =1057− = 𝑦− 𝑦 2 = 𝑦 2 − 𝑦 2 𝑛 = − = 𝑥− 𝑥 𝑦− 𝑦 = 𝑥𝑦− 𝑥 𝑦 𝑛 = − ∗ =
15
Step 3 Correlation= 𝑥− 𝑥 𝑦− 𝑦 𝑥− 𝑥 2 × 𝑦− 𝑦 = ∗ ≈0.861
16
Summary
18
Minitab prints the p-value for the individual hypothesis test of the correlation being
zero below the correlation. Since the p-value for the results of men is smaller than 0.05, there is sufficient evidence to suggest that at 5% significance that the correlation is not zero. For the case of results of women however, there is not enough evidence to suggest that the correlation is not zero.
19
Outliers and influential points
20
Outliers and influential points
The data of sprinters Maurice Greene and Justin Gatlin appears to be far from the other points-outlier. Taking out the data from Usain Bolt, the correlation coefficient becomes with high p-value which indicates not significant evidence to suggest the relationship is positive. So we say the data from this sprinter are influential.
21
Confidence intervals for z-test
In Unit 7, we learned about the z-test which is useful for testing the population mean against a target value. Using the mean of a sample, we can get a point estimate of the population mean μ.
22
Calculating the confidence interval
23
100 m World Championships record
John collected the 100 m results of medallist in the World Championships. He wants to calculate a 95% confidence interval for the mean time of the medallist for both men and women. For the male sprinters, we have the following. n = 42, 𝑥 = , s = ESE = s/√n = / √42 =
24
95% confidence interval n = 42, 𝑥 = , s = ESE = s/√n = / √42 = What is the 95% confidence interval? (𝑥 ̅−1.96 𝐸𝑆𝐸,𝑥 ̅+1.96 𝐸𝑆𝐸) = (9.911, )
30
Confidence interval for difference of two means
Since the two intervals do not overlap, this indicates there is a difference between the means between male and female medallists. ( 𝑥 ̅a-𝑥 ̅b −1.96 𝐸𝑆𝐸, 𝑥 ̅a-𝑥 ̅b 𝐸𝑆𝐸) 𝑥 ̅a-𝑥 ̅b = = ESE square = (Sm*Sm) /42 + (Sw*Sw)/42 = =
31
Confidence interval for difference of two means
32
Interval estimates from fitted lines
As discussed in Unit 5, we learned the regression equation as y = a + bx can be used to make predictions. The confidence interval for the mean response, provides an interval for the position of the regression line The prediction interval, provides an interval for the prediction of a new value.
33
Properties
34
Prediction intervals
35
Properties
36
100m vs 200m case study cont Looking back at the men sprinter data. We can fit a model so that we can predict the time of 200m for given time of 100m.
37
100m vs 200m case study cont For sprinters with average time of 9.8, what is the 95% confidence interval of the mean? How about the prediction interval?
39
100m vs 200m case study cont 95% CI ( , ) 95%PI ( , )
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.