Eugenie Chung Covering Unit 9

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Section 10-3 Regression.
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Linear Regression and Correlation
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
Descriptive Methods in Regression and Correlation
Linear Regression.
Inference for regression - Simple linear regression
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Relationship of two variables
Linear Regression and Correlation
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Aim: Review for Exam Tomorrow. Independent VS. Dependent Variable Response Variables (DV) measures an outcome of a study Explanatory Variables (IV) explains.
Correlation and Regression: The Need to Knows Correlation is a statistical technique: tells you if scores on variable X are related to scores on variable.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Section 1.6 Fitting Linear Functions to Data. Consider the set of points {(3,1), (4,3), (6,6), (8,12)} Plot these points on a graph –This is called a.
Purpose Data Collection Results Conclusion Sources We are evaluating to see if there is a significant linear correlation between the shoe size and height.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
GOAL: I CAN USE TECHNOLOGY TO COMPUTE AND INTERPRET THE CORRELATION COEFFICIENT OF A LINEAR FIT. (S-ID.8) Data Analysis Correlation Coefficient.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
Regression and Correlation
Inference for Least Squares Lines
Statistics 101 Chapter 3 Section 3.
Chapter 3: Describing Relationships
Correlation and Simple Linear Regression
Inference for Regression
Cautions about Correlation and Regression
Slides by JOHN LOUCKS St. Edward’s University.
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
Correlation and Simple Linear Regression
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
Lecture Slides Elementary Statistics Thirteenth Edition
The Practice of Statistics in the Life Sciences Fourth Edition
2. Find the equation of line of regression
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Correlation and Simple Linear Regression
The greatest blessing in life is
Chapter 3: Describing Relationships
Correlation and Regression
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Basic Practice of Statistics - 3rd Edition Inference for Regression
Simple Linear Regression and Correlation
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Topic 8 Correlation and Regression Analysis
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Algebra Review The equation of a straight line y = mx + b
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter Thirteen McGraw-Hill/Irwin
Honors Statistics Review Chapters 7 & 8
Solution to Problem 2.25 DS-203 Fall 2007.
Chapter 3: Describing Relationships
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Eugenie Chung Covering Unit 9 M140 Tutorial 9 Eugenie Chung Covering Unit 9

Case study John Wilkinson is working as an analyst in a well known sports betting company. He is looking at some sports data.

Recap Unit 5: Scatter plots The points will not lie exactly on a straight line. Give the overall impression Positive or negative relationship Linear or non-linear Strong or weak relationship Unusual points

Correlation coefficient By only looking at the pattern in the scatter plot, we can have a rough idea of the relationship between two variables. However, to quantify the relationship, we need a statistic to summarize the relationship. The correlation coefficient r lies between -1 (perfect negative correlation) and +1 (perfect positive correlation).

Properties of correlation coefficient

Case study-sprint record 100m vs 200m John has gathered the sprint record of medallists from Olympics and World Championship. He wants to see how the 100m record for sprinter is related to his/her 200m time. He plotted the data for both men and women medallist. We have slightly more data for women as there are more data available for sprinters who compete in both races.

Scatterplot of 100 vs 200m Positive relationship?

Scatterplot of 100 vs 200m No relationship??

Case study-sprint record 100m vs 200m From the scatterplot, we can see that there is an apparent linear relationship between the 100m and 200 m records for men but not for women. To verify this, let’s calculate the correlation coefficient.

Step 1 Calculate: x = 9.58+…….+9.99 y = 19.19+…….+19.8

Step 1 Calculate: x = 107.82  y = 216.22  x2 = 1057  y2 = 4250.97

Step 2 𝑥− 𝑥 2 = 𝑥 2 − 𝑥 2 𝑛 =1057− 107.82 2 11 = 0.171764 𝑦− 𝑦 2 = 𝑦 2 − 𝑦 2 𝑛 =4250.97− 216.22 2 11 =0.874855 𝑥− 𝑥 𝑦− 𝑦 = 𝑥𝑦− 𝑥 𝑦 𝑛 =2119.68− 107.82∗216.22 11 =0.333673

Step 3 Correlation= 𝑥− 𝑥 𝑦− 𝑦 𝑥− 𝑥 2 × 𝑦− 𝑦 2 = 0 .333673 0.171764∗0.874855 ≈0.861

Summary

Minitab prints the p-value for the individual hypothesis test of the correlation being zero below the correlation. Since the p-value for the results of men is smaller than 0.05, there is sufficient evidence to suggest that at 5% significance that the correlation is not zero. For the case of results of women however, there is not enough evidence to suggest that the correlation is not zero.

Outliers and influential points

Outliers and influential points The data of sprinters Maurice Greene and Justin Gatlin appears to be far from the other points-outlier. Taking out the data from Usain Bolt, the correlation coefficient becomes 0.473 with high p-value which indicates not significant evidence to suggest the relationship is positive. So we say the data from this sprinter are influential.

Confidence intervals for z-test In Unit 7, we learned about the z-test which is useful for testing the population mean against a target value. Using the mean of a sample, we can get a point estimate of the population mean μ.

Calculating the confidence interval

100 m World Championships record John collected the 100 m results of medallist in the World Championships. He wants to calculate a 95% confidence interval for the mean time of the medallist for both men and women. For the male sprinters, we have the following. n = 42, 𝑥 =9.9502 , s = 0.1297 ESE = s/√n = 0.1297/ √42 = 0.02001314449

95% confidence interval n = 42, 𝑥 = 9.9502, s = 0.1297 ESE = s/√n = 0.1297/ √42 = 0.02001314449 What is the 95% confidence interval? (𝑥 ̅−1.96 𝐸𝑆𝐸,𝑥 ̅+1.96 𝐸𝑆𝐸) = (9.911, 9.9894)

Confidence interval for difference of two means Since the two intervals do not overlap, this indicates there is a difference between the means between male and female medallists. ( 𝑥 ̅a-𝑥 ̅b −1.96 𝐸𝑆𝐸, 𝑥 ̅a-𝑥 ̅b +1.96 𝐸𝑆𝐸) 𝑥 ̅a-𝑥 ̅b = 10.9448-9.9502=0.9946 ESE square = (Sm*Sm) /42 + (Sw*Sw)/42 =0.0004008+0.0008269 =0.0012277

Confidence interval for difference of two means

Interval estimates from fitted lines As discussed in Unit 5, we learned the regression equation as y = a + bx can be used to make predictions. The confidence interval for the mean response, provides an interval for the position of the regression line The prediction interval, provides an interval for the prediction of a new value.

Properties

Prediction intervals

Properties

100m vs 200m case study cont Looking back at the men sprinter data. We can fit a model so that we can predict the time of 200m for given time of 100m.

100m vs 200m case study cont For sprinters with average time of 9.8, what is the 95% confidence interval of the mean? How about the prediction interval?

100m vs 200m case study cont 95% CI (19.5446, 19.7611) 95%PI (19.2779, 20.0278)