Welcome to MM207 Unit 8 Seminar Dr. R. Correlation, Regression and Excel!
Correlation: the relationship between two variables, x and y. Scatter Plot –The x variable is on the horizontal axis –The y variable is on the vertical axis –The scatter plot is location for each x,y pair. Types of Relationships –Positive: both x and y move in the same direction –Negative: x and y move in opposite directions –Zero: no pattern of movement in x and y 2
Scatter Plot for Example 3 on Page 498 3
Calculating the Correlation Coefficient The correlation coefficient is a numerical measurement that assesses the strength and direction of the relationship between paired data. -1 ≤ r ≤ 1, the correlation coefficient is always in this interval r is the Pearson Product Moment correlation coefficient But I will use Excel, as usual! 4
Using Excel for the Correlation 5 Compare this to the results shown on page 501
Statistical Significance of the Correlation Coefficient Table 11, page A28, gives the critical values for the correlation coefficient to be significant. If the value calculated is greater than the value in Table 11 the correlation is significant at either α = 0.05 or α = In our example there are 25 pairs of data and the critical values are [α = 0.05 ] and [α = 0.01] and our value is Our correlation is statistically significant at the 0.01 level. [page 503] Statistical significance means that the correlation is not 0 and that is all it means. 6
Linear Regression and Prediction Assume that there is a tennis club that will let anyone play tennis but there is a flat fee for using the facility and then a per-hour fee for using the tennis courts. Now, this club tells us that the fee to use the club is $10.00 and the per-hour cost is $5.00. Can you calculate the cost of playing 2 hours of tennis? Can you estimate the cost to play 10 hours of tennis? 7
Linear Regression and Prediction Now to do find out how much it will cost to play tennis all you did was take $10 plus $5 times the number of hours, right? I might write this as Cost = rate (hours) +base cost; which looks a lot like: –Yhat = mx + b [page 514] The formula to get m and b look like this: –m = (n ∑xy - (∑x)(∑y) ) / (n ∑x 2 – (∑x) 2 ) –b = ybar – m * xbar But, I will be using Excel. 8
Using Scatter Plot for the Regression Line 9 Right click on a dot Select “Format Trendline Type is Linear Click Display Equation on chart Click Display R-squared on chart
Using Scatter Plot for the Regression Line 10
Using Regression for the Regression Line 11
Using Regression for the Regression Line 12
Using the Regression Line to make a Prediction If Old Faithful erupts for 3.32 minutes how long do we predict before Old Faithful erupts again? From either procedure the equation is: Y = (x) Y = (3.32) Y = Y = minutes We predict it will be minutes until Old Faithful erupts after a 3.32 eruption duration. 13
Regression by Calculator Example 1, Page 514. Suppose we want to find a regression line for advertising expenses versus Company sales. We have the following data. 14 Expenses Sales
Regression Analysis x.yx*yx^2y^ , , , , , , , ,225 Σx = 15.8Σy = 1634Σx*y=3289.8Σx^2=32.44Σy^2=337,558 15
Regression m=(n*Σxy-(Σx)*(Σy))/(n*Σx^2-(Σx)^2) m=(8*(3289.8)-(15.8)*(1634))/(8*(32.44)-(15.8)^2) m= b = (Σy – mΣx)/n b = (1634 – ( )*(15.8))/8 = Thus, y = ( )*x
Final Graph with Equation 17
18 (Thanks to Freakonomics by Steven D. Levitt H AVE A G REAT W EEK, E VERYONE !!