Presentation is loading. Please wait.

Presentation is loading. Please wait.

©2006 Thomson/South-Western 1 Chapter 13 – Correlation and Simple Regression Slides prepared by Jeff Heyl Lincoln University ©2006 Thomson/South-Western.

Similar presentations


Presentation on theme: "©2006 Thomson/South-Western 1 Chapter 13 – Correlation and Simple Regression Slides prepared by Jeff Heyl Lincoln University ©2006 Thomson/South-Western."— Presentation transcript:

1 ©2006 Thomson/South-Western 1 Chapter 13 – Correlation and Simple Regression Slides prepared by Jeff Heyl Lincoln University ©2006 Thomson/South-Western Concise Managerial Statistics Concise Managerial Statistics KVANLI PAVUR KEELING KVANLI PAVUR KEELING

2 ©2006 Thomson/South-Western 2 Bivariate Data Figure 13.1 35 – 30 – 25 – 20 – 15 – 10 – 5 – Square footage (hundreds) |20|30|40|50|60|70|80YX Income (thousands) (a) 35 – 30 – 25 – 20 – 15 – 10 – 5 – Square footage (hundreds) |20|30|40|50|60|70|80YX Income (thousands) (b)

3 ©2006 Thomson/South-Western 3 Coefficient of Correlation The strength of the linear relationship between two variables is called the coefficient of correlation, r. r =r =r =r = ∑(x - x)(y - y) ∑(x - x) 2 ∑(y - y) 2 ∑(x - x) 2 ∑(y - y) 2= ∑xy - (∑x)(∑y) / n ∑x 2 - (∑x) 2 / n ∑y 2 - (∑y) 2 / n ∑x 2 - (∑x) 2 / n ∑y 2 - (∑y) 2 / n

4 ©2006 Thomson/South-Western 4 Coefficient of Correlation Properties 1.r ranges from -1.0 to 1.0 2.The larger |r | is, the stronger the linear relationship 3.The sign of r tells you whether the relationship between X and Y is a positive (direct) or a negative (inverse) relationship 4.r = 1 or -1 implies that a perfect linear pattern exists between the two variables, that they are perfectly correlated

5 ©2006 Thomson/South-Western 5 Sum of Squares SS X = sum of squares for X = ∑(x - x) 2 = ∑x 2 - (∑x) 2 n SS Y = sum of squares for Y = ∑(y - y) 2 = ∑y 2 - (∑y) 2 n SCP XY = sum of cross products for XY = ∑(x - x)(y - y) = ∑xy - (∑x) (∑y) n

6 ©2006 Thomson/South-Western 6 Sum of Squares SS X = sum of squares for X = ∑(x - x) 2 = ∑x 2 - (∑x) 2 n SS Y = sum of squares for Y = ∑(y - y) 2 = ∑y 2 - (∑y) 2 n SCP XY = sum of cross products for XY = ∑(x - x)(y - y) = ∑xy - (∑x) (∑y) n r =r =r =r = SCP XY SS X SS Y

7 ©2006 Thomson/South-Western 7 Scatter Diagram and Correlation Coefficient Figure 13.2

8 ©2006 Thomson/South-Western 8 Vertical Distances d1d1d1d1 d2d2d2d2 d3d3d3d3 d4d4d4d4 d5d5d5d5 d6d6d6d6 d7d7d7d7 d8d8d8d8 d9d9d9d9 d 10 Line L Figure 13.3 |20|30|40|50|60|70|80 XY Square footage Income

9 ©2006 Thomson/South-Western 9 Least Squares Line The least squares line is the line through the data that minimizes the sum of the differences between the observations and the line ∑d 2 = d 1 2 + d 2 2 + d 3 2 + … + d n 2 b 1 = b 0 = y - b 1 x SCP XY SS X

10 ©2006 Thomson/South-Western 10 Least Squares Line Figure 13.6 d1d1d1d1 d2d2d2d2 Y = b 0 + b 1 X ^ Y for X = 50 ^ YX 50Income Square footage Distance is Y − Y ^

11 ©2006 Thomson/South-Western 11 Sum of Squares of Error SSE = SS Y - (SCP XY ) 2 SS X SSE = ∑d 2 = ∑(y - y) 2 ^

12 ©2006 Thomson/South-Western 12 Least Squares Line for Real Estate Data Figure 13.5 YX 50Income Squarefootage Y = 4.915 +.3539X ^ Y = 20 Y = 22.67 ^

13 ©2006 Thomson/South-Western 13 Assumptions for the Simple Regression Model 1. The mean of each error component is zero Y =  0 +  1 X + e 2. Each error component (random variable) follows an approximate normal distribution 3. The variance of the error component is the same for each value of X 4. The errors are independent of each other

14 ©2006 Thomson/South-Western 14 Assumption 1 for the Simple Regression Model YX Income Square footage Figure 13.6 Y =  0 +  1 X Y =  0 +  1 X + e µ y150 µ y135 e 35 50 0000

15 ©2006 Thomson/South-Western 15 Violation of Assumption 3 Figure 13.7 YX Income Square footage Y =  0 +  1 X e 35 50 e 60

16 ©2006 Thomson/South-Western 16 Assumptions 1, 2, 3 for the Simple Regression Model Figure 13.8 YX Income Square footage e 35 5060 0 0 0ee eeee eeee eeee

17 ©2006 Thomson/South-Western 17 Estimating the Error Variance,  e 2 s 2 =  e 2 = estimate of  e 2 = SSE n - 2 ^ where (SCP XY ) 2 SS X SSE = ∑(y - y) 2 = SS Y - ^

18 ©2006 Thomson/South-Western 18 Three Possible Populations  1 < 0 (c) XY  1 > 0 (b) XY  1 = 0 (a) XY Figure 13.9

19 ©2006 Thomson/South-Western 19 Hypothesis Test on the Slope of the Regression Line H o :  1 = 0 (X provides no information) H a :  1 ≠ 0 (X does provide information) Two-Tailed Test Test Statistic: reject H o if |t| > t  /2,n-2 t = = b 1 –  1 s/ SS x b 1 –  1 s b 1

20 ©2006 Thomson/South-Western 20 Hypothesis Test on the Slope of the Regression Line Test Statistic: t =t =t =t = b1b1sbsbb1b1sbsb 1 H o :  1 ≤ 0 H a :  1 > 0 One-Tailed Test H o :  1 ≥ 0 H a :  1 < 0 reject H o if t > t  /2,n-2 reject H o if t < -t  /2,n-2

21 ©2006 Thomson/South-Western 21 t Curve with 8 df Figure 13.10 1.860 Rejection region Rejection region tt

22 ©2006 Thomson/South-Western 22 Real Estate Example Figure 13.11

23 ©2006 Thomson/South-Western 23 Real Estate Example Figure 13.12

24 ©2006 Thomson/South-Western 24 Real Estate Example Figure 13.13

25 ©2006 Thomson/South-Western 25 Real Estate Example Figure 13.14

26 ©2006 Thomson/South-Western 26 Scatter Diagram 30 – 20 – 10 – |12|24|36|48|60 Age Liquid assets (% of annual income) Y X Y = -.814 +.3526X ^ Figure 13.15

27 ©2006 Thomson/South-Western 27 Scatter Diagram Figure 13.15 SS X =1268.67x = 43.667 SS Y =348.92y = 14.583 SCP XY =447.33 r = =.672 SCP XY SS X SS Y

28 ©2006 Thomson/South-Western 28 Confidence Interval for  1 The (1 -  ) 100% confidence interval for  1 is b 1 - t  /2,n-2 s b to b 1 + t  /2,n-2 s b 11

29 ©2006 Thomson/South-Western 29 Curvilinear Relationship YYXX Figure 13.16

30 ©2006 Thomson/South-Western 30 Measuring the Strength of the Model r =r =r =r = SCP XY SS X SS Y SS X SS Y r 1 - r 2 1 - r 2 n - 2 n - 2 t =t =t =t = H o : p = 0(no linear relationship exists between X and Y) H a : p ≠ 0(a linear relationship does exist)

31 ©2006 Thomson/South-Western 31 Danger of Assuming Causality A high statistical correlation does not imply causality A high statistical correlation does not imply causality There are many situations when variables are highly correlated because a factor not being studied affects the variables being studied There are many situations when variables are highly correlated because a factor not being studied affects the variables being studied

32 ©2006 Thomson/South-Western 32 Coefficient of Determination SSE = SS Y - (SCP XY ) 2 SS X r2 =r2 =r2 =r2 = (SCP XY ) 2 SS X SS Y r 2 =coefficient of determination =1 - =percentage of explained variation in the dependent variable using the simple linear regression model SSE SS Y

33 ©2006 Thomson/South-Western 33 Total Variation, SS Y Figure 13.17 YX (x, y) y - y (x, y) y - y y Y = b 0 + b 1 X Sample point ^ ^ ^ ^

34 ©2006 Thomson/South-Western 34 Total Variation, SS Y Figure 13.17 YX (x, y) y - y (x, y) y - y y Y = b 0 + b 1 X Sample point ^ ^ ^ ^ SS Y = SSR + SSE SSR = (SCP XY ) 2 SS X

35 ©2006 Thomson/South-Western 35 Estimation and Prediction Using the Simple Linear Model The least squares line can be used to estimate average values or predict individual values

36 ©2006 Thomson/South-Western 36 Confidence Interval for µ Y|x 0 (1-  ) 100% Confidence Interval for  Y|x 0 Y - t  /2,n-2 s + ^ (x 0 - x) 2 SS X 1n to Y + t  /2,n-2 s + (x 0 - x) 2 SS X 1n ^ s Y = s + (x 0 - x) 2 SS X 1n ^

37 ©2006 Thomson/South-Western 37 Confidence and Prediction Intervals Figure 13.18

38 ©2006 Thomson/South-Western 38 Confidence and Prediction Intervals Figure 13.19

39 ©2006 Thomson/South-Western 39 Confidence and Prediction Intervals Figure 13.20

40 ©2006 Thomson/South-Western 40 95% Confidence Intervals x = 49.8 20.27 12.33 Upper confidence limits Lower confidence limits Y = 4.975 +.3539X ^ Figure 13.21 35 – 30 – 25 – 20 – 15 – 10 – 5 – |20|30|40|50|60|70 X

41 ©2006 Thomson/South-Western 41 Prediction Interval for Y X 0 Y - t  /2,n-2 s 1 + + ^ (x 0 - x) 2 SS X 1n to Y + t  /2,n-2 s 1 + + (x 0 - x) 2 SS X 1n ^ s Y 2 = s 2 1 + + (x 0 - x) 2 SS X 1n ^

42 ©2006 Thomson/South-Western 42 95% Confidence Intervals Figure 13.22 x = 49.8 24.43 Prediction interval limits 8.17 20.27 Confidence interval limits 12.33 35 – 30 – 25 – 20 – 15 – 10 – 5 – |20|30|40|50|60|70 X

43 ©2006 Thomson/South-Western 43 Checking Model Assumptions 1.The errors are normally distributed with a mean of zero 2.The variance of the errors remains constant. For example, you should not observe larger errors associated with larger values of X. 3.The errors are independent

44 ©2006 Thomson/South-Western 44 Examination of Residuals X (a) Y - Y ^ ^X (b) Figure 13.23

45 ©2006 Thomson/South-Western 45 Examination of Residuals Figure 13.24 Time Y - Y ^0 1994 – 1995 – 1993 – 1997 – 1999 – 1992 – 1996 – 1998 – 2000 – 2001 –

46 ©2006 Thomson/South-Western 46 Autocorrelation and the Durbin-Watson Statistic Range from 0 to 4 Range from 0 to 4 Ideal value is 2 Ideal value is 2 As DW decreases from 2, positive autocorrelation increases As DW decreases from 2, positive autocorrelation increases As DW increases from 2, negative autocorrelation increases As DW increases from 2, negative autocorrelation increases DW = ∑(e t - e t-1 ) 2 ∑e t 2 T t =2 T t =1

47 ©2006 Thomson/South-Western 47 Autocorrelation and the Durbin-Watson Statistic Figure 13.25

48 ©2006 Thomson/South-Western 48 Checking for Outliers Figure 13.26

49 ©2006 Thomson/South-Western 49 Identifying Outlying Values Outlying sample values can be found by calculating the sample leverage h i = + (x i - x) 2 SS X 1n SS X = ∑x 2 - (∑x) 2 /n A sample is considered an outlier if its leverage is greater than 4/n or 6/n

50 ©2006 Thomson/South-Western 50 Identifying Outlying Values The standard deviation of the predicted Y value is s y = s h i The confidence interval is Y - t  /2,n-2 s h i to Y + t  /2,n-2 s h i ^ ^ The prediction interval is Y - t  /2,n-2 s 1 + h i to Y + t  /2,n-2 s 1 + h i ^ ^

51 ©2006 Thomson/South-Western 51 Real Estate Example Real Estate Example Figure 13.27(a)

52 ©2006 Thomson/South-Western 52 Real Estate Example Real Estate Example Figure 13.27(b)

53 ©2006 Thomson/South-Western 53 Identifying Outlying Values Unusually large or small values of the dependent variable (Y) can generally be detected using the sample standardized residuals Estimated standard deviation of the ith residual s 1 - h i Standardized residual = Y i - Y i s 1 - h i ^ An observation is thought to have and outlying value of Y if its standardized residual > 2 or 2 or < -2

54 ©2006 Thomson/South-Western 54 Identifying Influential Observations You may conclude the ith observation is influential if the corresponding D i measure >.8 Cook’s distance measure D i = (standardized residual) 2 12 h i 1 - h i = (Y i – Y i ) 2 2s 2 h i (1 – h i ) 2 ^

55 ©2006 Thomson/South-Western 55 Leverages, Standardized Residuals, and Cook’s Distance Measures Figure 13.28

56 ©2006 Thomson/South-Western 56 Summary of Figures 13.26 and 13.28 Outlying inOutlying in Influential X ValueY ValueObservation Point(h i >.4)(|stand. res.| > 2)(D i >.8) ANoYesNo BNoNoNo CYesYesYes Table 13.1

57 ©2006 Thomson/South-Western 57 Engine Capacity and MPG Figure 13.29

58 ©2006 Thomson/South-Western 58 Engine Capacity and MPG Figure 13.30

59 ©2006 Thomson/South-Western 59 Engine Capacity and MPG Figure 13.31

60 ©2006 Thomson/South-Western 60 Engine Capacity and MPG Figure 13.32

61 ©2006 Thomson/South-Western 61 Engine Capacity and MPG Figure 13.33


Download ppt "©2006 Thomson/South-Western 1 Chapter 13 – Correlation and Simple Regression Slides prepared by Jeff Heyl Lincoln University ©2006 Thomson/South-Western."

Similar presentations


Ads by Google