Chapter 14 – Correlation and Simple Regression

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Chapter 12 Simple Linear Regression
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 10 Simple Regression.
Correlation and Simple Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
9. SIMPLE LINEAR REGESSION AND CORRELATION
Chapter 12 Simple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
1 Pertemuan 13 Uji Koefisien Korelasi dan Regresi Matakuliah: A0392 – Statistik Ekonomi Tahun: 2006.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
SIMPLE LINEAR REGRESSION
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
SIMPLE LINEAR REGRESSION
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Pertemua 19 Regresi Linier
Chapter 14 Introduction to Linear Regression and Correlation Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Lecture 5 Correlation and Regression
Correlation and Regression
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Regression Analysis (2)
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
©2006 Thomson/South-Western 1 Chapter 13 – Correlation and Simple Regression Slides prepared by Jeff Heyl Lincoln University ©2006 Thomson/South-Western.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Lecture 10: Correlation and Regression Model.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Chapter 13 Simple Linear Regression
Chapter 20 Linear and Multiple Regression
Regression and Correlation
Inference for Least Squares Lines
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Statistics for Managers using Microsoft Excel 3rd Edition
Linear Regression and Correlation Analysis
Simple Linear Regression
Chapter 11 Simple Regression
Statistics for Business and Economics (13e)
Chapter 13 Simple Linear Regression
Slides by JOHN LOUCKS St. Edward’s University.
Simple Linear Regression
Chapter 15 – Multiple Linear Regression
6-1 Introduction To Empirical Models
PENGOLAHAN DAN PENYAJIAN
SIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION
Introduction to Regression
St. Edward’s University
REGRESSION ANALYSIS 11/28/2019.
Chapter 13 Simple Linear Regression
Presentation transcript:

Chapter 14 – Correlation and Simple Regression Introduction to Business Statistics, 6e Kvanli, Pavur, Keeling Chapter 14 – Correlation and Simple Regression Slides prepared by Jeff Heyl, Lincoln University ©2003 South-Western/Thomson Learning™

Bivariate Data Figure 14.1 35 – 30 – 25 – 20 – 15 – 10 – 5 – Square footage (hundreds) | 20 30 40 50 60 70 80 Y X Income (thousands) A B Figure 14.1

Coefficient of Correlation The sample coefficient of correlation, r, measures the strength of the linear relationship that exists within a sample of n bivariate data r = ∑(x - x)(y - y) ∑(x - x)2 ∑(y - y)2 = ∑xy - (∑x)(∑y) / n ∑x2 - (∑x)2 / n ∑y2 - (∑y)2 / n

Coefficient of Correlation Properties r ranges from -1.0 to 1.0 The larger |r | is, the stronger the linear relationship r near zero indicates that there is no linear relationship. X and Y are uncorrelated r = 1 or -1 implies that a perfect linear pattern exists between the two variables Values of r = 0, 1, or -1 are rare in practice

Coefficient of Correlation Properties The sign of r tells you whether the relationship between X and Y is a positive (direct) or a negative (inverse) relationship The value of r tells you very little about the slope of the line. Except if the sign of r is positive the slope of the line is positive and if r is negative then the slope is negative

Various Values of r y x r = 0 A r = 1 B Figure 14.2

Various Values of r y x r = -1 C r = .9 D Figure 14.2

Various Values of r y x r = -.8 E r = .5 F Figure 14.2

Scatter Diagrams - Same r y x 40 – 30 – 20 – 10 – | 2 4 16 20 6 8 10 12 14 18 Figure 14.3

Scatter Diagram and Correlation Coefficient Figure 14.4

Covariance The sample covariance between two variables, cov(X,Y) is a measure of the joint variation of the two variables X and Y and is defined to be cov(X, Y) = ∑(x - x)(y - y) = SCPXY 1 n - 1 r = sample correlation between X and Y = cov(X, Y) sXsY

Least Squares Line The least squares line is the line through the data that minimizes the sum of the differences between the observations and the line ∑d2 = d12 + d22 + d32 + … + dn2 b1 = b0 = y - b1x SCPXY SSX

Vertical Distances Y d10 Line L d9 d7 Square footage d5 d8 d3 d6 d1 d4 | 20 30 40 50 60 70 80 X Y Square footage Income d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 Line L Figure 14.5

Least Squares Line Y Y = b0 + b1X Y for X = 50 Square footage d2 d1 50 X (income) 50 Income d1 d2 Y = b0 + b1X ^ Y for X = 50 Square footage Figure 14.6

Sum of Squares of Error SSE = ∑d2 = ∑(y - y)2 (SCPXY)2 SSX SSE = SSY - ^ SSE = SSY - (SCPXY)2 SSX

Least Squares Line for Real Estate Data Y X 50 Income Y = 4.915 + .3539X ^ Y = 20 Y = 22.67 Square footage Figure 14.7

Assumptions for the Simple Regression Model The mean of each error component is zero Each error component (random variable) follows an approximate normal distribution The variance of the error component is the same for each value of X The errors are independent of each other

Assumption 1 for the Simple Regression Model Y X Y = 0 + 1X Y = 0 + 1X + e µy150 µy135 e 35 50 Income Square footage 0 Figure 14.8

Violation of Assumption 3 Y X Y = 0 + 1X e 35 50 Income Square footage 60 Figure 14.9

Assumptions 1, 2, 3 for the Simple Regression Model Y X e 35 50 Income Square footage 60 e Figure 14.10

Estimating the Error Variance, e2 s2 = e2 = estimate of e2 = SSE n - 2 ^ where (SCPXY)2 SSX SSE = ∑(y - y)2 = SSY - ^

Three Possible Populations 1 < 0 1 = 0 1 > 0 A B C X Y Figure 14.11

Hypothesis Test on the Slope of the Regression Line Ho: 1 = 0 (X provides no information) Ha: 1 ≠ 0 (X does provide information) Two-Tailed Test Test Statistic: t = b1 sb 1 reject Ho if |t| > t/2,n-2

Hypothesis Test on the Slope of the Regression Line Ho: 1 ≤ 0 Ha: 1 > 0 One-Tailed Test Ho: 1 ≥ 0 Ha: 1 < 0 Test Statistic: t = b1 sb 1 reject Ho if t > t/2,n-2 reject Ho if t < -t/2,n-2

t Curve with 8 df 1.860 Rejection region t Figure 14.12

Real Estate Example Figure 14.13

Real Estate Example Figure 14.14

Real Estate Example Figure 14.15

Scatter Diagram Figure 14.16 30 – 20 – 10 – | 12 24 36 48 60 Age (% of annual income) Liquid assets Y X Y = -.814 + .3526X ^ Figure 14.16

Confidence Interval for 1 The (1 - ) • 100% confidence interval for 1 is b1 - t/2,n-2sb to b1 + t/2,n-2sb 1

Curvilinear Relationship Y X Figure 14.17

Measuring the Strength of the Model SCPXY SSX SSY Ho: p = 0 (no linear relationship exists between X and Y) Ha: p ≠ 0 (a linear relationship does exist) r 1 - r2 n - 2 t =

Danger of Assuming Causality A high statistical correlation does not imply causality There are many situations when variables are highly correlated because a factor not being studied affects the variables being studied

Coefficient of Determination SSE = SSY - (SCPXY)2 SSX r2 = SSXSSY r2 = coefficient of determination = 1 - = percentage of explained variation in the dependent variable using the simple linear regression model SSE SSY

Total Variation, SSY Y Sample point Y = b0 + b1X (x, y) y - y y X ^ Figure 14.18

Total Variation, SSY SSY = SSR + SSE (SCPXY)2 SSX SSR = Y Sample point y - y y Y = b0 + b1X Sample point ^ SSY = SSR + SSE SSR = (SCPXY)2 SSX Figure 14.18

Estimation and Prediction Using the Simple Linear Model The least squares line can be used to estimate average values or predict individual values

Confidence Interval for µY|x (1- ) 100% Confidence Interval for  Y|x Y - t/2,n-2s + ^ (x0 - x)2 SSX 1 n to Y + t/2,n-2s + sY = s + (x0 - x)2 SSX 1 n ^

Confidence Prediction Intervals Figure 14.19

95% Confidence Intervals 35 – 30 – 25 – 20 – 15 – 10 – 5 – | 20 30 40 50 60 70 x = 49.8 X 20.27 12.33 Upper confidence limits Lower confidence limits Y = 4.975 + .3539X ^ Figure 14.20

Prediction Interval for YX Y - t/2,n-2s 1 + + ^ (x0 - x)2 SSX 1 n to Y + t/2,n-2s 1 + + sY2 = s2 1 + + (x0 - x)2 SSX 1 n ^

95% Confidence Intervals 35 – 30 – 25 – 20 – 15 – 10 – 5 – | 20 30 40 50 60 70 x = 49.8 X 24.43 20.27 Prediction interval limits Confidence interval limits 12.33 8.17 Figure 14.21

Checking Model Assumptions The errors are normally distributed with a mean of zero The variance of the errors remains constant. For example, you should not observe larger errors associated with larger values of X. The errors are independent

Examination of Residuals Y - Y ^ B Figure 14.22

Examination of Residuals Time Y - Y ^ 1994 – 1995 – 1993 – 1997 – 1999 – 1992 – 1996 – 1998 – 2000 – 2001 – Figure 14.23

Checking for Outliers Figure 14.24

Identifying Outlying Values Outlying sample values can be found by calculating the sample leverage hi = + (xi - x)2 SSX 1 n SSX = ∑x2 - (∑x)2/n A sample is considered an outlier if its leverage is greater than 4/n or 6/n

Real Estate Example Figure 14.25(a)

Real Estate Example Figure 14.25(b)

Identifying Outlying Values Unusually large or small values of the dependent variable (Y) can generally be detected using the sample standardized residuals Estimated standard deviation of the ith residual s 1 - hi Standardized residual = Yi - Yi s 1 - hi ^ An observation is thought to have and outlying value of Y if its standardized residual > 2 or < -2

Identifying Influential Observations Cook’s distance measure Di = (standardized residual)2 1 2 hi (1 - hi)2 You may conclude the ith observation is influential if the corresponding Di measure > .8

Leverages, Standardized Residuals, and Cook’s Distance Measures Figure 14.26

Engine Capacity and MPG Figure 14.27

Engine Capacity and MPG Figure 14.28

Engine Capacity and MPG Figure 14.29

Engine Capacity and MPG Figure 14.30

Engine Capacity and MPG 16 – 14 – 12 – 10 – 8 – 6 – 4 – 2 – 0 – Frequency Histogram -8 and under -6.5 -6.5 and under -5 -5 and under -3.5 -3.5 and under -2 -2 and under -0.5 -0.5 and under 1 1 and under 2.5 2.5 and under 4 4 and under 5.5 Class Limits Figure 14.31