Simple Linear Regression and Correlation

Slides:



Advertisements
Similar presentations
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
EPI 809/Spring Probability Distribution of Random Error.
Ch11 Curve Fitting Dr. Deshi Ye
Simple Linear Regression
Chapter 12 Simple Linear Regression
Introduction to Regression Analysis
Chapter 12 Simple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Statistics for Business and Economics
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
1 Pertemuan 13 Uji Koefisien Korelasi dan Regresi Matakuliah: A0392 – Statistik Ekonomi Tahun: 2006.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
SIMPLE LINEAR REGRESSION
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building.
Linear Regression Example Data
SIMPLE LINEAR REGRESSION
Ch. 14: The Multiple Regression Model building
Korelasi dalam Regresi Linear Sederhana Pertemuan 03 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Chapter 7 Forecasting with Simple Regression
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
© 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 10 Simple Linear Regression.
Correlation and Linear Regression
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Chapter 5: Regression Analysis Part 1: Simple Linear Regression.
Lecture 10: Correlation and Regression Model.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
Conceptual Foundations © 2008 Pearson Education Australia Lecture slides for this course are based on teaching materials provided/referred by: (1) Statistics.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
1 Linear Regression Model. 2 Types of Regression Models.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
© 2011 Pearson Education, Inc Statistics for Business and Economics Chapter 10 Simple Linear Regression.
Chapter 13 Simple Linear Regression
Statistics for Business and Economics
Statistics for Managers using Microsoft Excel 3rd Edition
Chapter 11: Simple Linear Regression
PENGOLAHAN DAN PENYAJIAN
SIMPLE LINEAR REGRESSION
Statistics for Business and Economics
Introduction to Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Simple Linear Regression and Correlation TOPIC 12 Simple Linear Regression and Correlation

Models Representation of some phenomenon Mathematical model is a mathematical expression of some phenomenon Often describe relationships between variables Types Deterministic models Probabilistic models .

Deterministic Models Hypothesize exact relationships Suitable when prediction error is negligible Example: force is exactly mass times acceleration F = m·a

Probabilistic Models Hypothesize two components Deterministic Random error Example: sales volume (y) is 10 times advertising spending (x) + random error y = 10x +  Random error may be due to factors other than advertising

Regression Models Probabilistic Models Regression Models Correlation Models 7

Regression Models Answers ‘What is the relationship between the variables?’ Equation used One numerical dependent (response) variable What is to be predicted One or more numerical or categorical independent (explanatory) variables Used mainly for prediction and estimation

Regression Modeling Steps Hypothesize deterministic component Estimate unknown model parameters Specify probability distribution of random error term Estimate standard deviation of error Evaluate model Use model for prediction and estimation

Types of Regression Models Simple 1 Explanatory Variable Regression Models 2+ Explanatory Variables Multiple Linear Non- This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 27

Population y-intercept Linear Regression Model Relationship between variables is a linear function Population y-intercept Population Slope Random Error y     x   1 Dependent (Response) Variable Independent (Explanatory) Variable

 $  $  $  $  $  $ Population & Sample Regression Model Random Sample  $ Unknown Relationship  $  $  $  $  $ 31

y i = Random error x Population Linear Regression Model Observed value i = Random error (line of means) x Observed value 35

y x Sample Linear Regression Model i = Random error ^ Unsampled observation x Observed value 36

Scatter Diagram Plot of all (xi, yi) pairs Suggests how well model will fit 20 40 60 x y

Thinking Challenge How would you draw a line through the points? How do you determine which line ‘fits best’? 20 40 60 x y 42

Least Squares Method ‘Best fit’ means difference between actual y values and predicted y values are a minimum But positive differences off-set negative Least Squares minimizes the Sum of the Squared Differences (SSE) 49

Least Squares Graphically ^ e ^ 4 2 ^ e e ^ 1 3 x 52

Coefficient Equations Prediction Equation Slope y-intercept 53

Least Squares Example You’re a marketing analyst for Hasbro Toys. You gather the following data: Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 Find the least squares line relating sales and advertising.

xi yi xi yi xiyi Parameter Estimation Solution Table 15 10 55 26 37 1 4 1 2 3 2 9 4 6 4 2 16 4 8 5 4 25 16 20 15 10 55 26 37 58

Parameter Estimation Solution 59

Parameter Estimation Computer Output Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 -0.1000 0.6350 -0.157 0.8849 ADVERT 1 0.7000 0.1914 3.656 0.0354 ^ 0 ^ 1

Regression Line Fitted to the Data Sales 4 3 2 1 1 2 3 4 5 Advertising 57

Least Squares Thinking Challenge You’re an economist for the county cooperative. You gather the following data: Fertilizer (lb.) Yield (lb.) 4 3.0 6 5.5 10 6.5 12 9.0 Find the least squares line Relating crop yield and fertilizer. 62

xi yi xi yi xiyi Parameter Estimation Solution Table 32 24.0 296 3.0 16 9.00 12 6 5.5 36 30.25 33 10 6.5 100 42.25 65 12 9.0 144 81.00 108 32 24.0 296 162.50 218 66

Parameter Estimation Solution 67

Regression Line Fitted to the Data* Yield (lb.) 10 8 6 4 2 5 10 15 Fertilizer (lb.) 65

Linear Regression Assumptions Mean of probability distribution of error, ε, is 0 Probability distribution of error has constant variance Probability distribution of error, ε, is normal Errors are independent

Error Probability Distribution E(y) = β0 + β1x x x1 x2 x3 91

Random Error Variation ^ Variation of actual y from predicted y, y Measured by standard error of regression model Sample standard deviation of  : s ^ Affects several factors Parameter significance Prediction accuracy

y x xi Variation Measures yi y Unexplained sum of squares Total sum of squares Explained sum of squares y x xi 78

Estimation of σ2

Calculating SSE, s2, s Example You’re a marketing analyst for Hasbro Toys. You gather the following data: Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 Find SSE, s2, and s.

Calculating SSE Solution xi yi 1 1 .6 .4 .16 2 1 1.3 -.3 .09 3 2 2 4 2 2.7 -.7 .49 5 4 3.4 .6 .36 SSE=1.1

Calculating s2 and s Solution Estimation of σ2 Calculating s2 and s Solution

Test of Slope Coefficient Shows if there is a linear relationship between x and y Involves population slope 1 Hypotheses H0: 1 = 0 (No Linear Relationship) Ha: 1  0 (Linear Relationship) Theoretical basis is sampling distribution of slope

Sampling Distribution of Sample Slopes y Sample 1 Line All Possible Sample Slopes Sample 1: 2.5 Sample 2: 1.6 Sample 3: 1.8 Sample 4: 2.1 : : Very large number of sample slopes Sample 2 Line Population Line x b 1 Sampling Distribution 1 S ^ 105

Slope Coefficient Test Statistic 106

Test of Slope Coefficient Example You’re a marketing analyst for Hasbro Toys. You find β0 = –.1, β1 = .7 and s = .6055. Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 Is the relationship significant at the .05 level of significance? ^ ^

Solution Table xi yi xi yi xiyi 1 1 1 1 1 2 1 4 1 2 3 2 9 4 6 4 2 16 4 8 5 4 25 16 20 15 10 55 26 37 108

Test Statistic Solution

Test of Slope Coefficient Solution H0: 1 = 0 Ha: 1  0   .05 df  5 - 2 = 3 Critical Value(s): Test Statistic: Decision: Conclusion: t 3.182 -3.182 .025 Reject H0 Reject at  = .05 There is evidence of a relationship 109

Test of Slope Coefficient Computer Output Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 -0.1000 0.6350 -0.157 0.8849 ADVERT 1 0.7000 0.1914 3.656 0.0354 ‘Standard Error’ is the estimated standard deviation of the sampling distribution, sbP. ^ ^ S ^ t = 1 / S 1 ^ 1 1 P-Value

Correlation Model Map Probabilistic Models Regression Models Correlation Models 130

Correlation Models Answers ‘How strong is the linear relationship between two variables?’ Coefficient of correlation Sample correlation coefficient denoted r Values range from –1 to +1 Measures degree of association Does not indicate cause–effect relationship

Coefficient of Correlation where

Perfect Negative Correlation Perfect Positive Correlation Coefficient of Correlation Values Perfect Negative Correlation Perfect Positive Correlation No Linear Correlation –1.0 –.5 +.5 +1.0 Increasing degree of negative correlation Increasing degree of positive correlation 134

Coefficient of Correlation Example You’re a marketing analyst for Hasbro Toys. Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 Calculate the coefficient of correlation. 83

Solution Table xi yi xi yi xiyi 15 10 55 26 37 1 1 1 1 1 2 1 4 1 2 3 2 9 4 6 4 2 16 4 8 5 4 25 16 20 15 10 55 26 37 108

Coefficient of Correlation Solution

Coefficient of Correlation Thinking Challenge You’re an economist for the county cooperative. You gather the following data: Fertilizer (lb.) Yield (lb.) 4 3.0 6 5.5 10 6.5 12 9.0 Find the coefficient of correlation. 62

xi yi xi yi xiyi Solution Table 4 3.0 16 9.00 12 6 5.5 36 30.25 33 10 6.5 100 42.25 65 12 9.0 144 81.00 108 32 24.0 296 162.50 218 66

Coefficient of Correlation Solution

Coefficient of Determination Proportion of variation ‘explained’ by relationship between x and y 0  r2  1 r2 = (coefficient of correlation)2 79

Coefficient of Determination Example You’re a marketing analyst for Hasbro Toys. You know r = .904. Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 Calculate and interpret the coefficient of determination. 83

Coefficient of Determination Solution r2 = (coefficient of correlation)2 r2 = (.904)2 r2 = .817 Interpretation: About 81.7% of the sample variation in Sales (y) can be explained by using Ad $ (x) to predict Sales (y) in the linear model. 83

r2 Computer Output Root MSE 0.60553 R-square 0.8167 Dep Mean 2.00000 Adj R-sq 0.7556 C.V. 30.27650 r2 r2 adjusted for number of explanatory variables & sample size

Prediction With Regression Models Types of predictions Point estimates Interval estimates What is predicted Population mean response E(y) for given x Point on population regression line Individual response (yi) for given x

y ^ y ^ ^ ^ x xP What is Predicted? yi = b0 + b1x Mean y, E(y) Individual yi = b0 + b1x ^ Mean y, E(y) E(y) = b0 + b1x ^ Prediction, y x xP 115

Confidence Interval Estimate for Mean Value of y at x = xp df = n – 2

Factors Affecting Interval Width Level of confidence (1 – ) Width increases as confidence increases Data dispersion (s) Width increases as variation increases Sample size Width decreases as sample size increases Distance of xp from mean Width increases as distance increases

y y x x1 x x2 Why Distance from Mean Sample 1 Line Sample 2 Line The closer to the mean, the less variability. This is due to the variability in estimated slope parameters. Sample 1 Line Greater dispersion than x1 y Sample 2 Line x x1 x x2 118

Confidence Interval Estimate Example You’re a marketing analyst for Hasbro Toys. You find β0 = -.1, β 1 = .7 and s = .6055. Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 Find a 95% confidence interval for the mean sales when advertising is $4. ^ ^

Solution Table 2 2 x y x y x y i i i i i i 1 1 1 1 1 2 1 4 1 2 3 2 9 4 6 4 2 16 4 8 5 4 25 16 20 15 10 55 26 37 120

Confidence Interval Estimate Solution x to be predicted 121

Prediction Interval of Individual Value of y at x = xp Note the 1 under the radical in the standard error formula. The effect of the extra Syx is to increase the width of the interval. This will be seen in the interval bands. Note! df = n – 2 122

e y ^ x xp Why the Extra ‘S’ ? yi = b0 + b1xi E(y) = b0 + b1x ^ y we're trying to predict The error in predicting some future value of Y is the sum of 2 errors: 1. the error of estimating the mean Y, E(Y|X) 2. the random error that is a component of the value of Y to be predicted. Even if we knew the population regression line exactly, we would still make  error. e Expected (Mean) y E(y) = b0 + b1x ^ Prediction, y x xp 123

Prediction Interval Example You’re a marketing analyst for Hasbro Toys. You find β0 = -.1, β 1 = .7 and s = .6055. Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 Predict the sales when advertising is $4. Use a 95% prediction interval. ^ ^

Solution Table 2 2 x y x y x y i i i i i i 1 1 1 1 1 2 1 4 1 2 3 2 9 4 6 4 2 16 4 8 5 4 25 16 20 15 10 55 26 37 120

Prediction Interval Solution x to be predicted 121

Interval Estimate Computer Output Dep Var Pred Std Err Low95% Upp95% Low95% Upp95% Obs SALES Value Predict Mean Mean Predict Predict 1 1.000 0.600 0.469 -0.892 2.092 -1.837 3.037 2 1.000 1.300 0.332 0.244 2.355 -0.897 3.497 3 2.000 2.000 0.271 1.138 2.861 -0.111 4.111 4 2.000 2.700 0.332 1.644 3.755 0.502 4.897 5 4.000 3.400 0.469 1.907 4.892 0.962 5.837 Predicted y when x = 4 Confidence Interval Prediction Interval SY ^

y ^ x x Confidence Intervals vs. Prediction Intervals yi = b0 + b1xi Note: 1. As we move farther from the mean, the bands get wider. 2. The prediction interval bands are wider. Why? (extra Syx) ^ yi = b0 + b1xi x x 124

Note: 1. As we move farther from the mean, the bands get wider. 2. The prediction interval bands are wider. Why? (extra Syx) Any Questions ? 124