Download presentation
Presentation is loading. Please wait.
1
Chapter 20 Linear and Multiple Regression
2
Empirical Models Study of relationship between two or more variables
Response variable: (dependent, output) Predictor or explanatory variables: (independent, input) Deterministic relationship: The outcomes can be predicted precisely (physics, chemistry, etc.) Regression Analysis: statistical tools used to model and explore relationships between variables
3
Regression Analysis Simple regression models: one explanatory variable
Linear Non-linear Multiple regression models: two or more explanatory variables
4
Simple Linear Regression Model
Population regression line y = response variable A = y-intercept (population parameter) B = slope (population parameter) x = explanatory variable = random error Missing or omitted variables Random variation Estimated regression equation ŷ = estimated value of y for a given x
5
Scatterplots and Least Squares Line
(residual): difference between the actual value y and the predicted value of y for population data e: error for the estimated equation Sum of Squared Errors (SSE)
6
Scatterplots and Least Squares Line
Least squares method finds a and b to minimize SSE a and b are called the least squares estimates of A and B Excel slope(y, x) intercept(y, x) forecast
7
Scatterplots and Least Squares Line –Example 20.1
y 63 16 0.0204 88 25 38 13 9.8776 70 19 8.1633 27 9 51 15 1.3061 3.9184 44 1.4898 Mean Std Dev 4.9809 Sum SSxx SSyy SSxy
8
Scatterplots and Least Squares Line –Example 20.1
Minitab Graph Scatterplot
9
Scatterplots and Least Squares Line –Example 20.1
Regression Analysis: y versus x The regression equation is y = x S = R-Sq = 91.5% R-Sq(adj) = 89.8% Analysis of Variance Minitab Stat Regression Predictor Coef SE Coef T P Constant 3.648 1.802 2.02 0.099 x 7.35 0.001 Source DF SS MS F P Regression 1 136.26 54.08 0.001 Residual Error 5 12.60 2.52 Total 6 148.86
10
Interpretations of a and b
Interpretation of a Intercept on y axis at x=0 Caution on extrapolation Interpretation of b Slope Change in y due to an increase of one unit in x Positive relationship when b>0 Negative relationship when b<0
11
Assumptions of the Regression Model
y = A + Bx + The random error has a mean equal to zero. y|x = A + Bx The errors associated with different observations are independent For any given x, the distribution of errors is normal The distribution of population errors for each x has the same standard deviation,
12
Standard Deviation of Random Errors
For the population, y = A + Bx + is the std. dev. of all Since is unknown, it is estimated by the std. dev. For the sample data, se
13
Standard Deviation of Errors – Example 20.2
Income Food Exp ŷ e e2 63 16 4.4542 88 25 1.1506 1.3238 38 13 0.6285 0.3950 70 19 0.5147 27 9 9.8464 0.7164 51 15 0.1266 44 2.2511 5.0675 0.0000
14
Standard Deviation of Errors – Example 20.2
Regression Analysis: y versus x The regression equation is y = x S = R-Sq = 91.5% R-Sq(adj) = 89.8% Analysis of Variance Predictor Coef SE Coef T P Constant 3.648 1.802 2.02 0.099 x 7.35 0.001 se Source DF SS MS F P Regression 1 136.26 54.08 0.001 Residual Error 5 12.60 2.52 Total 6 148.86 n-2 SSE MSE
15
Coefficient of Determination and Correlation
Measures how well does the explanatory variable explain the response variable in the regression model Total sum of squares (SST) Regression sum of squares (SSR) SST = SSR + SSE Coefficient of Determination (2 for population data) 0 R2 1
16
Coefficient of Determination -- Example 20.3
Income Food Exp ŷ (y-ŷ)2=e2 (y-ÿ)2 (ŷ-ÿ)2 63 16 4.4542 0.0204 3.8716 88 25 1.3238 38 13 0.3950 9.8776 70 19 0.5147 8.1633 27 9 9.8464 0.7164 51 15 0.1266 1.3061 0.6195 44 5.0675 5.7311 ÿ=
17
Coefficient of Determination – Example 20.3
Regression Analysis: y versus x The regression equation is y = x S = R-Sq = 91.5% R-Sq(adj) = 89.8% Analysis of Variance Predictor Coef SE Coef T P Constant 3.648 1.802 2.02 0.099 x 7.35 0.001 R2 Source DF SS MS F P Regression 1 136.26 54.08 0.001 Residual Error 5 12.60 2.52 Total 6 148.86 SSR SST
18
Correlation -1 1 and -1 r 1
Pearson product-moment correlation coefficient Measures the strength of the linear association between two variables Correlation coefficient: for population data, r for sample data -1 1 and -1 r 1
19
Correlation – Example 20.4 x y 63 16 73.4694 0.0204 -1.2245 88 25
38 13 9.8776 70 19 8.1633 27 9 51 15 1.3061 3.9184 44 1.4898 Mean Std Dev 4.9809 Sum SSxx SSyy SSxy
20
Multiple Regression Model
Population regression line y = response variable A = constant term (population parameter) Bs = regression coefficients of x’s (population parameter) x’s = explanatory variables = random error Missing or omitted variables Random variation Estimated regression equation ŷ = estimated value of y for a given x’s
21
Least Squares Line (residual): difference between the actual value y and the predicted value of y for population data e: error for the estimated equation Sum of Squared Errors (SSE) Regression equation is obtained to minimize SSE
22
Assumptions of the Multiple Regression Model
The random error has a mean equal to zero. The errors associated with different observations are independent The distribution of errors is normal The distribution of population errors for each x has the same standard deviation, The explanatory variables are not linearly related. There exists a 0 correlation between the random error and each explanatory variable xi
23
Standard Deviation of Random Errors
For the population, is the std. dev. of all Since is unknown, it is estimated by the std. dev. For the sample data, se
24
Coefficient of Multiple Determination
Total sum of squares (SST) Sum of squared errors (SSE) Regression sum of squares (SSR) SST = SSR + SSE Coefficient of Multiple Determination 0 R2 1
25
Adjusted Coefficient of Multiple Determination and Correlation
26
Coefficient of Determination – Example 20.3
Regression Analysis: y versus x The regression equation is y = x S = R-Sq = 91.5% R-Sq(adj) = 89.8% Analysis of Variance Predictor Coef SE Coef T P Constant 3.648 1.802 2.02 0.099 x 7.35 0.001 Source DF SS MS F P Regression 1 136.26 54.08 0.001 Residual Error 5 12.60 2.52 Total 6 148.86
27
Multiple Regression Model - Example
Period Promotion Demand 1 10 37 2 12 40 3 41 4 5 45 6 14 50 7 43 8 47 9 56 15 52 11 55 16 54
28
Multiple Regression Analysis - Example
Regression Analysis: Demand versus Period, Promotion The regression equation is Demand = Period Promotion Predictor Coef SE Coef T P Constant Period Promotion S = R-Sq = 80.5% R-Sq(adj) = 76.2%
29
Multiple Regression Analysis - Example
Analysis of Variance Source DF SS MS F P Regression Residual Error Total Source DF Seq SS Period Promotion Unusual Observations Obs Period Demand Fit SE Fit Residual St Resid R R denotes an observation with a large standardized residual.
30
Multiple Regression Analysis
Test of overall significance on the set of regression coefficients, B1, B2, … Bk Test on an individual regression coefficient, Bi Develop a confidence interval for an individual regression coefficient, Bi
31
Test of Overall Significance of Multiple Regression Model
Null Hypothesis: H0: B1 = B2 = … = Bk = 0 Alt. Hypothesis: H1: At least one of the Bi 0 Test statistic: Degrees of Freedom = k, n-k-1 Alt. Hypothesis P-value Rejection Criterion H1 P(F>F0) F0 > F,k,n-k-1
32
Test of Overall Significance of Simple Regression Model – Example
Regression Analysis: y versus x The regression equation is y = x S = R-Sq = 91.5% R-Sq(adj) = 89.8% Analysis of Variance Predictor Coef SE Coef T P Constant 3.648 1.802 2.02 0.099 x 7.35 0.001 Source DF SS MS F P Regression 1 136.26 54.08 0.001 Residual Error 5 12.60 2.52 Total 6 148.86
33
Sampling Distribution of b
34
Test of Overall Significance of Simple Regression Model – Example
Regression Analysis: y versus x The regression equation is y = x S = R-Sq = 91.5% R-Sq(adj) = 89.8% Analysis of Variance Minitab Stat Regression Predictor Coef SE Coef T P Constant 3.648 1.802 2.02 0.099 x 7.35 0.001 b sb Source DF SS MS F P Regression 1 136.26 54.08 0.001 Residual Error 5 12.60 2.52 Total 6 148.86
35
Test on an Individual Regression Coefficient
Null Hypothesis: H0: Bi = Bi0 Test statistic: Degree of Freedom = n-k-1 Alt. Hypothesis P-value Rejection Criterion H1: Bi Bi0 2*P(t>|t0|) t0 > t/2,n-k-1 or t0 < -t/2, n-k-1 H1: Bi > Bi0 P(t>t0) t0 > t, n-k-1 H1: Bi < Bi0 P(t<-t0) t0 <- t, n-k-1
36
Test of Overall Significance of Simple Regression Model – Example
Regression Analysis: y versus x The regression equation is y = x S = R-Sq = 91.5% R-Sq(adj) = 89.8% Analysis of Variance Minitab Stat Regression Predictor Coef SE Coef T P Constant 3.648 1.802 2.02 0.099 x 7.35 0.001 t0 Source DF SS MS F P Regression 1 136.26 54.08 0.001 Residual Error 5 12.60 2.52 Total 6 148.86
37
Develop a Confidence Interval for an Individual Regression Coefficient
38
Scatterplots and Least Squares Line –Example 20.5
Exp. Premium 5 92 2 127 12 73 0.5625 9 104 5.0625 15 65 6 82 25 62 16 87 0.2500 2.3750 Mean 11.25 86.5 Std Dev 7.4017 SSxx SSyy SSxy Sum
39
Standard Deviation of Errors – Example 20.5
Exp. Premium ŷ e e2 5 92 2 127 12 73 9 104 15 65 6 82 25 62 5.9759 16 87 0.0000
40
Coefficient of Determination -- Example 20.5
Exp. Premium ŷ (y-ŷ)2=e2 (y-ÿ)2 (ŷ-ÿ)2 5 92 30.25 2 127 12 73 182.25 2.7633 9 104 306.25 15 65 462.25 6 82 20.25 25 62 600.25 16 87 0.25 ÿ= 86.5
41
Correlation – Example 20.5 Mean Std Dev SSyy SSxy Exp. Premium 5 92
2 127 12 73 0.5625 9 104 5.0625 15 65 6 82 25 62 16 87 0.2500 2.3750 Mean 11.25 86.5 Std Dev 7.4017 SSxx SSyy SSxy
42
Test on an Individual Regression Coefficient – Example 20.5
Null Hypothesis: H0: Bi = 0 Test statistic: Degree of Freedom = n-2 = 6 H1: Bi < 0, = .05 Critical Value: - t.05, 6= p-value = .0139 Reject H0
43
Confidence Interval for an Individual Regression Coefficient – Example 20.5
= 10%
44
Scatterplots and Least Squares Line –Example 20.5
45
Scatterplots and Least Squares Line –Example 20.5
Regression Analysis: y versus x The regression equation is Premium = Exp. S = R-Sq = 58.1% R-Sq(adj) = 51.1% Analysis of Variance Predictor Coef SE Coef T P Constant 111.43 10.15 10.98 0.000 Exp. 0.7682 -2.89 0.028 Source DF SS MS F P Regression 1 1884.0 8.32 0.028 Residual Error 6 1358.0 226.3 Total 7 3242.0
46
Residual Analysis From Minitab Histogram of the Residuals
Normal Probability Plot of residuals Residuals Versus Fitted Values Residuals Versus Order of Data Residuals versus predictors
47
Cautions in Using Regression
Determining whether a Model is Good or Bad: R2 and correlation coefficient are not enough Watch for Outliers and Influential Observations Avoid Multicollinearity Extra precaution for Extrapolation
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.