Download presentation
Presentation is loading. Please wait.
Published byEmory Antony Hampton Modified over 9 years ago
1
Review Session Linear Regression
2
Correlation Pearson’s r –Measures the strength and type of a relationship between the x and y variables –Ranges from -1 to +1 ValueRelationshipMeaning +1Perfect direct relationship between x and y. As x gets bigger, so does y. 0No relationship exists between x and y. Perfect Inverse relationship between x and y. As x bigger, y gets smaller.
3
Correlation printout in Minitab Top number is the correlation Bottom number is the p-value
4
Simple Linear Regression yresponse x1x1 predictor B0B0 constant (y-intercept) B1B1 Coefficient for x 1 (slope) eerror y=b 0 + b 1 x 1 + e
5
Simple Linear Regression Making A Point Prediction y = b 0 + b 1 x 1 + e GPA = 1.47 + 0.00323(GMAT) For a person with a GMAT Score of 400, what is the expected 1 st year GPA? GPA = 1.47 + 0.00323(GMAT) GPA = 1.47 + 0.00323(400) GPA = 1.47 + 1.292 GPA = 2.76
6
Simple Linear Regression y = b 0 + b 1 x 1 + e GPA = 1.47 + 0.00323(GMAT) What’s the 95% CI for the GPA of a person with a GMAT score of 400? GPA = 2.76 SE = 0.26 2.76 +/- 2(0.26) 95% CI = (2.24, 3.28)
7
Coefficient CI’s and Testing y = b 0 + b 1 x 1 + e GPA = 1.47 + 0.00323(GMAT) b 0 = 1.47 +/- 2(0.22) = 1.47 +/- 0.44 = (1.03, 1.91) Find the 95% CI for the coefficients. b 1 = 0.0032 +/- 2(0.0004) = 0.0032 +/- 0.0008 = 0.0026, 0.0040
8
Coefficient Testing y = b 0 + b 1 x 1 + e GPA = 1.47 + 0.00323(GMAT) H 0 : b = 0 H 1 : b <> 0 The p-value for each coefficient is the result of a hypothesis test If p-value <= 0.05, reject H 0 and accept the coefficient.
9
R2R2 r 2 and R 2 Square of Pearson's r Little r 2 is for simple regression Big R 2 is used for multiple regression R 2 interpretation 0No correlation 1Perfect correlation
10
Sample R 2 values R 2 = 0.80 R 2 = 0.60 R 2 = 0.30 R 2 = 0.20
11
Regression ANOVA H 0 : b 1 = b 2 = …. = b k = 0 H a : at least one b <> 0 F-statistic, df 1, df 2 p-value If p <= 0.05, at least one of the b’s is not zero If p > 0.05, it’s possible that all of the b’s are zero
12
Diagnostics - Residuals Residuals = errors Residuals should be normally distributed Residuals should have a constant variance –Heteroscedasticity: pattern in the residual distribution Autocorrelation: error magnitude increases or decreases with the magnitude of an independent variable Heteroscedasticity and autocorrelation indicate problems with the model –Homoscedasticity: no pattern in the residual distribution Use the 4-in-one plot for these diagnostics
13
Adding a Power Transformation Each “bump” or “U” shape in a scatter plot indicates that an additional power may be involved. –0 bumps: x –1 bump: x 2 –2 bumps: x 3 Standard equation is y = b 0 + b 1 x + b 2 x 2 Don’t forget: Check to see if b 1 and b 2 are statistically significant, and that the model is also statistically significant.
14
Categorical Variables Occasionally it is necessary to add a categorical variable to a regression model. Suppose that we have a car dealership, and we want to model the sale price based on the time on the lot and the sales person (Tom, Dick, or Harry). –The time on the lot is a linear variable. –Salesperson is a categorical variable.
15
Categorical Variables Categorical variables are modeled in regression using Boolean logic Boolean Logic Yes1 No0 Example: y = b 0 + b time x time + b Tom x Tom + b Dick x Dick x Tom x Dick Tom10 Dick01 Harry00
16
Categorical Variables x Tom x Dick Tom10 Dick01 Harry00 Harry is the baseline category for the model Tom and Dick’s performance will be gauged in relation to Harry, but not each other. Example: y = b 0 + b time x time + b Tom x Tom + b Dick x Dick
17
Categorical Variables Interpretation –Tom’s average sale price is b Tom more than Harry’s –Dick’s average sale price is b Dick more than Harry’s x Tom x Dick Tom 10 y = b 0 + b time x time + b Tom Dick 01 y = b 0 + b time x time + b Dick Harry 00 y = b 0 + b time x time y = b 0 + b time x time + b Tom x Tom + b Dick x Dick
18
Multicolinearity Multicolinearity: Predictor variables are correlated with each other. Multicolinearity results in instability in the estimation of the b’s –P-values will be larger –Confidence in the b’s decreases or disappears (magnitude and sign may be different from the expected values) –A small change in the data results in large variations in the coefficients –Read 11.11
19
VIF-Variance Inflation Factor Measures the degree to which the confidence in the estimate of the coefficient is decreased by multicolinearity. The larger the VIF, the greater a problem multicolinearity is. If VIF > 10 then there may be a problem If VIF >=15 then there may be a serious problem
20
Model Selection 1)Start with everything. 2)Delete variables with high VIF factors one at a time. 3)Delete variables one at a time, deleting the one with the largest p-value. 4)Stop when all p-values are less than 0.05.
21
Demand Price Curve The demand-price function is nonlinear: D=b 0 P b1 A log transformation makes it linear: ln(D)=ln(b 0 ) +b 1 ln(P) Run the Regression on the transformed variables Plug the coefficients into the equation below: D=e b0 P b1 Make your projections on this last equation.
22
Demand Price Curve 1)Create a variable for the natural log of demand and the natural log of the independent variables. –In Excel : =ln(demand), =ln(price), =ln(income), etc. 2)Run the regression on the transformed variables. 3)Place the coefficients in the equation: d=e constant p b1 i b2 4)Simplify to: d=kp b1 i b2 (Note that e constant =k) 5)If income is not included, then the equation is just: d=kp b1 The demand-price function is nonlinear: d=kp b1 A log transformation makes it linear: ln(d)=b 0 +b p ln(p)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.