Presentation is loading. Please wait.

Presentation is loading. Please wait.

Review Session Linear Regression. Correlation Pearson’s r –Measures the strength and type of a relationship between the x and y variables –Ranges from.

Similar presentations


Presentation on theme: "Review Session Linear Regression. Correlation Pearson’s r –Measures the strength and type of a relationship between the x and y variables –Ranges from."— Presentation transcript:

1 Review Session Linear Regression

2 Correlation Pearson’s r –Measures the strength and type of a relationship between the x and y variables –Ranges from -1 to +1 ValueRelationshipMeaning +1Perfect direct relationship between x and y. As x gets bigger, so does y. 0No relationship exists between x and y. Perfect Inverse relationship between x and y. As x bigger, y gets smaller.

3 Correlation printout in Minitab Top number is the correlation Bottom number is the p-value

4 Simple Linear Regression yresponse x1x1 predictor B0B0 constant (y-intercept) B1B1 Coefficient for x 1 (slope) eerror y=b 0 + b 1 x 1 + e

5 Simple Linear Regression Making A Point Prediction y = b 0 + b 1 x 1 + e GPA = 1.47 + 0.00323(GMAT) For a person with a GMAT Score of 400, what is the expected 1 st year GPA? GPA = 1.47 + 0.00323(GMAT) GPA = 1.47 + 0.00323(400) GPA = 1.47 + 1.292 GPA = 2.76

6 Simple Linear Regression y = b 0 + b 1 x 1 + e GPA = 1.47 + 0.00323(GMAT) What’s the 95% CI for the GPA of a person with a GMAT score of 400? GPA = 2.76 SE = 0.26 2.76 +/- 2(0.26) 95% CI = (2.24, 3.28)

7 Coefficient CI’s and Testing y = b 0 + b 1 x 1 + e GPA = 1.47 + 0.00323(GMAT) b 0 = 1.47 +/- 2(0.22) = 1.47 +/- 0.44 = (1.03, 1.91) Find the 95% CI for the coefficients. b 1 = 0.0032 +/- 2(0.0004) = 0.0032 +/- 0.0008 = 0.0026, 0.0040

8 Coefficient Testing y = b 0 + b 1 x 1 + e GPA = 1.47 + 0.00323(GMAT) H 0 : b = 0 H 1 : b <> 0 The p-value for each coefficient is the result of a hypothesis test If p-value <= 0.05, reject H 0 and accept the coefficient.

9 R2R2 r 2 and R 2 Square of Pearson's r Little r 2 is for simple regression Big R 2 is used for multiple regression R 2 interpretation 0No correlation 1Perfect correlation

10 Sample R 2 values R 2 = 0.80 R 2 = 0.60 R 2 = 0.30 R 2 = 0.20

11 Regression ANOVA H 0 : b 1 = b 2 = …. = b k = 0 H a : at least one b <> 0 F-statistic, df 1, df 2  p-value If p <= 0.05, at least one of the b’s is not zero If p > 0.05, it’s possible that all of the b’s are zero

12 Diagnostics - Residuals Residuals = errors Residuals should be normally distributed Residuals should have a constant variance –Heteroscedasticity: pattern in the residual distribution Autocorrelation: error magnitude increases or decreases with the magnitude of an independent variable Heteroscedasticity and autocorrelation indicate problems with the model –Homoscedasticity: no pattern in the residual distribution Use the 4-in-one plot for these diagnostics

13 Adding a Power Transformation Each “bump” or “U” shape in a scatter plot indicates that an additional power may be involved. –0 bumps: x –1 bump: x 2 –2 bumps: x 3 Standard equation is y = b 0 + b 1 x + b 2 x 2 Don’t forget: Check to see if b 1 and b 2 are statistically significant, and that the model is also statistically significant.

14 Categorical Variables Occasionally it is necessary to add a categorical variable to a regression model. Suppose that we have a car dealership, and we want to model the sale price based on the time on the lot and the sales person (Tom, Dick, or Harry). –The time on the lot is a linear variable. –Salesperson is a categorical variable.

15 Categorical Variables Categorical variables are modeled in regression using Boolean logic Boolean Logic Yes1 No0 Example: y = b 0 + b time x time + b Tom x Tom + b Dick x Dick x Tom x Dick Tom10 Dick01 Harry00

16 Categorical Variables x Tom x Dick Tom10 Dick01 Harry00 Harry is the baseline category for the model Tom and Dick’s performance will be gauged in relation to Harry, but not each other. Example: y = b 0 + b time x time + b Tom x Tom + b Dick x Dick

17 Categorical Variables Interpretation –Tom’s average sale price is b Tom more than Harry’s –Dick’s average sale price is b Dick more than Harry’s x Tom x Dick Tom 10 y = b 0 + b time x time + b Tom Dick 01 y = b 0 + b time x time + b Dick Harry 00 y = b 0 + b time x time y = b 0 + b time x time + b Tom x Tom + b Dick x Dick

18 Multicolinearity Multicolinearity: Predictor variables are correlated with each other. Multicolinearity results in instability in the estimation of the b’s –P-values will be larger –Confidence in the b’s decreases or disappears (magnitude and sign may be different from the expected values) –A small change in the data results in large variations in the coefficients –Read 11.11

19 VIF-Variance Inflation Factor Measures the degree to which the confidence in the estimate of the coefficient is decreased by multicolinearity. The larger the VIF, the greater a problem multicolinearity is. If VIF > 10 then there may be a problem If VIF >=15 then there may be a serious problem

20 Model Selection 1)Start with everything. 2)Delete variables with high VIF factors one at a time. 3)Delete variables one at a time, deleting the one with the largest p-value. 4)Stop when all p-values are less than 0.05.

21 Demand Price Curve The demand-price function is nonlinear: D=b 0 P b1 A log transformation makes it linear: ln(D)=ln(b 0 ) +b 1 ln(P) Run the Regression on the transformed variables Plug the coefficients into the equation below: D=e b0 P b1 Make your projections on this last equation.

22 Demand Price Curve 1)Create a variable for the natural log of demand and the natural log of the independent variables. –In Excel : =ln(demand), =ln(price), =ln(income), etc. 2)Run the regression on the transformed variables. 3)Place the coefficients in the equation: d=e constant p b1 i b2 4)Simplify to: d=kp b1 i b2 (Note that e constant =k) 5)If income is not included, then the equation is just: d=kp b1 The demand-price function is nonlinear: d=kp b1 A log transformation makes it linear: ln(d)=b 0 +b p ln(p)


Download ppt "Review Session Linear Regression. Correlation Pearson’s r –Measures the strength and type of a relationship between the x and y variables –Ranges from."

Similar presentations


Ads by Google