Regression Analysis Part C Confidence Intervals and Hypothesis Testing Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied Approach. L01C MGS 8110 - Regression Inference
Regression Analysis Modules Part A – Basic Model & Parameter Estimation Part B – Calculation Procedures Part C – Inference: Confidence Intervals & Hypothesis Testing Part D – Goodness of Fit Part E – Model Building Part F – Transformed Variables Part G – Standardized Variables Part H – Dummy Variables Part I – Eliminating Intercept Part J - Outliers Part K – Regression Example #1 Part L – Regression Example #2 Part N – Non-linear Regression Part P – Non-linear Example R L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Overview of Part L01C Confidence Intervals and Hypothesis Testing For Yi prediction and Yi mean Formulas for univariate and multivariate cases. Example calculation: 1) Manual in Excel and 2) SPSS. For Regression Coefficients, bi Example calculation: 1) Data Analysis in Excel and 2) SPSS. Hypothesis Testing For Entire Regression Model, F-test L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Underlying Statistical Theory Underlying Statistical Theory Confidence Intervals and Hypothesis Testing L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
The Standard Error of a Regression Equation The Standard Error of a Regression Equation single independent variable where Yi is the actually observed values of the dependent variable. Yihat is the predicted value from the fitted regression equation. p = 1 is the number of independent variables. k = p+1 = 2 for the number of parameters, b0, b1. n is the sample size used when calculating s. L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Confidence Interval for Individual Prediction Confidence Interval for Individual Prediction single independent variable where f denotes the future (forecasted) or predicted value. p = 1 is the number of independent variables. k = p+1 = 2 for the number of parameters, b0, b1. n is the sample size used when calculating s. 1-a is the confidence level, typically .95. So a/2 = .025. L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Confidence Interval for Mean Prediction. single independent variable Confidence Interval for Mean Prediction single independent variable (1 of 2) where f denotes the future (forecasted) or predicted value. p = 1 is the number of independent variables. k = p+1 = 2 for the number of parameters, b0, b1. n is the sample size used when calculating s. m is the sample size that is going to be used to calculate the mean value. 1-a is the confidence level, typically .95. So a/2 = .025. L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Confidence Interval for Mean Prediction. single independent variable Confidence Interval for Mean Prediction single independent variable (2 of 2) When m=1, the CI for the mean becomes the CI for an individual Y. When m = infinity, the CI for the mean become the CI for a general mean. L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
L01C MGS 8110 - Regression Inference
CI Manual Calculations single independent variable L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
CI Manual Calculations single independent variable L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
SPSS Data Analysis Calculations single independent variable L01C MGS 8110 - Regression Inference
SPSS Data Analysis Calculations. single independent variable SPSS Data Analysis Calculations single independent variable (continued) L01C MGS 8110 - Regression Inference
The Standard Error of a Regression Equation multivariate case where Y is the actually observed values of the dependent variable, an [n x 1] matrix vector. X is the actually observed values of the independent variable, an [n x 1] matrix vector. b is the calculated regression parameters, a [k x 1] matrix. b=(X’X)-1(X’Y) p is the number of independent variables. k=p+1 is the number of parameters, b0, b1, … bp. n is the sample size used when calculating s. L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Confidence Interval for Individual Predictions multivariate case where Xf is a matrix vector of specified values for the independent variables. X’f = [1 Xf,1, Xf,2, … Xf,p] p is the number of independent variables. k = p+1 is the number of parameters, b0, b1, … bp. n is the sample size used when calculating s. 1-a is the confidence level, typically .95. So a/2 = .025. L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Confidence Interval for Mean Predictions multivariate case where Xf is a matrix vector of specified values for the independent variables. X’f = [1 Xf,1, Xf,2, … Xf,p] p is the number of independent variables. k = p+1 is the number of parameters, b0, b1, … bp. n is the sample size used when calculating s. 1-a is the confidence level, typically .95. So a/2 = .025. L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
CI Manual Calculations multivariate case L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
SPSS Data Analysis Calculations multivariate case L01C MGS 8110 - Regression Inference
SPSS Data Analysis Calculations multivariate case (continued) L01C MGS 8110 - Regression Inference
The Standard Error of a Regression Equation where Yi is the actually observed values of the dependent variable. Yihat is the predicted value from the fitted regression equation. p = 1 is the number of independent variables. k = p+1 = 2 for the number of parameters, b0, b1. n is the sample size used when calculating s. Review from previous slide. L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Skip’s Quick and Dirty method to Estimate the Confidence Interval for a Regression Line. Procedure: Select a range of X values from Minimum X to Maximum X. Calculate the corresponding predicted values for Y, Yhat. Add and subtract 2 times the Standard Error for Regression to the predicted values. Optional – plot the two CL line on the scatter plot. L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Confidence Interval for. Regression Coefficients Confidence Interval for Regression Coefficients single independent variable where p = 1 is the number of independent variables. k = p+1 = 2 for the number of parameters, b0, b1. n is the sample size used when calculating s. 1-a is the confidence level, typically .95. So a/2 = .025. L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Confidence Interval for Regression Coefficients multivariate case where p is the number of independent variables. k = p+1 is the number of parameters, b0, b1, … bp. n is the sample size used when calculating s. 1-a is the confidence level, typically .95. So a/2 = .025. L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Excel, Data Analysis Calculations Multivariate Case L01C MGS 8110 - Regression Inference
Excel, Data Analysis Calculations Multivariate Case (continued) L01C MGS 8110 - Regression Inference
SPSS Data Analysis Calculations Multivariate Case L01C MGS 8110 - Regression Inference
SPSS Data Analysis Calculations Multivariate Case (continued) L01C MGS 8110 - Regression Inference
Hypothesis Test of Regression Coefficient where p is the number of independent variables. k = p+1 is the number of parameters, b0, b1, … bp. n is the sample size used when calculating s. 1-a is the confidence level, typically .95. So a/2 = .025. L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Excel, Data Analysis Calculation Multivariate Case L01C MGS 8110 - Regression Inference
SPSS Data Analysis Calculations Multivariate Case L01C MGS 8110 - Regression Inference
Summary: Never test the intercept (constant) Summary: Never test the intercept (constant). Discussed in more detail in L01I If sig is less than .05, keep the variable (slope not equal to zero). If sig is greater than .05, consider eliminating the variable from the model (slope could be zero). L01C MGS 8110 - Regression Inference
Summary: Never test the intercept (constant). If sig is less than Summary: Never test the intercept (constant). If sig is less than .05, keep the variable (slope not equal to zero). If sig is greater than .05, consider eliminating the variable from the model (slope could be zero). If you can’t remember theses rules a year from now, look at the confidence interval. Does the confidence interval contain 0 (zero) L01C MGS 8110 - Regression Inference
F-test for Overall Model L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Excel, Data Analysis Calculation Multivariate Case L01C MGS 8110 - Regression Inference
SPSS Data Analysis Calculations Multivariate Case L01C MGS 8110 - Regression Inference
Review of ANOVA Analysis Green = Residual from mean. Blue, dashed = portion of residual explained by regression equation. Red = portion of residual still unexplained after fitting regression equation. L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Fundamental Concept of ANOVA Analysis Residual Analysis Total = Unexplained + Explained It can be shown (algebraically complex) Total SS = Unexplained SS + Explained SS L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Review of ANOVA Table (1 of 3) Terminology and Table Calculations L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Review of ANOVA Table (2 of 3) Algebraic explanation of terms L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Review of ANOVA Table (3 of 3) Calculation formulas L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Review of ANOVA Table. (1 of 3). Matrix explanation of terms Review of ANOVA Table (1 of 3) Matrix explanation of terms Regression prediction compared to prediction mean of y L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Review of ANOVA Table (2 of 3) Alternative matrix explanation of terms Regression prediction compared to prediction 0 (zero) L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Review of ANOVA Table (3 of 3) Alternative matrix explanation of terms Regression prediction compared to prediction mean of Y & 0 (zero) L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference
Statistical Assumptions 0. The expected value of the residuals is zero, E(ei)=0. The algebraic equation is the correct functional form and accurately predicts E(Yi,j) for all j. Inference Assumptions The residual variance is constant. That is, sj,j2 = s2 for all Xj,j and all i and j. The variance of the observations (Yi,j) does not change as more observations are obtained and/or as different values of Xj are observed. The observations are statistically independent. That is, Yi,j is statistically independent of all other Y’,j values for all i (& j fixed). Knowing the current value of Y does not provide insights into the value of the next Y. The residual errors are normally distributed. The ei,j terms are N(0,s2). L01C MGS 8110 - Regression Inference L01C MGS 8110 - Regression Inference