Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Similar presentations


Presentation on theme: "Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting."— Presentation transcript:

1 Basics of Regression Analysis

2 Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting Error

3 Two Predictor Variables Population Regression Model: Y =  0 +  1 X 1 +  2 X 2 + e  e following N(0,  ) Unknown parameters:  0,  1,  2 ; 

4 From Data to Estimates of Coefficients Principle: Least Squares Normal Equation Systems Estimates of Coefficients Mathematics Computing Algorithm

5 Least Squares Method Simple RegressionMultiple Regression

6 Matrix Computation for b Normal Equation System: (X T X) b = X T Y –See Text Appendix D.3 Solution for b: b = (X T X) -1 (X T Y)

7 Standardized Regression Coefficients, Definition –b 0 = 0 –the beta coefficient Used to show relative weights of predictors. for k = 1, 2

8 Estimation of  s e - Standard Deviation of Disturbance e Forecasting Equation SS of Residuals Mean SS SSE =Y i -Y i 2  i=1 n MSE =sese 2 = SSE (n-3)

9 Standard Error of Coefficients The variance matrix of b (K+1 x 1)is

10 The Variability Explained First, determine the base variability for explanation by the regression Unconditional mean model: Y =  y + e e follows N(0,  y ) LS fit of the model: Pred_Y = Y SS of Residuals: MSS (DF=n-1):

11 The Variability Explained – cont. Second, by subtraction of the variability for still left. In SS: In Variance :

12 Creating ANOVA Table Regression Model Unexplained Variability in SS DF Unexplained Variability in Variance (MSE) Un- conditional SST (n-1) Conditional SSE (n-3) Variability Explained SSR= SST - SSE 2 Proportion Explained

13 Test of Significance F test of significance T- Test of significance –Two sided alternative –One sided alternative

14 F - Test of Significance of the variability explained by the regression H 0 :  1 =  2 = 0 H a : At least one coefficient is not 0 P-Value of F-stat = P{F (2, n-3) > F-stat}

15 t-Test of Significance of significance of a variable, X 1 - two sided H 0 :  1 = 0 H a :  1 = 0 P-Value of t-stat = P{ t ( n-3) > |t-stat|}

16 One Sided Test of Significance of significance of a variable, X 1 H 0 :  1 = 0 H a :  1 > 0 (using the prior knowledge) p-Value of t-stat = P{ t ( n-3) > t-stat}

17 Forecasting Point forecasting Sources of forecasting error Interval forecasting

18 Forecasting at x m Data of X for regressionValue of X for prediction

19 Sources of Forecasting Error Data: Y|x m =  0 +  1 x 1m +  2 x 2m + e m Forecast: Forecast Error:

20 Computing Standard Errors

21 Forecasting Performance Analysis R 2 _pred = 1 – Press / SST Press = SS of {y i – y i (i)} (deleted residual) Sample splitting –Analysis sample (n 1 ) –Validation sample (n 2 )

22 Generalization to K Independent Variables Use n – K – 1 for n – 3 for DF for t. Use K for the numerator DF and n-K-1 for the denominator DF for F.

23 Diagnostics Assumptions for Disturbance Multi-collinearity Outliers and Influential Observations

24 Problematic Data Conditions Regression Coefficients Are Sensitive to: –Highly Collinear Independent Variables –Contamination By Outliers and Influential Observations

25 Detecting Outliers and Influential Data Outliers –Leverage (X-space) distance from the mean –Tresid (Y-space) forecasting error Influential Data –Idea: with / without comparison –Cook’D –Dfbetas –Dfits

26 Modeling Techniques Transformation of Variables –Log –Others Using Dummy Variables –Symbolic representation –Dummy variables for qualitative variables Using Scores for Ordinal Variables Selection of Independent Variables –Forecasting –Computer intensive –Analysis of correlation structure of independent variables

27 Dummy Variables DK= “If (X=k,1,0)” Can be used nominal and also ordinal variables # of DK = c-1 where c is the number of categories.

28 Using Scores for Ordinal Variable Scoring Systems – 1, 2, 3, …c – -2, -1, 0, 1, 2 c:odd

29 Implications of Variable Selection

30 Selection of Variables - 1 Backward elimination Stepwise (forward) inclusion All X’s Final Regression T-test Best simple Best Two variables Best …. variables Max Increase in R 2 Max Increase in R 2

31 Selection of Variables - 2 All Possible Regression K independent variables K simple K (K-1) two variable 1 K variable Final Regression

32 Selection Criteria R2___________________________ Adj. R 2 ______________________ R 2 PRED ______________________ Se __________________________ Cp___________________________

33 C p (= # of coefficients) Select a combination with Cp close to p

34 What to Look for in Good Regression? Remember the three functions of regression –Estimation of the effect of each X –Explaining the variability of Y –Forecasting Populations regressions are assumptions –Needs testing Data might be contaminated

35 Extensions For Other Variable Types of Y

36 Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal

37 Generalized Linear Models (GLM) Regression model: Y =  0 +  1 X 1 +  2 X 2 + e  e following N(0,  ) GLM Formulation: 1.Model for Y: Y is N( ,  ) 2.Model for predictors (Link Function):  =  0 +  1 X 1 +  2 X

38 Forecasting Counting Data Model for Y: Poisson Distribution (  ) Link Function:


Download ppt "Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting."

Similar presentations


Ads by Google