Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Basics of Regression Analysis

Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting Error

Two Predictor Variables Population Regression Model: Y =  0 +  1 X 1 +  2 X 2 + e  e following N(0,  ) Unknown parameters:  0,  1,  2 ; 

From Data to Estimates of Coefficients Principle: Least Squares Normal Equation Systems Estimates of Coefficients Mathematics Computing Algorithm

Least Squares Method Simple RegressionMultiple Regression

Matrix Computation for b Normal Equation System: (X T X) b = X T Y –See Text Appendix D.3 Solution for b: b = (X T X) -1 (X T Y)

Standardized Regression Coefficients, Definition –b 0 = 0 –the beta coefficient Used to show relative weights of predictors. for k = 1, 2

Estimation of  s e - Standard Deviation of Disturbance e Forecasting Equation SS of Residuals Mean SS SSE =Y i -Y i 2  i=1 n MSE =sese 2 = SSE (n-3)

Standard Error of Coefficients The variance matrix of b (K+1 x 1)is

The Variability Explained First, determine the base variability for explanation by the regression Unconditional mean model: Y =  y + e e follows N(0,  y ) LS fit of the model: Pred_Y = Y SS of Residuals: MSS (DF=n-1):

The Variability Explained – cont. Second, by subtraction of the variability for still left. In SS: In Variance :

Creating ANOVA Table Regression Model Unexplained Variability in SS DF Unexplained Variability in Variance (MSE) Un- conditional SST (n-1) Conditional SSE (n-3) Variability Explained SSR= SST - SSE 2 Proportion Explained

Test of Significance F test of significance T- Test of significance –Two sided alternative –One sided alternative

F - Test of Significance of the variability explained by the regression H 0 :  1 =  2 = 0 H a : At least one coefficient is not 0 P-Value of F-stat = P{F (2, n-3) > F-stat}

t-Test of Significance of significance of a variable, X 1 - two sided H 0 :  1 = 0 H a :  1 = 0 P-Value of t-stat = P{ t ( n-3) > |t-stat|}

One Sided Test of Significance of significance of a variable, X 1 H 0 :  1 = 0 H a :  1 > 0 (using the prior knowledge) p-Value of t-stat = P{ t ( n-3) > t-stat}

Forecasting Point forecasting Sources of forecasting error Interval forecasting

Forecasting at x m Data of X for regressionValue of X for prediction

Sources of Forecasting Error Data: Y|x m =  0 +  1 x 1m +  2 x 2m + e m Forecast: Forecast Error:

Computing Standard Errors

Forecasting Performance Analysis R 2 _pred = 1 – Press / SST Press = SS of {y i – y i (i)} (deleted residual) Sample splitting –Analysis sample (n 1 ) –Validation sample (n 2 )

Generalization to K Independent Variables Use n – K – 1 for n – 3 for DF for t. Use K for the numerator DF and n-K-1 for the denominator DF for F.

Diagnostics Assumptions for Disturbance Multi-collinearity Outliers and Influential Observations

Problematic Data Conditions Regression Coefficients Are Sensitive to: –Highly Collinear Independent Variables –Contamination By Outliers and Influential Observations

Detecting Outliers and Influential Data Outliers –Leverage (X-space) distance from the mean –Tresid (Y-space) forecasting error Influential Data –Idea: with / without comparison –Cook’D –Dfbetas –Dfits

Modeling Techniques Transformation of Variables –Log –Others Using Dummy Variables –Symbolic representation –Dummy variables for qualitative variables Using Scores for Ordinal Variables Selection of Independent Variables –Forecasting –Computer intensive –Analysis of correlation structure of independent variables

Dummy Variables DK= “If (X=k,1,0)” Can be used nominal and also ordinal variables # of DK = c-1 where c is the number of categories.

Using Scores for Ordinal Variable Scoring Systems – 1, 2, 3, …c – -2, -1, 0, 1, 2 c:odd

Implications of Variable Selection

Selection of Variables - 1 Backward elimination Stepwise (forward) inclusion All X’s Final Regression T-test Best simple Best Two variables Best …. variables Max Increase in R 2 Max Increase in R 2

Selection of Variables - 2 All Possible Regression K independent variables K simple K (K-1) two variable 1 K variable Final Regression

Selection Criteria R2___________________________ Adj. R 2 ______________________ R 2 PRED ______________________ Se __________________________ Cp___________________________

C p (= # of coefficients) Select a combination with Cp close to p

What to Look for in Good Regression? Remember the three functions of regression –Estimation of the effect of each X –Explaining the variability of Y –Forecasting Populations regressions are assumptions –Needs testing Data might be contaminated

Extensions For Other Variable Types of Y

Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal

Generalized Linear Models (GLM) Regression model: Y =  0 +  1 X 1 +  2 X 2 + e  e following N(0,  ) GLM Formulation: 1.Model for Y: Y is N( ,  ) 2.Model for predictors (Link Function):  =  0 +  1 X 1 +  2 X

Forecasting Counting Data Model for Y: Poisson Distribution (  ) Link Function:

Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Similar presentations

Presentation on theme: "Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Similar presentations

Presentation on theme: "Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting."— Presentation transcript:

Similar presentations

About project

Feedback