Download presentation
Presentation is loading. Please wait.
Published byBernice Townsend Modified over 9 years ago
1
Basics of Regression Analysis
2
Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting Error
3
Two Predictor Variables Population Regression Model: Y = 0 + 1 X 1 + 2 X 2 + e e following N(0, ) Unknown parameters: 0, 1, 2 ;
4
From Data to Estimates of Coefficients Principle: Least Squares Normal Equation Systems Estimates of Coefficients Mathematics Computing Algorithm
5
Least Squares Method Simple RegressionMultiple Regression
6
Matrix Computation for b Normal Equation System: (X T X) b = X T Y –See Text Appendix D.3 Solution for b: b = (X T X) -1 (X T Y)
7
Standardized Regression Coefficients, Definition –b 0 = 0 –the beta coefficient Used to show relative weights of predictors. for k = 1, 2
8
Estimation of s e - Standard Deviation of Disturbance e Forecasting Equation SS of Residuals Mean SS SSE =Y i -Y i 2 i=1 n MSE =sese 2 = SSE (n-3)
9
Standard Error of Coefficients The variance matrix of b (K+1 x 1)is
10
The Variability Explained First, determine the base variability for explanation by the regression Unconditional mean model: Y = y + e e follows N(0, y ) LS fit of the model: Pred_Y = Y SS of Residuals: MSS (DF=n-1):
11
The Variability Explained – cont. Second, by subtraction of the variability for still left. In SS: In Variance :
12
Creating ANOVA Table Regression Model Unexplained Variability in SS DF Unexplained Variability in Variance (MSE) Un- conditional SST (n-1) Conditional SSE (n-3) Variability Explained SSR= SST - SSE 2 Proportion Explained
13
Test of Significance F test of significance T- Test of significance –Two sided alternative –One sided alternative
14
F - Test of Significance of the variability explained by the regression H 0 : 1 = 2 = 0 H a : At least one coefficient is not 0 P-Value of F-stat = P{F (2, n-3) > F-stat}
15
t-Test of Significance of significance of a variable, X 1 - two sided H 0 : 1 = 0 H a : 1 = 0 P-Value of t-stat = P{ t ( n-3) > |t-stat|}
16
One Sided Test of Significance of significance of a variable, X 1 H 0 : 1 = 0 H a : 1 > 0 (using the prior knowledge) p-Value of t-stat = P{ t ( n-3) > t-stat}
17
Forecasting Point forecasting Sources of forecasting error Interval forecasting
18
Forecasting at x m Data of X for regressionValue of X for prediction
19
Sources of Forecasting Error Data: Y|x m = 0 + 1 x 1m + 2 x 2m + e m Forecast: Forecast Error:
20
Computing Standard Errors
21
Forecasting Performance Analysis R 2 _pred = 1 – Press / SST Press = SS of {y i – y i (i)} (deleted residual) Sample splitting –Analysis sample (n 1 ) –Validation sample (n 2 )
22
Generalization to K Independent Variables Use n – K – 1 for n – 3 for DF for t. Use K for the numerator DF and n-K-1 for the denominator DF for F.
23
Diagnostics Assumptions for Disturbance Multi-collinearity Outliers and Influential Observations
24
Problematic Data Conditions Regression Coefficients Are Sensitive to: –Highly Collinear Independent Variables –Contamination By Outliers and Influential Observations
25
Detecting Outliers and Influential Data Outliers –Leverage (X-space) distance from the mean –Tresid (Y-space) forecasting error Influential Data –Idea: with / without comparison –Cook’D –Dfbetas –Dfits
26
Modeling Techniques Transformation of Variables –Log –Others Using Dummy Variables –Symbolic representation –Dummy variables for qualitative variables Using Scores for Ordinal Variables Selection of Independent Variables –Forecasting –Computer intensive –Analysis of correlation structure of independent variables
27
Dummy Variables DK= “If (X=k,1,0)” Can be used nominal and also ordinal variables # of DK = c-1 where c is the number of categories.
28
Using Scores for Ordinal Variable Scoring Systems – 1, 2, 3, …c – -2, -1, 0, 1, 2 c:odd
29
Implications of Variable Selection
30
Selection of Variables - 1 Backward elimination Stepwise (forward) inclusion All X’s Final Regression T-test Best simple Best Two variables Best …. variables Max Increase in R 2 Max Increase in R 2
31
Selection of Variables - 2 All Possible Regression K independent variables K simple K (K-1) two variable 1 K variable Final Regression
32
Selection Criteria R2___________________________ Adj. R 2 ______________________ R 2 PRED ______________________ Se __________________________ Cp___________________________
33
C p (= # of coefficients) Select a combination with Cp close to p
34
What to Look for in Good Regression? Remember the three functions of regression –Estimation of the effect of each X –Explaining the variability of Y –Forecasting Populations regressions are assumptions –Needs testing Data might be contaminated
35
Extensions For Other Variable Types of Y
36
Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal
37
Generalized Linear Models (GLM) Regression model: Y = 0 + 1 X 1 + 2 X 2 + e e following N(0, ) GLM Formulation: 1.Model for Y: Y is N( , ) 2.Model for predictors (Link Function): = 0 + 1 X 1 + 2 X
38
Forecasting Counting Data Model for Y: Poisson Distribution ( ) Link Function:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.