Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Regression (1)

Similar presentations


Presentation on theme: "Multiple Regression (1)"— Presentation transcript:

1 Multiple Regression (1)
Shakeel Nouman M.Phil Statistics Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

2 11 Multiple Regression (1) Using Statistics
The k-Variable Multiple Regression Model The F Test of a Multiple Regression Model How Good is the Regression Tests of the Significance of Individual Regression Parameters Testing the Validity of the Regression Model Using the Multiple Regression Model for Prediction Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

3 11 Multiple Regression (2) Qualitative Independent Variables
Polynomial Regression Nonlinear Models and Transformations Multicollinearity Residual Autocorrelation and the Durbin-Watson Test Partial F Tests and Variable Selection Methods The Matrix Approach to Multiple Regression Analysis Summary and Review of Terms Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

4 11-1 Using Statistics x y x2 x1 Lines Planes
Slope: 1 Intercept: 0 Any two points (A and B), or an intercept and slope (0 and 1), define a line on a two-dimensional surface. B A x y x2 x1 C Any three points (A, B, and C), or an intercept and coefficients of x1 and x2 (0 , 1, and 2), define a plane in a three-dimensional surface. Lines Planes Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

5 11-2 The k-Variable Multiple Regression Model
The population regression model of a dependent variable, Y, on a set of k independent variables, X1, X2,. . . , Xk is given by: Y= 0 + 1X1 + 2X kXk + where 0 is the Y-intercept of the regression surface and each i , i = 1,2,...,k is the slope of the regression surface - sometimes called the response surface - with respect to Xi. x2 x1 y 2 1 0 Model assumptions: 1. ~N(0,2), independent of other errors. 2. The variables Xi are uncorrelated with the error term. Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

6 Simple and Multiple Least-Squares Regression
In a simple regression model, the least-squares estimators minimize the sum of squared errors from the estimated regression line. In a multiple regression model, the least-squares estimators minimize the sum of squared errors from the estimated regression plane. X Y x2 x1 y Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

7 The Estimated Regression Relationship
where is the predicted value of Y, the value lying on the estimated regression surface. The terms b0,...,k are the least-squares estimates of the population regression parameters i. The actual, observed value of Y is the predicted value plus an error: yj = b0+ b1 x1j+ b2 x2j bk xkj+e Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

8 Least-Squares Estimation: The 2-Variable Normal Equations
Minimizing the sum of squared errors with respect to the estimated coefficients b0, b1, and b2 yields the following normal equations: Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

9 --- --- --- --- ---- --- ---- ----
Example 11-1 Y X1 X2 X1X2 X12 X22 X1Y X2Y Normal Equations: 743 = 10b0+123b1+65b2 9382 = 123b0+1615b1+869b2 5040 = 65b0+869b1+509b2 b0 = b1 = b2 = Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

10 Example 11-1: Using the Template
Regression results for Alka-Seltzer sales Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

11 Decomposition of the Total Deviation in a Multiple Regression Model
x2 x1 y Total Deviation = Regression Deviation + Error Deviation SST = SSR SSE Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

12 11-3 The F Test of a Multiple Regression Model
A statistical test for the existence of a linear relationship between Y and any or all of the independent variables X1, x2, ..., Xk: H0: 1 = 2 = ...= k=0 H1: Not all the i (i=1,2,...,k) are 0 Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

13 Using the Template: Analysis of Variance Table (Example 11-1)
F D i s t r b u o n w h 2 a d 7 e g f m F0.01=9.55 =0.01 Test statistic 86.34 f(F) The test statistic, F = 86.34, is greater than the critical point of F(2, 7) for any common level of significance (p-value 0), so the null hypothesis is rejected, and we might conclude that the dependent variable is related to one or more of the independent variables. Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

14 11-4 How Good is the Regression
x2 x1 y Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

15 Example 11-1: s = 1.911 R-sq = 96.1% R-sq(adj) = 95.0%
Decomposition of the Sum of Squares and the Adjusted Coefficient of Determination SST SSR SSE Example 11-1: s = R-sq = 96.1% R-sq(adj) = 95.0% Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

16 Measures of Performance in Multiple Regression and the ANOVA Table
Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

17 11-5 Tests of the Significance of Individual Regression Parameters
Hypothesis tests about individual regression slope parameters: (1) H0: b1= 0 H1: b1  0 (2) H0: b2 = 0 H1: b2  0 . (k) H0: bk = 0 H1: bk  0 Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

18 Regression Results for Individual Parameters
Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

19 Example 11-1: Using the Template
Regression results for Alka-Seltzer sales Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

20 Using the Template: Example 11-2
Regression results for Exports to Singapore Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

21 11-6 Testing the Validity of the Regression Model: Residual Plots
Residuals vs M1 It appears that the residuals are randomly distributed with no pattern and with equal variance as M1 increases Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

22 11-6 Testing the Validity of the Regression Model: Residual Plots
Residuals vs Price It appears that the residuals are increasing as the Price increases. The variance of the residuals is not constant. Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

23 Normal Probability Plot for the Residuals: Example 11-2
Linear trend indicates residuals are normally distributed Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

24 Investigating the Validity of the Regression: Outliers and Influential Observations
. * Outlier y x Regression line without outlier Regression line with outlier Outliers Point with a large value of xi * Regression line when all data are included No relationship in this cluster Influential Observations Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

25 Outliers and Influential Observations: Example 11-2
Unusual Observations Obs M1 EXPORTS Fit Stdev.Fit Residual St.Resid X X R R R R R denotes an obs. with a large st. resid. X denotes an obs. whose X value gives it large influence. Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

26 11-7 Using the Multiple Regression Model for Prediction
Sales Advertising Promotions 8.00 18.00 3 12 63.42 89.76 Estimated Regression Plane for Example 11-1 Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

27 Prediction in Multiple Regression
Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

28 MOVIE EARN COST PROM BOOK
11-8 Qualitative (or Categorical) Independent Variables (in Regression) MOVIE EARN COST PROM BOOK EXAMPLE 11-3 Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

29 Picturing Qualitative Variables in Regression
x2 x1 y b3 X1 Y Line for X2=1 Line for X2=0 b0 b0+b2 A regression with one quantitative variable (X1) and one qualitative variable (X2): A multiple regression with two quantitative variables (X1 and X2) and one qualitative variable (X3): Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

30 Picturing Qualitative Variables in Regression: Three Categories and Two Dummy Variables
X1 Y Line for X = 0 and X3 = 1 A regression with one quantitative variable (X1) and two qualitative variables (X2 and X2): b0+b2 b0+b3 Line for X2 = 1 and X3 = 0 Line for X2 = 0 and X3 = 0 A qualitative variable with r levels or categories is represented with (r-1) 0/1 (dummy) variables. Category X2 X3 Adventure Drama Romance Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

31 Using Qualitative Variables in Regression: Example 11-4
Salary = Education Experience Gender (SE) (32.6) (45.1) (78.5) (212.4) (t) (262.2) (21.0) (16.0) (-15.3) On average, female salaries are $3256 below male salaries Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

32 Interactions between Quantitative and Qualitative Variables: Shifting Slopes
X1 Y Line for X2=0 b0+b2 b0 Line for X2=1 Slope = b1 Slope = b1+b3 A regression with interaction between a quantitative variable (X1) and a qualitative variable (X2 ): Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

33 11-9 Polynomial Regression
One-variable polynomial regression model: Y=0+1 X + 2X2 + 3X mXm + where m is the degree of the polynomial - the highest power of X appearing in the equation. The degree of the polynomial is the order of the model. X1 Y Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

34 Polynomial Regression: Example 11-5
Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

35 Polynomial Regression: Other Variables and Cross-Product Terms
Variable Estimate Standard Error T-statistic X X X X X1X Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

36 11-10 Nonlinear Models and Transformations: Multiplicative Model
Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

37 Transformations: Exponential Model
Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

38 Plots of Transformed Variables
1 5 3 2 A D V E R T S L i m p l e g r s o n f a d v t . O G ( ) - q u = 8 9 Y 6 7 + X 4 Y-HAT I P : Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

39 Variance Stabilizing Transformations
Square root transformation: Useful when the variance of the regression errors is approximately proportional to the conditional mean of Y Logarithmic transformation: Useful when the variance of regression errors is approximately proportional to the square of the conditional mean of Y Reciprocal transformation: Useful when the variance of the regression errors is approximately proportional to the fourth power of the conditional mean of Y Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

40 Regression with Dependent Indicator Variables
y x 1 Logistic Function The logistic function: Transformation to linearize the logistic function: Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

41 11-11: Multicollinearity x2 x1 x2 x1
Orthogonal X variables provide information from independent sources. No multicollinearity. Perfectly collinear X variables provide identical information content. No regression. Some degree of collinearity. Problems with regression depend on the degree of collinearity. x2 x1 A high degree of negative collinearity also causes problems with regression. Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

42 Effects of Multicollinearity
Variances of regression coefficients are inflated. Magnitudes of regression coefficients may be different from what are expected. Signs of regression coefficients may not be as expected. Adding or removing variables produces large changes in coefficients. Removing a data point may cause large changes in coefficient estimates or signs. In some cases, the F ratio may be significant while the t ratios are not. Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

43 Detecting the Existence of Multicollinearity: Correlation Matrix of Independent Variables and Variance Inflation Factors Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

44 Variance Inflation Factor
Relationship between VIF and Rh2 1 . 5 Rh2 VIF Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

45 Variance Inflation Factor (VIF)
Observation: The VIF (Variance Inflation Factor) values for both variables Lend and Price are both greater than 5. This would indicate that some degree of multicollinearity exists with respect to these two variables. Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

46 Solutions to the Multicollinearity Problem
Drop a collinear variable from the regression Change in sampling plan to include elements outside the multicollinearity range Transformations of variables Ridge regression Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

47 11-12 Residual Autocorrelation and the Durbin-Watson Test
An autocorrelation is a correlation of the values of a variable with values of the same variable lagged one or more periods back. Consequences of autocorrelation include inaccurate estimates of variances and inaccurate predictions. Lagged Residuals i i i i-2 i-3 i-4 * * * * * * * * * * The Durbin-Watson test (first-order autocorrelation): H0: 1 = 0 H1:  0 The Durbin-Watson test statistic: Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

48 n dL dU dL dU dL dU dL dU dL dU
Critical Points of the Durbin-Watson Statistic: =0.05, n= Sample Size, k = Number of Independent Variables k = 1 k = 2 k = k = 4 k = 5 n dL dU dL dU dL dU dL dU dL dU Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

49 Using the Durbin-Watson Statistic
Positive Autocorrelation Test is Inconclusive No Autocorrelation Test is Inconclusive Negative Autocorrelation dL dU 4-dU 4-dL 4 For n = 67, k = 4: dU dU2.27 dL dL2.53 < 2.58 H0 is rejected, and we conclude there is negative first-order autocorrelation. Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

50 11-13 Partial F Tests and Variable Selection Methods
Full model: Y = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 +  Reduced model: Y = 0 + 1 X1 + 2 X2 +  Partial F test: H0: 3 = 4 = 0 H1: 3 and 4 not both 0 Partial F statistic: where SSER is the sum of squared errors of the reduced model, SSEF is the sum of squared errors of the full model; MSEF is the mean square error of the full model [MSEF = SSEF/(n-(k+1))]; r is the number of variables dropped from the full model. Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

51 Variable Selection Methods
All possible regressions Run regressions with all possible combinations of independent variables and select best model A p-value of indicates that we should reject the null hypothesis H0: the slopes for Lend and Exch. are zero. Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

52 Variable Selection Methods
Stepwise procedures Forward selection Add one variable at a time to the model, on the basis of its F statistic Backward elimination Remove one variable at a time, on the basis of its F statistic Stepwise regression Adds variables to the model and subtracts variables from the model, on the basis of the F statistic Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

53 Stepwise Regression Compute F statistic for each variable not in the model Enter most significant (smallest p-value) variable into model Calculate partial F for all variables in the model Is there a variable with p-value > Pout? Remove variable Stop Yes No Is there at least one variable with p-value > Pin? Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

54 Stepwise Regression: Using the Computer (MINITAB)
MTB > STEPWISE 'EXPORTS' PREDICTORS 'M1’ 'LEND' 'PRICE’ 'EXCHANGE' Stepwise Regression F-to-Enter: F-to-Remove: Response is EXPORTS on 4 predictors, with N = 67 Step Constant M T-Ratio PRICE T-Ratio S R-Sq Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

55 Using the Computer: MINITAB
MTB > REGRESS 'EXPORTS’ 'M1’ 'LEND’ 'PRICE' 'EXCHANGE'; SUBC> vif; SUBC> dw. Regression Analysis The regression equation is EXPORTS = M LEND PRICE EXCHANGE Predictor Coef Stdev t-ratio p VIF Constant M LEND PRICE EXCHANGE s = R-sq = 82.5% R-sq(adj) = 81.4% Analysis of Variance SOURCE DF SS MS F p Regression Error Total Durbin-Watson statistic = 2.58 Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

56 Using the Computer: SAS (continued)
Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP M LEND PRICE EXCHANGE Variance Variable DF Inflation INTERCEP M LEND PRICE EXCHANGE Durbin-Watson D (For Number of Obs.) 1st Order Autocorrelation Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

57 11-15: The Matrix Approach to Regression Analysis (1)
Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

58 The Matrix Approach to Regression Analysis (2)
Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

59 (Degree awarded by GC University)
Name                                       Shakeel Nouman Religion                                  Christian Domicile                            Punjab (Lahore) Contact #                            E.Mail                                M.Phil (Statistics) GC University, . (Degree awarded by GC University)   M.Sc   (Statistics)   GC University, . Statitical Officer (BS-17) (Economics & Marketing Division) Livestock Production Research Institute Bahadurnagar (Okara), Livestock & Dairy Development Department, Govt. of Punjab Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer


Download ppt "Multiple Regression (1)"

Similar presentations


Ads by Google