Presentation is loading. Please wait.

Presentation is loading. Please wait.

Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.

Similar presentations


Presentation on theme: "Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example."— Presentation transcript:

1 Topic 14: Inference in Multiple Regression

2 Outline Review multiple linear regression Inference of regression coefficients –Application to book example Inference of mean –Application to book example Inference of future observation Diagnostics and remedies

3 Data for Multiple Regression Y i is the response variable X i1, X i2, …, X i,p-1 are the p-1 explanatory variables Y i, X i1, X i2, …, X i,p-1 are the data for case i, where i = 1 to n

4 Multiple Regression Model Y i = β 0 + β 1 X i1 + β 2 X i2 +…+ β p-1 X i,p-1 + e i Y i is the value of the response variable for the i th case β 0 is the intercept β 1, β 2, …, β p-1 are the regression coefficients for the explanatory variables e i are independent Normally distributed random errors with mean 0 and variance σ 2

5 Least Squares Solutions s 2 = MSE= s = Root MSE

6 ANOVA F-test H 0 : β 1 = β 2 = … = β p-1 = 0 H a : β k ≠ 0, for at least one k=1,2,…,p-1 Under H 0, F ~ F(p-1,n-p) Reject H 0 if F is large, using P-value we reject if the P-value ≤ 0.05

7 Inference for individual regression coefficients We can show b ~ N(β, σ 2 (X΄X) -1 ) Define

8 Significance Test for β k H 0 : β k = 0 Same test statistic t * = b k /s(b k ) Still use df E which now equals n-p P-value computed from t(n-p) dist This tests the significance of a variable given the other variables are already in the model (i.e., fitted last)

9 Confidence interval for β k CI: b k ± t c s(b k ), where t c = t(.975, n-p) Same form as before but df E now equals n-p This interval describes region of b k given the other variables are in the model

10 Example II (KNNL p 236) Dwaine Studios, Inc. operates portrait studios in 21 cities of medium size Y i is sales in city i X 1 : population aged 16 and under X 2 : per capita disposable income

11 Read in the data data a1; infile ‘../data/ch06fi05.txt'; input young income sales; proc print data=a1; run;

12 Partial Proc Print Results Obs young income sales 1 68.5 16.7 174.4 2 45.2 16.8 164.4 3 91.3 18.2 244.2 4 47.8 16.3 154.6 5 46.9 17.3 181.6

13 Proc Reg proc reg data=a1; model sales=young income; run;

14 Output Analysis of Variance SourceDF Sum of Squares Mean SquareF ValuePr > F Model2240151200899.10<.0001 Error182180.9274121.1626 Corrected Total2026196 Root MSE11.00739R-Square0.917 At least one variable is helpful in predicting in sales

15 Output Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept1-68.8570760.01695-1.150.2663 young11.454560.211786.87<.0001 income19.365504.063962.300.0333 Both variables are helpful in explaining sales after the other is already in the model

16 CLB option Used to get confidence intervals for each coefficient proc reg data=a1; model sales=young income/clb; run;

17 Output Parameter Estimates VariableDF Parameter Estimate Standard Error 95% Confidence Limits Intercept1-68.8570760.01695-194.9480157.23387 young11.454560.211781.009621.89950 income19.365504.063960.8274417.90356

18 What if just young fit? Parameter Estimates VariableDF Parameter Estimate Standard Error 95% Confidence Limits Intercept168.045369.4622448.2406687.85006 young11.835880.146411.529432.14233 CIs for both the intercept and young change dramatically when just young as explanatory variable

19 Estimation of E(Y h ) X h is now a vector that looks like (1, X h1, X h2, …, X h,p-1 )΄ We want a point estimate and a confidence interval for the subpopulation mean corresponding to the set of explanatory variables X h

20 Theory for E(Y h )

21 Using CLM option proc reg data=a1; model sales=young income/clm; id young income; run; Adds them to output table

22 CLM Output Output Statistics Obsyoungincome Dependent Variable Predicted Value Std Error Mean Predict95% CL Mean 168.516.7174.4000187.18413.8409179.1146195.2536 245.216.8164.4000154.22943.5558146.7591161.6998 391.318.2244.2000234.39634.5882224.7569244.0358 447.816.3154.6000153.32853.2331146.5361160.1210 546.917.3181.6000161.38494.4300152.0778170.6921 2152.316.0166.5000157.06444.0792148.4944165.6344

23 Prediction of Y h X h is still a vector of form (1, X h1, X h2, …, X h,p-1 )΄ We want a prediction of Y h based on a set of predictor values with an interval that expresses the uncertainty in our prediction

24 Theory for Y h

25 Using the CLI option proc reg data=a1; model sales=young income/cli; id young income; run; Adds them to output table

26 CLI Output Output Statistics Obsyoungincome Dependent Variable Predicted Value Std Error Mean Predict95% CL Predict 168.516.7174.4000187.18413.8409162.6910211.6772 245.216.8164.4000154.22943.5558129.9271178.5317 391.318.2244.2000234.39634.5882209.3421259.4506 2152.316.0166.5000157.06444.0792132.4018181.7270

27 Diagnostics Look at the distribution of each variable Look at the relationship between pairs of variables Plot the residuals versus –the predicted/fitted values –each explanatory variable –time (if available)

28 Diagnostics Are the residuals approximately Normal –Look at a histogram –Normal quantile plot Is the variance constant –Plot the residuals vs anything that might be related to the variance (e.g. residuals vs predicted values & residuals versus each X)

29

30

31 Remedies Similar remedies as simple regression Transformations such as Box-Cox Analyze with/without outliers More detail in KNNL Ch 9 and 10

32 Background Reading We finished Chapter 6. Program used to generate output for confidence intervals for means and prediction intervals is topic14.sas


Download ppt "Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example."

Similar presentations


Ads by Google