Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stepwise Regression SAS. Download the Data atData.htmhttp://core.ecu.edu/psyc/wuenschk/StatData/St atData.htm.

Similar presentations


Presentation on theme: "Stepwise Regression SAS. Download the Data atData.htmhttp://core.ecu.edu/psyc/wuenschk/StatData/St atData.htm."— Presentation transcript:

1 Stepwise Regression SAS

2 Download the Data http://core.ecu.edu/psyc/wuenschk/StatData/St atData.htmhttp://core.ecu.edu/psyc/wuenschk/StatData/St atData.htm 3.2 625 540 65 2.7 4.1 575 680 75 4.5 3.0 520 480 65 2.5 2.6 545 520 55 3.1 3.7 520 490 75 3.6 4.0 655 535 65 4.3 4.3 630 720 75 4.6 2.7 500 500 75 3.0 and so on

3 Download the SAS Code http://core.ecu.edu/psyc/wuenschk/SAS/SAS- Programs.htmhttp://core.ecu.edu/psyc/wuenschk/SAS/SAS- Programs.htm data grades; infile 'C:\Users\Vati\Documents\StatData\MultReg.dat'; input GPA GRE_Q GRE_V MAT AR; PROC REG; a: MODEL GPA = GRE_Q GRE_V MAT AR / STB SCORR2 selection=forward slentry =.05 details; run;

4 Forward Selection, Step 1 Statistics for Entry DF = 1,28 VariableToleranceModel R-Square F ValuePr > F GRE_Q1.0000000.373516.690.0003 GRE_V1.0000000.338114.300.0008 MAT1.0000000.365116.100.0004 AR1.0000000.385317.550.0003 All predictors have p < the slentry value of.05. AR has the lowest p. AR enters first.

5 Step 2 Statistics for Entry DF = 1,27 VariableToleranceModel R-Square F ValuePr > F GRE_Q0.7420990.50336.410.0174 GRE_V0.8357140.51557.260.0120 MAT0.7245990.49235.690.0243 All predictors have p < the slentry value of.05. GRE-V has the lowest p. GRE-V enters second.

6 Step 3 Statistics for Entry DF = 1,26 VariableToleranceModel R-Square F ValuePr > F GRE_Q0.6598210.57163.410.0764 MAT0.6703040.57193.420.0756 No predictor has p <.05, forward selection terminates.

7 The Final Model Parameter Estimates VariableDFParameter Estimate Standard Error t ValuePr > |t|Standard ized Estimate Squared Semi- partial Corr Type II Intercept10.497180.576520.860.39610. GRE_V10.002850.001062.690.01200.394700.13020 AR10.329630.104833.140.00400.460740.17740 R 2 =.516, F(2, 27) = 14.36, p <.001

8 Backward Selection b: MODEL GPA = GRE_Q GRE_V MAT AR / STB SCORR2 selection=backward slstay =.05 details; run; We start out with a simultaneous multiple regression, including all predictors. Then we trim that model.

9 Step 1 VariableParameter Estimate Standard Error Type II SSF ValuePr > F Intercept-1.738110.950740.501533.340.0795 GRE_Q0.004000.001830.715824.770.0385 GRE_V0.001520.001050.315882.110.1593 MAT0.020900.009550.718614.790.0382 AR0.144230.113000.244481.630.2135 GRE-V and AR have p values that exceed the slstay value of.05. AR has the larger p, it is dropped from the model.

10 Step 2 Statistics for Removal DF = 1,26 VariablePartial R-Square Model R-Square F ValuePr > F GRE_Q0.12360.49358.390.0076 GRE_V0.03400.58302.310.1405 MAT0.13180.48528.950.0060 Only GRE_V has p >.05, it is dropped from the model.

11 Step 3 Statistics for Removal DF = 1,27 VariablePartial R-Square Model R-Square F ValuePr > F GRE_Q0.21790.365114.110.0008 MAT0.20950.373513.560.0010 No predictor has p <.05, backwards elimination halts.

12 The Final Model Parameter Estimates VariableDFParameter Estimate Standard Error t ValuePr > |t|Standard ized Estimate Squared Semi- partial Corr Type II Intercept1-2.129380.92704-2.300.02960. GRE_Q10.005980.001593.760.00080.484380.21791 MAT10.030810.008363.680.00100.474940.20950 R 2 =.5183, F(2, 27) = 18.87, p <.001

13 What the F Test? Forward selection led to a model with AR and GRE_V Backward selection led to a model with MAT and GRE_Q. I am getting suspicious about the utility of procedures like this.

14 Fully Stepwise Selection c: MODEL GPA = GRE_Q GRE_V MAT AR / STB SCORR2 selection=stepwise slentry=.08 slstay =.08 details; run; Like forward selection, but, once added to the model, a predictor is considered for elimination in subsequent steps.

15 Step 3 Steps 1 and 2 are identical to those of forward selection, but with slentry set to.08, MAT enters the model. Statistics for Entry DF = 1,26 VariableToleranceModel R-Square F ValuePr > F GRE_Q0.6598210.57163.410.0764 MAT0.6703040.57193.420.0756

16 Step 4 GRE_Q enters. Now we have every predictor in the model Statistics for Entry DF = 1,25 VariableToleranceModel R-Square F ValuePr > F GRE_Q0.6532360.64054.770.0385

17 Step 5 Once GRE_Q is in the model, AR and GRE_V become eligible for removal. Statistics for Removal DF = 1,25 VariablePartial R-Square Model R-Square F ValuePr > F GRE_Q0.06860.57194.770.0385 GRE_V0.03030.61022.110.1593 MAT0.06890.57164.790.0382 AR0.02340.61701.630.2135

18 Step 6 AR out, GRE_V still eligible for removal. Statistics for Removal DF = 1,26 VariablePartial R-Square Model R-Square F ValuePr > F GRE_Q0.12360.49358.390.0076 GRE_V0.03400.58302.310.1405 MAT0.13180.48528.950.0060

19 Step 7 At this point, no variables in the model are eligible for removal And no variables not in the model are eligible for entry. The final model includes MAT and GRE_Q Same as the final model with backwards selection.

20 R-Square Selection d: MODEL GPA = GRE_Q GRE_V MAT AR / selection=rsquare cp mse; run; Test all one predictor models, all two predictor models, and so on. Goal is the get highest R 2 with fewer than all predictors.

21 One Predictor Models Number in Model R-SquareC(p)MSEVariables in Model 10.385316.74420.22908AR 10.373517.56420.23348GRE_Q 10.365118.14900.23661MAT 10.338120.02680.24667GRE_V

22 One Predictor Models AR yields the highest R 2 C(p) = 16.74, MSE =.229 Mallows says best model will be that with small C(p) and value of C(p) near that of p (number of parameters in the model). p here is 2 – one predictor and the intercept Howell suggests one keep adding predictors until MSE starts increasing.

23 Two Predictor Models Number in Model R-SquareC(p)MSEVariables in Model 20.58304.99630.16116GRE_Q MAT 20.51559.69080.18725GRE_V AR 20.503310.53880.19196GRE_Q AR 20.493511.22150.19575GRE_V MAT 20.492311.30190.19620MAT AR 20.485211.79430.19894GRE_Q GRE_V

24 Two Predictor Models Compared to the best one predictor model, that with MAT and GRE_Q has –Considerably higher R 2 –Considerably lower C(p) –Value of C(p), 5, close to value of p, 3. –Considerably lower MSE

25 Three Predictor Models Number in Model R-SquareC(p)MSEVariables in Model 30.61704.62920.15369GRE_Q GRE_V MAT 30.61025.10500.15644GRE_Q MAT AR 30.57197.77020.17182GRE_V MAT AR 30.57167.78880.17193GRE_Q GRE_V AR

26 Three Predictor Models Adding GRE_V to the best two predictor model (GRE_Q and MAT) –Slightly increases R 2 (from.58 to.62) –Reduces [C(p) – p] from 2 to.6 –Reduces MSE from.16 to.15 None of these stats impress me much, I am inclined to take the GRE_Q, MAT model as being best.

27 Closer Look at MAT, GRE_Q, GRE_V e: MODEL GPA = GRE_Q GRE_V MAT / STB SCORR2; run; Parameter Estimates VariableDFParameter Estimate Standard Error t ValuePr > |t|Standard ized Estimate Squared Semi- partial Corr Type II Intercept1-2.148770.90541-2.370.02530. GRE_Q10.004930.001702.900.00760.399220.12357 GRE_V10.001610.001061.520.14050.223170.03404 MAT10.026120.008732.990.00600.402670.13180

28 Keep GRE_V or Not ? It does not have a significant partial effect in the model, why keep it? Because it is free info. You get GRE-V and GRE_Q for the same price as GRE_Q along. Equi donati dentes non inspiciuntur. –As (gift) horses age, their gums recede, making them look long in the tooth.

29 Add AR ? R 2 increases from.617 to.640 C(p) = p (always true in full model) MSE drops from.154 to.150 Getting AR data is expensive Stop gathering the AR data, unless it has some other value.

30 Conclusions Read http://core.ecu.edu/psyc/wuenschk/StatHel p/Stepwise-Voodoo.htm http://core.ecu.edu/psyc/wuenschk/StatHel p/Stepwise-Voodoo.htm Treat all claims based on stepwise algorithms as if they were made by Saddam Hussein on a bad day with a headache having a friendly chat with George Bush.


Download ppt "Stepwise Regression SAS. Download the Data atData.htmhttp://core.ecu.edu/psyc/wuenschk/StatData/St atData.htm."

Similar presentations


Ads by Google