Stepwise Regression SAS. Download the Data atData.htmhttp://core.ecu.edu/psyc/wuenschk/StatData/St atData.htm.

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

All Possible Regressions and Statistics for Comparing Models
Chapter 5 Multiple Linear Regression
Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.
Statistical Techniques I EXST7005 Multiple Regression.
Best subsets regression
/k 2DS00 Statistics 1 for Chemical Engineering lecture 4.
Topic 15: General Linear Tests and Extra Sum of Squares.
Multiple Linear Regression
Comparing Regression Lines
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.
Psychology 202b Advanced Psychological Statistics, II February 1, 2011.
Multiple regression analysis
Psychology 202b Advanced Psychological Statistics, II February 22, 2011.
Part I – MULTIVARIATE ANALYSIS C3 Multiple Linear Regression II © Angel A. Juan & Carles Serrat - UPC 2007/2008.
Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
Statistics 350 Lecture 25. Today Last Day: Start Chapter 9 ( )…please read 9.1 and 9.2 thoroughly Today: More Chapter 9…stepwise regression.
1 Chapter 9 Variable Selection and Model building Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Path Analysis. Figure 1 Exogenous Variables Causally influenced only by variables outside of the model. SES and IQ in Figure 1. The two-headed arrow.
Module 32: Multiple Regression This module reviews simple linear regression and then discusses multiple regression. The next module contains several examples.
Topic 16: Multicollinearity and Polynomial Regression.
Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
Model Selection1. 1. Regress Y on each k potential X variables. 2. Determine the best single variable model. 3. Regress Y on the best variable and each.
Regression For the purposes of this class: –Does Y depend on X? –Does a change in X cause a change in Y? –Can Y be predicted from X? Y= mX + b Predicted.
The Goal of MLR  Types of research questions answered through MLR analysis:  How accurately can something be predicted with a set of IV’s? (ex. predicting.
Prediction Confidence Intervals, Cross- validation, and Predictor Selection.
A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics.
Services Providers Conjoint Hotel A $$$ *** Friendly Customized Value___ Hotel B $$$ *** Friendly Standardized Value ___ Hotel C $$$ *** Not Friendly Customized.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Variance Partitions How to Slice a Pie (into Peachy Pieces)
Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.
1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Variable Selection 1 Chapter 8 Variable Selection Terry Dielman Applied Regression Analysis:
I OWA S TATE U NIVERSITY Department of Animal Science Model Development and Selection of Variables Animal Science 500 Lecture No. 11 October 7, 2010.
1 Building the Regression Model –I Selection and Validation KNN Ch. 9 (pp )
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
© Galit Shmueli and Peter Bruce 2010 Chapter 6: Multiple Linear Regression Data Mining for Business Analytics Shmueli, Patel & Bruce.
1 Experimental Statistics - week 13 Multiple Regression Miscellaneous Topics.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Variable selection and model building Part I. Statement of situation A common situation is that there is a large set of candidate predictor variables.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
Multiple Regression Scott Hudson January 24, 2011.
Model selection and model building. Model selection Selection of predictor variables.
Chapter 15 Multiple Regression Model Building
Regression Models First-order with Two Independent Variables
Generalized linear models
Chapter 9 Multiple Linear Regression
Interactions Interaction: Does the relationship between two variables depend on a third variable? Does the relationship of age to BP depend on gender Does.
Forward Selection The Forward selection procedure looks to add variables to the model. Once added, those variables stay in the model even if they become.
Confidence Intervals, Cross-validation, and Predictor Selection
Business Statistics, 4e by Ken Black
6-4 Other Aspects of Regression
Stat 324 – Day 28 Model Validation (Ch. 11).
Simple Linear Regression
Business Statistics, 4e by Ken Black
Presentation transcript:

Stepwise Regression SAS

Download the Data atData.htmhttp://core.ecu.edu/psyc/wuenschk/StatData/St atData.htm and so on

Download the SAS Code Programs.htmhttp://core.ecu.edu/psyc/wuenschk/SAS/SAS- Programs.htm data grades; infile 'C:\Users\Vati\Documents\StatData\MultReg.dat'; input GPA GRE_Q GRE_V MAT AR; PROC REG; a: MODEL GPA = GRE_Q GRE_V MAT AR / STB SCORR2 selection=forward slentry =.05 details; run;

Forward Selection, Step 1 Statistics for Entry DF = 1,28 VariableToleranceModel R-Square F ValuePr > F GRE_Q GRE_V MAT AR All predictors have p < the slentry value of.05. AR has the lowest p. AR enters first.

Step 2 Statistics for Entry DF = 1,27 VariableToleranceModel R-Square F ValuePr > F GRE_Q GRE_V MAT All predictors have p < the slentry value of.05. GRE-V has the lowest p. GRE-V enters second.

Step 3 Statistics for Entry DF = 1,26 VariableToleranceModel R-Square F ValuePr > F GRE_Q MAT No predictor has p <.05, forward selection terminates.

The Final Model Parameter Estimates VariableDFParameter Estimate Standard Error t ValuePr > |t|Standard ized Estimate Squared Semi- partial Corr Type II Intercept GRE_V AR R 2 =.516, F(2, 27) = 14.36, p <.001

Backward Selection b: MODEL GPA = GRE_Q GRE_V MAT AR / STB SCORR2 selection=backward slstay =.05 details; run; We start out with a simultaneous multiple regression, including all predictors. Then we trim that model.

Step 1 VariableParameter Estimate Standard Error Type II SSF ValuePr > F Intercept GRE_Q GRE_V MAT AR GRE-V and AR have p values that exceed the slstay value of.05. AR has the larger p, it is dropped from the model.

Step 2 Statistics for Removal DF = 1,26 VariablePartial R-Square Model R-Square F ValuePr > F GRE_Q GRE_V MAT Only GRE_V has p >.05, it is dropped from the model.

Step 3 Statistics for Removal DF = 1,27 VariablePartial R-Square Model R-Square F ValuePr > F GRE_Q MAT No predictor has p <.05, backwards elimination halts.

The Final Model Parameter Estimates VariableDFParameter Estimate Standard Error t ValuePr > |t|Standard ized Estimate Squared Semi- partial Corr Type II Intercept GRE_Q MAT R 2 =.5183, F(2, 27) = 18.87, p <.001

What the F Test? Forward selection led to a model with AR and GRE_V Backward selection led to a model with MAT and GRE_Q. I am getting suspicious about the utility of procedures like this.

Fully Stepwise Selection c: MODEL GPA = GRE_Q GRE_V MAT AR / STB SCORR2 selection=stepwise slentry=.08 slstay =.08 details; run; Like forward selection, but, once added to the model, a predictor is considered for elimination in subsequent steps.

Step 3 Steps 1 and 2 are identical to those of forward selection, but with slentry set to.08, MAT enters the model. Statistics for Entry DF = 1,26 VariableToleranceModel R-Square F ValuePr > F GRE_Q MAT

Step 4 GRE_Q enters. Now we have every predictor in the model Statistics for Entry DF = 1,25 VariableToleranceModel R-Square F ValuePr > F GRE_Q

Step 5 Once GRE_Q is in the model, AR and GRE_V become eligible for removal. Statistics for Removal DF = 1,25 VariablePartial R-Square Model R-Square F ValuePr > F GRE_Q GRE_V MAT AR

Step 6 AR out, GRE_V still eligible for removal. Statistics for Removal DF = 1,26 VariablePartial R-Square Model R-Square F ValuePr > F GRE_Q GRE_V MAT

Step 7 At this point, no variables in the model are eligible for removal And no variables not in the model are eligible for entry. The final model includes MAT and GRE_Q Same as the final model with backwards selection.

R-Square Selection d: MODEL GPA = GRE_Q GRE_V MAT AR / selection=rsquare cp mse; run; Test all one predictor models, all two predictor models, and so on. Goal is the get highest R 2 with fewer than all predictors.

One Predictor Models Number in Model R-SquareC(p)MSEVariables in Model AR GRE_Q MAT GRE_V

One Predictor Models AR yields the highest R 2 C(p) = 16.74, MSE =.229 Mallows says best model will be that with small C(p) and value of C(p) near that of p (number of parameters in the model). p here is 2 – one predictor and the intercept Howell suggests one keep adding predictors until MSE starts increasing.

Two Predictor Models Number in Model R-SquareC(p)MSEVariables in Model GRE_Q MAT GRE_V AR GRE_Q AR GRE_V MAT MAT AR GRE_Q GRE_V

Two Predictor Models Compared to the best one predictor model, that with MAT and GRE_Q has –Considerably higher R 2 –Considerably lower C(p) –Value of C(p), 5, close to value of p, 3. –Considerably lower MSE

Three Predictor Models Number in Model R-SquareC(p)MSEVariables in Model GRE_Q GRE_V MAT GRE_Q MAT AR GRE_V MAT AR GRE_Q GRE_V AR

Three Predictor Models Adding GRE_V to the best two predictor model (GRE_Q and MAT) –Slightly increases R 2 (from.58 to.62) –Reduces [C(p) – p] from 2 to.6 –Reduces MSE from.16 to.15 None of these stats impress me much, I am inclined to take the GRE_Q, MAT model as being best.

Closer Look at MAT, GRE_Q, GRE_V e: MODEL GPA = GRE_Q GRE_V MAT / STB SCORR2; run; Parameter Estimates VariableDFParameter Estimate Standard Error t ValuePr > |t|Standard ized Estimate Squared Semi- partial Corr Type II Intercept GRE_Q GRE_V MAT

Keep GRE_V or Not ? It does not have a significant partial effect in the model, why keep it? Because it is free info. You get GRE-V and GRE_Q for the same price as GRE_Q along. Equi donati dentes non inspiciuntur. –As (gift) horses age, their gums recede, making them look long in the tooth.

Add AR ? R 2 increases from.617 to.640 C(p) = p (always true in full model) MSE drops from.154 to.150 Getting AR data is expensive Stop gathering the AR data, unless it has some other value.

Conclusions Read p/Stepwise-Voodoo.htm p/Stepwise-Voodoo.htm Treat all claims based on stepwise algorithms as if they were made by Saddam Hussein on a bad day with a headache having a friendly chat with George Bush.