I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500.

Slides:



Advertisements
Similar presentations
Introduction to Data Set Options Mark Tabladillo, Ph.D. Software Developer, MarkTab Consulting Associate Faculty, University of Phoenix January 30, 2007.
Advertisements

Technology Short Courses: Spring 2010 Kentaka Aruga
All Possible Regressions and Statistics for Comparing Models
Minitab® 16 Workshop Presented by Arved Harding Your friendly, neighborhood statistician.
I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science.
Structural Equation Modeling (SEM) – Some Basic Concepts Kathryn Sharpe & Wei Zhu.
Topic 9: Remedies.
The %LRpowerCorr10 SAS Macro Power Estimation for Logistic Regression Models with Several Predictors of Interest in the Presence of Covariates D. Keith.
Statistical Techniques I EXST7005 Multiple Regression.
Simple Logistic Regression
Regression with Autocorrelated Errors U.S. Wine Consumption and Adult Population –
I OWA S TATE U NIVERSITY Department of Animal Science Modifying and Combing SAS Data Sets (Chapter in the 6 Little SAS Book) Animal Science 500 Lecture.
I OWA S TATE U NIVERSITY Department of Animal Science Getting Started Using SAS Software Animal Science 500 Lecture No. 2.
Use of Proc GLM to Analyze Experimental Data
I OWA S TATE U NIVERSITY Department of Animal Science Model Development and Selection of Variables Animal Science 500 Lecture No. 17 October 28, 2010.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Psychology 202b Advanced Psychological Statistics, II February 3, 2011.
Psychology 202b Advanced Psychological Statistics, II February 8, 2011.
Descriptive Statistics In SAS Exploring Your Data.
Quick Data Summaries in SAS Start by bringing in data –Use permanent data set for these examples Proc Tabulate –Produces summaries very quickly and easily.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
More Linear Regression Outliers, Influential Points, and Confidence Interval Construction.
WLS for Categorical Data
EPI809/Spring Testing Individual Coefficients.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression.
Longitudinal Data Analysis: Why and How to Do it With Multi-Level Modeling (MLM)? Oi-man Kwok Texas A & M University.
Slide 1 Detecting Outliers Outliers are cases that have an atypical score either for a single variable (univariate outliers) or for a combination of variables.
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
Chapter Outline 5.0 PROC GLM
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
Discussion 3 1/20/2014. Outline How to fill out the table in the appendix in HW3 What does the Model statement do in SAS Proc GLM ( please download lab.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Multilevel Linear Models Field, Chapter 19. Why use multilevel models? Meeting the assumptions of the linear model – Homogeneity of regression coefficients.
Xuhua Xia Polynomial Regression A biologist is interested in the relationship between feeding time and body weight in the males of a mammalian species.
I OWA S TATE U NIVERSITY Department of Animal Science Getting Your Data Into SAS (Chapter 2 in the Little SAS Book) Animal Science 500 Lecture No. 3 September.
12.1 Heteroskedasticity: Remedies Normality Assumption.
Regression. Population Covariance and Correlation.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Introduction to Multiple Imputation CFDR Workshop Series Spring 2008.
Haas MFE SAS Workshop Lecture 3: Peng Liu Haas School.
Regression in SAS Caitlin Phelps. Importing Data  Proc Import:  Read in variables in data set  May need some options incase SAS doesn’t guess the format.
Regression Analysis Part C Confidence Intervals and Hypothesis Testing
Robust Regression V & R: Section 6.5 Denise Hum. Leila Saberi. Mi Lam.
Lecture 3 Topic - Descriptive Procedures Programs 3-4 LSB 4:1-4.4; 4:9:4:11; 8:1-8:5; 5:1-5.2.
I OWA S TATE U NIVERSITY Department of Animal Science Model Development and Selection of Variables Animal Science 500 Lecture No. 11 October 7, 2010.
Lecture 4 Ways to get data into SAS Some practice programming
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Tutorial I: Missing Value Analysis
1 Modeling change Kristin Sainani Ph.D. Stanford University Department of Health Research and Policy
Multiple Imputation using SAS Don Miller 812 Oswald Tower
Two-Group Discriminant Function Analysis. Overview You wish to predict group membership. There are only two groups. Your predictor variables are continuous.
Chapter 8: Using Basic Statistical Procedures “33⅓% of the mice used in the experiment were cured by the test drug; 33⅓% of the test population were unaffected.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
1 Statistics 262: Intermediate Biostatistics Mixed models; Modeling change.
11.1 Heteroskedasticity: Nature and Detection Aims and Learning Objectives By the end of this session students should be able to: Explain the nature.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
Applied Business Forecasting and Regression Analysis
B&A ; and REGRESSION - ANCOVA B&A ; and
Multiple Linear Regression
…Don’t be afraid of others, because they are bigger than you
Quick Data Summaries in SAS
Multiple Linear Regression
Introduction to Logistic Regression
ADVANCED DATA ANALYSIS IN SPSS AND AMOS
AP Statistics Chapter 12 Notes.
Performing the Runs Test Using SPSS
Presentation transcript:

I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500 Lecture No. 10 October 5, 2010

I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTREG u The purpose of robust regression is to detect outliers and provide stable results in the presence of outliers. n In order to achieve this stability, robust regression limits the influence of outliers. u Outliers can be classified as: n Problems with outliers in the y-direction (response direction) n Problems with multivariate outliers in the x-space (i.e., outliers in the covariate space, which are also referred to as leverage points) n Problems with outliers in both the y-direction and the x-space

I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTREG u Two types of estimations methods u M Estimation - is the method for outlier detection and robust regression when contamination is mainly in the response direction (y) u LTS Estimation - the method used when data contamination occurs in the x space.

I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTREG M-estimation u The following ROBUSTREG statements analyze the data: Proc Robustreg data=stack; model y = x1 x2 x3 / diagnostics leverage; id x1; test x3; run; quit;

I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTREG M-estimation Proc Robustreg data=stack; model y = x1 x2 x3 / diagnostics leverage; id x1; test x3; run; quit; The procedure does M estimation with the bisquare weight function (default), and it uses the median method for estimating the scale parameter. The MODEL statement specifies the covariate effects. The DIAGNOSTICS option requests a table for outlier diagnostics, The LEVERAGE option adds leverage point diagnostic results to this table for continuous covariate effects. The ID statement specifies that variable x1 is used to identify each observation in this table. If the ID statement is missing, the observation number is used to identify the observations (might even be better this way in some cases). Tests of significance for the covariate effects are obtained using the test line with a variable(s) listed with the test term.

I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTREG example output M-estimation The ROBUSTREG Procedure Model Information Data Set WORK.STACK Dependent Variable y Number of Covariates 3 Number of Observations 21 Method M Estimation Summary Statistics Variable Q1 Median Q3 Mean Standard MAD Deviation x x x y

I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTREG example output The ROBUSTREG Procedure Parameter Estimates ParameterDFEstimate Standard Error95% Confidence LimitsChi-SquarePr > ChiSq Intercept <.0001 x <.0001 x x Scale

I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTREG example output M-estimation Diagnostics Obsx1 Mahalanobis Distance Robust MCD Distance Leverage Standardized Robust Residual Outlier * * * * * * The ROBUSTREG Procedure

I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTREG example output M-estimation Diagnostics Summary Observation TypeProportionCutoff Outlier Leverage

I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTREG LTS-estimation u The following statements invoke the ROBUSTREG procedure with the LTS estimation method. Proc Robustreg data=hbk fwls method=lts; model y = x1 x2 x3 / diagnostics leverage; Id index; run; quit;

I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTREG LTS-estimation u The following statements invoke the ROBUSTREG procedure with the LTS estimation method. Proc Robustreg data=hbk fwls method=lts; model y = x1 x2 x3 / diagnostics leverage; Id index; run; quit;

I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTREG Output LTS-estimation The ROBUSTREG Procedure Model Information Data SetWORK.HBK Dependent Variabley Number of Covariates3 Number of Observations75 MethodLTS Estimation

I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTREG Output LTS-estimation The ROBUSTREG Procedure Model Information Data SetWORK.HBK Dependent Variabley Number of Covariates3 Number of Observations75 MethodLTS Estimation Summary Statistics VariableQ1MedianQ3Mean Standard DeviationMAD X X X Y

I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTREG Output LTS-estimation The ROBUSTREG Procedure LTS Profile Total Number of Observations75 Number of Squares Minimized57 Number of Coefficients4 Highest Possible Breakdown Value

I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTREG Output LTS-estimation The ROBUSTREG Procedure LTS Parameter Estimates ParameterDFEstimate Intercept x x x Scale (sLTS) Scale (Wscale)

I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTREG Output LTS-estimation Diagnostics Obsindex Mahalanobis Distance Robust MCD Distance Leverage Standardized Robust Residual Outlier * * * * * * * * * * * * * * * * * * * * * * * *

I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTREG Output LTS-estimation Diagnostics Summary Observation TypeProportionCutoff Outlier Leverage

I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTREG Output LTS-estimation Parameter Estimates for Final Weighted Least Squares Fit ParameterDFEstimate Standard Error 95% Confidence Limits Chi- Square Pr > ChiSq Intercept x x x Scale The final weighted least squares estimates are shown. These estimates are least squares estimates computed after deleting the detected outliers.

I OWA S TATE U NIVERSITY Department of Animal Science PROC RSQUARE u The RSQUARE procedure selects optimal subsets of independent variables in a multiple regression analysis. u Regression coefficients and a variety of statistics useful for model selection can be printed or output to a SAS data set. u In SAS Version 6+, the RSQUARE procedure is subsumed by PROC REG.

I OWA S TATE U NIVERSITY Department of Animal Science PROC RSQUARE u General Form n PROC RSQUARE options; l MODEL dependents=independents/options; l FREQ variable; l WEIGHT variable; l BY variables; n Run; n Quit;

I OWA S TATE U NIVERSITY Department of Animal Science PROC RSQUARE u There must be one or more MODEL statements. u The FREQ, WEIGHT, and BY statements can appear only once. u The MODEL, FREQ, WEIGHT, and BY statements can appear in any order.

I OWA S TATE U NIVERSITY Department of Animal Science PROC RSQUARE Options u The following options can be specified in the PROC statement; n DATA=SASdataset l names the SAS data set to be used. l The data set can be an ordinary SAS data set or a TYPE=CORR, COV, or SSCP data set. If the DATA= option is omitted, RSQUARE uses the most recently created SAS data set. n SIMPLE|S l Prints means and standard deviations for every variable listed in a MODEL statement.

I OWA S TATE U NIVERSITY Department of Animal Science PROC RSQUARE Options u The following options can be specified in the PROC statement; n CORR|C l Pprints the correlation matrix for all variables in the analysis. n NOINT l suppresses the intercept term from all models. n NOPRINT l suppresses the regression printout n OUTEST=SASdataset l creates a TYPE=EST data set containing model-selection statistics and parameter estimates for the selected models.

I OWA S TATE U NIVERSITY Department of Animal Science PROC RSQUARE Options u The options listed in the MODEL Statement section can also be used in the PROC RSQUARE statement. u Any option specified in the PROC statement applies to every MODEL statement except those in which you specify a different value of the option. u Optional statistics will appear in the OUTEST= data set only if the corresponding options are specified in the PROC statement

I OWA S TATE U NIVERSITY Department of Animal Science PROC RSQUARE Model Statement Options u MODEL dependents=independents/options ; u Options are listed after a forward slash that follows the model statement u SELECT=n u Specifies the maximum number of subset models of each size to be printed or output to the OUTEST= data set. u If SELECT= is used without the B option, the variables in each MODEL are listed in order of inclusion instead of the order in which they appear in the MODEL statement. u If SELECT= is omitted and the number of regressors is less than 11, all possible subsets are evaluated. u If SELECT= is omitted and the number of regressors is greater than 10, the number of subsets selected is at most equal to the number of regressors. A small value of SELECT= greatly reduces the CPU time required for large problems.

I OWA S TATE U NIVERSITY Department of Animal Science PROC RSQUARE Model Statement Options u MODEL dependents=independents/options ; u Options are listed after a forward slash that follows the model statement n INCLUDE=i l Requests that the first i variables after the equal sign in the MODEL statement be included in every regression model. l The default status = no variables are required to appear in every model. n START=n l Specifies the smallest number of regressors to be reported in a subset model. The default value is one more than the value specified by the INCLUDE= option, or one if INCLUDE= is omitted.

I OWA S TATE U NIVERSITY Department of Animal Science PROC RSQUARE Model Statement Options u MODEL dependents=independents/options ; u Options are listed after a forward slash that follows the model statement n STOP=n l Specifies the largest number of regressors to be reported in a subset model. The default is the number of regressors listed in the MODEL statement. n ADJRSQ l Computes r-square adjusted for degrees of freedom for each model selected. n CP l Computes Mallows' Cp statistic for each model selected.

I OWA S TATE U NIVERSITY Department of Animal Science PROC RSQUARE Model Statement Options u MODEL dependents=independents/options ; u Options are listed after a forward slash that follows the model statement n JP l Computes Jp, the estimated mean square error of prediction for each model selected assuming that the values of the regressors are fixed and that the model is correct. l The Jp statistic is also called the final prediction error (FPE). n MSE l Computes the mean square error for each model selected. n SSE l Computes the error sum of squares for each model selected. n B l Computes estimated regression coefficients for each model selected

I OWA S TATE U NIVERSITY Department of Animal Science PROC RSQUARE Model Statement Options u MODEL dependents=independents/options ; u The FREQ Statement can also be used in this syntax u The use of FREQ in this sense treats the data set as if each observation appears n times where n is the value of the FREQ variable for the observation. u The total number of observations will be considered equal to the sum of the FREQ variable when the procedure determines the df when calculating significance probabilities. PROC RSQUARE options; MODEL dependents=independents/options; FREQ variable; WEIGHT variable; BY variables; Run; Quit;

I OWA S TATE U NIVERSITY Department of Animal Science PROC RSQUARE Model Statement Options u MODEL dependents=independents/options ; u The FREQ Statement can also be used in this syntax u If your data set includes a variable indicating the frequency of occurrence for other values in the observation, you would include this variables name beside the Freq statement. PROC RSQUARE options; MODEL dependents=independents/options; FREQ variable; WEIGHT variable; BY variables; Run; Quit;

I OWA S TATE U NIVERSITY Department of Animal Science PROC RSQUAREModel Statement Options u MODEL dependents=independents/options ; u The WEIGHT Statement can also be used in this syntax u The WEIGHT statement names a variable in the input data set whose values are relative weights for a weighted least-squares fit. If the weight value is proportional to the reciprocal of the variance for each observation, then the weighted estimates are the best linear unbiased estimates (BLUE). u The WEIGHT and FREQ statements have similar effects, except in the calculation of degrees of freedom. BY Statement PROC RSQUARE options; MODEL dependents=independents/options; FREQ variable; WEIGHT variable; BY variables; Run; Quit;

I OWA S TATE U NIVERSITY Department of Animal Science PROC RSQUARE Model Statement Options u MODEL dependents=independents/options ; u The BY variable can be used in this syntax u The BY statement can be used with PROC RSQUARE u Will result in separate analyses on observations in groups defined by the BY variables. u When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables. u If the data has not been sorted previously in ascending order, u Use PROC SORT procedure with a similar BY statement to sort the data, u Or might be appropriate to use the option NOTSORTED u or DESCENDING if data was previous sorted in the largest to smallest value for some other reason previously. u Most likely you will need to sort the data

I OWA S TATE U NIVERSITY Department of Animal Science PROC RSQUARE Model Statement Options PROC SORT DATA=New by variable1; Run; Quit; PROC RSQUARE options; MODEL dependents=independents/options; FREQ variable; WEIGHT variable; BY variables; Run; Quit;

I OWA S TATE U NIVERSITY Department of Animal Science PROC RSQUARE u What we are building toward using PROC RSQUARE is building the best model or most predictive model. u Topic of next lecture Model Development and Selection of Variables