Introduction to SAS Essentials Mastering SAS for Data Analytics

Introduction to SAS Essentials Mastering SAS for Data Analytics
Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward

Chapter 14: ANALYSIS OF VARIANCE Part 2
SAS ESSENTIALS -- Elliott & Woodward

LEARNING OBJECTIVES • To be able to use SAS® procedures to perform analysis of covariance • To be able to use SAS procedures to perform two-factor AN OVA using PROC GLM • To be able to perform two-factor ANOVA using PROC MIXED • To be able to use SAS procedures to perform repeated measures with a grouping factor SAS ESSENTIALS -- Elliott & Woodward

14.1 ANALYSIS OF COVARIANCE
An analysis of covariance (ANCOVA) is a combination of analysis of variance and regression. The covariate is a quantitative variable that is related to the dependent variable. The covariate is not controlled by the investigator but is some value intrinsic to the subject (or entity). In ANCOVA, the group means are adjusted by the covariate, and these adjusted means are compared with each other. Including this covariate variable in the model may explain some of the variability, resulting in a more powerful statistical test. SAS ESSENTIALS -- Elliott & Woodward

An Example ANCOVA Analysis
A fifth-grade math teacher randomly assigns the 18 students in her class to three different teaching methods for individualized instruction of a certain concept. The outcome of interest is the score on an exam over the concept after the instruction period. An exam of basic math skills (EXAM) is given to the students before beginning the instruction. (This is the covariate) The variable FINAL is compared across the three teaching methods. The covariate EXAM is assumed to be linearly related to FINAL. The ANCOVA tests the null hypothesis that the FINAL means for the three methods adjusted by EXAM are not different. SAS ESSENTIALS -- Elliott & Woodward

Data for the Example METHOD EXAM FINAL 1 16 38 17 39 15 41 23 47 12 33
13 37 2 22 10 30 19 45 Etc… Notice METHOD is the teaching method. EXAM is the exam given at the first of the class (the covariate) and FINAL is the final test outcome. SAS ESSENTIALS -- Elliott & Woodward

The ANCOVA Steps… In this text, this ANCOVA analysis is shown as a multistep approach. 1. Perform a test to determine if the FINAL-by-EXAM linear relationships by METHOD are parallel. That is, when you compare the regression lines for each of the three METHODs, the lines should theoretically be parallel. If the lines are sufficiently nonparallel, ANCOVA is not the appropriate analysis to perform. The statistical test used for parallelism is an F-test. 2. If the F-test in step 1 does not reject parallelism, then another F-test is used to compare the adjusted means across METHOD. 3. If there are differences in means, then appropriate multiple comparisons can be performed to determine which groups differ. Do the Hands On Example p 329 (AGLM ANCOVA.SAS) SAS ESSENTIALS -- Elliott & Woodward

Steps for ANCOVA Analysis
Part 1 – this model includes an interaction term (EXAM*METHOD) that test for parallelism. PROC GLM; CLASS METHOD; MODEL FINAL=EXAM METHOD EXAM*METHOD; TITLE 'Analysis of Covariance Example'; RUN; MODEL FINAL=EXAM METHOD; LSMEANS METHOD/STDERR PDIFF ADJUST=SCHEFFE; QUIT; If the interaction term is not significant, drop it from the model. The LSMEANS code performs a post hoc test to compare (adjusted) METHOD means (if the Ho for equal means is rejected.) SAS ESSENTIALS -- Elliott & Woodward

Step 1: Preliminary Results
The test for interaction is non-significant (which is good!) The test for interaction is asking the question – are these lines close to parallel? When you do not reject Ho for EXAM*METHOD (when p>0.05) you conclude that the lines are close to parallel, so you can consider them parallel in the primary analysis. (next slide) SAS ESSENTIALS -- Elliott & Woodward

Step 2: Analysis of Covariance Analysis
With the interaction out of the model, you now test for a significant METHOD effect, In this case P<0.05, so you conclude that the (adjusted) means by METHOD are different. By assuming that the lines are parallel, the ANCOVA is assuming that the actual picture of the regression lines looks like this. SAS ESSENTIALS -- Elliott & Woodward

Post Hoc Comparisons Because you conclude that the (adjusted) means by METHOD are different, you perform a post hoc multiple comparison test to determine which means are different. These result from the code: LSMEANS METHOD/STDERR PDIFF ADJUST=SCHEFFE; From this, conclude that means 1 and 3 are different (p=0.0056), means 2 and 3 are different (p=0.0007), and means 1 and 2 are not different (p=0.684). SAS ESSENTIALS -- Elliott & Woodward

Overall Conclusion Because a high FINAL score is the goal, there is evidence to support the contention that the adjusted mean of for METHOD 3 is a significantly higher score than for METHODs 1 and 2, and thus that METHOD 3 is the preferred method. SAS ESSENTIALS -- Elliott & Woodward

Two-Factor ANOVA Using PROC GLM
A two-way ANOVA is an analysis that allows you to simultaneously evaluate the effects of two experimental variables (factors). Each factor is a "grouping" variable such as type of treatment, gender, brand, and so on. The two-way ANOVA tests determine whether the factors are important (significant) either separately (called main effects) or in combination (via an interaction), which is the combined effect of the two factors. SAS ESSENTIALS -- Elliott & Woodward

p x q factorial design The dimensions of a factorial design depend on how many levels of each factor are used. For example, a design in which the first factor has two categories and the second has three categories is called a 2 x 3 (2 by 3) factorial design. Generally, the two-way ANOVA is called a p x q factorial design. SAS ESSENTIALS -- Elliott & Woodward

Understanding Fixed and Random Factors
FIXED FACTORS: When the factor levels completely define all the classifications of interest. (Example: Gender) RANDOM FACTOR: When the levels used in the experiment are randomly selected from the population of possible levels. (Example 3 different doses of a drug, when there are many possible doses) Two-way ANOVA models are classified as: • Both factors fixed: Model I ANOVA • Both factors random: Model II ANOVA • One random, one fixed: Model III ANOVA SAS ESSENTIALS -- Elliott & Woodward

Interaction Effect Interaction implies that the pattern of means across groups is inconsistent (as illustrated below) If there is an interaction effect, main effects cannot be examined directly because the interaction effect shows that differences across one main effect are not consistent across all levels of the other factor. Interaction No Interaction SAS ESSENTIALS -- Elliott & Woodward

Typical Hypotheses for Model I Design
Test for an interaction effect: H0: There is no interaction effect. Ha: There is an interaction effect. If there is no interaction effect, it makes sense to test hypotheses about main effects. The "main effects" hypotheses for factor A are as follows: H0: Means are equal across levels of A summed over B. Ha: Means are not equal across levels of A summed over B. A similar hypothesis is used for B main effects. Do Hands on Example p 335 (AGLM 2FACTOR.SAS) SAS ESSENTIALS -- Elliott & Woodward

Code for a Two-Factor ANOVA
The MODEL statement defines the two-way analysis with the two factors CONDITION and STATUS and an interaction test, CONDITION*STATUS; PROC GLM; CLASS CONDITION STATUS; MODEL RESPONSE=CONDITION STATUS CONDITION*STATUS; MEANS CONDITION STATUS RUN; QUIT; The MEANS statement displays simple statistics by factor. SAS ESSENTIALS -- Elliott & Woodward

Results for a Two-Way ANOVA
Results are in the TYPE III SS table output. Typically, you first check to see if there is a significant interaction effect. In this case p=0.79, so you conclude NO significant interaction effect. If there is no interaction, examine the main effects tests in the ANOVA table by comparing marginal means. SAS ESSENTIALS -- Elliott & Woodward

Examine Marginal Effects (since no interaction)
Marginal effects tests indicate no CONDITION effect (p=0.67), and a statistically significant STATUS effect (p=0.0056) You conclude that there is no difference in means across CONDITION, but there is a difference in means across STATUS. SAS ESSENTIALS -- Elliott & Woodward

Graphical Results for Two–Way ANOVA
Because there are only two categories of STATUS, we do not need to perform multiple comparisons, and the clear conclusion is for STATUS1, (Mean loss = -6.9) (i.e., those who had previously been trying to lose weight) had significantly more weight loss on average than those in STATUS2 (Mean loss= ) Mean loss for each level of STATUS SAS ESSENTIALS -- Elliott & Woodward

14.2 GOING DEEPER: TWO-FACTOR ANOVA USING PROC MIXED
PROC MIXED performs mixed-model analysis of variance using features not available in PROC GLM. PROC GLM • Is designed primarily for fixed effects models. • Calculates based on ordinary least squares and method of moments. • Defines all effects as fixed and adjusts for random effects after estimation. • Can perform mixed model analysis, but some results are not optimally calculated. PROC MIXED • Is designed for mixed effects models. • Uses generalized least squares for fixed effects and Restricted Maximum Likelihood Estimation (RMLE) to estimate variance components. • Allows selection of correlation models. Do Hands on Example p 338 (AMIXED1.SAS) SAS ESSENTIALS -- Elliott & Woodward

Code for PROC MIXED ANOVA
PROC MEANS MAXDEC=2 MEAN DATA=HEIGHTS; CLASS FAMILY GENDER; VAR HEIGHT; PROC MIXED; MODEL HEIGHT = GENDER; RANDOM FAMILY FAMILY*GENDER; RUN; Displays simple statistics by factors. Categorical Factors Note how the RANDOM statement indicates which factors are random. (Any interaction with a random factor is random.) SAS ESSENTIALS -- Elliott & Woodward

How the PROC MIXED statement is constructed
CLASS FAMILY GENDER; MODEL HEIGHT = GENDER; RANDOM FAMILY FAMILY*GENDER; • FAMILY and GENDER appear in the CLASS statement because they are both grouping-type factors. • The MODEL statement includes only the fixed factor GENDER. • The RANDOM statement includes the random factors FAMILY and FAMILY* GENDER. SAS ESSENTIALS -- Elliott & Woodward

Output for Analysis (Means)
First output is means by factor. SAS ESSENTIALS -- Elliott & Woodward

Output Testing the Fixed Factor
This shows the results of a hypothesis test that there is no difference in means by GENDER, which is marginally non-significant (p = 0.07). SAS reports no test involving FAMILY. SAS ESSENTIALS -- Elliott & Woodward

What If Random had not be properly assigned
Consider the model statement MODEL HEIGHT = GENDER FAMILY GENDER*FAMILY; The results would have been This model results in a DIFFERENT ANSWER! In this case GENDER is significant. SAS ESSENTIALS -- Elliott & Woodward

14.3 GOING DEEPER: REPEATED MEASURES WITH A GROUPING FACTOR
A common design in medical research and other settings is to observe data for the same subject under different circumstances or over time (longitudinal). Although it is also possible to perform this analysis using PROC GLM, the approach used in PROC MIXED is preferred. In addition, a distinct advantage of using PROC MIXED is that it allows missing values in the design, whereas PROC GLM deletes an entire record from analysis if there is one missing observation. Do Hands On Example p 341 (AREPEAT1.SAS) SAS ESSENTIALS -- Elliott & Woodward

Data for Repeated Measures Analysis
Suppose you observe a response to a drug over 4 hours for seven subjects, three male and four female. In this case, the repeated measure, HOUR, is longitudinal. SUB GENDER HOUR1 HOUR2 HOUR3 HOUR4 1 M 1.5 6 5.1 2 4 2.2 6.1 5.2 3 4.1 6.8 3.2 F 3.3 4.2 4.8 5 6.3 4.9 6.9 8.2 5.9 9.5 9.1 7 8.3 Missing 9.2 SAS ESSENTIALS -- Elliott & Woodward

Create a Revised data set
DATA REPMIXED(KEEP= SUBJECT GENDER TIME OUTCOME); INPUT SUBJECT GENDER $ HOUR1-HOUR4 ; OUTCOME = HOUR1; TIME = 1; OUTPUT; OUTCOME = HOUR2; TIME = 2; OUTPUT; OUTCOME = HOUR3; TIME = 3; OUTPUT; OUTCOME = HOUR4; TIME = 4; OUTPUT; DATALINES; 1 M 2 M 3 M Etc… (a) assign each of the four HOUR values to the variable OUTCOME (b) create a variable called TIME that contains the time marker (1-4) (c) for each of these assignments, output the variables to the new data set REPMIXED using the OUTPUT statement. The data must be manipulated to put it in the right format for the analysis. In this case an HOURx and TIME variables are created and output in a data set named REPMIXED SAS ESSENTIALS -- Elliott & Woodward

The revised data set: This shows the revised data set: (Partially shown here) SAS ESSENTIALS -- Elliott & Woodward

Basic Model for this Analysis
PROC MIXED DATA=REPMIXED; CLASS GENDER TIME SUBJECT; MODEL OUTCOME=GENDER TIME GENDER*TIME; REPEATED / TYPE=UN SUB=SUBJECT; RUN; All factors are FIXED in this analysis The REPEATED statement tells SAS that SUBJECT is repeated (there are 4 per observation.) The TYPE=UN option tells SAS to use an unstructured covariance matrix SAS ESSENTIALS -- Elliott & Woodward

Covariance Structures
The best fitting model for an analysis may depend on how the variance components are structures – which best explains the variance in the mode. Some covariance structures include: • AR(1) (Autoregressive) assumes that nearby measurements are correlated and decline exponentially with time. That is, measurements at TIME1 and TIME2 are more highly correlated than are measurements at TIME1 and TIME3. • CS (Compound symmetry) assumes homogeneous variances that are constant regardless of how far apart the measurements are. • UN (Unstructured) allows all variances to be different. SAS ESSENTIALS -- Elliott & Woodward

What is the best Covariance Structure?
Look at several possible models… PROC MIXED DATA=REPMIXED; CLASS GENDER TIME SUBJECT; MODEL OUTCOME=GENDER TIME GENDER*TIME; REPEATED / TYPE=UN SUB=SUBJECT; RUN; REPEATED / TYPE=CS SUB=SUBJECT; REPEATED / TYPE=AR(1) SUB=SUBJECT; Each model has a different TYPE= selected SAS ESSENTIALS -- Elliott & Woodward

Results from Multiple Models
In this example, based on the AIC criterion, the AR(1) specification best fits the data. (Smaller is better.) • AIC(unstructured)=71.0 • AIC(compound symmetry)=77.6 • AIC(autoregressive) = 70.2 Best result AIC is found in this table of the output. This is the output table using the TYPE=AR(1) option. SAS ESSENTIALS -- Elliott & Woodward

Choose which Analysis to Use
The results from the TYPE=AR(1) analysis are used as the final model, since it had the best fit. Conclude that there is no interaction effect, that there is a TIME effect, and no GENDER effect (or a marginal GENDER effect.) SAS ESSENTIALS -- Elliott & Woodward

Visual Examination of the Results
Use PROC GPLOT to display the means by factor. (Using AREPEAT2 code) PROC SORT DATA=REPMIXED;BY GENDER TIME; PROC MEANS noprint; BY GENDER TIME; OUTPUT OUT=FORPLOT MEAN=; RUN; TITLE "Repeated Measures Example"; PROC GPLOT; PLOT OUTCOME*GENDER=TIME; SYMBOL1 V=CIRCLE I=JOIN L=1 C=BLACK; SYMBOL2 V=DOT I=JOIN L=2 C=BLUE; SYMBOL3 V=STAR I=JOIN L=2 C=RED; SYMBOL4 V=SQUARE I=JOIN L=2 C=GREEN; Output means to plot using PROC MEANS Using the outputted means in the file FORPLOT, plot the results using PROC GPLOT SAS ESSENTIALS -- Elliott & Woodward

Results of PROC GPLOT Visual confirmation that there is a TIME difference… at least it appear TIME2 (blue)is always lower than TIME2 (red) SAS ESSENTIALS -- Elliott & Woodward

A Second Plot Use PROC PLOT to produce a second plot showing the same data in a different perspective… SAS ESSENTIALS -- Elliott & Woodward

Comparisons To determine which TIMEs are significantly different, include the following option in the section of PROC MIXED code related to the AR(1) covariance structure. LSMEANS TIME/PDIFF; Indicates a significant difference in marginal means from TIME1 to TIME2 (difference =1.36) p=0.0016… and so on… SAS ESSENTIALS -- Elliott & Woodward

14.4 SUMMARY This chapter discusses the use of PROC GLM to perform ANCOVA and two-factor ANOVA with fixed effects. We also introduce PROC MIXED and discuss its use on the two-factor mixed model and a repeated measures analysis with a grouping variable. Continue to Chapter 15:NONPARAMETRIC ANALYSIS SAS ESSENTIALS -- Elliott & Woodward

These slides are based on the book:
Introduction to SAS Essentials Mastering SAS for Data Analytics, 2nd Edition By Alan C, Elliott and Wayne A. Woodward Paperback: 512 pages Publisher: Wiley; 2 edition (August 3, 2015) Language: English ISBN-10: X ISBN-13: These slides are provided for you to use to teach SAS using this book. Feel free to modify them for your own needs. Please send comments about errors in the slides (or suggestions for improvements) to Thanks. SAS ESSENTIALS -- Elliott & Woodward

Introduction to SAS Essentials Mastering SAS for Data Analytics

Similar presentations

Presentation on theme: "Introduction to SAS Essentials Mastering SAS for Data Analytics"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to SAS Essentials Mastering SAS for Data Analytics

Similar presentations

Presentation on theme: "Introduction to SAS Essentials Mastering SAS for Data Analytics"— Presentation transcript:

Similar presentations

About project

Feedback