Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.

Slides:

Advertisements

Similar presentations

Analysis of variance (ANOVA)-the General Linear Model (GLM)

Advertisements

Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution.

Multiple Linear Regression

Analysis of variance (ANOVA)-the General Linear Model (GLM)

Design of Engineering Experiments - Experiments with Random Factors

Lecture 23: Tues., Dec. 2 Today: Thursday:

Part I – MULTIVARIATE ANALYSIS

Analysis of Variance. Experimental Design u Investigator controls one or more independent variables –Called treatment variables or factors –Contain two.

Two-Way ANOVA in SAS Multiple regression with two or

Additional HW Exercise 12.9 (a) The amount of air pressure necessary to crack tubing manufactured by a company is of interest. Mean pressure in hundreds.

Incomplete Block Designs

Analysis of Variance & Multivariate Analysis of Variance

Two-Way Analysis of Variance STAT E-150 Statistical Methods.

Chapter 12 Inferential Statistics Gay, Mills, and Airasian

ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?

Chapter 13: Inference in Regression

Introduction to SAS Essentials Mastering SAS for Data Analytics

Topic 28: Unequal Replication in Two-Way ANOVA. Outline Two-way ANOVA with unequal numbers of observations in the cells –Data and model –Regression approach.

1 Experimental Statistics - week 7 Chapter 15: Factorial Models (15.5) Chapter 17: Random Effects Models.

QNT 531 Advanced Problems in Statistics and Research Methods

Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.

Chapter 13Design & Analysis of Experiments 8E 2012 Montgomery 1.

1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.

More complicated ANOVA models: two-way and repeated measures Chapter 12 Zar Chapter 11 Sokal & Rohlf First, remember your ANOVA basics……….

بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.

Introduction to SAS Essentials Mastering SAS for Data Analytics

23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.

Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.

1 Experimental Statistics - week 2 Review: 2-sample t-tests paired t-tests Thursday: Meet in 15 Clements!! Bring Cody and Smith book.

ANOVA (Analysis of Variance) by Aziza Munir

Randomized Block Design (Kirk, chapter 7) BUSI 6480 Lecture 6.

Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution.

Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.

Testing Hypotheses about Differences among Several Means.

Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.

Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.

Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)

Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.

BUSI 6480 Lecture 8 Repeated Measures.

Chapter 14 Repeated Measures and Two Factor Analysis of Variance

1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics.

11/19/2015Slide 1 We can test the relationship between a quantitative dependent variable and two categorical independent variables with a two-factor analysis.

One-Way Analysis of Covariance (ANCOVA)

Mixed ANOVA Models combining between and within. Mixed ANOVA models We have examined One-way and Factorial designs that use: We have examined One-way.

Text Exercise 12.2 (a) (b) (c) Construct the completed ANOVA table below. Answer this part by indicating what the f test statistic value is, what the appropriate.

Chapter 12 Introduction to Analysis of Variance PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Eighth Edition by Frederick.

28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.

Chapter 13 Repeated-Measures and Two-Factor Analysis of Variance

1 Experimental Statistics - week 9 Chapter 17: Models with Random Effects Chapter 18: Repeated Measures.

1 Experimental Statistics Spring week 6 Chapter 15: Factorial Models (15.5)

Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects.

Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Between Subjects Analysis of Variance PowerPoint.

One-Way Analysis of Variance Recapitulation Recapitulation 1. Comparing differences among three or more subsamples requires a different statistical test.

Handout Twelve: Design & Analysis of Covariance

Experimental Statistics - week 9

Topic 27: Strategies of Analysis. Outline Strategy for analysis of two-way studies –Interaction is not significant –Interaction is significant What if.

1 Topic 14 – Experimental Design Crossover Nested Factors Repeated Measures.

1 Experimental Statistics - week 8 Chapter 17: Mixed Models Chapter 18: Repeated Measures.

Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.

Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.

Chapter 12 Chi-Square Tests and Nonparametric Tests

Lecture Slides Elementary Statistics Twelfth Edition

Comparing Three or More Means

Inference for Relationships

Fixed, Random and Mixed effects

Introduction to SAS Essentials Mastering SAS for Data Analytics

Introduction to SAS Essentials Mastering SAS for Data Analytics

Introduction to SAS Essentials Mastering SAS for Data Analytics

Introduction to SAS Essentials Mastering SAS for Data Analytics

Presentation transcript:

Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1

Chapter 14: ANALYSIS OF VARIANCE Part 2 SAS ESSENTIALS -- Elliott & Woodward2

LEARNING OBJECTIVES SAS ESSENTIALS -- Elliott & Woodward3 To be able to use SAS® procedures to perform analysis of covariance To be able to use SAS procedures to perform two-factor AN OVA using PROC GLM To be able to perform two-factor ANOVA using PROC MIXED To be able to use SAS procedures to perform repeated measures with a grouping factor

14.1 ANALYSIS OF COVARIANCE SAS ESSENTIALS -- Elliott & Woodward4  An analysis of covariance (ANCOVA) is a combination of analysis of variance and regression.  The covariate is a quantitative variable that is related to the dependent variable.  The covariate is not controlled by the investigator but is some value intrinsic to the subject (or entity).  In ANCOVA, the group means are adjusted by the covariate, and these adjusted means are compared with each other.  Including this covariate variable in the model may explain some of the variability, resulting in a more powerful statistical test.

An Example ANCOVA Analysis SAS ESSENTIALS -- Elliott & Woodward5  A fifth-grade math teacher randomly assigns the 18 students in her class to three different teaching methods for individualized instruction of a certain concept.  The outcome of interest is the score on an exam over the concept after the instruction period. An exam of basic math skills (EXAM) is given to the students before beginning the instruction. (This is the covariate)  The variable FINAL is compared across the three teaching methods. The covariate EXAM is assumed to be linearly related to FINAL.  The ANCOVA tests the null hypothesis that the FINAL means for the three methods adjusted by EXAM are not different.

Data for the Example SAS ESSENTIALS -- Elliott & Woodward6 METHODEXAMFINAL Etc… Notice METHOD is the teaching method. EXAM is the exam given at the first of the class (the covariate) and FINAL is the final test outcome.

The ANCOVA Steps… SAS ESSENTIALS -- Elliott & Woodward7  In this text, this ANCOVA analysis is shown as a multistep approach. 1. Perform a test to determine if the FINAL-by-EXAM linear relationships by METHOD are parallel. That is, when you compare the regression lines for each of the three METHODs, the lines should theoretically be parallel. If the lines are sufficiently nonparallel, ANCOVA is not the appropriate analysis to perform. The statistical test used for parallelism is an F-test. 2. If the F-test in step 1 does not reject parallelism, then another F-test is used to compare the adjusted means across METHOD. 3. If there are differences in means, then appropriate multiple comparisons can be performed to determine which groups differ.  Do the Hands On Example p 329 (AGLM ANCOVA.SAS)

Steps for ANCOVA Analysis SAS ESSENTIALS -- Elliott & Woodward8 PROC GLM; CLASS METHOD; MODEL FINAL=EXAM METHOD EXAM*METHOD; TITLE 'Analysis of Covariance Example'; RUN; PROC GLM; CLASS METHOD; MODEL FINAL=EXAM METHOD; LSMEANS METHOD/STDERR PDIFF ADJUST=SCHEFFE; RUN; QUIT; Part 1 – this model includes an interaction term (EXAM*METHOD) that test for parallelism. If the interaction term is not significant, drop it from the model. The LSMEANS code performs a post hoc test to compare (adjusted) METHOD means (if the Ho for equal means is rejected.)

Step 1: Preliminary Results SAS ESSENTIALS -- Elliott & Woodward9 The test for interaction is non-significant (which is good!) The test for interaction is asking the question – are these lines close to parallel? When you do not reject Ho for EXAM*METHOD (when p>0.05) you conclude that the lines are close to parallel, so you can consider them parallel in the primary analysis. (next slide)

Step 2: Analysis of Covariance Analysis SAS ESSENTIALS -- Elliott & Woodward10 With the interaction out of the model, you now test for a significant METHOD effect, In this case P<0.05, so you conclude that the (adjusted) means by METHOD are different. By assuming that the lines are parallel, the ANCOVA is assuming that the actual picture of the regression lines looks like this.

Post Hoc Comparisons SAS ESSENTIALS -- Elliott & Woodward11  Because you conclude that the (adjusted) means by METHOD are different, you perform a post hoc multiple comparison test to determine which means are different. These result from the code: LSMEANS METHOD/STDERR PDIFF ADJUST=SCHEFFE ; From this, conclude that means 1 and 3 are different (p=0.0056), means 2 and 3 are different (p=0.0007), and means 1 and 2 are not different (p=0.684).

Overall Conclusion SAS ESSENTIALS -- Elliott & Woodward12 Because a high FINAL score is the goal, there is evidence to support the contention that the adjusted mean of for METHOD 3 is a significantly higher score than for METHODs 1 and 2, and thus that METHOD 3 is the preferred method.

Two-Factor ANOVA Using PROC GLM SAS ESSENTIALS -- Elliott & Woodward13  A two-way ANOVA is an analysis that allows you to simultaneously evaluate the effects of two experimental variables (factors).  Each factor is a "grouping" variable such as type of treatment, gender, brand, and so on.  The two-way ANOVA tests determine whether the factors are important (significant) either separately (called main effects) or in combination (via an interaction), which is the combined effect of the two factors.

p x q factorial design SAS ESSENTIALS -- Elliott & Woodward14  The dimensions of a factorial design depend on how many levels of each factor are used.  For example, a design in which the first factor has two categories and the second has three categories is called a 2 x 3 (2 by 3) factorial design.  Generally, the two-way ANOVA is called a p x q factorial design.

Understanding Fixed and Random Factors SAS ESSENTIALS -- Elliott & Woodward15  FIXED FACTORS: When the factor levels completely define all the classifications of interest. (Example: Gender)  RANDOM FACTOR: When the levels used in the experiment are randomly selected from the population of possible levels. (Example 3 different doses of a drug, when there are many possible doses)  Two-way ANOVA models are classified as: Both factors fixed: Model I ANOVA Both factors random: Model II ANOVA One random, one fixed: Model III ANOVA

Interaction Effect SAS ESSENTIALS -- Elliott & Woodward16  Interaction implies that the pattern of means across groups is inconsistent (as illustrated below)  If there is an interaction effect, main effects cannot be examined directly because the interaction effect shows that differences across one main effect are not consistent across all levels of the other factor. Interaction No Interaction

Typical Hypotheses for Model I Design SAS ESSENTIALS -- Elliott & Woodward17  Test for an interaction effect: H 0 : There is no interaction effect. H a : There is an interaction effect.  If there is no interaction effect, it makes sense to test hypotheses about main effects. The "main effects" hypotheses for factor A are as follows: H 0 : Means are equal across levels of A summed over B. H a : Means are not equal across levels of A summed over B.  A similar hypothesis is used for B main effects.  Do Hands on Example p 335 (AGLM 2FACTOR.SAS)

Code for a Two-Factor ANOVA SAS ESSENTIALS -- Elliott & Woodward18 PROC GLM; CLASS CONDITION STATUS; MODEL RESPONSE=CONDITION STATUS CONDITION*STATUS; MEANS CONDITION STATUS CONDITION*STATUS; RUN; QUIT; The MODEL statement defines the two- way analysis with the two factors CONDITION and STATUS and an interaction test, CONDITION*STATUS; The MEANS statement displays simple statistics by factor.

Results for a Two-Way ANOVA SAS ESSENTIALS -- Elliott & Woodward19 Results are in the TYPE III SS table output. Typically, you first check to see if there is a significant interaction effect. In this case p=0.79, so you conclude NO significant interaction effect. If there is no interaction, examine the main effects tests in the ANOVA table by comparing marginal means.

Examine Marginal Effects (since no interaction) SAS ESSENTIALS -- Elliott & Woodward20  You conclude that there is no difference in means across CONDITION, but there is a difference in means across STATUS. Marginal effects tests indicate no CONDITION effect (p=0.67), and a statistically significant STATUS effect (p=0.0056)

Graphical Results for Two–Way ANOVA SAS ESSENTIALS -- Elliott & Woodward21 Because there are only two categories of STATUS, we do not need to perform multiple comparisons, and the clear conclusion is for STATUS1, (Mean loss = -6.9) (i.e., those who had previously been trying to lose weight) had significantly more weight loss on average than those in STATUS2 (Mean loss= -0.92) Mean loss for each level of STATUS

14.2 GOING DEEPER: TWO-FACTOR ANOVA USING PROC MIXED SAS ESSENTIALS -- Elliott & Woodward22  PROC MIXED performs mixed-model analysis of variance using features not available in PROC GLM.  PROC GLM Is designed primarily for fixed effects models. Calculates based on ordinary least squares and method of moments. Defines all effects as fixed and adjusts for random effects after estimation. Can perform mixed model analysis, but some results are not optimally calculated.  PROC MIXED Is designed for mixed effects models. Uses generalized least squares for fixed effects and Restricted Maximum Likelihood Estimation (RMLE) to estimate variance components. Allows selection of correlation models.  Do Hands on Example p 338 (AMIXED1.SAS)

Code for PROC MIXED ANOVA SAS ESSENTIALS -- Elliott & Woodward23 PROC MEANS MAXDEC=2 MEAN DATA=HEIGHTS; CLASS FAMILY GENDER; VAR HEIGHT; PROC MIXED; CLASS FAMILY GENDER; MODEL HEIGHT = GENDER; RANDOM FAMILY FAMILY*GENDER; RUN; Displays simple statistics by factors. Note how the RANDOM statement indicates which factors are random. (Any interaction with a random factor is random.) Categorical Factors

How the PROC MIXED statement is constructed SAS ESSENTIALS -- Elliott & Woodward24 CLASS FAMILY GENDER; MODEL HEIGHT = GENDER; RANDOM FAMILY FAMILY*GENDER; FAMILY and GENDER appear in the CLASS statement because they are both grouping-type factors. The MODEL statement includes only the fixed factor GENDER. The RANDOM statement includes the random factors FAMILY and FAMILY* GENDER.

Output for Analysis (Means) SAS ESSENTIALS -- Elliott & Woodward25  First output is means by factor.

Output Testing the Fixed Factor SAS ESSENTIALS -- Elliott & Woodward26 This shows the results of a hypothesis test that there is no difference in means by GENDER, which is marginally non-significant (p = 0.07). SAS reports no test involving FAMILY.

What If Random had not be properly assigned SAS ESSENTIALS -- Elliott & Woodward27  Consider the model statement MODEL HEIGHT = GENDER FAMILY GENDER*FAMILY;  The results would have been This model results in a DIFFERENT ANSWER! In this case GENDER is significant.

14.3 GOING DEEPER: REPEATED MEASURES WITH A GROUPING FACTOR SAS ESSENTIALS -- Elliott & Woodward28  A common design in medical research and other settings is to observe data for the same subject under different circumstances or over time (longitudinal).  Although it is also possible to perform this analysis using PROC GLM, the approach used in PROC MIXED is preferred.  In addition, a distinct advantage of using PROC MIXED is that it allows missing values in the design, whereas PROC GLM deletes an entire record from analysis if there is one missing observation.  Do Hands On Example p 341 (AREPEAT1.SAS)

Data for Repeated Measures Analysis SAS ESSENTIALS -- Elliott & Woodward29  Suppose you observe a response to a drug over 4 hours for seven subjects, three male and four female. In this case, the repeated measure, HOUR, is longitudinal. SUBGENDER HOUR1HOUR2HOUR3HOUR4 1M M M F F F F Missing9.2

Create a Revised data set SAS ESSENTIALS -- Elliott & Woodward30 DATA REPMIXED(KEEP= SUBJECT GENDER TIME OUTCOME); INPUT SUBJECT GENDER $ HOUR1-HOUR4 ; OUTCOME = HOUR1; TIME = 1; OUTPUT; OUTCOME = HOUR2; TIME = 2; OUTPUT; OUTCOME = HOUR3; TIME = 3; OUTPUT; OUTCOME = HOUR4; TIME = 4; OUTPUT; DATALINES; 1M M M Etc… (a) assign each of the four HOUR values to the variable OUTCOME (b) create a variable called TIME that contains the time marker (1-4) (c) for each of these assignments, output the variables to the new data set REPMIXED using the OUTPUT statement. The data must be manipulated to put it in the right format for the analysis. In this case an HOURx and TIME variables are created and output in a data set named REPMIXED

The revised data set: SAS ESSENTIALS -- Elliott & Woodward31  This shows the revised data set: (Partially shown here)

Basic Model for this Analysis SAS ESSENTIALS -- Elliott & Woodward32 PROC MIXED DATA=REPMIXED; CLASS GENDER TIME SUBJECT; MODEL OUTCOME=GENDER TIME GENDER*TIME; REPEATED / TYPE=UN SUB=SUBJECT; RUN; All factors are FIXED in this analysis The REPEATED statement tells SAS that SUBJECT is repeated (there are 4 per observation.) The TYPE=UN option tells SAS to use an unstructured covariance matrix

Covariance Structures SAS ESSENTIALS -- Elliott & Woodward33  The best fitting model for an analysis may depend on how the variance components are structures – which best explains the variance in the mode. Some covariance structures include: AR(1) (Autoregressive) assumes that nearby measurements are correlated and decline exponentially with time. That is, measurements at TIME1 and TIME2 are more highly correlated than are measurements at TIME1 and TIME3. CS (Compound symmetry) assumes homogeneous variances that are constant regardless of how far apart the measurements are. UN (Unstructured) allows all variances to be different.

What is the best Covariance Structure? SAS ESSENTIALS -- Elliott & Woodward34 PROC MIXED DATA=REPMIXED; CLASS GENDER TIME SUBJECT; MODEL OUTCOME=GENDER TIME GENDER*TIME; REPEATED / TYPE=UN SUB=SUBJECT; RUN; PROC MIXED DATA=REPMIXED; CLASS GENDER TIME SUBJECT; MODEL OUTCOME=GENDER TIME GENDER*TIME; REPEATED / TYPE=CS SUB=SUBJECT; RUN; PROC MIXED DATA=REPMIXED; CLASS GENDER TIME SUBJECT; MODEL OUTCOME=GENDER TIME GENDER*TIME; REPEATED / TYPE=AR(1) SUB=SUBJECT; RUN; Each model has a different TYPE= selected Look at several possible models…

Results from Multiple Models SAS ESSENTIALS -- Elliott & Woodward35  In this example, based on the AIC criterion, the AR(1) specification best fits the data. (Smaller is better.) AIC(unstructured)=71.0 AIC(compound symmetry)=77.6 AIC(autoregressive) = 70.2 AIC is found in this table of the output. This is the output table using the TYPE=AR(1) option. Best result

Choose which Analysis to Use SAS ESSENTIALS -- Elliott & Woodward36  The results from the TYPE=AR(1) analysis are used as the final model, since it had the best fit. Conclude that there is no interaction effect, that there is a TIME effect, and no GENDER effect (or a marginal GENDER effect.)

Visual Examination of the Results SAS ESSENTIALS -- Elliott & Woodward37  Use PROC GPLOT to display the means by factor. (Using AREPEAT2 code) PROC SORT DATA=REPMIXED;BY GENDER TIME; PROC MEANS noprint; BY GENDER TIME; OUTPUT OUT=FORPLOT MEAN=; RUN; TITLE "Repeated Measures Example"; PROC GPLOT; PLOT OUTCOME*GENDER=TIME; SYMBOL1 V=CIRCLE I=JOIN L=1 C=BLACK; SYMBOL2 V=DOT I=JOIN L=2 C=BLUE; SYMBOL3 V=STAR I=JOIN L=2 C=RED; SYMBOL4 V=SQUARE I=JOIN L=2 C=GREEN; RUN; Output means to plot using PROC MEANS Using the outputted means in the file FORPLOT, plot the results using PROC GPLOT

Results of PROC GPLOT SAS ESSENTIALS -- Elliott & Woodward38  Visual confirmation that there is a TIME difference… at least it appear TIME2 (blue)is always lower than TIME2 (red)

A Second Plot SAS ESSENTIALS -- Elliott & Woodward39  Use PROC PLOT to produce a second plot showing the same data in a different perspective…

Comparisons SAS ESSENTIALS -- Elliott & Woodward40  To determine which TIMEs are significantly different, include the following option in the section of PROC MIXED code related to the AR(1) covariance structure. LSMEANS TIME/PDIFF; Indicates a significant difference in marginal means from TIME1 to TIME2 (difference =1.36) p=0.0016… and so on…

14.4 SUMMARY SAS ESSENTIALS -- Elliott & Woodward41  This chapter discusses the use of PROC GLM to perform ANCOVA and two-factor ANOVA with fixed effects. We also introduce PROC MIXED and discuss its use on the two-factor mixed model and a repeated measures analysis with a grouping variable.  Continue to Chapter 15:NONPARAMETRIC ANALYSIS