Experimental design and statistical analyses of data Lesson 4: Analysis of variance II A posteriori tests Model control How to choose the best model.

Slides:

Advertisements

Similar presentations

Statistical Techniques I EXST7005 Start here Measures of Dispersion.

Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.

Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture #19 Analysis of Designs with Random Factor Levels.

Experimental design and analyses of experimental data Lesson 2 Fitting a model to data and estimating its parameters.

Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test.

EPI 809/Spring Probability Distribution of Random Error.

Simple Linear Regression and Correlation

Multiple regression analysis

ANOVA notes NR 245 Austin Troy

Statistics for Business and Economics

ANalysis Of VAriance (ANOVA) Comparing > 2 means Frequently applied to experimental data Why not do multiple t-tests? If you want to test H 0 : m 1 = m.

1 Experimental design and statistical analyses of data Lesson 5: Mixed models Nested anovas Split-plot designs.

This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.

Chapter 12 Inferential Statistics Gay, Mills, and Airasian

1 Experimental Statistics - week 7 Chapter 15: Factorial Models (15.5) Chapter 17: Random Effects Models.

© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.

1 Experimental Statistics - week 6 Chapter 15: Randomized Complete Block Design (15.3) Factorial Models (15.5)

 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.

Multiple Comparisons.

Statistics for Business and Economics Chapter 10 Simple Linear Regression.

Measures of Variation. For discrete variables, the Index of Qualitative Variation.

1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.

23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.

Design and Analysis of Experiments Dr. Tai-Yue Wang Department of Industrial and Information Management National Cheng Kung University Tainan, TAIWAN,

© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.

STA305 week21 The One-Factor Model Statistical model is used to describe data. It is an equation that shows the dependence of the response variable upon.

Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.

GENERAL LINEAR MODELS Oneway ANOVA, GLM Univariate (n-way ANOVA, ANCOVA)

5-5 Inference on the Ratio of Variances of Two Normal Populations The F Distribution We wish to test the hypotheses: The development of a test procedure.

Xuhua Xia Polynomial Regression A biologist is interested in the relationship between feeding time and body weight in the males of a mammalian species.

6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.

The Completely Randomized Design (§8.3)

1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics.

Topic 25: Inference for Two-Way ANOVA. Outline Two-way ANOVA –Data, models, parameter estimates ANOVA table, EMS Analytical strategies Regression approach.

Topic 26: Analysis of Covariance. Outline One-way analysis of covariance –Data –Model –Inference –Diagnostics and rememdies Multifactor analysis of covariance.

PSYC 3030 Review Session April 19, Housekeeping Exam: –April 26, 2004 (Monday) –RN 203 –Use pencil, bring calculator & eraser –Make use of your.

By: Corey T. Williams 03 May Situation Objective.

Xuhua Xia Correlation and Regression Introduction to linear correlation and regression Numerical illustrations SAS and linear correlation/regression –CORR.

Simple Linear Regression. Data available ： (X,Y) Goal ： To predict the response Y. (i.e. to obtain the fitted response function f(X)) Least Squares Fitting.

1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.

1 Experimental Statistics - week 9 Chapter 17: Models with Random Effects Chapter 18: Repeated Measures.

Topic 24: Two-Way ANOVA. Outline Two-way ANOVA –Data –Cell means model –Parameter estimates –Factor effects model.

1 Experimental Statistics Spring week 6 Chapter 15: Factorial Models (15.5)

The Distribution of Single Variables. Two Types of Variables Continuous variables – Equal intervals of measurement – Known zero-point that is meaningful.

Experimental Statistics - week 3

Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects.

Today: March 7 Data Transformations Rank Tests for Non-Normal data Solutions for Assignment 4.

One-Way Analysis of Variance Recapitulation Recapitulation 1. Comparing differences among three or more subsamples requires a different statistical test.

1 Experimental Statistics - week 13 Multiple Regression Miscellaneous Topics.

1 An example of a more complex design (a four level nested anova) 0 %, 20% and 40% of a tree’s roots were cut with the purpose to study the influence.

Experimental Statistics - week 9

Topic 27: Strategies of Analysis. Outline Strategy for analysis of two-way studies –Interaction is not significant –Interaction is significant What if.

Topic 22: Inference. Outline Review One-way ANOVA Inference for means Differences in cell means Contrasts.

Simple ANOVA Comparing the Means of Three or More Groups Chapter 9.

Chapters Way Analysis of Variance - Completely Randomized Design.

ANOVA and Multiple Comparison Tests

1 Experimental Statistics - week 8 Chapter 17: Mixed Models Chapter 18: Repeated Measures.

1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.

Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.

1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.

Lesson 10 - Topics SAS Procedures for Standard Statistical Tests and Analyses Programs 19 and 20 LSB 8:16-17.

1 Experimental Statistics - week 5 Chapter 9: Multiple Comparisons Chapter 15: Randomized Complete Block Design (15.3)

Statistical Quality Control, 7th Edition by Douglas C. Montgomery.

After ANOVA If your F < F critical: Null not rejected, stop right now!! If your F > F critical: Null rejected, now figure out which of the multiple means.

6-1 Introduction To Empirical Models

Joanna Romaniuk Quanticate, Warsaw, Poland

Experimental Statistics - Week 4 (Lab)

Presentation transcript:

Experimental design and statistical analyses of data Lesson 4: Analysis of variance II A posteriori tests Model control How to choose the best model

Growth of bean plants in four different media ZnCuMnControlOverall Biomass (y) nini Completely randomized design (one-way anova)

How to do it with SAS

DATA medium; /* 20 bean plants exposed to 4 different treatments (5 plants per treatment) Mn = extra mangan added to the soil Zn = ekstra zink added to the soil Cu = ekstra cupper added to the soil K = control soil The dependent variable (Mass) is the biomass of the plants at harvest */ INPUT treat $ mass ; /* treat = treatment */ /* mass = biomass of a plant */ CARDS; zn 61.7 zn 59.4 zn 60.5 zn 59.2 zn 57.6 cu 57.0 cu 58.4 cu 57.3 cu 57.8 cu 59.9 mn 62.3 mn 66.2 mn 65.2 mn 63.7 mn 64.1 k 58.1 k 56.3 k 58.9 k 57.4 k 56.1 ;

PROC SORT; /* sort the observations according to treatment */ BY treat; RUN; /* compute average and 95% confidence limits for each treatment */ PROC MEANS N MEAN CLM; BY treat; RUN;

1 14:09 Wednesday, November 7, 2001 Analysis Variable : MASS TREAT=cu N Mean Lower 95.0% CLM Upper 95.0% CLM TREAT=k N Mean Lower 95.0% CLM Upper 95.0% CLM TREAT=mn N Mean Lower 95.0% CLM Upper 95.0% CLM TREAT=zn N Mean Lower 95.0% CLM Upper 95.0% CLM

PROC GLM; CLASS treat; MODEL mass = treat /SOLUTION; /* SOLUTION gives the estimated parameter values */ RUN;

Class Levels Values TREAT 4 cu k mn zn Number of observations in data set = 20 General Linear Models Procedure Dependent Variable: MASS Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total R-Square C.V. Root MSE MASS Mean Source DF Type I SS Mean Square F Value Pr > F TREAT Source DF Type III SS Mean Square F Value Pr > F TREAT

T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT B TREAT cu B k B mn B zn B... NOTE: The X'X matrix has been found to be singular and a generalized inverse was used to solve the normal equations. Estimates followed by the letter 'B' are biased, and are not unique estimators of the parameters.

PROC GLM; CLASS treat; MODEL mass = treat /SOLUTION; /* SOLUTION gives the estimated parameter values */ /*Test for pairwise differences between treatments by linear contrasts */ CONTRAST 'Cu vs K' Treat ; CONTRAST 'Cu vs Mn' Treat ; CONTRAST 'Cu vs Zn' Treat ; CONTRAST 'K vs Mn' Treat ; CONTRAST 'K vs Zn' Treat ; CONTRAST 'Mn vs Zn' Treat ; /* test for whether the 3 treatments with added minerals are different from the control */ CONTRAST 'K vs Cu, Mn Zn' Treat ; RUN;

Contrast DF Contrast SS Mean Square F Value Pr > F Cu vs K Cu vs Mn Cu vs Zn K vs Mn K vs Zn Mn vs Zn K vs Cu, Mn Zn

PROC GLM; CLASS treat; MODEL mass = treat /SOLUTION; /* SOLUTION gives the estimated parameter values */ /* Test for differences between levels of treatment */ MEANS treat / BON DUNCAN SCHEFFE TUKEY DUNNETT('k'); RUN;

Tukey's Studentized Range (HSD) Test for variable: MASS NOTE: This test controls the type I experimentwise error rate. Alpha= 0.05 Confidence= 0.95 df= 16 MSE= Critical Value of Studentized Range= Minimum Significant Difference= Comparisons significant at the 0.05 level are indicated by '***'. Simultaneous Simultaneous Lower Difference Upper TREAT Confidence Between Confidence Comparison Limit Means Limit mn - zn *** mn - cu *** mn - k *** zn - mn *** zn - cu zn - k cu - mn *** cu - zn cu - k k - mn *** k - zn k - cu

Bonferroni (Dunn) T tests for variable: MASS NOTE: This test controls the type I experimentwise error rate but generally has a higher type II error rate than Tukey's for all pairwise comparisons. Alpha= 0.05 Confidence= 0.95 df= 16 MSE= Critical Value of T= Minimum Significant Difference= Comparisons significant at the 0.05 level are indicated by '***'. Simultaneous Simultaneous Lower Difference Upper TREAT Confidence Between Confidence Comparison Limit Means Limit mn - zn *** mn - cu *** mn - k *** zn - mn *** zn - cu zn - k cu - mn *** cu - zn cu - k k - mn *** k - zn k - cu

Scheffe's test for variable: MASS NOTE: This test controls the type I experimentwise error rate but generally has a higher type II error rate than Tukey's for all pairwise comparisons. Alpha= 0.05 Confidence= 0.95 df= 16 MSE= Critical Value of F= Minimum Significant Difference= Comparisons significant at the 0.05 level are indicated by '***'. Simultaneous Simultaneous Lower Difference Upper TREAT Confidence Between Confidence Comparison Limit Means Limit mn - zn *** mn - cu *** mn - k *** zn - mn *** zn - cu zn - k cu - mn *** cu - zn cu - k k - mn *** k - zn k - cu

Dunnett's T tests for variable: MASS NOTE: This tests controls the type I experimentwise error for comparisons of all treatments against a control. Alpha= 0.05 Confidence= 0.95 df= 16 MSE= Critical Value of Dunnett's T= Minimum Significant Difference= Comparisons significant at the 0.05 level are indicated by '***'. Simultaneous Simultaneous Lower Difference Upper TREAT Confidence Between Confidence Comparison Limit Means Limit mn - k *** zn - k *** cu - k

Duncan’s test exaggarates the risk of Type I errors Comparison between multiple tests TestMinimum significant difference Duncan Dunnett Tukey Bonferroni Scheffe Type I Scheffe’s test exaggarates the risk of Type II errrors Type II Tukey’s test is recommended as the best!

PROC GLM; CLASS treat; MODEL mass = treat /SOLUTION; /* SOLUTION gives the estimated parameter values */ /* Test for differences between different levels of treatment */ MEANS treat / BON DUNCAN SCHEFFE TUKEY lines; RUN;

General Linear Models Procedure Duncan's Multiple Range Test for variable: MASS NOTE: This test controls the type I comparisonwise error rate, not the experimentwise error rate Alpha= 0.05 df= 16 MSE= Number of Means Critical Range Means with the same letter are not significantly different. Duncan Grouping Mean N TREAT A mn B zn B C B cu C C k

General Linear Models Procedure Tukey's Studentized Range (HSD) Test for variable: MASS NOTE: This test controls the type I experimentwise error rate, but generally has a higher type II error rate than REGWQ. Alpha= 0.05 df= 16 MSE= Critical Value of Studentized Range= Minimum Significant Difference= Means with the same letter are not significantly different. Tukey Grouping Mean N TREAT A mn B zn B B cu B B k

General Linear Models Procedure Bonferroni (Dunn) T tests for variable: MASS NOTE: This test controls the type I experimentwise error rate, but generally has a higher type II error rate than REGWQ. Alpha= 0.05 df= 16 MSE= Critical Value of T= 3.01 Minimum Significant Difference= Means with the same letter are not significantly different. Bon Grouping Mean N TREAT A mn B zn B B cu B B k

General Linear Models Procedure Scheffe's test for variable: MASS NOTE: This test controls the type I experimentwise error rate but generally has a higher type II error rate than REGWF for all pairwise comparisons Alpha= 0.05 df= 16 MSE= Critical Value of F= Minimum Significant Difference= Means with the same letter are not significantly different. Scheffe Grouping Mean N TREAT A mn B zn B B cu B B k

PROC GLM; CLASS treat; MODEL mass = treat /SOLUTION; /* SOLUTION gives the estimated parameter values */ /* In unbalanced (and balanced) designs LSMEANS can be used: */ LSMEANS treat /TDIF PDIFF; RUN;

The GLM Procedure Least Squares Means LSMEAN treat mass LSMEAN Number cu k mn zn Least Squares Means for Effect treat t for H0: LSMean(i)=LSMean(j) / Pr > |t| Dependent Variable: mass i/j < < <.0001 <.0001 < <.0001 NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used. Er denne P-værdi signifikant?

Den sekventielle Bonferroni-test Den sekventielle Bonferroni-test er mindre konservativ end den ordinære Bonferroni-test. Procedure: Først ordnes de k P-værdier i voksende rækkefølge. Lad P (i) betegne den i’te P-værdi efter at værdierne er blevet ordnet i voksende rækkefølge. Herefter beregnes hvor α er det signifikansniveau, der benyttes, hvis der kun var en enkelt P-værdi (sædvanligvis 0.05). Hvis P (i) < α (i) er den i’te P-værdi signifikant. iP (i) α (i) P (i) -α (i) Signifikante P-værdier

Model assumptions and model control All GLM’s are based on the assumption that (1) ε is independently distributed (2) ε is normally distributed with the mean = 0 (3) The variance of ε (denoted σ 2 ) is the same for all values of the independent variable(s) (variance homogeneity) (4) Mathematically this is written as ε is iid ND(0; σ 2 ) iid = independently and identically distributed

Transformation of data Transformation of data serves two purposes (1)To remove variance heteroscedasticity (2)To make data more normal Usually a transformation meets both purposes, but if this is not possible, variance homoscedasticity is regarded as the most important, especially if sample sizes are large

How to choose the appropriate transformation?

y * = y p We have to find a value of p, so that the transformed values of y (denoted y * ) meet the condition of being normally distributed and with a variance that is independent of y *. A useful method to find p is to fit Taylor’s power law to data

Taylor’s power law It can be shown that p = 1- b/2 is the appropriate transformation we search for

If y is a proportion, i.e. 0 <= y <= 1, an appropriate transformation is often

T. urticae: log s 2 = log x r 2 = y * = log(y+1) P. persimilis: log s 2 = log x r 2 = y * = log(y+1)

Exponential growth Deterministic model: Stochastic model: b = birth rate/capita d = death rate/capita Instantaneous growth rateN = population size at time t r = net growth rate/capita ΔN = change in N during Δt B = birth rateD = death rate ε = noise associated with births δ = noise associated with deaths The number of births during a time interval follows a Poisson distribution with mean BΔt The number of deaths during a time interval is binomially distributed with parameters (θ, N) The probability that an individual dies during Δt is θ = DΔt/N

Type I, II, III and IV SS Example: Mites in stored grain influenced by temperature (T) and humidity (H)

DATA mites; INFILE'h:\lin-mod\besvar\opg1-1.prn' FIRSTOBS=2; INPUT pos $ depth T H Mites; /* pos = position in store */ /* depth = depth in m */ /* T = Temperature of grain */ /* H = Humidity of grain */ /* Mites = number of mites in sampling unit */ logMites = log10(Mites+1); /* log transformation of Mites */ T2 = T**2; /* square temperature */ H2 = H**2; /* square humidity */ TH = T*H; /* product of temperature and humidity */ PROC GLM; CLASS pos; MODEL logMites = T T2 H H2 TH /SOLUTION SS1 SS3; RUN;

General Linear Models Procedure Dependent Variable: LOGMITES Source DF Sum of Squares Mean Square F Value Pr > F Model Error Corrected Total R-Square C.V. Root MSE LOGMITES Mean T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT T T H H TH

General Linear Models Procedure Dependent Variable: LOGMITES Source DF Sum of Squares Mean Square F Value Pr > F Model Error Corrected Total R-Square C.V. Root MSE LOGMITES Mean Source DF Type I SS Mean Square F Value Pr > F T T H H TH Source DF Type III SS Mean Square F Value Pr > F T T H H TH

Example: β 3 SS I is used to compare the model: with SS III is used to compare the model with

General Linear Models Procedure Dependent Variable: LOGMITES Source DF Sum of Squares Mean Square F Value Pr > F Model Error Corrected Total R-Square C.V. Root MSE LOGMITES Mean Source DF Type I SS Mean Square F Value Pr > F T T H H TH Source DF Type III SS Mean Square F Value Pr > F T T H H TH H is significant if it is added after T and T 2 H is not significant if it is added after T, T 2, H 2, and TH

How do we choose the best model?

DATA mites; INFILE'h:\lin-mod\besvar\opg1-1.prn' FIRSTOBS=2; INPUT pos $ depth T H Mites; /* pos = position in store */ /* depth = depth in m */ /* T = Temperature of grain */ /* H = Humidity of grain */ /* Mites = number of mites in sampling unit */ logMites = log10(Mites+1); /* log transformation of Mites */ T2 = T**2; /* square temperature */ H2 = H**2; /* square humidity */ TH = T*H; /* product of temperature and humidity */ PROC STEPWISE; MODEL logMites = T T2 H H2 TH /MAXR; RUN;

Maximum R-square Improvement for Dependent Variable LOGMITES Step 1 Variable H2 Entered R-square = C(p) = DF Sum of Squares Mean Square F Prob>F Regression Error Total Parameter Standard Type II Variable Estimate Error Sum of Squares F Prob>F INTERCEP H Bounds on condition number: 1, The above model is the best 1-variable model found.

Step 2 Variable T Entered R-square = C(p) = DF Sum of Squares Mean Square F Prob>F Regression Error Total Parameter Standard Type II Variable Estimate Error Sum of Squares F Prob>F INTERCEP T H Bounds on condition number: ,

Step 3 Variable H2 Removed R-square = C(p) = Variable TH Entered DF Sum of Squares Mean Square F Prob>F Regression Error Total Parameter Standard Type II Variable Estimate Error Sum of Squares F Prob>F INTERCEP T TH Bounds on condition number: , The above model is the best 2-variable model found.

Step 4 Variable T2 Entered R-square = C(p) = DF Sum of Squares Mean Square F Prob>F Regression Error Total Parameter Standard Type II Variable Estimate Error Sum of Squares F Prob>F INTERCEP T T TH Bounds on condition number: , The above model is the best 3-variable model found.

Step 5 Variable H2 Entered R-square = C(p) = DF Sum of Squares Mean Square F Prob>F Regression Error Total Parameter Standard Type II Variable Estimate Error Sum of Squares F Prob>F INTERCEP T T H TH Bounds on condition number: ,

Step 6 Variable TH Removed R-square = C(p) = Variable H Entered DF Sum of Squares Mean Square F Prob>F Regression Error Total Parameter Standard Type II Variable Estimate Error Sum of Squares F Prob>F INTERCEP T T H H Bounds on condition number: , The above model is the best 4-variable model found.

Step 7 Variable TH Entered R-square = C(p) = DF Sum of Squares Mean Square F Prob>F Regression Error Total Parameter Standard Type II Variable Estimate Error Sum of Squares F Prob>F INTERCEP T T H H TH Bounds on condition number: , The above model is the best 5-variable model found. No further improvement in R-square is possible.

Models with 1 variable ModelR2R2 FP T T H H T*H

Models with 2 variables ModelR2R2 FP T T T H T H T T*H T2 H T2 H T2 T*H H H H T*H H2 T*H

Models with 3 variables ModelR2R2 FP T T2 H T T2 H T T2 T*H T H H T H T*H T H2 T*H T2 H H T2 H T*H T2 H2 T*H H H2 T*H

Models with 4 variables ModelR2R2 FP T T2 H H T T2 H T*H T T2 H2 T*H T H H2 T*H T2 H H2 T*H

Models with 5 variables ModelR2R2 FP T T2 H H2 TH

Best models ModelR2R2 FPC(p)C(p) H T T*H T T2 T*H T T2 H H T T2 H H2 T*H Overall, this may considered the best model Mallow’s C(p)

Model control

DATA mites; INFILE'h:\lin-mod\besvar\opg1-1.prn' FIRSTOBS=2; INPUT pos $ depth T H Mites; LogMites = log10(Mites+1); /* transform dependent variable */ T2 = T**2; /* square temperature */ H2 = H**2; /* square humidity */ TH = T*H; /* interaction between temperature and humidity */ PROC REG; /* Multiple regression analysis */ MODEL logMites = T T2 H H2 TH; OUTPUT out = new P = pred R = res; RUN; /*Model control */ PROC GPLOT; PLOT LogMites*pred pred*pred /OVERLAY; /*plot observed values against predicted values together with line of equality */ SYMBOL1 COLOR=blue VALUE=circle HEIGHT=1; SYMBOL2 COLOR=red INTERPOL=line WIDTH = 1; PLOT res*pred; /* plot residuals against the predicted values */ SYMBOL1 COLOR=blue VALUE=circle HEIGHT=1; RUN;

Observed values of LogMites against predicted values

DATA mites; INFILE'h:\lin-mod\besvar\opg1-1.prn' FIRSTOBS=2; INPUT pos $ depth T H Mites; LogMites = log10(Mites+1); /* transform dependent variable */ T2 = T**2; /* square temperature */ H2 = H**2; /* square humidity */ TH = T*H; /* interaction between temperature and humidity */ PROC REG; /* Multiple regression analysis */ MODEL logMites = T T2 H H2 TH; OUTPUT out = new P = pred R = res; RUN; /*Model control */ PROC GPLOT; PLOT LogMites*pred pred*pred /OVERLAY; /*plot observed values against predicted values together with line of equality */ SYMBOL1 COLOR=blue VALUE=circle HEIGHT=1; SYMBOL2 COLOR=red INTERPOL=line WIDTH = 1; PLOT res*pred; /* plot residuals against the predicted values */ SYMBOL1 COLOR=blue VALUE=circle HEIGHT=1; RUN;

Residuals plotted against predicted values of LogMites

PROC UNIVARIATE FREQ PLOT NORMAL data= Newdata; /* PROC UNIVARIATE gives information about the variables defined by VAR */ /* FREQ, PLOT, NORMAL etc are options FREQ = number of observations of a given value PLOT = plot of observations NORMAL = test for the variable is normally distributed */ VAR res; /* information about the residuals */ RUN;

Univariate Procedure Variable=RES Moments Quantiles(Def=5) N 20 Sum Wgts % Max % 2.02 Mean 0 Sum 0 75% Q % 1.96 Std Dev Variance % Med % 1.86 Skewness Kurtosis % Q % USS CSS % Min % CV. Std Mean % T:Mean=0 0 Pr>|T| Range 4.1 Num ^= 0 20 Num > 0 9 Q3-Q M(Sign) -1 Pr>=|M| Mode Sgn Rank -3 Pr>=|S| W:Normal Pr<W H 0 : The residuals are normally distributed This is the probability of getting a deviation from the normal distribution equal to or greater than the observed one by chance given H 0 is true

Extremes Lowest Obs Highest Obs -2.08( 20) 0.9( 13) -2( 11) 1.54( 8) -1.26( 10) 1.82( 5) -1.08( 1) 1.9( 12) -1.06( 7) 2.02( 16) Stem Leaf # Boxplot | | 1 | | + | *-----* | -1 | |

Normal Probability Plot *+ | * * +*++ | *+**+ | ++**+ | +*** * *+* | +*+* * | ++* * Points should follow a straight line if data are normally distributed

Frequency Table Percents Percents Value Count Cell Cum Value Count Cell Cum