Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects.

Slides:



Advertisements
Similar presentations
Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
Advertisements

Topic 12: Multiple Linear Regression
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Topic 32: Two-Way Mixed Effects Model. Outline Two-way mixed models Three-way mixed models.
Hypothesis Testing Steps in Hypothesis Testing:
Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of  is constant.
Chapter 4 Randomized Blocks, Latin Squares, and Related Designs
Topic 15: General Linear Tests and Extra Sum of Squares.
EPI 809/Spring Probability Distribution of Random Error.
Chapter 12 Simple Linear Regression
Topic 3: Simple Linear Regression. Outline Simple linear regression model –Model parameters –Distribution of error terms Estimation of regression parameters.
Chapter 13 Multiple Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 12 Multiple Regression
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
ANalysis Of VAriance (ANOVA) Comparing > 2 means Frequently applied to experimental data Why not do multiple t-tests? If you want to test H 0 : m 1 = m.
Linear and generalised linear models
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Chapter 13: Inference in Regression
Regression Analysis (2)
Topic 16: Multicollinearity and Polynomial Regression.
Topic 28: Unequal Replication in Two-Way ANOVA. Outline Two-way ANOVA with unequal numbers of observations in the cells –Data and model –Regression approach.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Topic 7: Analysis of Variance. Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General.
Lecture 4 SIMPLE LINEAR REGRESSION.
Chapter 14 Introduction to Multiple Regression
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
1 1 Slide Simple Linear Regression Coefficient of Determination Chapter 14 BA 303 – Spring 2011.
STA305 week21 The One-Factor Model Statistical model is used to describe data. It is an equation that shows the dependence of the response variable upon.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
GENERAL LINEAR MODELS Oneway ANOVA, GLM Univariate (n-way ANOVA, ANCOVA)
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Topic 17: Interaction Models. Interaction Models With several explanatory variables, we need to consider the possibility that the effect of one variable.
The Completely Randomized Design (§8.3)
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Topic 6: Estimation and Prediction of Y h. Outline Estimation and inference of E(Y h ) Prediction of a new observation Construction of a confidence band.
Analisa Regresi Week 7 The Multiple Linear Regression Model
Chapter 13 Multiple Regression
General Linear Model 2 Intro to ANOVA.
Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures.
1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics.
Topic 30: Random Effects. Outline One-way random effects model –Data –Model –Inference.
Topic 25: Inference for Two-Way ANOVA. Outline Two-way ANOVA –Data, models, parameter estimates ANOVA table, EMS Analytical strategies Regression approach.
Topic 26: Analysis of Covariance. Outline One-way analysis of covariance –Data –Model –Inference –Diagnostics and rememdies Multifactor analysis of covariance.
PSYC 3030 Review Session April 19, Housekeeping Exam: –April 26, 2004 (Monday) –RN 203 –Use pencil, bring calculator & eraser –Make use of your.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Topic 24: Two-Way ANOVA. Outline Two-way ANOVA –Data –Cell means model –Parameter estimates –Factor effects model.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Experimental Statistics - week 3
Experimental Statistics - week 9
Topic 29: Three-Way ANOVA. Outline Three-way ANOVA –Data –Model –Inference.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Topic 27: Strategies of Analysis. Outline Strategy for analysis of two-way studies –Interaction is not significant –Interaction is significant What if.
Topic 22: Inference. Outline Review One-way ANOVA Inference for means Differences in cell means Contrasts.
Topic 21: ANOVA and Linear Regression. Outline Review cell means and factor effects models Relationship between factor effects constraint and explanatory.
Summary of the Statistics used in Multiple Regression.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
Topic 31: Two-way Random Effects Models
CHAPTER 29: Multiple Regression*
Presentation transcript:

Topic 20: Single Factor Analysis of Variance

Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects model –Link to linear regression using indicator explanatory variables

One-Way ANOVA The response variable Y is continuous The explanatory variable is categorical –We call it a factor –The possible values are called levels This approach is a generalization of the independent two-sample pooled t-test In other words, it can be used when there are more than two treatments

Data for One-Way ANOVA Y is the response variable X is the factor (it is qualitative/discrete) –r is the number of levels –often refer to these levels as groups or treatments Y i,j is the j th observation in the i th group

Notation For Y i,j we use –i to denote the level of the factor –j to denote the j th observation at factor level i i = 1,..., r levels of factor X j = 1,..., n i observations for level i of factor X –n i does not need to be the same in each group

KNNL Example (p 685) Y is the number of cases of cereal sold X is the design of the cereal package –there are 4 levels for X because there are 4 different package designs i =1 to 4 levels j =1 to n i stores with design i (n i =5,5,4,5) Will use n if n i the same across groups

Data for one-way ANOVA data a1; infile 'c:../data/ch16ta01.txt'; input cases design store; proc print data=a1; run;

The data Obscasesdesignstore

Plot the data symbol1 v=circle i=none; proc gplot data=a1; plot cases*design; run;

The plot

Plot the means proc means data=a1; var cases; by design; output out=a2 mean=avcases; proc print data=a2; symbol1 v=circle i=join; proc gplot data=a2; plot avcases*design; run;

New Data Set Obsdesign_TYPE__FREQ_avcases

Plot of the means

The Model We assume that the response variable is –Normally distributed with a 1.mean that may depend on the level of the factor 2.constant variance All observations assumed independent NOTE: Same assumptions as linear regression except there is no assumed linear relationship between X and E(Y|X)

Cell Means Model A “cell” refers to a level of the factor Y ij = μ i + ε ij –where μ i is the theoretical mean or expected value of all observations at level (or cell) i –the ε ij are iid N(0, σ 2 ) which means –Y ij ~N(μ i, σ 2 ) and independent –This is called the cell means model

Parameters The parameters of the model are – μ 1, μ 2, …, μ r –σ 2 Question (Version 1) – Does our explanatory variable help explain Y? Question (Version 2) – Do the μ i vary? H 0 : μ 1 = μ 2 = … = μ r = μ (a constant) H a : not all μ’s are the same

Estimates Estimate μ i by the mean of the observations at level i, (sample mean) û i = = ΣY i,j /n i For each level i, also get an estimate of the variance = Σ(Y ij - ) 2 /(n i -1) (sample variance) We combine these to get an overall estimate of σ 2 Same approach as pooled t-test

Pooled estimate of σ 2 If the n i were all the same we would average the –Do not average the s i In general we pool the, giving weights proportional to the df, n i -1 The pooled estimate is

Running proc glm proc glm data=a1; class design; model cases=design; means design; lsmeans design run; Difference 1: Need to specify factor variables Difference 2: Ask for mean estimates

Output Class Level Information ClassLevelsValues design Number of Observations Read19 Number of Observations Used19 Important summaries to check these summaries!!!

SAS 9.3 default output for MEANS statement

MEANS statement output Level of designN cases MeanStd Dev Table of sample means and sample variances

SAS 9.3 default output for LSMEANS statement

LSMEANS statement output designcases LSMEAN Standard ErrorPr > |t| < < < <.0001 Provides estimates based on model (i.e., constant variance)

Notation

ANOVA Table Source df SS MS Model r-1 Σ ij ( - ) 2 SSR/df R Error n T -r Σ ij (Y ij - ) 2 SSE/df E Total n T -1 Σ ij (Y ij - ) 2 SST/df T

ANOVA SAS Output SourceDF Sum of Squares Mean SquareF ValuePr > F Model <.0001 Error Corrected Total R-SquareCoeff VarRoot MSEcases Mean

Expected Mean Squares E(MSR) > E(MSE) when the group means are different See KNNL p 694 – 698 for more details In more complicated models, these tell us how to construct the F test

F test F = MSR/MSE H 0 : μ 1 = μ 2 = … = μ r H a : not all of the μ i are equal Under H 0, F ~ F(r-1, n T -r) Reject H 0 when F is large Report the P-value

Maximum Likelihood Approach proc glimmix data=a1; class design; model cases=design / dist=normal; lsmeans design; run;

GLIMMIX Output Model Information Data SetWORK.A1 Response Variablecases Response DistributionGaussian Link FunctionIdentity Variance FunctionDefault Variance MatrixDiagonal Estimation Technique Restricted Maximum Likelihood Degrees of Freedom MethodResidual

GLIMMIX Output Fit Statistics -2 Res Log Likelihood84.12 AIC (smaller is better)94.12 AICC (smaller is better) BIC (smaller is better)97.66 CAIC (smaller is better) HQIC (smaller is better)94.08 Pearson Chi-Square Pearson Chi-Square / DF10.55

GLIMMIX Output Type III Tests of Fixed Effects Effect Num DF Den DFF ValuePr > F design <.0001 design Least Squares Means designEstimate Standard ErrorDFt ValuePr > |t| < < < <.0001

Factor Effects Model A reparameterization of the cell means model Useful way at looking at more complicated models Null hypotheses are easier to state Y ij = μ +  i + ε ij –the ε ij are iid N(0, σ 2 )

Parameters The parameters of the model are – μ,  1,  2, …,  r –σ 2 The cell means model had r + 1 parameters –r μ’s and σ 2 The factor effects model has r + 2 parameters –μ, the r  ’s, and σ 2 –Cannot uniquely estimate all parameters

An example Suppose r=3; μ 1 = 10, μ 2 = 20, μ 3 = 30 What is an equivalent set of parameters for the factor effects model? We need to have μ +  i = μ i μ = 0,  1 = 10,  2 = 20,  3 = 30 μ = 20,  1 = -10,  2 = 0,  3 = 10 μ = 5000,  1 = -4990,  2 = -4980,  3 = -4970

Problem with factor effects? These parameters are not estimable or not well defined (i.e., unique) There are many solutions to the least squares problem There is an X΄X matrix for this parameterization that does not have an inverse (perfect multicollinearity) The parameter estimators here are biased (SAS proc glm)

Factor effects solution Put a constraint on the  i Common to assume Σ i  i = 0 This effectively reduces the number of parameters by 1 Numerous other constraints possible

Consequences Regardless of constraint, we always have μ i = μ +  i The constraint Σ i  i = 0 implies –μ = (Σ i μ i )/r (unweighted grand mean) –  i = μ i – μ (group effect) The “unweighted” complicates things when the n i are not all equal; see KNNL p

Hypotheses H 0 : μ 1 = μ 2 = … = μ r H 1 : not all of the μ i are equal are translated into H 0 :  1 =  2 = … =  r = 0 H 1 : at least one  i is not 0

Estimates of parameters With the constraint Σ i  i = 0

Solution used by SAS Recall, X΄X does not have an inverse We can use a generalized inverse in its place (X΄X) - is the standard notation There are many generalized inverses, each corresponding to a different constraint

Solution used by SAS (X΄X) - used in proc glm corresponds to the constraint  r = 0 Recall that μ and the  i are not estimable But the linear combinations μ +  i are estimable These are estimated by the cell means

Cereal package example Y is the number of cases of cereal sold X is the design of the cereal package i =1 to 4 levels j =1 to n i stores with design i

SAS coding for X Class statement generates r explanatory variables The i th explanatory variable is equal to 1 if the observation is from the i th group In other words, the rows of X are for design= for design= for design= for design=4

Some options proc glm data=a1; class design; model cases=design /xpx inverse solution; run;

Output The X'X Matrix Int d1 d2 d3 d4 cases Int d d d d cases Also contains X’Y

Output X'X Generalized Inverse (g2) Int d1 d2 d3 d4 cases Int d d d d cases

Output matrix Actually, this matrix is (X΄X) - (X΄X) - X΄Y Y΄X(X΄X) - Y΄Y-Y΄X(X΄X) - X΄Y Parameter estimates are in upper right corner, SSE is lower right corner (last column on previous page)

Parameter estimates St Par Est Err t P Int 27.2 B <.0001 d B <.0001 d B <.0001 d B d4 0.0 B...

Caution Message NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

Interpretation If  r = 0 (in our case,  4 = 0), then the corresponding estimate should be zero the intercept μ is estimated by the mean of the observations in group 4 since μ +  i is the mean of group i, the  i are the differences between the mean of group i and the mean of group 4

Recall the means output Level of design N Mean Std Dev

Parameter estimates based on means Level of design Mean = 27.2 = = = = = = = = = 0

Last slide Read KNNL Chapter 16 up to We used programs topic20.sas to generate the output for today Will focus more on the relationship between regression and one-way ANOVA in next topic