Two-way ANOVA with significant interactions

Slides:



Advertisements
Similar presentations
Analysis of Variance (ANOVA) ANOVA methods are widely used for comparing 2 or more population means from populations that are approximately normal in distribution.
Advertisements

Hypothesis Testing Steps in Hypothesis Testing:
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
© 2010 Pearson Prentice Hall. All rights reserved Single Factor ANOVA.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Basics of ANOVA Why ANOVA Assumptions used in ANOVA
Part I – MULTIVARIATE ANALYSIS
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Analysis of Variance Chapter 3Design & Analysis of Experiments 7E 2009 Montgomery 1.
Business Statistics - QBM117 Statistical inference for regression.
Relationships Among Variables
Chapter 12: Analysis of Variance
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Chapter 13: Inference in Regression
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
Psychology 301 Chapters & Differences Between Two Means Introduction to Analysis of Variance Multiple Comparisons.
1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
ETM U 1 Analysis of Variance (ANOVA) Suppose we want to compare more than two means? For example, suppose a manufacturer of paper used for grocery.
Model adequacy checking in the ANOVA Checking assumptions is important –Normality –Constant variance –Independence –Have we fit the right model? Later.
F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE.
Two-Way ANOVA Interactions. What we will cover Two-way ANOVA: Family of ANOVA tests More examples in R Looking at interaction plots How to interpret the.
Lecture notes 13: ANOVA (a.k.a. Analysis of Variance)
Stats Methods at IC Lecture 3: Regression.
Multiple Regression Analysis: Inference
Step 1: Specify a null hypothesis
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Statistical Data Analysis - Lecture /04/03
Statistical Data Analysis - Lecture /04/03
ANALYSIS OF VARIANCE (ANOVA)
Analysis of variance ANOVA.
Size of a hypothesis test
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Statistical Data Analysis - Lecture /04/03
i) Two way ANOVA without replication
Comparing Three or More Means
Statistical Data Analysis - Lecture10 26/03/03
Statistics Analysis of Variance.
Chapter 12 Tests with Qualitative Data
Analysis of Covariance (ANCOVA)
Chapter 25 Comparing Counts.
12 Inferential Analysis.
...Relax... 9/21/2018 ST3131, Lecture 3 ST5213 Semester II, 2000/2001
Internal Validity – Control through
Econ 3790: Business and Economic Statistics
CHAPTER 29: Multiple Regression*
Elementary Statistics
Comparing Three or More Means
Prepared by Lee Revere and John Large
Statistics review Basic concepts: Variability measures Distributions
Discrete Event Simulation - 4
Introduction to ANOVA.
Joanna Romaniuk Quanticate, Warsaw, Poland
One-Way Analysis of Variance
12 Inferential Analysis.
Chapter 26 Comparing Counts.
The Analysis of Variance
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Chapter 10 Introduction to the Analysis of Variance
Chapter 26 Comparing Counts.
Exercise 1 Use Transform  Compute variable to calculate weight lost by each person Calculate the overall mean weight lost Calculate the means and standard.
Chapter 10 – Part II Analysis of Variance
InferentIal StatIstIcs
One-way Analysis of Variance
MGS 3100 Business Analysis Regression Feb 18, 2016
STATISTICS INFORMED DECISIONS USING DATA
F test for Lack of Fit The lack of fit test..
Presentation transcript:

Two-way ANOVA with significant interactions Statistical Data Analysis - Lecture15 - 08/04/03 Two-way ANOVA with significant interactions In the previous lecture we saw that the full model for a two-way ANOVA includes an interaction term Remember an interaction term models the possible interaction of the two main effects If the interaction term is significant (non-zero) in our ANOVA table then we cannot say anything about the main effects in isolation because they are confounded The concept of confounding is important in statistics If two factors are confounded this means that the main effects do not alter the response independently of each other Statistical Data Analysis - Lecture15 - 08/04/03

Statistical Data Analysis - Lecture15 - 08/04/03 Example The data on the following slide come from an experiment carried out by a statistics class They planted 100 seeds in each of 48 boxes and applied two treatments The boxes were left uncovered or covered The students watered the seed boxes each day with a preset amount of water (coded levels 1 through 6) So the 48 boxes were divided into 24 left uncovered and 24 left covered. The watering levels were assigned in equal proportions, so 4 boxes out of 24 receive level 1, 4 receive level 2 and so on Therefore the design is balanced (equal numbers of replicates for each treatment combination) At the end of two weeks the students counted how many seeds had germinated in each box Statistical Data Analysis - Lecture15 - 08/04/03

Statistical Data Analysis - Lecture15 - 08/04/03 Example In this experiment: The number of seeds that germinated is our response LIGHT and WATER are our two factors The factor LIGHT has two levels (uncovered/covered) The factor WATER has six levels corresponding to the amount of water given Therefore our probability model is In words we say that “the number of seeds that germinate is dependent on whether the box is covered or not and the amount of water the box receives plus some effect due to the interaction of these two factors. We expect the residuals in this model to be Normally distributed with mean zero and variance sigma squared” Statistical Data Analysis - Lecture15 - 08/04/03

Statistical Data Analysis - Lecture15 - 08/04/03 What do we know? What do we know before we carry out this experiment? Plants need light to live (to carry out photosynthesis) Plants need water to live Plants do not survive (well) on water alone Plants do not survive at all on light alone Too much water is not good for plants that don’t live in the water Therefore, we expect that there will be an interaction between the factors LIGHT and WATER Statistical Data Analysis - Lecture15 - 08/04/03

Two way interaction with ANOVA Statistical Data Analysis - Lecture15 - 08/04/03

Two-way ANOVA with missing values From the table on the previous slide we can see that the experimenters did not get a result from the 4th uncovered replicate with watering level 5 Some statistical packages have a little trouble with missing values or unbalanced designs. Minitab does. R does not. In general an experiment with two factors with levels I and J respectively is balanced if is an integer, where Statistical Data Analysis - Lecture15 - 08/04/03

Statistical Data Analysis - Lecture15 - 08/04/03 Two-way ANOVA in R Read in the data seeds<-read.csv(“seeds.csv”) Make sure the factors are coded as factors seeds$Light<-as.factor(seeds$Light) seeds$Water<-as.factor(seeds$Water) Fit the model fit<-aov(Count~Light*Water,data=seeds) Check the model plot(fit) Get the ANOVA table anova(fit) Statistical Data Analysis - Lecture15 - 08/04/03

Statistical Data Analysis - Lecture15 - 08/04/03 Analysis of Variance Table Response: Count Df Sum Sq Mean Sq F value Pr(>F) Light 1 3.6 3.6 0.0857 0.7715 Water 5 28995.0 5799.0 136.7303 < 2.2e-16 *** Light:Water 5 6017.8 1203.6 28.3779 2.174e-11 *** Residuals 35 1484.4 42.4 --- From the ANOVA table we can see that the P-value for the hypothesis test that the interactions are zero is <0.001 I.e. there is very strong evidence against the null hypothesis that the interactions are zero This is what we expected to see However, do the model assumptions hold? If the responses can range from zero to one hundred (no seeds germinated – all seeds germinated), then this is a clear violation of the assumption of normality Does it matter? Statistical Data Analysis - Lecture15 - 08/04/03

Statistical Data Analysis - Lecture15 - 08/04/03

Statistical Data Analysis - Lecture15 - 08/04/03

Normality assumptions Norplot is approximately linear Therefore the assumption of Normality whilst not strictly true is at least not violated too badly The least squares line (regressing residuals on Z-scores) is of y = -1.4E-15 + 5.56997x, or approximately y = 0 + 5.6x = 5.6x I.e. the residuals are approximately normally distributed with mean 0 and standard deviation 5.6 Therefore +/- 3sd = +/-16.8 We have one residual of –17.25 (z-score 3.08) so not too bad Statistical Data Analysis - Lecture15 - 08/04/03

Statistical Data Analysis - Lecture15 - 08/04/03

Statistical Data Analysis - Lecture15 - 08/04/03 Pred-res plot The pred-res plot is hard to interpret with only 4 observations per group We might be concerned about the water level 6 data (all the counts are zero) What happens when we analyse the experiment without it? Use the subs parameter in aov, e.g. fit2<-aov(Count~Light*Water,data=seeds ,subs=-((1:48)[Water==6])) Makes little difference to the ANOVA table or the residual plots Statistical Data Analysis - Lecture15 - 08/04/03

A graphical display of two-way data (interaction plots) We can construct LSD plots for two-way data The process is just the same as for one-way, but now we have a mean for each combination of the factor levels, i.e. E.g. we have 2 levels for light and 6 levels for water, so we have 12 fitted means and standard deviations. The degrees of freedom are now min(nij-1, i=1..I, j=1..J) (3 for our example) However, it may be more informative to look at a plot of how the means vary with the combinations of the factors Statistical Data Analysis - Lecture15 - 08/04/03

Statistical Data Analysis - Lecture15 - 08/04/03 Interaction plots Interaction plots plot the means for each combination of the levels of the factors The plot places the levels of one factor on the x-axis and the mean levels of the other factor on the y-axis, i.e Factors 1 to I on the x-axis on the y-axis A point is then plotted for each data mean It is traditional to join the means with the same j level in the y-axis factor with a line I.e. join up If the lines are parallel and not overlapping then there is little or no evidence of an interaction in the data If the lines overlap or are not parallel, then this is taken as evidence of a significant interaction For the examples we’re considering it should make no difference whether we use data means or fitted means Statistical Data Analysis - Lecture15 - 08/04/03

Statistical Data Analysis - Lecture15 - 08/04/03

Statistical Data Analysis - Lecture15 - 08/04/03 Interpretation We know from our ANOVA table that the interaction is significant This is represented by the overlapping lines We can see that the uncovered boxes have more successful germinations with less water than the uncovered boxes We can see that water level 6 is fatal regardless of light We can see that light is less important than water (in that we can achieve similar germination levels with more water and no light It looks like the covered boxes may be more robust to water level Statistical Data Analysis - Lecture15 - 08/04/03