Stat Today: Multiple comparisons, diagnostic checking, an example After these notes, we will have looked at (skip figures 1.2 and 1.3, last two paragraphs of section 1.3), 1.6 (skip matrix notation and constraints), 1.7 (Tukey method only) and 1.9 (ignore H matrix notation on page 35), 2.1, 2.2 We will not do 1.5 nor 1.8 Assignment 1:
Multiple Comparisons In previous example, we saw that there was a significant treatment effect…so what? If an ANOVA is conducted and the analysis suggests that there is a significant treatment effect, then a reasonable question to ask is
Multiple Comparisons Would like to see if there is a difference between treatments i and j Can use two-sample t-test statistic to do this For testing reject if Perform many of these tests
Multiple Comparisons Perform many of these tests Error rate must be controlled
Tukey Method Tests: Confidence Interval:
Back to Example
Diagnostic Checking – Residual Analysis To support the assumptions on which the analysis is based, we need to check for –have all effects been captured? –unequal variances –non-Normality –sequence effects Should do this before hypothesis testing and multiple comparisons The data plot (limited data) shows no strong evidence of non- Normality or unequal variances
Diagnostic Checking ANOVA model: Predicted response:, where – Residual: Estimates error
Diagnostic Plots Errors are assumed to be normally distributed –Useful plot Errors assumed to be independent –Useful plot Equal variances in each group –Useful plot
Normality Check Dot plot or histogram of residuals Normal probability plot of residuals (via software or by hand - see class handout)
Independence Check Plot residuals in the time sequence in which the data were collected X-axis denotes the sequence, Y-axis denotes the residual values Should observe
Independence Check Suppose the sequence of the observations (going across rows from top to bottom in the tabled data) is 1, 2, 11, 9, 5, 7, 6, 3, 4, 12, 10, 8
Equal Variances A useful plot is: Should observe:
Equal Variances
Comments The F-test is fairly robust – it is not very sensitive to departures from the assumption of Normal distributions. Often, simple transformations, such as the logarithm or square root, can make the Normal distribution assumption and the equal variance assumption more appropriate (Chapter 2)
Summary: Completely Randomized Design, One-Way ANOVA Method: Random assignment of treatments to experimental units ANOVA: Compare variation among treatments to variation within treatments to assess evidence of a difference among treatments Investigate and identify differences among Treatments, if any. Act on the findings
Comment: One-Way Model The one-way model, y ij = + i + e ij, e ij ~NID(0, 2 ) can be and is applied to data obtained in ways other than a completely randomized design Example: starting salaries for MBAs at different companies. Company is not a treatment that is applied to experimental units Analyzing the data according to the above model can answer whether apparent differences between companies are real or could be just due to chance. The randomness involved comes from the randomness of the hiring and salary-determination processes, not the random assignment of treatments to experimental units
General Linear Model ANOVA model can be viewed as a special case of the general linear model or regression model Suppose have response, y, which is thought to be related to p predictors (sometimes called explanatory variables or regressors) Predictors: x 1, x 2,…,x p Model:
Example: Rainfall (Exercise 2.16) In winter, a plastic rain gauge cannot be used to collect precipitation because it will freeze and crack. Instead, metal cans are used to collect snowfall and the snow is allowed to melt indoors. The water is then poured into a plastic rain gauge and a measurement recorded. An estimate of snowfall is obtained by multiplying this measurement by One observer questions this and decides to collect data to test the validity of this approach For each rainfall in a summer, she measures: (i) rainfall using a plastic rain gauge, (ii) using a metal can What is the current model being used?
Example: Rainfall (Exercise 2.16)
Seems to be a linear relationship Will use regression to establish linear relationship between x and y What should the slope be?
Example: Rainfall (Exercise 2.16)