STA305 week21 The One-Factor Model Statistical model is used to describe data. It is an equation that shows the dependence of the response variable upon.

Slides:



Advertisements
Similar presentations
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 10 The Analysis of Variance.
Advertisements

BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
Chapter 11 Analysis of Variance
1 Chapter 4 Experiments with Blocking Factors The Randomized Complete Block Design Nuisance factor: a design factor that probably has an effect.
Chapter 4 Randomized Blocks, Latin Squares, and Related Designs
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
© 2010 Pearson Prentice Hall. All rights reserved Single Factor ANOVA.
Part I – MULTIVARIATE ANALYSIS
Chapter 3 Analysis of Variance
Chapter 3 Experiments with a Single Factor: The Analysis of Variance
Lecture 9: One Way ANOVA Between Subjects
Lecture 12 One-way Analysis of Variance (Chapter 15.2)
Incomplete Block Designs
Experimental Evaluation
The Analysis of Variance
Inferences About Process Quality
Statistical Methods in Computer Science Hypothesis Testing II: Single-Factor Experiments Ido Dagan.
Chapter 12: Analysis of Variance
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Regression Analysis (2)
QNT 531 Advanced Problems in Statistics and Research Methods
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Comparing Three or More Means 13.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter 10 Analysis of Variance.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.
Testing Hypotheses about Differences among Several Means.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Analysis of Variance Statistics for Managers Using Microsoft.
INTRODUCTION TO ANALYSIS OF VARIANCE (ANOVA). COURSE CONTENT WHAT IS ANOVA DIFFERENT TYPES OF ANOVA ANOVA THEORY WORKED EXAMPLE IN EXCEL –GENERATING THE.
STA305 week 51 Two-Factor Fixed Effects Model The model usually used for this design includes effects for both factors A and B. In addition, it includes.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Analysis of Variance (One Factor). ANOVA Analysis of Variance Tests whether differences exist among population means categorized by only one factor or.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.1 One-Way ANOVA: Comparing.
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
© Copyright McGraw-Hill 2004
Experimental Statistics - week 3
Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.
One-Way Analysis of Variance Recapitulation Recapitulation 1. Comparing differences among three or more subsamples requires a different statistical test.
The Mixed Effects Model - Introduction In many situations, one of the factors of interest will have its levels chosen because they are of specific interest.
Topic 22: Inference. Outline Review One-way ANOVA Inference for means Differences in cell means Contrasts.
Chapters Way Analysis of Variance - Completely Randomized Design.
1/54 Statistics Analysis of Variance. 2/54 Statistics in practice Introduction to Analysis of Variance Analysis of Variance: Testing for the Equality.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Chapter 15 Analysis of Variance. The article “Could Mean Platelet Volume be a Predictive Marker for Acute Myocardial Infarction?” (Medical Science Monitor,
Chapter 11 Analysis of Variance
Virtual University of Pakistan
ANOVA Econ201 HSTS212.
CHAPTER 13 Design and Analysis of Single-Factor Experiments:
i) Two way ANOVA without replication
Comparing Three or More Means
Basic Practice of Statistics - 5th Edition
Statistics Analysis of Variance.
Chapter 8: Inference for Proportions
Statistics for Business and Economics (13e)
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Test Review: Ch. 7-9
Chapter 9 Hypothesis Testing.
One-Way Analysis of Variance
Chapter 10 – Part II Analysis of Variance
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

STA305 week21 The One-Factor Model Statistical model is used to describe data. It is an equation that shows the dependence of the response variable upon the levels of the treatment factors. Let Y ij be a random variable that represents the response obtained on the j-th observation of the i-th treatment. Let μ denote the overall expected response. The expected response for an experimental unit in the i-th treatment group is μ i = μ + τ i τ i is deviation of i-th mean from overall mean; it is referred to as the effect of treatment i.

STA305 week22 The model is where is the deviation of the individual’s response from the treatment group mean. is known as the random or experimental error.

STA305 week23 Fixed Effects versus Random Effects In some cases the treatments are specifically chosen by the experimenter from all possible treatments. The conclusions drawn from such an experiment apply only to these treatments and cannot be generalized to other treatments not included in experiment. This is called a fixed effects model In other cases, the treatments included in the experiment can be regarded as a random selection from the set of all possible treatments. In this situation, conclusions based on the experiment can be generalized to other treatments. When the treatments are random sample, treatment effects, τ i are random variables. This model is called a random effects model or a components of variance model. The random effects model will be studied after the fixed effects model

STA305 week24 More about the Fixed Effects Model As specified in slide (2) the model is Where are i.i.d. with distribution N(0, σ 2 ) It follows that response of experimental unit j in treatment group i, Y ij, is normally distributed with In other words

STA305 week25 Treatment Effects Recall that treatment effects have been defined as deviations from overall mean, and so the model can be parameterized so that: In the special case where r 1 = r 2 = · · · = r a = r this condition reduces to The hypothesis that there is no treatment effect can be expressed mathematically as: H 0 : μ 1 = μ 2 = · · · = μ a H a : not all μ i are equal This can be expressed equivalently in terms of the τ i : H 0 : τ 1 = τ 2 = · · · = τ a = 0 H a : not all τ i are equal to 0

STA305 week26 ’Dot’ Notation “Dot” notation will be used to denote treatment and overall totals, as well as treatment and overall means. The sum of all observations in the i-th treatment group will be denoted as Similarly, the sum of all responses in all treatment groups is denoted: The treatment and overall means are:

Rationale for Analysis of Variance Consider all of the data from the a treatment groups as a whole. The variability in the data may come from two sources: 1) treatment means differ from overall mean, this is called between group variability. 2) within a given treatment group individual observations differ from group mean, this is called within group variability. STA305 week27

Total Sum of Squares Total variation in data set as a whole is measured by the total sum of squares. It is given by Each deviation from the overall sample mean can be expressed as the sum of 2 parts: 1) deviation of the observation from the group mean. 2) deviation of the group mean from the overall mean In other words… The SS T can then be written as… STA305 week28

Expected Sums of Squares Finding the expected value of the sums of squares for error and treatment will lead us to a test of the hypothesis of no treatment effect, i.e., H 0 : τ 1 = τ 2 = · · · = τ a = 0 We start by finding the expected value of SSE…. We continue with the expected value of SS Treat STA305 week29

Mean Squares As we have seen in the calculation above, the MSE = SSE/(n − a) is an unbiased estimator of σ 2. The MSE is called the mean square for error. The degrees of freedom associated with SSE are n − a and it follows that E(MSE) = σ 2. The mean square for treatment is defined to be: MS Treat = SS Treat / (a-1). The expected value of MS Treat is STA305 week210

Hypothesis Testing Recall that our goal is to test whether there is a treatment effect. The hypothesis of interest is H 0 : τ 1 = τ 2 = · · · = τ a = 0 H a : not all τ i are equal to 0 Notice that if H 0 is true, then On the other hand, if H 0 is false, then at least one τ a ≠ 0, in which case and so E (MS Treat ) > E (MS E ) On average, then, the ratio MS Treat /MS E should be small if H 0 is true, and large otherwise. We use this to develop formal test. STA305 week211

Cochran’s Theorem Let Z 1,Z 2,...,Z n be i.i.d. N(μ, 1). Suppose that where Q j has d.f v j. A necessary and sufficient condition for the Q j to be independent of one another, and for Q j ~ χ 2 (v j ) is that. Cochran’s theorem implies that SS E /σ 2 and SS Treat / σ 2 have independent χ 2 distributions with n – a and a − 1 d.f., respectively. Recall: If X 1 and X 2 are two independent random variables, each with a χ 2 distribution, then STA305 week212

Hypothesis Test for Treatment Effects Cochran’s theorem and the result just stated provide the tools to construct a formal hypothesis test of no treatment effects. The Hypothesis again are: H 0 : τ 1 = τ 2 = · · · = τ a = 0 H a : not all τ i are equal to 0 The Test Statistic is: F obs = MS Treat /MS E Note that if H 0 is true, then F obs ~ F(a − 1, n − a). So the P-value = P(F(a − 1, n − a) > F obs ). We reject H 0 in favor of H a if P−value < α. Alternatively, reject H 0 in favor of H a if F obs > F α (a − 1, n − a), where F α (a − 1, n − a) is the upper 100 × α%-ile point of the F(a − 1, n − a) distribution. STA305 week213

Analysis of Variance Table STA305 week214 The results of the calculations and the hypothesis testing are best summarized in an analysis of variance table The ANOVA Table is given below

Estimable Functions of Parameters A function of the model parameters is estimable if and only if it can be written as the expected value of a linear combination of the response variables. In other words, every estimable function is of the form where the c ij are constants It can be shown that from previous sections, μ, μ i, and σ 2 are estimable. STA305 week215

Example - Effectiveness of Three Methods for Teaching a Programming Language A study was conducted to determine whether there is any difference in the effectiveness of 3 methods of teaching a particular programming language. The factor levels (treatments) are the three teaching methods: 1) on-line tutorial 2) personal attention of instructor plus hands-on experience 3) personal attention of instructor, but no hands-on experience Replication and Randomization: 5 volunteers were randomly allocated to each of the 3 teaching methods, for a total of 15 study participants. Response Variable: After the programming instruction, a test was administered to determine how well the students had learned the programming language. Research Question: Do the data provide any evidence that the instruction methods differ with respect to test score. The data and the solutions are…. STA305 week216

Conducting an ANOVA in SAS There are several procedures in SAS that can be used to do an analysis of variance. PROC GLM (for generalized linear model) will be used in this course To do the analysis for the Example on slide 16, start by creating a SAS dataset: data teach ; input method score ; cards ; ; run ; STA305 week217

Use this dataset to conduct an ANOVA using the following SAS code: proc glm data = teach ; class method ; model score = method / ss3 ; run ; quit ; The output produced by this procedure is given in the next slide. STA305 week218

STA305 week219

Estimating Model Parameters The ANOVA indicates whether there is a treatment effect, however, it doesn’t provide any information about individual treatments or how treatments compare with each other. To better understand outcome of experiment, estimating mean response for each treatment group is useful. Also, it is useful to obtain an estimate of how much variability there is within each treatment group. This involves estimating model parameters. STA305 week220

Variability Recall, on slides (9 and 10) we have showed that the MS E is unbiased estimator of σ 2. Further, Cochran’s Theorem was used to show that SS E / σ 2 ~ χ 2 (n − a). We can use this result to calculate a 100 × (1 − α)% confidence interval for σ 2. The CI is give by where and are the upper and lower percentage points of the χ 2 distribution with n − a d.f., respectively. STA305 week221

Overall Mean As discussed in the beginning, the overall expected value is μ. Show that is unbiased estimator of μ… The variance of is σ 2 /n. So the 100 × (1 −α)% confidence interval for μ is: Further, a 100 × (1 −α)% confidence interval for μ i is: It follows that is an unbiased estimator of the effect of treatment i, τ i. STA305 week222

Differences between Treatment Groups Differences between specific treatment groups will be important from researcher’s point of view. The expected difference in response between treatment groups i and j is: μ i − μ j = τ i – τ j. Since treatment groups are independent of each other, it follows that Therefore, a 100 × (1 −α)% confidence interval for τ i – τ j is: STA305 week223

Example - Methods for Teaching Programming Language Cont’d Back to the example of three teaching methods and their effect on programming test score. Based on the ANOVA developed earlier, we found significant difference between the three methods. Which method had the highest average? What is a 95% CI for mean difference in test scores for the 2 instructor-based methods? STA305 week224

Comparisons Among Treatment Means As mentioned above, ANOVA will indicate whether there is significant effect of treatments overall it doesn’t indicate which treatments are significantly different from each other. There are a number of methods available for making pairwise comparisons of treatment means. STA305 week225

Least Significant Difference (LSD) This method tests the hypothesis that all treatment pairs have the same mean against the alternative that at least one pair differs, that is the hypothesis are: H 0 : μ i − μ j = 0 for all i, j H a : μ i − μ j ≠ 0 for at least one pair i, j In testing difference between any two specific means, reject the null hypothesis if: In the case where the design is balanced and r i = r for all i, the condition above becomes: STA305 week226

In other words, the smallest difference between the means that would be considered statistically significant is: This quantity, LSD, is called the least significant difference. LSD method requires that the difference between each pair of means be compared to the LSD. In cases where difference is greater than LSD, we conclude that treatment means differ. STA305 week2 27

Important Notes As in any situation where large number of significance tests conducted, the possibility of finding large difference due to chance alone increases. Therefore, in case where the number of treatment groups is large, the probability of making this type of error is relatively large. In other words, probability of committing a Type I error will be increased above α. Further, although the ANOVA F-test might find a significant treatment effect, LSD method might conclude that there are no 2 treatment means that are significantly different from each other. This is because ANOVA F-test considers overall trend of effect of treatment on outcome, and is not restricted to pairwise comparisons. STA305 week228

Other Methods for Pairwise Comparisons Other methods for conducting pairwise comparisons are available. The methods that are implemented in PROC GLM in SAS include: – Bonferonni – Duncan’s Multiple Range Test – Dunnett’s procedure – Scheffe’s method – Tukey’s test – several otheres Chapter 4 of Dean & Voss discusses some of these methods. STA305 week229

Pairwise Comparisons in SAS Pairwise comparisons can be requested by including a means statement. The code below requests means with LSD comparison: proc glm data = teach ; class method ; model score = method / ss3 ; means method / lsd cldiff ; run ; The part of the output containing the pairwise comparisons is shown in the next slide. STA305 week230

STA305 week231

STA305 week232

Contrasts ANOVA test indicates only whether there is an overall trend for the treatment means to differ, and does not indicate specifically which treatments are the same, which are different, etc. In the last few slides looked at pairwise comparisons between treatment means. However, comparisons that are of interest to researcher may include more then just two group. They can be linear combination of means. STA305 week233

Example - Does Food Decrease Effectiveness of Pain Killers? Researchers at pain clinic want to know whether effectiveness of two leading pain killers is same when taken on empty stomach as when taken with food. A study with four treatment groups was designed: 1. aspirin with no food 2. aspirin with food 3. tylenol with no food 4. tylenol with food In addition to determining whether there is a difference between the four treatment groups, researchers want to determine whether there is a difference between taking medication with food and taking it without. This second hypothesis can be expressed statistically as: H 0 : μ 1 + μ 3 = μ 2 + μ 4 H a : μ 1 + μ­ ≠ μ 2 + μ 4 STA305 week234

The point estimate of difference between fed and not fed conditions is based on sample means: STA305 week235

Hypothesis Tests Using Contrasts As in the example on the previous slide, the comparison of treatment means that is of interest might be a linear combination of means. That is, the hypothesis of interest would be of the form H 0 : c 1 μ 1 + c 2 μ 2 + · · · + c a μ a = 0 H a : c 1 μ 1 + c 2 μ 2 + · · · + c a μ a ≠ 0 The c i are constants subject to the constraints: (i) c i > 0 for all i, and (ii) Test of this hypothesis can be constructed using sample means for each treatment group. The linear combination c 1 μ 1 + c 2 μ 2 + · · · + c a μ a is called a contrast. STA305 week236

If the assumptions of the model are satisfied, then: If σ 2 was known, a test of H 0 could be done using: Since σ 2 is unknown, we use its unbiased estimate, the MS E, and conduct a t-test with n − a d.f.. The test statistics is Recall, if X is a random variable with t(v) distribution, then X 2 has F(1, v) distribution. STA305 week237

So an equivalent test statistic is: At level α, reject H 0 in favour of H a if F obs > F α (1, n − a), or equivalently if |t obs | > t α/2 (n − a). The sum of squares for contrast is: Each contrast has 1 d.f., so the mean square for contrast is: MS contrast = SS contrast /1 STA305 week238

Summary The hypothesis: H 0 : c 1 μ 1 + c 2 μ 2 + · · · + c a μ a = 0 H a : c 1 μ 1 + c 2 μ 2 + · · · + c a μ a ≠ 0 Test Statistic Decision Rule: reject H 0 if F obs > F α (1, n − a) STA305 week239

Orthogonal Contrasts Very often more than one contrast will be of interest. Further, it is possible that one research question will require more than one contrast, i.e., H 0 : μ 1 = μ 3 and μ 2 = μ 4 Ideally, we want tests about different contrasts to be independent of each other. Suppose that the two contrasts of interest are: c 1 μ 1 + c 2 μ 2 + · · · + c a μ a and d 1 μ 1 + d 2 μ 2 + · · · + d a μ a. These two contrasts are orthogonal to each other they iff they satisfy: If there are a treatments then, SS Treat can be decomposed into set of a − 1 orthogonal contrasts, each with 1 d.f. as follows SS Treat = SS contrast1 + SS contrast2 + · · · + SS contrasta−1. Unless a = 2, there will be more than one set of orthogonal contrasts. STA305 week240

Example - Food / Pain Killers Continued Refer back to the example on slide 31. The study designed with 4 treatment groups. The treatment sum of squares can be decomposed into 3 orthogonal contrasts. Since researcher interested in difference between fed & unfed, makes sense to use the following contrasts: STA305 week241

Exercise: verify that each is in fact a contrast. Exercise: verify that contrasts are orthogonal. Note, there is more than one way to decompose treatment sum of squares into set of orthogonal contrasts. For example, instead of comparing aspirin and Tylenol, might be interested in comparing food with no food. In this case, compare (i) aspirin with food and Tylenol with food, (ii) aspirin without food and Tylenol without food, and (iii) the 2 food groups to the 2 no-food groups. STA305 week242

ANOVA Table for Orthogonal Contrasts Contrasts to be used in experiment must be chosen at the beginning of the study. The hypotheses to be tested should not be selected after viewing the data. Once the treatment SS has been decomposed using preplanned orthogonal contrasts, the ANOVA table can be expanded to show decomposition as shown in the next slide. STA305 week243

STA305 week244

Example - Pressure on a Torsion Spring STA305 week245

The figure above shows a diagram of a torsion spring. Pressure is applied to arms to close the spring. A study has been designed to examine pressure on torsion spring. Five different angles between arms of spring will be studied to determined their impact on the pressure: 67º, 71 º, 75 º, 79 º, and 83 º. Researchers are interested in whether there is an overall difference between different angle settings. In addition would like to study set of orthogonal contrasts which compares the 2 smallest angles to each other and 2 largest angles to each other. The data collected are shown in the following slide. STA305 week246

Torsion Spring Data STA305 week247

Solution STA305 week248

Contrasts in SAS To do the analysis for the last example, start by creating a SAS dataset: data torsion ; input angle pressure; cards ; ; run ; STA305 week249

Here is an additional code that is required to specify the contrasts of interest: proc glm data = torsion ; class angle ; model pressure = angle / ss3 ; contrast ’67-71’ angle ; contrast ’79-83’ angle ; contrast ’sm vs lg’ angle ; contrast ’mid vs oth’ angle ; run ; quit ; STA305 week250

The ANOVA part of the output is not shown here. The part of the output generated by the contrast statements looks like this: Contrast DF Contrast SS Mean Square F Value Pr>F sm vs lg < mid vs oth STA305 week251