Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 10 – Part II Analysis of Variance

Similar presentations


Presentation on theme: "Chapter 10 – Part II Analysis of Variance"— Presentation transcript:

1 Chapter 10 – Part II Analysis of Variance

2 Chapter Outline An introduction to experimental design and analysis of variance Analysis of Variance and the completely randomized design

3 An Introduction to Experimental Design and Analysis and Variance
Statistical studies can be classified as being either experimental or observational. In an experimental study, one or more factors are controlled so that data can be obtained about how the factors influence the variables of interest. In an observational study, no attempt is made to control the factors. Cause and effect relationship are easier to establish in experimental studies than in observational studies. Analysis of variance (ANOVA) can be used to analyze the data obtained from experimental or observational studies.

4 An Introduction to Experimental Design and Analysis and Variance
Three types of experimental designs are introduced. A completely randomly design A randomized block design A factorial experiment

5 An Introduction to Experimental Design and Analysis and Variance
A factor is a variable that the experimenter has selected for investigation. A treatment is a level of a factor For example, if location is a factor, then a treatment of location can be New York, Chicago, or Seattle. Experimental units are the objects of interest in the experiment. A completely randomized design is an experimental design in which the treatments are randomly assigned to the experimental units.

6 Analysis of Variance: A Conceptual Overview
Analysis of Variance (ANOVA) can be used to test for the equality of three or more population means. Data obtained from observational or experimental studies can be used for the analysis. We want to use the sample results to test the following hypothesis: H0: 1=2=3= = k Ha: Not all population means are equal

7 Analysis of Variance: A Conceptual Overview
H0: 1=2=3= = k Ha: Not all population means are equal If H0 is rejected, we cannot conclude that all population means are different. Rejecting H0 means that at least two population means have different values.

8 Analysis of Variance: A Conceptual Overview
Assumptions for Analysis of Variance For each population, the response (dependent) variable is normally distributed. The variance of the response variable, denoted  2, is the same for all of the populations. The observations must be independent.

9 Analysis of Variance: A Conceptual Overview
Sampling Distribution of Given H0 is True If H0 is true, all the populations have the same mean. It is also assumed that all the populations have the same variance. Therefore, all the sample means are drawn from the same sampling distribution. As a result, the sample means tend to be close to one another. Sample means are likely to be close to the same population mean if H0 is true.

10 Analysis of Variance: A Conceptual Overview
Sampling Distribution of Given H0 is False When H0 is false, sample means are drawn from different populations. As a result, sample means tend NOT to be close together. Instead, they tend to be close to their own population means. 3 1 2

11 Analysis of Variance Between-treatments estimate of population variance Within-treatments estimate of population variance Comparing the variance estimates: The F test ANOVA table

12 Between-Treatments Estimate of Population Variance s 2
The estimate of s 2 based on the variation of the sample means is called the mean square due to treatments and is denoted by MSTR. Numerator is called the sum of squares due to treatments (SSTR) Denominator is the degrees of freedom associated with SSTR

13 Between-Treatments Estimate of Population Variance s 2
k is the number of treatments (# of samples) is the number of observations in treatment j is the sample mean of treatment j is the overall mean, i.e. the average value of ALL the observations from all the treatments

14 Within-Treatments Estimate of Population Variance s 2
The estimate of s 2 based on the variation of the sample observations within each sample is called the mean square due to error and is denoted by MSE. Numerator is called the sum of squares due to error (SSE) Denominator is the degrees of freedom associated with SSE

15 Within-Treatments Estimate of Population Variance s 2
k is the number of treatments (# of samples) is the number of observations in treatment j is the sample variance of treatment j nT is the total number of ALL the observations from all the treatments

16 Comparing the Variance Estimates: The F Test
Because the within-treatments estimate (MSE) of s 2 only involves sample variances, all of which are unbiased estimates of the population variance (according to the assumptions, all the population variances are the same), MSE is a good estimate of population variance regardless whether H0 is true or not. On the other hand, the between-treatments estimate (MSTR), which uses sample means, will be a good estimate of s 2 if H0 is true, since all the sample means are drawn from the same population when H0 is true. When H0 is false, the sample means are drawn from different populations (with different µ). Therefore, MSTR will overestimate s 2 since the sample means will not be close together.

17 Comparing the Variance Estimates: The F Test
If the null hypothesis is true and the ANOVA assumptions are valid, the sampling distribution of MSTR/MSE is an F distribution with MSTR degrees of freedom (d.f.) equal to k -1 and MSE d.f. equal to nT - k. If H0 is true, MSTR/MSE should be close to 1 since both are good estimates of s 2. If H0 is false, i.e. if the means of the k populations are not equal, the ratio MSTR/MSE will be larger than 1 since MSTR overestimates s 2 . Hence, we will reject H0 if the value of MSTR/MSE proves to be too large to have been resulted at random from the appropriate F distribution.

18 Comparing the Variance Estimates: The F Test
Sampling Distribution of MSTR/MSE Sampling Distribution of MSTR/MSE Reject H0 Do Not Reject H0 a MSTR/MSE F Critical Value

19 ANOVA Table Source of Variation Sum of Squares Degrees of Freedom Mean
p- Value F Treatments SSTR k - 1 Error SSE nT - k Total SST nT - 1 SST is partitioned into SSTR and SSE. SST’s degrees of freedom (d.f.) are partitioned into SSTR’s d.f. and SSE’s d.f.

20 ANOVA Table SST divided by its degrees of freedom nT – 1 is the
overall sample variance that would be obtained if we treated the entire set of observations as one data set. With the entire data set as one sample, the formula for computing the total sum of squares, SST, is:

21 ANOVA Table ANOVA can be viewed as the process of partitioning
the total sum of squares and the degrees of freedom into their corresponding sources: treatments and error. Dividing the sum of squares by the appropriate degrees of freedom provides the variance estimates. The F value (MSTR/MSE) is used to test the hypothesis of equal population means.

22 Test for the Equality of k Population Means
Hypotheses H0: 1=2=3= = k Ha: Not all population means are equal Test Statistic F = MSTR/MSE

23 Test for the Equality of k Population Means
Rejection Rule Reject H0 if p-value < a p-value Approach: Critical Value Approach: Reject H0 if F > Fa where the value of F is based on an F distribution with k - 1 numerator d.f. and nT - k denominator d.f.

24 Test for the Equality of k Population Means: An Observational Study
Example: Reed Manufacturing Janet Reed would like to know if there is any significant difference in the mean number of hours worked per week for the department managers at her three manufacturing plants (in Buffalo, Pittsburgh, and Detroit). An F test will be conducted using a = .05.

25 Test for the Equality of k Population Means: An Observational Study
Example: Reed Manufacturing A simple random sample of five managers from each of the three plants was taken and the number of hours worked by each manager in the previous week is shown on the next slide. Factor Manufacturing plant Treatments Buffalo, Pittsburgh, Detroit Experimental units Managers Response variable Number of hours worked

26 Test for the Equality of k Population Means: An Observational Study
Plant 1 Buffalo Plant 2 Pittsburgh Plant 3 Detroit Observation 1 2 3 4 5 48 54 57 62 73 63 66 64 74 51 63 61 54 56 Sample Mean Sample Variance

27 Test for the Equality of k Population Means: An Observational Study
1. Develop the hypotheses. H0:  1= 2= 3 Ha: Not all the means are equal where:  1 = mean number of hours worked per week by the managers at Plant 1  2 = mean number of hours worked per week by the managers at Plant 2   3 = mean number of hours worked per week by the managers at Plant 3

28 Test for the Equality of k Population Means: An Observational Study
2. Specify the level of significance. a = .05 3. Compute the value of the test statistic. Mean Square Due to Treatments (Only when sample sizes are all equal, the overall mean is equal to the average of sample means.) = ( )/3 = 60 SSTR = 5( )2 + 5( )2 + 5( )2 = 490 MSTR = 490/(3 - 1) =

29 Test for the Equality of k Population Means: An Observational Study
3. Compute the value of the test statistic. (con’t.) Mean Square Due to Error SSE = 4(26.0) + 4(26.5) + 4(24.5) = 308 MSE = 308/(15 - 3) = F = MSTR/MSE = 245/ =

30 Test for the Equality of k Population Means: An Observational Study
ANOVA Table Source of Variation Sum of Squares Degrees of Freedom Mean Square F p-Value 245 25.667 Treatment Error Total 490 308 798 2 12 14 9.55 .0033

31 Test for the Equality of k Population Means: An Observational Study
p – Value Approach 4. Compute the p –value. With 2 numerator d.f. and 12 denominator d.f., the p-value is for F = 9.55. 5. Determine whether to reject H0. The p-value < .05, so we reject H0. We have sufficient evidence to conclude that the mean number of hours worked per week by department managers is not the same at all 3 plants.

32 Test for the Equality of k Population Means: An Observational Study
Critical Value Approach 4. Determine the critical value and rejection rule. Based on an F distribution with 2 numerator d.f. and 12 denominator d.f., F.05 = 3.89. Reject H0 if F > 3.89 (critical value) 5. Determine whether to reject H0. Because F = 9.55 > 3.89, we reject H0. We have sufficient evidence to conclude that the mean number of hours worked per week by department managers is not the same at all 3 plants.


Download ppt "Chapter 10 – Part II Analysis of Variance"

Similar presentations


Ads by Google