Download presentation
Presentation is loading. Please wait.
Published byMilo Potter Modified over 9 years ago
1
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis of Variance (ANOVA)
2
HAWKES LEARNING SYSTEMS math courseware specialists To learn the meaning of ANOVA. Objectives Analysis of Variance (ANOVA) 14.2 Analysis of Variance (ANOVA)
3
HAWKES LEARNING SYSTEMS math courseware specialists Estimating Means and Proportions 14.2 Analysis of Variance (ANOVA) ANOVA In regression analysis, we break apart the total variation in the dependent variable into two parts: SSR (the part explained by the model) and SSE (the part not explained by the model). We will do the same concerning the differences in means with one piece being the sum of squares for treatments, SST (variation attributed to treatments) and sum of squares for error, SSE (variation not explained by the sum of squares for treatment). where The total variation can be summarized by the sample variance which is given by is the number of observations in the j th treatment, is the number of treatments, and is the total number of observations in all samples is the sample average of all of the observations, Analysis of Variance (ANOVA) 14.2 Analysis of Variance (ANOVA)
4
HAWKES LEARNING SYSTEMS math courseware specialists Analysis of Variance (ANOVA) 14.2 Analysis of Variance (ANOVA) ANOVA The numerator of s 2 is called the total sum of squares or SSTotal since it describes the total variation among all of the sample observations. While the denominator gives the degrees of freedom associated with SSTotal, N - 1. The total variation can then be divided into two components, variation attributed to treatments and random variation. The first component is called the sum of squares for treatments or SST, which is given by SST is a summary of how far each treatment or population mean differs from the grand mean,, which is the mean of all the sample observations. The larger the difference between the treatment means and the grand mean, the more likely the variation in sample observations is due to treatments rather than to ordinary sampling variation. The degrees of freedom associated with SST is k − 1.
5
HAWKES LEARNING SYSTEMS math courseware specialists ANOVA The SST measures the variation between the treatments, by comparing each with the grand mean,. The mean square for treatment, MST, is the sum of squares for treatment divided by its degrees of freedom, The MST represents the average weighted squared deviation of the treatment mean from the grand mean. The MST is the variance of the sample means. Analysis of Variance (ANOVA) 14.2 Analysis of Variance (ANOVA)
6
HAWKES LEARNING SYSTEMS math courseware specialists ANOVA The second component measures the random variation attributable to sampling and is called the sum of squares for error, or SSE. The mathematical expression for the SSE is given by The SSE is a summary of how much of the total variation in the sample data is not explained by SST. SSE is a measure of the variation within the treatments. In practice SSE is easier to calculate as SSTotal minus SST. The degrees of freedom associated with SSE are N−k. The mean square for error, MSE, is the SSE divided by its degrees of freedom: Analysis of Variance (ANOVA) 14.2 Analysis of Variance (ANOVA)
7
HAWKES LEARNING SYSTEMS math courseware specialists ANOVA Thus we have the following results: Sum of Squares Degrees of Freedom These measures of variability are the fundamental pieces which will be used to develop a hypothesis test for determining whether or not there is a significant difference among the population means. This is why the test is called an Analysis of Variance, or ANOVA. Analysis of Variance (ANOVA) 14.2 Analysis of Variance (ANOVA)
8
HAWKES LEARNING SYSTEMS math courseware specialists To learn the assumptions of the ANOVA test. Objectives Analysis of Variance (ANOVA) 14.3 Assumptions of the Test
9
HAWKES LEARNING SYSTEMS math courseware specialists Assumptions of the Test: It is important to point out the assumptions upon which the test is based. The first assumption is that the distributions of all k populations of interest are approximately normal. The best way to determine whether or not this assumption is satisfied is to construct a histogram of the sample data for each of the k populations of interest. If the histogram appears approximately normal, then it is reasonable to proceed. Analysis of Variance (ANOVA) 14.3 Assumptions of the Test
10
HAWKES LEARNING SYSTEMS math courseware specialists Estimating Means and Proportions 14.3 Assumptions of the Test Assumptions of the Test: The second assumption is that the variances of the k populations of interest are equal. We can determine if this assumption is reasonable by drawing box plots of the k samples of data and comparing the spread of the data for each sample. The third assumption is that the each of the k samples must be selected independently from each other and randomly from each of the respective populations. In the figure to the left the spread of the box plots is not exactly the same, but they are similar enough that it is likely that the observed difference in spread is due to the sample variation, and it is safe to proceed. Analysis of Variance (ANOVA) 14.3 Assumptions of the Test
11
HAWKES LEARNING SYSTEMS math courseware specialists To create an ANOVA table. Objectives Analysis of Variance (ANOVA) 14.4 The F-Test
12
HAWKES LEARNING SYSTEMS math courseware specialists Estimating Means and Proportions 14.4 The F-Test The F-Test: In a previous section we developed MST, a measure for summarizing the variability among the sample means, and MSE, a measure for summarizing the variability within the samples themselves. If the variability among the sample means is much larger than the variability within the sample observations, this will cause us to doubt the hypothesis that the population means are the same. The question is “How large is large enough?” Consider the ratio of MST to MSE: If the assumptions are met and we assume the population means are equal then this quantity will have an F-distribution with k−1 (numerator) degrees of freedom and N−k (denominator) degrees of freedom. Analysis of Variance (ANOVA) 14.3 Assumptions of the Test Analysis of Variance (ANOVA) 14.4 The F-Test
13
HAWKES LEARNING SYSTEMS math courseware specialists The F-Test: If the variability among the sample means is close to the variability within the sample observations, F will be close to 1. As the variability among the sample means increases relative to the variability within the sample observations, the value of F will become large. Thus F is a natural test statistic to use in determining whether or not a difference exists among the population means. We will reject the null hypothesis that the population means are equal for large values of the F-test statistic. Analysis of Variance (ANOVA) 14.4 The F-Test
14
HAWKES LEARNING SYSTEMS math courseware specialists The F-Test Summary: Analysis of Variance (ANOVA) 14.4 The F-Test
15
HAWKES LEARNING SYSTEMS math courseware specialists Computational Formulas: Unfortunately, the given MST and MSE are difficult to use for actual calculations. The following formulas are for these summary measures. Let k be the total number of treatments and the total number of observations, then: Luckily there are statistical packages which are able to calculate MST and MSE. Analysis of Variance (ANOVA) 14.4 The F-Test
16
HAWKES LEARNING SYSTEMS math courseware specialists A sales manager decides to randomly select four sales of robots over the last year for each of his three sales representatives and observes the actual selling price of the robot. Sales Prices (in thousands of dollars) Salesperson #1 Salesperson #2 Salesperson #3 1011 141613 1412 15 Totals495651156 Based on the results of your survey, can you conclude that there is a significant difference in the average sale price which the three sales reps have been able to negotiate? Use α =.05. Example: Analysis of Variance (ANOVA) 14.4 The F-Test
17
HAWKES LEARNING SYSTEMS math courseware specialists Solution: Step 1 (Define the hypothesis in plain English.) The hypothesis is straightforward: There is no difference in average sale price among the three sales reps. There is a difference in average sale price among the three sales reps. Step 2 (Select the appropriate statistical measure.) Since the sales manager is interested in comparing the average sale price of the three sales reps, the population parameters of interest are the true average sale price for each sales rep. μ 1 = true average sale price for sales rep #1. μ 2 = true average sale price for sales rep #2. μ 3 = true average sale price for sales rep #3. Analysis of Variance (ANOVA) 14.4 The F-Test
18
HAWKES LEARNING SYSTEMS math courseware specialists Solution: Step 3 (Determine whether the hypothesis should be one-sided or two-sided.) Based on the way the test statistic is constructed, we will reject the null hypothesis for large values of the test statistic, meaning that the variability among the sample means is much larger than the variability within the sample observations. Thus, the F-test is always a one- sided test. Step 4 (State the hypothesis using the appropriate statistical measure.) Step 5 (Specify the level of the test.) The level of the test is specified in the problem as α =.05. H 0 : μ 1= μ 2= μ 3 (The average sale price is the same for all three sales reps.) H a : At least one μ i is different. Analysis of Variance (ANOVA) 14.4 The F-Test
19
HAWKES LEARNING SYSTEMS math courseware specialists Solution: Step 6 (Select the appropriate test statistic.) There are three key questions which we must ask: Are the sale prices for each of the sales reps normally distributed? Do the sales prices for each of the sales reps have approximately equal variances? Were the sample sale prices for each of the sales reps collected in an independent and random fashion? Based on prior studies, you have reason to believe that the sale prices of the robots for each of the sales reps has an approximately normal distribution and that the variances of the three distributions are approximately equal. Also, you used random sampling to collect your data. Thus the appropriate test statistic is Analysis of Variance (ANOVA) 14.4 The F-Test
20
HAWKES LEARNING SYSTEMS math courseware specialists Solution: Step 7 (Determine the critical value.) The level of the test is α =.05, and there are (k-1)=(3-1)=2 (numerator) degrees of freedom and (N-k)=(12-3)=9 (denominator) degrees of freedom. Thus the critical value is F.05 =4.26. We will reject H 0 if the computed value of the test statistic is larger than 4.26. The figure below displays the rejection region. Analysis of Variance (ANOVA) 14.4 The F-Test
21
HAWKES LEARNING SYSTEMS math courseware specialists Solution: Using the computational formulas presented previously, you can calculate that the MST = 3.25 and MSE = 3.5. The resulting calculated value of the test statistic is Step 9 (Make the decision.) Since the resulting value of the test statistic, 0.929, is less than the critical value of 4.26, we fail to reject the null hypothesis. Step 8 (Compute the test statistic.) Analysis of Variance (ANOVA) 14.4 The F-Test
22
HAWKES LEARNING SYSTEMS math courseware specialists Solution: There is not sufficient evidence at α =.05 to reject the null hypothesis. Thus, we cannot conclude that there is a difference in average sale price among the three sales reps. Step 10 (State the conclusion in terms of the original question.) Analysis of Variance (ANOVA) 14.4 The F-Test
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.