Statistics Analysis of Variance.

Statistics Analysis of Variance

Contents Statistics in practice Introduction to Analysis of Variance
Analysis of Variance: Testing for the Equality of k Population Means Multiple Comparison Procedures

STATISTICS in PRACTICE
Burke Marketing Services, Inc., is one of the most experienced market research firms in the industry. In one study, a firm retained Burke to evaluate potential new versions of a children’s dry cereal. Analysis of variance was the statistical method used to study the data obtained from the taste tests.

STATISTICS in PRACTICE (Continue)
The experimental design employed by Burke and the subsequent analysis of variance were helpful in making a product design recommendation.

Introduction to Analysis of Variance (ANOVA)
Analysis of Variance (ANOVA) can be used to test for the equality of three or more populations. Data obtained from observational or experimental studies can be used for the analysis.

We want to use the sample results to test the following hypotheses: H0: 1=2=3= = k Ha: Not all population means are equal

If H0 is rejected, we cannot conclude that all population means are different. Rejecting H0 means that at least two population means have different values.

Sample means are close together because there is only one sampling distribution when H0 is true.  Sampling Distribution of Given H0 is True

Sample means come from different sampling distributions and are not as close together when H0 is false. 3 1 2 Sampling Distribution of Given H0 is False

For each population, the response variable is normally distributed. The variance of the response variable, denoted  2, is the same for all of the populations. The observations must be independent. TA will teach how to valid. use MINITAB to test if these three assumptions are valid.

Terminology: Treatment Between Treatment Within Treatment TA will teach how to valid. use MINITAB to test if these three assumptions are valid.

Analysis of Variance: Testing for the Equality of k Population Means
Between-Treatments Estimate of Population Variance Within-Treatments Estimate of Population Variance Comparing the Variance Estimates: The F Test ANOVA Table

Analysis of variance can be used to test for the equality of k population means. The hypotheses tested is H0: Ha: Not all population means are equal where mean of the jth population.

Sample data = value of observation i for treatment j = number of observations for treatment j = sample mean for treatment j = sample variance for treatment j = sample standard deviation for treatment j

statisitcs The sample mean for treatment j The sample variance for treatment j

The overall sample mean where nT = n1 + n nk If the size of each sample is n, nT = kn then

Between-Treatments Estimate of Population Variance The sum of squares due to treatments (SSTR)

Between-Treatments Estimate of Population Variance The mean square due to treatments (MSTR)

Within-Treatments Estimate of Population Variance The sum of squares due to error (SSE The mean square due to error (MSE)

Between-Treatments Estimate of Population Variance
A between-treatment estimate of  2 is called the mean square treatment and is denoted MSTR. Numerator is the sum of squares due to treatments and is denoted SSTR Denominator represents the degrees of freedom associated with SSTR

Within-Samples Estimate of Population Variance
The estimate of  2 based on the variation of the sample observations within each sample is called the mean square error and is denoted by MSE. Numerator is the sum of squares due to error and is denoted SSE Denominator represents the degrees of freedom associated with SSE

Comparing the Variance Estimates: The F Test
If the null hypothesis is true and the ANOVA assumptions are valid, the sampling distribution of MSTR/MSE is an F distribution with MSTR d.f. equal to k-1 and MSE d.f. equal to nT - k. If the means of the k populations are not equal, the value of MSTR/MSE will be inflated because MSTR overestimates  σ2.

Test for the Equality of k Population Means
Hypotheses H0: 1=2=3= = k Ha: Not all population means are equal Test Statistic F = MSTR/MSE

Reject H0 if p-value < a
Test for the Equality of k Population Means Rejection Rule: p-value Approach: Critical Value Approach: where the value of Fα is based on an F distribution with k - 1 numerator d.f. and nT - k denominator d.f. Reject H0 if p-value < a Reject H0 if F > Fa

Sampling Distribution
Sampling Distribution of MSTR/MSE Rejection Region Do Not Reject H0 Reject H0 MSTR/MSE Critical Value F Sampling Distribution of MSTR/MSE a

ANOVA Table Source of Variation Sum of Squares Degrees of Freedom Mean
Treatment Error Total SSTR SSE SST k – 1 nT – k nT - 1 MSTR MSE MSTR/MSE SST is partitioned into SSTR and SSE. SST’s degrees of freedom (d.f.) are partitioned into SSTR’s d.f. and SSE’s d.f.

ANOVA Table SST divided by its degrees of freedom nT – 1 is the overall sample variance that would be obtained if we treated the entire set of observations as one data set. With the entire data set as one sample, the formula for computing the total sum of squares, SST, is:

ANOVA Table ANOVA can be viewed as the process of partitioning the total sum of squares and the degrees of freedom into their corresponding sources: treatments and error. Dividing the sum of squares by the appropriate degrees of freedom provides the variance estimates and the F value used to test the hypothesis of equal population means.

Test for the Equality of k Population Means-- Example
National Computer Products, Inc. (NCP), manufactures printers and fax machines at plants located in Atlanta, Dallas, and Seattle. Objective: To measure how much employees at these plants know about total quality management. A random sample of six employees was selected from each plant and given a quality awareness examination.

Test for the Equality of k Population Means--Example
Data Let = mean examination score for population 1 = mean examination score for population 2 = mean examination score for population 3

Hypotheses H0: = = Ha: Not all population means are equal In this example 1. dependent or response variable : examination score 2. independent variable or factor : plant location 3. levels of the factor or treatments : the values of a factor selected for investigation, in the NCP example the three treatments or three population are Atlanta, Dallas, and Seattle.

Three assumptions: 1. For each population, the response variable is normally distributed. The examination scores (response variable) must be normally distributed at each plant. 2. The variance of the response variable, , is the same for all of the populations. The variance of examination scores must be the same for all three plants. 3. The observations must be independent. The examination score for each employee must be independent of the examination score for any other employee.

ANOVA Table p-value = < α = .05. We reject H0.

Example: Reed Manufacturing Janet Reed would like to know if there is any significant difference in the mean number of hours worked per week for the department managers at her three manufacturing plants (in Buffalo, Pittsburgh, and Detroit).

A simple random sample of five managers from each of the three plants was taken and the number of hours worked by each manager for the previous week is shown on the next slide. Conduct an F test using α = .05.

1 2 3 4 5 48 54 57 62 73 63 66 64 74 51 61 56 Plant 1 Buffalo Plant 2 Pittsburgh Plant 3 Detroit Observation Sample Mean Sample Variance

p -Value and Critical Value Approaches 1. Develop the hypotheses. H0:  1= 2= 3 Ha: Not all the means are equal where:  1 = mean number of hours worked per week by the managers at Plant 1  2 = mean number of hours worked per week by the managers at Plant 2   3 = mean number of hours worked per week by the managers at Plant 3

2. Specify the level of significance. a = .05 3. Compute the value of the test statistic. Mean Square Due to Treatments (Sample sizes are all equal.) = ( )/3 = 60 SSTR = 5( )2 + 5( )2 + 5( )2 = 490 MSTR = 490/(3 - 1) = MSE = 308/(15 - 3) = SSE = 4(26.0) + 4(26.5) + 4(24.5) = 308 Mean Square Due to Error F = MSTR/MSE = 245/ =

ANOVA Table Source of Variation Sum of Squares Degrees of Freedom Mean Squares F Treatment Error Total 490 308 798 2 12 14 245 25.667 9.55

p -Value Approaches 4. Compute the p –value. With 2 numerator d.f. and 12 denominator d.f.,the p-value is .01 for F = Therefore, the p-value is less than .01 for F = 9.55. 5. Determine whether to reject H0. The p-value < .05, so we reject H0. We have sufficient evidence to conclude that the mean number of hours worked per week by department managers is not the same at all 3 plant.

Critical Value Approaches 4. Determine the critical value and rejection rule. Based on an F distribution with 2 numerator d.f. and 12 denominator d.f., F.05 = 3.89. Reject H0 if F > 3.89 5. Determine whether to reject H0. Because F = 9.55 > 3.89, we reject H0. We have sufficient evidence to conclude that the mean number of hours worked per week by department managers is not the same at all 3 plant.

Test for the Equality of k Population Means
Summary

Multiple Comparison Procedures
Suppose that analysis of variance has provided statistical evidence to reject the null hypothesis of equal population means. Fisher’s least significant difference (LSD) procedure can be used to determine where the differences occur.

Fisher’s LSD Procedure
Hypotheses Test Statistic

Fisher’s LSD Procedure
Rejection Rule p-value Approach: Reject H0 if p-value < a Critical Value Approach: Reject H0 if t < -ta/2 or t > ta/2 where the value of ta/2 is based on a t distribution with nT - k degrees of freedom.

Fisher’s LSD Procedure Based on the Test Statistic xi - xj
Hypotheses Test Statistic Rejection Rule Reject H0 if > LSD

Fisher’s LSD Procedure Based on the Test Statistic xi – xj -- Example
Example: Reed Manufacturing Recall that Janet Reed wants to know if there is any significant difference in the mean number of hours worked per week for the department managers at her three manufacturing plants.

Analysis of variance has provided statistical evidence to reject the null hypothesis of equal population means. Fisher’s least significant difference (LSD) procedure can be used to determine where the differences occur.

For  = .05 and nT - k = 15 – 3 = 12 degrees of freedom, t.025 = 2.179 MSE value was computed earlier

LSD for Plants 1 and 2 Hypotheses (A) Rejection Rule Reject H0 if > 6.98 Test Statistic = | | = 13 Conclusion: The mean number of hours worked at Plant 1 is not equal to the mean number worked at Plant 2.

LSD for Plants 1 and 3 Hypotheses (B) Rejection Rule Reject H0 if > 6.98 Test Statistic = | | = 2 Conclusion: There is no significant difference between the mean number of hours worked at Plant 1 and the mean number of hours worked at Plant 3.

LSD for Plants 2 and 3 Hypotheses (C) Rejection Rule Reject H0 if > 6.98 Test Statistic = | | = 11 Conclusion: The mean number of hours worked at Plant 2 is not equal to the mean number worked at Plant 3.

Type I Error Rates The experimentwise Type I error rate aEW is the probability of making a Type I error on at least one of the (k – 1)! pairwise comparisons. The comparisonwise Type I error rate a indicates the level of significance associated with a single pairwise comparison. aEW = 1 – (1 – a)(k – 1)!

Type I Error Rates The experimentwise Type I error rate gets larger for problems with more populations (larger k).

Statistics Analysis of Variance.

Similar presentations

Presentation on theme: "Statistics Analysis of Variance."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistics Analysis of Variance.

Similar presentations

Presentation on theme: "Statistics Analysis of Variance."— Presentation transcript:

Similar presentations

About project

Feedback