Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Slides:



Advertisements
Similar presentations
“Students” t-test.
Advertisements

ANALYSIS OF VARIANCE (ONE WAY)
Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test.
Inference for Regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Multivariate distributions. The Normal distribution.
Chapter 12b Testing for significance—the t-test Developing confidence intervals for estimates of β 1. Testing for significance—the f-test Using Excel’s.
Chi-square Test of Independence
Lecture 12 One-way Analysis of Variance (Chapter 15.2)
Inferences About Process Quality
SIMPLE LINEAR REGRESSION
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Presentation 12 Chi-Square test.
SIMPLE LINEAR REGRESSION
AM Recitation 2/10/11.
Introduction to Linear Regression and Correlation Analysis
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
Hypothesis testing – mean differences between populations
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
MANOVA Multivariate Analysis of Variance. One way Analysis of Variance (ANOVA) Comparing k Populations.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 11 HYPOTHESIS TESTING USING THE ONE-WAY ANALYSIS OF VARIANCE.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
t(ea) for Two: Test between the Means of Different Groups When you want to know if there is a ‘difference’ between the two groups in the mean Use “t-test”.
Basic concept Measures of central tendency Measures of central tendency Measures of dispersion & variability.
Orthogonal Linear Contrasts This is a technique for partitioning ANOVA sum of squares into individual degrees of freedom.
Orthogonal Linear Contrasts
Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental.
MANOVA Multivariate Analysis of Variance. One way Analysis of Variance (ANOVA) Comparing k Populations.
Copyright © 2004 Pearson Education, Inc.
Chapter 19 Analysis of Variance (ANOVA). ANOVA How to test a null hypothesis that the means of more than two populations are equal. H 0 :  1 =  2 =
Orthogonal Linear Contrasts This is a technique for partitioning ANOVA sum of squares into individual degrees of freedom.
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Reasoning in Psychology Using Statistics
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Two-Sample Tests Statistics for Managers Using Microsoft.
The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables.
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
to accompany Introduction to Business Statistics
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental.
MARKETING RESEARCH CHAPTER 17: Hypothesis Testing Related to Differences.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Orthogonal Linear Contrasts A technique for partitioning ANOVA sum of squares into individual degrees of freedom.
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
Analysis of Variance STAT E-150 Statistical Methods.
The p-value approach to Hypothesis Testing
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Summary of the Statistics used in Multiple Regression.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 4 Investigating the Difference in Scores.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Experimental Designs The objective of Experimental design is to reduce the magnitude of random error resulting in more powerful tests to detect experimental.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
Chapter 14 Introduction to Multiple Regression
Two-Sample Hypothesis Testing
Comparing k Populations
Comparing k Populations
Comparing k Populations
Comparing Populations
Hypothesis testing and Estimation
Comparing k Populations
Presentation transcript:

Comparing k Populations Means – One way Analysis of Variance (ANOVA)

The F test – for comparing k means Situation We have k normal populations Let  i and  denote the mean and standard deviation of population i. i = 1, 2, 3, … k. Note: we assume that the standard deviation for each population is the same.  1 =  2 = … =  k = 

We want to test against

The data Assume we have collected data from each of th k populations Let x i1, x i2, x i3, … denote the n i observations from population i. i = 1, 2, 3, … k. Let

One possible solution (incorrect) Choose the populations two at a time then perform a two sample t test of Repeat this for every possible pair of populations

The flaw with this procedure is that you are performing a collection of tests rather than a single test If each test is performed with  = 0.05, then the probability that each test makes a type I error is 5% but the probability the group of tests makes a type I error could be considerably higher than 5%. i.e. Suppose there is no different in the means of the populations. The chance that this procedure could declare a significant difference could be considerably higher than 5%

The Bonferoni inequality If N tests are preformed with significance level . then P[group of N tests makes a type I error] ≤ 1 – (1 –  ) N Example: Suppose . = 0.05, N = 10 then P[group of N tests makes a type I error] ≤ 1 – (0.95) 10 = 0.41

For this reason we are going to consider a single test for testing: against Note: If k = 10, the number of pairs of means (and hence the number of tests that would have to be performed ) is:

The F test

To test against use the test statistic

is called the Between Sum of Squares and is denoted by SS Between It measures the variability between samples the statistic k – 1 is known as the Between degrees of freedom and is called the Between Mean Square and is denoted by MS Between

is called the Within Sum of Squares and is denoted by SS Within the statistic is known as the Within degrees of freedom and is called the Within Mean Square and is denoted by MS Within

then

The Computing formula for F: Compute 1) 2) 3) 4) 5)

Then 1) 2) 3)

We reject if F  is the critical point under the F distribution with 1 = k - 1degrees of freedom in the numerator and 2 = N – k degrees of freedom in the denominator The critical region for the F test

Example In the following example we are comparing weight gains resulting from the following six diets 1.Diet 1 - High Protein, Beef 2.Diet 2 - High Protein, Cereal 3.Diet 3 - High Protein, Pork 4.Diet 4 - Low protein, Beef 5.Diet 5 - Low protein, Cereal 6.Diet 6 - Low protein, Pork

Hence

Thus Thus since F > we reject H 0

The ANOVA Table A convenient method for displaying the calculations for the F-test

Sourced.f.Sum of Squares Mean Square F-ratio Betweenk - 1SS Between MS Between MS B /MS W WithinN - kSS Within MS Within TotalN - 1SS Total Anova Table

Sourced.f.Sum of Squares Mean Square F-ratio Between Within (p = ) Total The Diet Example

Equivalence of the F-test and the t-test when k = 2 the t-test

the F-test

Hence

Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS

Assume the data is contained in an Excel file

Each variable is in a column 1.Weight gain (wtgn) 2.diet 3.Source of protein (Source) 4.Level of Protein (Level)

After starting the SSPS program the following dialogue box appears:

If you select Opening an existing file and press OK the following dialogue box appears

The following dialogue box appears:

If the variable names are in the file ask it to read the names. If you do not specify the Range the program will identify the Range: Once you “click OK”, two windows will appear

One that will contain the output:

The other containing the data:

To perform ANOVA select Analyze->General Linear Model-> Univariate

The following dialog box appears

Select the dependent variable and the fixed factors Press OK to perform the Analysis

The Output

Comments The F-test H 0 :  1 =  2 =  3 = … =  k against H A : at least one pair of means are different If H 0 is accepted we know that all means are equal (not significantly different) If H 0 is rejected we conclude that at least one pair of means is significantly different. The F – test gives no information to which pairs of means are different. One now can use two sample t tests to determine which pairs means are significantly different

Fishers LSD (least significant difference) procedure: 1.Test H 0 :  1 =  2 =  3 = … =  k against H A : at least one pair of means are different, using the ANOVA F-test 2.If H 0 is accepted we know that all means are equal (not significantly different). Then stop in this case 3.If H 0 is rejected we conclude that at least one pair of means is significantly different, then follow this by using two sample t tests to determine which pairs means are significantly different

Example In the following example we are comparing weight gains resulting from the following six diets 1.Diet 1 - High Protein, Beef 2.Diet 2 - High Protein, Cereal 3.Diet 3 - High Protein, Pork 4.Diet 4 - Low protein, Beef 5.Diet 5 - Low protein, Cereal 6.Diet 6 - Low protein, Pork

Hence

Thus

Sourced.f.Sum of Squares Mean Square F-ratio Between Within (p = ) Total The ANOVA Table Thus since F > we reject H 0 Conclusion: There are significant differences amongst the k = 6 means

with t = for 54 d.f. Now we want to perform t tests to compare the k = 6 means

Critical value t = for 54 d.f. t values that are significant are indicated in bold. Table of means t test results

Conclusions: 1.There is no significant difference between diet 1 (high protein, pork) and diet 3 (high protein, pork). 2.There are no significant differences amongst diets 2, 4, 5 and 6. (i. e. high protein, cereal (diet 2) and the low protein diets (diets 4, 5 and 6)). 3.There are significant differences between diets 1and 3 (high protein, meat) and the other diets (2, 4, 5, and 6). Major conclusion: High protein diets result in a higher weight gain but only if the source of protein is a meat source.

These are similar conclusions to those made using exploratory techniques –Examining box-plots

High ProteinLow Protein Beef Cereal Pork

Conclusions Weight gain is higher for the high protein meat diets Increasing the level of protein - increases weight gain but only if source of protein is a meat source The carrying out of the F-test and Fisher’s LSD ensures the significance of the conclusions. Differences observed exploratory methods could have occurred by chance.

Comparing k Populations Proportions The  2 test for independence

The two sample test for proportions population 12Total Successx1x1 x2x2 x 1 + x 2 Failure n 1 - x 2 n 2 - x 2 n 1 + n 2 - (x 1 + x 2 ) Totaln1n1 n2n2 n 1 + n 2 The data can be displayed in the following table:

12cTotal 1x 11 x 12 R1R1 2 x 21 x 22 R2R2 RrRr TotalC1C1 C2C2 CcCc N This problem can be extended in two ways: 1.Increasing the populations (columns) from 2 to k (or c) 2.Increasing the number of categories (rows) from 2 to r.

The  2 test for independence

Situation We have two categorical variables R and C. The number of categories of R is r. The number of categories of C is c. We observe n subjects from the population and count x ij = the number of subjects for which R = i and C = j. R = rows, C = columns

Example Both Systolic Blood pressure (C) and Serum Cholesterol (R) were meansured for a sample of n = 1237 subjects. The categories for Blood Pressure are: < The categories for Cholesterol are: <

Table: two-way frequency

The  2 test for independence Define = Expected frequency in the (i,j) th cell in the case of independence.

Justification - for E ij = (R i C j )/n in the case of independence Let  ij = P[R = i, C = j] = P[R = i] P[C = j] =  i  j in the case of independence = Expected frequency in the (i,j) th cell in the case of independence.

Use test statistic E ij = Expected frequency in the (i,j) th cell in the case of independence. H 0 : R and C are independent against H A : R and C are not independent Then to test x ij = observed frequency in the (i,j) th cell

Sampling distribution of test statistic when H 0 is true -  2 distribution with degrees of freedom = (r - 1)(c - 1) Critical and Acceptance Region Reject H 0 if : Accept H 0 if :

Standardized residuals degrees of freedom = (r - 1)(c - 1) = 9 Test statistic Reject H 0 using  = 0.05

Another Example This data comes from a Globe and Mail study examining the attitudes of the baby boomers. Data was collected on various age groups

One question with responses Are there differences in weekly consumption of alcohol related to age?

Table: Expected frequencies

Table: Residuals Conclusion: There is a significant relationship between age group and weekly alcohol use

Examining the Residuals allows one to identify the cells that indicate a departure from independence Large positive residuals indicate cells where the observed frequencies were larger than expected if independent Large negative residuals indicate cells where the observed frequencies were smaller than expected if independent

Another question with responses Are there differences in weekly internet use related to age? In an average week, how many times would you surf the internet?

Table: Expected frequencies

Table: Residuals Conclusion: There is a significant relationship between age group and weekly internet use

Echo (Age 20 – 29)

Gen X (Age 30 – 39)

Younger Boomers (Age 40 – 49)

Older Boomers (Age 50 – 59)

Pre Boomers (Age 60+)

Regressions and Correlation Estimation by confidence intervals, Hypothesis Testing