1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal.

Slides:



Advertisements
Similar presentations
1 Difference Between the Means of Two Populations.
Advertisements

Statistical Techniques I
Here we add more independent variables to the regression.
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Simple Linear Regression 1. 2 I want to start this section with a story. Imagine we take everyone in the class and line them up from shortest to tallest.
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of  is constant.
1 One Tailed Tests Here we study the hypothesis test for the mean of a population when the alternative hypothesis is an inequality.
1 Multiple Regression Interpretation. 2 Correlation, Causation Think about a light switch and the light that is on the electrical circuit. If you and.
1 Matched Samples The paired t test. 2 Sometimes in a statistical setting we will have information about the same person at different points in time.
1 Difference Between the Means of Two Populations.
1 More Regression Information. 2 3 On the previous slide I have an Excel regression output. The example is the pizza sales we saw before. The first thing.
An Inference Procedure
1 Multiple Regression Here we add more independent variables to the regression. In this section I focus on sections 13.1, 13.2 and 13.4.
Statistics Are Fun! Analysis of Variance
The Basics of Regression continued
More Simple Linear Regression 1. Variation 2 Remember to calculate the standard deviation of a variable we take each value and subtract off the mean and.
Inference about a Mean Part II
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Chapter 11: Inference for Distributions
1 T-test for the Mean of a Population: Unknown population standard deviation Here we will focus on two methods of hypothesis testing: the critical value.
1 The Sample Mean rule Recall we learned a variable could have a normal distribution? This was useful because then we could say approximately.
The Sampling Distribution of the Sample Mean AGAIN – with a new angle.
1 Confidence Interval for Population Mean The case when the population standard deviation is unknown (the more common case).
Hypothesis Testing: Two Sample Test for Means and Proportions
Chapter 9: Introduction to the t statistic
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
AM Recitation 2/10/11.
The Chi-Square Distribution 1. The student will be able to  Perform a Goodness of Fit hypothesis test  Perform a Test of Independence hypothesis test.
Analysis of Variance or ANOVA. In ANOVA, we are interested in comparing the means of different populations (usually more than 2 populations). Since this.
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
Chapter 9.3 (323) A Test of the Mean of a Normal Distribution: Population Variance Unknown Given a random sample of n observations from a normal population.
1 Tests with two+ groups We have examined tests of means for a single group, and for a difference if we have a matched sample (as in husbands and wives)
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
Copyright © 2012 by Nelson Education Limited. Chapter 7 Hypothesis Testing I: The One-Sample Case 7-1.
1 Statistical Inference. 2 The larger the sample size (n) the more confident you can be that your sample mean is a good representation of the population.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Beak of the Finch Natural Selection Statistical Analysis.
ANOVA (Analysis of Variance) by Aziza Munir
Basic concept Measures of central tendency Measures of central tendency Measures of dispersion & variability.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 Section 2 Measures of Dispersion.
One-Way ANOVA ANOVA = Analysis of Variance This is a technique used to analyze the results of an experiment when you have more than two groups.
Interval Estimation and Hypothesis Testing Prepared by Vera Tabakova, East Carolina University.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
Two-Sample Hypothesis Testing. Suppose you want to know if two populations have the same mean or, equivalently, if the difference between the population.
Testing Differences between Means, continued Statistics for Political Science Levin and Fox Chapter Seven.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Welcome to MM570 Psychological Statistics
Chapter Eight: Using Statistics to Answer Questions.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics S eventh Edition By Brase and Brase Prepared by: Lynn Smith.
Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 4 Investigating the Difference in Scores.
 List the characteristics of the F distribution.  Conduct a test of hypothesis to determine whether the variances of two populations are equal.  Discuss.
A.P. STATISTICS EXAM REVIEW TOPIC #2 Tests of Significance and Confidence Intervals for Means and Proportions Chapters
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
Chapter 9: Hypothesis Tests for One Population Mean 9.5 P-Values.
Lecture Nine - Twelve Tests of Significance.
SEMINAR ON ONE WAY ANOVA
Chapter 9 Hypothesis Testing
Interval Estimation and Hypothesis Testing
Hypothesis Tests for Two Population Standard Deviations
Hypothesis Tests for a Standard Deviation
Presentation transcript:

1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal.

2 You may recall from several chapters back the idea of the variance and the associated idea called standard deviation. Each of these concepts is a measure of how spread out are the values of a quantitative variable. On a variable that has a normal distribution, the square root of the variance, the standard deviation, helped in making probability statements about certain ranges of values of the variable. For what we want to do next we will focus more on the variance than on the standard deviation. µ σ

3 When we do not know the population variance we saw that the sample variance was considered a good way to estimate the population variance. If the sample size is n and the sample mean is xbar (remember this is an x with a line or bar over it, but is a drag for me to type), then the sample variance is Σ(x i – xbar) 2 /(n-1). So, to get the sample variance you take each sample data value and subtract the sample mean and then square the deviation. You add up, or take the summation of, the squared deviation based on each point and divide the result by n-1.

4 Say you have two variables and one is quantitative and one is qualitative. When the qualitative variable has three or more categories we can do Analysis of Variance. The basic idea is that we test to see if the quantitative variable has the same mean for each value of the qualitative variable. As an example, say we have three majors. In each major there is a population mean gpa value. The null hypothesis is that the mean of each population is the same. The alternative is that the means are not all the same. An underlying assumption is that the variance of each population is the same. Operationally we will take a sample and have sample points of gpa from the various majors. We will work with both the sample means and the sample variances to test the null hypothesis and the test here is always a one-tailed test.

5 xbar1 xbar3 µ xbar3 Null hypothesis in general Ho: µ 1 = µ 2 = µ 3 = … = µ k Here we make some comments to help you get a feel for the test that will occur. If the null is true, then really we have only one large group and you see here each majors sample gpa is put on the number line and the population mean is just the value µ. Here is where variances come into play. We will look at variances derived from the samples and we will make a statement about how good the sample information is expected to be in terms of estimating the population variance.

6 xbar1 µ1 xbar3 µ3 µ2 xbar2 Under the alternative hypothesis we would expect each groups distribution to be located in a different place, and I show here group 2’s population mean being the farthest to the right. Note: on the last slide and this slide I have bell shaped curves and I am assuming the variance is the same on each curve. I do not know the value of the variance. There are several ways to estimate the variance and we will turn to that next, but I will want you to look at each graph from time to time.

7 The between group estimate of the variance Essentially the between group estimate of the variance uses the sample means from each group (you have at least identified groups as possibly being distinct) and compares these means to the overall sample mean (ignoring group status). Operationally from each sample mean the overall sample mean is subtracted off the result is squared and combined and the result is divided by the number of groups minus 1. Under the null hypothesis you see each sample mean under one bell curve and thus the samples means are in the same neighborhood. If the null is true that all groups have the same mean then the property of this between group estimate as an estimate of the population variance has been deemed “good” because we are using information “inside” the bounds of the one distribution. Under the alternative hypothesis of different group means you see each sample mean under a bell curve, but with three separate curves the values are spread out greater. For illustrative purposes I have xbar2 way to the right. As sample means are compared to an overall sample mean the between group estimate is deemed bad and in fact “too big” because we are using information from samples that are in different neighborhoods (having different means). So depending on which hypothesis is true this estimate is good or bad and we will exploit that in our test. But first we will discuss another idea.

8 The within group estimate of the variance Essentially the within group estimate uses the sample variance from each group, combines them and divides the result by the overall sample size minus the number of groups. Since the population variance is assumed the same in each group and since the within group estimate uses the sample variance within a group, this method of estimating the population variance is deemed good under either the null or alternative hypothesis. Another way to see this is that although under the alternative hypothesis the means are not equal, we expect the shape of each distribution to be the same and thus the samples variances within a given curve should be okay. So, this method of estimating the population variance is deemed “good” under either hypothesis.

9 F statistic There is a statistic called the F statistic that is formed in this context by taking the ratio of the between group estimate to the within group estimate. We would have F = (between group) divided by (within group). Under the null of equal means we have informally the ratio F = (good)/(good) and should then be 1. Under the alternative F = (too big)/(good) and should be greater than 1.

10 F The F distribution is skewed in such a way that it has a long tail on the right. Even when the null hypothesis is true (all group means are the same) it is possible due to sample variation that the sample F could actually be greater than 1. But the farther the F is from 1 the less likely values are to occur. When we pick an alpha value to control for a type I error, we pick a value of F from the table such that if the F calculated from a sample is greater than the tabled value (sometimes called he critical value) then we reject the null hypothesis and go with the null. Otherwise the null hypothesis is accepted.

11 Summary We test differences in group means by working with properties of estimates of the population variance. If the groups are the same we expect the F statistic, made up of a ratio of estimates of the population variance, to be fairly close to 1. So, when we get a sample F that is larger than a critical F we reject the null. Or, if the p-value for the F is less than alpha, we reject the null. Note: there is a large number of computations to be made in this section so we turn to Excel again to do our calculations. Note that F has two df’s – numerator and denominator degrees of freedom. The numerator degrees of freedom = k – 1, where k is number of groups. The denominator degrees of freedom = n T – k, where n T is the total number of observations, or n 1 + n 2 + … + n k

12 F table – starting on page 655 In the chapter there is an example with 3 groups and 18 total observations. The critical F has 2 and 15 degrees of freedom. So you go over 2 columns first and then down 15, which places us on page 656. With alpha =.05 the critical F is If our F from the sample information is larger than this we reject the null and conclude at least one group mean is different from the others. Similarly, if our p-value from the sample information is less than alpha we reject the null. Fortunately, Excel gives us both pieces of information.