Chapter 15 ANOVA.

Slides:



Advertisements
Similar presentations
The t Test for Two Independent Samples
Advertisements

Introductory Mathematics & Statistics for Business
Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Multiple-choice question
1 Contact details Colin Gray Room S16 (occasionally) address: Telephone: (27) 2233 Dont hesitate to get in touch.
Chapter 7 Sampling and Sampling Distributions
Chi-Square and Analysis of Variance (ANOVA)
Comparing Means for Several Populations When we wish to test for differences in means for only 1 or 2 populations, we use one- or two-sample t inference.
Statistical Inferences Based on Two Samples
Analysis of Variance Chapter 12 . McGraw-Hill/Irwin
Chapter Thirteen The One-Way Analysis of Variance.
Chapter 11: The t Test for Two Related Samples
Experimental Design and Analysis of Variance
Lecture 11 One-way analysis of variance (Chapter 15.2)
Simple Linear Regression Analysis
© 2010 Pearson Prentice Hall. All rights reserved The Complete Randomized Block Design.
1 1 Slide © 2009, Econ-2030 Applied Statistics-Dr Tadesse Chapter 10: Comparisons Involving Means n Introduction to Analysis of Variance n Analysis of.
Independent Sample T-test Formula
Analysis of Variance Chapter Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested.
Statistics Are Fun! Analysis of Variance
Chapter 3 Analysis of Variance
BCOR 1020 Business Statistics
Lecture 9: One Way ANOVA Between Subjects
Inferences About Process Quality
Two-Sample Testing: Small Samples Problem 9.15: Bear gallbladder is used in Chinese medicine to treat inflammation. Due to the difficulty of obtaining.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
1 1 Slide © 2006 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
1 1 Slide Analysis of Variance Chapter 13 BA 303.
PROBABILITY & STATISTICAL INFERENCE LECTURE 6 MSc in Computing (Data Analytics)
12-1 Chapter Twelve McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
Chapter 11 HYPOTHESIS TESTING USING THE ONE-WAY ANALYSIS OF VARIANCE.
1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.
Copyright © 2004 Pearson Education, Inc.
Chapter 14 – 1 Chapter 14: Analysis of Variance Understanding Analysis of Variance The Structure of Hypothesis Testing with ANOVA Decomposition of SST.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
INTRODUCTION TO ANALYSIS OF VARIANCE (ANOVA). COURSE CONTENT WHAT IS ANOVA DIFFERENT TYPES OF ANOVA ANOVA THEORY WORKED EXAMPLE IN EXCEL –GENERATING THE.
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
Chapter 13 - ANOVA. ANOVA Be able to explain in general terms and using an example what a one-way ANOVA is (370). Know the purpose of the one-way ANOVA.
Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D.
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Chapter 12 Introduction to Analysis of Variance PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Eighth Edition by Frederick.
Econ 3790: Business and Economic Statistics Instructor: Yogesh Uppal
Econ 3790: Business and Economic Statistics Instructor: Yogesh Uppal
Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.
Chapter 13 Understanding research results: statistical inference.
Formula for Linear Regression y = bx + a Y variable plotted on vertical axis. X variable plotted on horizontal axis. Slope or the change in y for every.
1/54 Statistics Analysis of Variance. 2/54 Statistics in practice Introduction to Analysis of Variance Analysis of Variance: Testing for the Equality.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Chapter 12 Introduction to Analysis of Variance
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
Chapter 13 Analysis of Variance (ANOVA). ANOVA can be used to test for differences between three or more means. The hypotheses for an ANOVA are always:
Chapter 11 Created by Bethany Stubbe and Stephan Kogitz.
Copyright © 2008 by Hawkes Learning Systems/Quant Systems, Inc.
Chapter 10 Two-Sample Tests and One-Way ANOVA.
Chapter 10 Two Sample Tests
Analysis of Variance (ANOVA)
i) Two way ANOVA without replication
Applied Business Statistics, 7th ed. by Ken Black
Comparing Three or More Means
Statistics Analysis of Variance.
Chapter 10 Two-Sample Tests and One-Way ANOVA.
Econ 3790: Business and Economic Statistics
Comparing Three or More Means
Chapter 15 Analysis of Variance
Chapter 10 – Part II Analysis of Variance
ANalysis Of VAriance Lecture 1 Sections: 12.1 – 12.2
Presentation transcript:

Chapter 15 ANOVA

Comparing Means for Several Populations When we wish to test for differences in means for only 1 or 2 populations, we use one- or two-sample t inference. (We did two-sample t inferences in MAT 212) Testing for differences in 2 or more populations, or at several different levels (values) of a variable involves a different approach. This is called Analysis of Variance, or ANOVA. ANOVA partitions the total sum of squares into two parts: within treatment variability between treatment variability

Comparing Means for Several Populations Example: Test 5 types of concrete for differences in moisture absorption. The 5 types of concrete are the five levels of the treatment. Within Variability – this seeks to quantify the variability in absorption for one particular type of concrete. Between Variability – this seeks to quantify the differences between the types of concrete. ANOVA seeks to answer the question “Are the differences between the 5 sample means what is expected purely from random variation alone?”

Definitions An experimental unit is an object, or subject, that produces a sample measurement. The experimental conditions that define the different populations in a completely randomized design are called treatments. Testing for differences in the treatments is equivalent to testing for differences in the population means.

Graphical demonstration: Employing two types of variability

Graphical demonstration: Employing two types of variability 20 25 30 1 7 Treatment 1 Treatment 2 Treatment 3 10 12 19 9 20 Graphical demonstration: Employing two types of variability 16 15 14 11 10 9 A small variability within the samples makes it easier to draw a conclusion about the population means. The sample means are the same as before, but the larger within-sample variability makes it harder to draw a conclusion about the population means. Treatment 1 Treatment 2 Treatment 3

Assumptions for ANOVA 1. The samples are independent and random Selection of objects from any one population is unrelated to the selection of objects from any of the other populations. Selections are random (one individual has as much chance of being selected as another.) Examples Different groups of people (no person in more than one group) Different types of music Different concentrations of chemicals Different models of automobiles

Assumptions for ANOVA 2. All populations are normal 3. Each population has the same standard deviation, s, (which implies the same variance, σ2) But the values of the population standard deviations is not known before testing. 4. Each sample has a mean that can be calculated. This mean is somehow representative of the population mean for its population.

Assumptions for ANOVA The following assumptions are required for a 1-way ANOVA: The k populations are independent. Each population is normally distributed. Each population has common standard deviation, s. Each population has a mean, mi for i = 1, 2, …, k. So we now are testing whether all the treatment means are equal. H0: m1 = m2 = … = mk Ha: At least two of the population means are not equal

Test Statistic If the null hypothesis is true, we expect the k sample means to have reasonably similar values. In other words, if the population means are equal, we would expect the variability among the sample means to be relatively small. Variability among the sample means is one of the things we will be testing for.

Test Statistic If the null hypothesis is true, we do not expect the population means to be exactly the same, because there is a chance factor in our choice of sample experimental units. We need to take into account the variability due to chance among the sample means.

Test Statistic This method is called “analysis of variance” of ANOVA because we are comparing two sources of variance: the variance among the sample means and the variation expected by chance among the sample means when the null hypothesis is true.

Test Statistic Our test statistic is called F. F = Variability among the sample means Variability expected by chance

Degrees of freedom For a sample, (or group) (k) df = n – 1 Total df = total number of units in the experiment – 1 Error df = Total df – Group df Or Error df = N - k

Technology We will use Minitab (or StatCrunch or Excel) to do our calculations. A typical Minitab display is on the next slide.

ANOVA Table: Tensile Strength for 6 Machines Analysis of Variance for Tensile-Strength Source DF SS MS F P Machine 5 5.34 1.07 0.31 0.902 Error 18 62.64 3.48 Total 23 67.98 SST = Sum of squares of treatment = SSMachine = 5.34 (sample mean variability), k = 6 machines SSError = 62.64 (variability due to chance) Notice how much larger the “chance” variability is than the other. There is little to no evidence that the machines differ in mean tensile-strength. Look at that HUGE p-value!

Another Example A sociologist conducts an experiment to compare the mean grade-point averages of first-year college students associated with four socioeconomic groups. The sociologist defines the four categories of interest to be: Poor, Lower Middle Class, Upper Middle Class, and Well-to-do. The experimenter knows that the populations of grade-point averages are normally distributed with equal standard deviations. At the end of the school year, the sociologist selects independent random samples of 10 grade-point averages for first year students in each of the four socioeconomic groups. Do the data provide sufficient evidence to indicate a difference in mean grade-point averages for at least two of the four socioeconomic groups?

Socioeconomics and GPA Treatments = 4 SOE groups Response variable = GPA H0: μ1= μ2= μ3= μ4 H1: At least two of the population means are not equal Decision rule:Accept H1 if the p-value < .05 Test statistic: F F = Variability among the sample means Variability expected by chance

Socioeconomics and GPA Variability among sample means = MST = SST / k-1 Variability due to chance = MSE = SSE / n-1

One-way ANOVA: GPA versus Group Source DF SS MS F P Group 3 1.519 0.506 2.99 0.044 Error 36 6.091 0.169 Total 39 7.610

Socioeconomics and GPA F = 2.99. P-value = .044 p=value < .05 There is sufficient evidence to say that there is a difference in the mean grade-point averages for at least two of the socioeconomic groups. We reach this conclusion at the 0.05 level of significance. Since we accepted the alternative hypothesis, we now need to state which means are different.

Socioeconomics and GPA We already have enough data to say that of the four groups, the Well-to-do have the highest mean GPA with 2.576, the Upper Middle is next with 2.717, followed by the Lower Middle with 2.520. The Poor have the lowest mean GPA with 2.264. But are these differences statistically significant?

Which means are different? We need to test each of the following pairs of hypotheses. Pair 1: Ho: μ1-μ2=0 Ha: μ1-μ2≠0 Pair 2: Ho: μ1-μ3=0 Ha: μ1-μ3≠0 Pair 3: Ho: μ1-μ4=0 Ha: μ1-μ4≠0 Pair 4: Ho: μ2-μ3=0 Ha: μ2-μ3≠0 Pair 5: Ho: μ2-μ4=0 Ha: μ2-μ4≠0 Pair 6: Ho: μ3-μ4=0 Ha: μ3-μ4≠0

Which means are different? To test each pair of hypothesis, we are only testing two means for a difference between them. This is the two-sample t-statistic that we used in Chapter 13 in MAT 212.

Which means are different However, it takes less time to calculate the confidence intervals for each pair and use these to make our inferences. If a confidence interval contains only positive numbers, we may conclude that the first mean is larger than the second If a confidence interval contains only negative numbers, we may conclude that the first mean is smaller than the second. If a confidence interval contains the number zero, there is insufficient evidence to conclude which mean is larger.

Which means are different To do this, we use StatCrunch, and the Tukey’s Multiple Comparisons (The notes in blue on the following slide are the conclusions drawn, these are not a result of StatCrunch.)

Group = Lower Middle subtracted from: Group Lower Upper Poor -0.6331 0.1131 (-,+) Not significant Upper Middle -0.1801 0.5661 (-,+) Not significant Well-to-do -0.1411 0.6051 (-,+) Not significant Group = Poor subtracted from: Group Lower Upper - Upper Middle 0.0799 0.8261 (+,+) μ1 > μ2 Well-to-do 0.1189 0.8651 (+,+) μ1 > μ2 Group = Upper Middle subtracted from: Group Lower Upper Well-to-do -0.3341 0.4121 (-,+) Not significant

Socioeconomics and GPA This shows that both Upper Middle Class and Well-to-do have higher mean GPA than Poor. There are no other statistically significant differences.

ANOVA – What is expected from you? Be able to complete each of the following exercises: State the two hypotheses. State the decision rule. What is the test stat, and what is its formula. What is the observed value of this test statistic? Is this valid? What is the p-value? State a conclusion. If you accepted the alternate hypothesis, you then need to find out which means are different.

Another Example Is hair color related to pain sensitivity? To study this, an experimenter divides men and women of various ages into four hair color categories: light blond, dark blond, light brunette, and dark brunette. There are six people in each of the four categories. Each participant in the study receives a pain threshold score based upon his or her performance in a pain sensitivity test (the higher the score, the lower the person’s pain tolerance.)

Hair Color vs Pain Sensitivity The treatments are the hair color The response variable is the sensitivity to pain score. H0: μ1= μ2= μ3= μ4 H1: At least two of the population means of the scores are not equal Decision rule:Accept H1 if the p-value < .05 Test statistic: F F = Variability among the sample means Variability expected by chance

Hair Color vs Pain Sensitivity One-way ANOVA: Score versus Hair Color Source DF SS MS F P Hair Color 3 908.8 302.9 5.44 0.007 Error 20 1113.0 55.7 Total 23 2021.8 H0: mlight_blond = mdark_blond = … = mdark_brunette Ha: At least two population means are different. F = 5.44 p-value = 0.007 At the .05 level of significance, there is overwhelming evidence to conclude that there is a difference among mean pain thresholds for people possessing these four hair colors.

Minitab One-way ANOVA: Score versus Hair Color Source DF SS MS F P Error 20 1113.0 55.7 Total 23 2021.8 S = 7.460 R-Sq = 44.95% R-Sq(adj) = 36.69%

Hair Color = Dark Blond subtracted from: Hair Color Lower Upper Dark Brunette -15.818 2.151 Light Blond 1.349 19.318 Light Brunette -6.151 11.818 Hair Color = Dark Brunette subtracted from: Hair Color Lower Upper Light Blond 8.182 26.151 Light Brunette 0.682 18.651 Hair Color = Dark Brunette subtracted from: Hair Color Lower Upper Light Blond 8.182 26.151 Light Brunette 0.682 18.651

Hair Color vs Pain Sensitivity Examine Minitab’s output to make the following table: Pair From To Conclusion D Brun – D Blon -15.8 2.1 NS (No difference) L Blon – D Blon 1.3 19.3 L Blon > D Blon L Brun – D Blon -6.2 11.8 NS (No difference) L Blon – D Brun 8.2 26.2 L Blon > D Brun L Brun – D Brun 0.7 18.7 L Brun > D Brun L Brun – L Blon -16.5 1.5 NS (No difference) Summarize the results.

When should we use the multiple comparison method? The sample data are obtained from the k populations using a completely randomized design An analysis of variance F-test indicates that there are some differences among the k population means. The objective is to determine which of the k population means differ. It is usually of interest to determine which mean might be the largest (or smallest).