Intermediate Applied Statistics STAT 460

Slides:

Advertisements

Similar presentations

Lesson #24 Multiple Comparisons. When doing ANOVA, suppose we reject H 0 :  1 =  2 =  3 = … =  k Next, we want to know which means differ. This does.

Advertisements

BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.

Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.

Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.

One-Way ANOVA Multiple Comparisons.

MARE 250 Dr. Jason Turner Analysis of Variance (ANOVA)

© 2010 Pearson Prentice Hall. All rights reserved The Complete Randomized Block Design.

Part I – MULTIVARIATE ANALYSIS

ANOVA Determining Which Means Differ in Single Factor Models Determining Which Means Differ in Single Factor Models.

Lecture 14 – Thurs, Oct 23 Multiple Comparisons (Sections 6.3, 6.4). Next time: Simple linear regression (Sections )

Comparing Means.

Intro to Statistics for the Behavioral Sciences PSYC 1900

Analysis of Variance (ANOVA) MARE 250 Dr. Jason Turner.

PSY 1950 Post-hoc and Planned Comparisons October 6, 2008.

Lecture 9: One Way ANOVA Between Subjects

Two Groups Too Many? Try Analysis of Variance (ANOVA)

One-way Between Groups Analysis of Variance

Lecture 12 One-way Analysis of Variance (Chapter 15.2)

K-group ANOVA & Pairwise Comparisons ANOVA for multiple condition designs Pairwise comparisons and RH Testing Alpha inflation & Correction LSD & HSD procedures.

Comparing Several Means: One-way ANOVA Lesson 14.

Today Concepts underlying inferential statistics

Lecture 14: Thur., Feb. 26 Multiple Comparisons (Sections ) Next class: Inferences about Linear Combinations of Group Means (Section 6.2).

Statistical Methods in Computer Science Hypothesis Testing II: Single-Factor Experiments Ido Dagan.

1 Confidence Interval for Population Mean The case when the population standard deviation is unknown (the more common case).

Introduction to Analysis of Variance (ANOVA)

Linear Contrasts and Multiple Comparisons (Chapter 9)

Chapter 12 Inferential Statistics Gay, Mills, and Airasian

Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.

1 Multiple Comparison Procedures Once we reject H 0 :   =   =...  c in favor of H 1 : NOT all  ’s are equal, we don’t yet know the way in which.

When we think only of sincerely helping all others, not ourselves,

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Comparing Three or More Means 13.

Stats Lunch: Day 7 One-Way ANOVA. Basic Steps of Calculating an ANOVA M = 3 M = 6 M = 10 Remember, there are 2 ways to estimate pop. variance in ANOVA:

Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.

January 31 and February 3,  Some formulae are presented in this lecture to provide the general mathematical background to the topic or to demonstrate.

Design and Analysis of Experiments Dr. Tai-Yue Wang Department of Industrial and Information Management National Cheng Kung University Tainan, TAIWAN,

t(ea) for Two: Test between the Means of Different Groups When you want to know if there is a ‘difference’ between the two groups in the mean Use “t-test”.

Sociology 5811: Lecture 14: ANOVA 2

Statistics 11 Confidence Interval Suppose you have a sample from a population You know the sample mean is an unbiased estimate of population mean Question:

ANOVA (Analysis of Variance) by Aziza Munir

Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.

Everyday is a new beginning in life. Every moment is a time for self vigilance.

Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.

Statistics for the Social Sciences Psychology 340 Fall 2013 Tuesday, October 15, 2013 Analysis of Variance (ANOVA)

Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.

Chapter 19 Analysis of Variance (ANOVA). ANOVA How to test a null hypothesis that the means of more than two populations are equal. H 0 :  1 =  2 =

One-Way Analysis of Variance

STA MCP1 Multiple Comparisons: Example Study Objective: Test the effect of six varieties of wheat to a particular race of stem rust. Treatment:

Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.

Two-Sample Hypothesis Testing. Suppose you want to know if two populations have the same mean or, equivalently, if the difference between the population.

One-way ANOVA: - Comparing the means IPS chapter 12.2 © 2006 W.H. Freeman and Company.

Chapter 8 Parameter Estimates and Hypothesis Testing.

Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.

Chapter 12 Introduction to Analysis of Variance PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Eighth Edition by Frederick.

Chapter 13 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 13: Multiple Comparisons Experimentwise Alpha (α EW ) –The probability.

ANOVA P OST ANOVA TEST 541 PHL By… Asma Al-Oneazi Supervised by… Dr. Amal Fatani King Saud University Pharmacy College Pharmacology Department.

DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.

MARE 250 Dr. Jason Turner Analysis of Variance (ANOVA)

Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.

1 Topic 9 – Multiple Comparisons Multiple Comparisons of Treatment Means Reading:

One-way ANOVA Example Analysis of Variance Hypotheses Model & Assumptions Analysis of Variance Multiple Comparisons Checking Assumptions.

Topic 22: Inference. Outline Review One-way ANOVA Inference for means Differences in cell means Contrasts.

Chapters Way Analysis of Variance - Completely Randomized Design.

Outline of Today’s Discussion 1.Independent Samples ANOVA: A Conceptual Introduction 2.Introduction To Basic Ratios 3.Basic Ratios In Excel 4.Cumulative.

Formula for Linear Regression y = bx + a Y variable plotted on vertical axis. X variable plotted on horizontal axis. Slope or the change in y for every.

Independent Samples ANOVA. Outline of Today’s Discussion 1.Independent Samples ANOVA: A Conceptual Introduction 2.The Equal Variance Assumption 3.Cumulative.

MARE 250 Dr. Jason Turner Analysis of Variance (ANOVA)

Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,

Posthoc Comparisons finding the differences. Statistical Significance What does a statistically significant F statistic, in a Oneway ANOVA, tell us? What.

Intermediate Applied Statistics STAT 460

I. Statistical Tests: Why do we use them? What do they involve?

Presentation transcript:

Intermediate Applied Statistics STAT 460 Lecture 10, 10/1/2004 Instructor: Aleksandra (Seša) Slavković sesa@stat.psu.edu TA: Wang Yu wangyu@stat.psu.edu

Failure Times Example from Sleuth, p. 171

One-way ANOVA One-way ANOVA: TIME versus COMPOUND Analysis of Variance for TIME Source DF SS MS F P COMPOUND 4 401.3 100.3 5.02 0.002 Error 45 899.2 20.0 Total 49 1300.5 Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev --+---------+---------+---------+---- 1 10 10.693 4.819 (------*------) 2 10 6.050 2.915 (------*------) 3 10 8.636 3.291 (-------*------) 4 10 9.798 5.806 (------*-------) 5 10 14.706 4.863 (------*------) --+---------+---------+---------+--- Pooled StDev = 4.470 4.0 8.0 12.0 16.0

ANOVA So far we have talked about testing the null hypothesis H0: μ1 = μ2 = μ3 = . . . = μk versus the alternative hypothesis HA: at least one pair of groups has μi ≠ μj Next, we want to know where the difference comes from?

Approaches to Comparisons Pairwise comparisons Multiple comparison with a control group Multiple comparison with the “best” group Linear contrasts Each of these procedures, (e.g. pairwise confidence intervals for the difference) are available in addition to corresponding tests, in most statistical packages.

All Pairwise Comparisons In this approach, we don’t just want to know whether there is any significant difference, we want to know which groups are significantly different from which other groups. The most obvious approach to testing pairwise comparisons is just to do a t-test on each pair of groups. The problem with this is sometimes called “compound uncertainty” or “capitalizing on chance.” Note: By significantly we mean systematically or demonstrably (statistical significance), not greatly or importantly (practical significance).

Fails 5% of time Fails 5% of time Fails 5% of time Fails 5% of time Fails 5% of time If there are five independent parts and each fail 5% of the time, then the whole machine fails 1-(0.95)5 = 23% of the time!

All Pairwise Comparisons If you have c tests, each of which has its individual false positive rate controlled at , then the total false positive rate may be as high as 1-(1-)c  c. If we want to do all possible pairwise comparisons then the number of tests will be

All Pairwise Comparisons If we want to do all possible pairwise comparisons, then the number of tests will be where k is the number of groups. If there are 5 groups then there are 10 tests!

All Pairwise Comparisons So we have to decide whether we want to control only individual type I error risk, or experiment-wide (or “familywise”) type I error risk. We have to be much more conservative to control the latter than we would need to be to control the former. In fact, we might have to make  so low that β (power) will become very high.

All Pairwise Comparisons One possible answer is to control individual type I error risk for planned comparisons (a few comparisons which were of special interest as you were planning the study) and control experiment-wide type I error risk for unplanned comparisons (those you are making after the study has been done just for exploratory purposes). Suppose for now that there aren’t any comparisons of special interest, we just want to look at all possible pairwise comparisons. Then we definitely should try to control experiment-wide error.

Method 1: Fisher’s Protected LSD (Least Significant Difference) A simple approach to trying to control experiment-wide error: 1. First do an overall F-test to see if there are any differences at all 2. If this test does not reject H0 then conclude that no differences can be found. 3. If it does reject H0 then go ahead and do all possible two-group t-tests. (You might use a standard deviation pooled over all groups instead of each pair separately, but otherwise act just as in the two-sample case.)

Method 1: Fisher’s Protected LSD (Least Significant Difference) The only problem with that approach is that it doesn’t work. At least according to some experts, the LSD procedure does not really control experiment-wide error. You still have the multiple comparisons problem.

Method 2: Bonferroni Correction One way to do this is to use a “Bonferroni correction” on . The idea here is that we first set a family-wise , say .05, and then figure out how small the individual * would need to be in order to keep the family-wise type I error rate at . For example, if *=.01 and c=5, then   .05.

Method 2: Bonferroni Correction Bonferroni Inequality: For independent events A1, …, Ak of equal probability,

Method 2: Bonferroni Correction So if we just set then the experiment-wide error rate will be controlled. An advantage of this approach is that it is a simple idea (just divide up your acceptable risk equally among all the tests you want to do). A disadvantage is that it can be inefficient. The can become quite small, e.g., .05/21  .002.

Method 3: Tukey’s (HSD) Tests Tukey’s “Honest Significant Difference” test takes a different approach that does not use the t or F distributions but the distribution of the “studentized range” or Q statistic. In a Tukey test, the Q distribution is used to find a boundary on differences between group averages , such that 95% of the time if the null hypothesis was true, no pair of groups would have a difference larger than this boundary. So any pair of groups with a difference larger than this critical value is declared significantly different.

Confidence Intervals Much like the usual t-intervals, the Bonferroni and Tukey confidence intervals for μ1-μ2 is where m is some multiplier. But m is chosen differently for each procedure. “margin of error” or “half-width” of interval best estimate for μ1-μ2

For a t-interval or LSD t-interval, For a Bonferroni t-interval, For a Tukey confidence interval, with α*= α/c

Method 4: Scheffe’s test Also available are Scheffé tests and intervals, which use but they lead to a test that may be too conservative.

How to compare specific groups in an ANOVA If you have a few planned comparisons of theoretical importance in mind, then you can just do them with a slightly adjusted t-test a different value for sP and a different degrees of freedom for the t statistic based on the fact that you pool variance over all groups, not just two. If you want to do many comparisons then you may wish to try some means of controlling compound uncertainty (i.e., correcting for multiple comparisons). In the case where you want to test for all pairwise differences, choices include Fisher’s LSD (too lenient), Bonferroni, and Tukey (the latter also known as Tukey-Kramer when the sample sizes are different). There are also various other procedures I didn’t (and won’t) mention.

Tukey's pairwise comparisons Family error rate = 0.0500 Individual error rate = 0.00670 Critical value = 4.02 Intervals for (column level mean) - (row level mean) 1 2 3 4 2 -1.040 10.326 3 -3.626 -8.269 7.740 3.097 4 -4.788 -9.431 -6.845 6.578 1.935 4.521 5 -9.696 -14.339 -11.753 -10.591 1.670 -2.973 -0.387 0.775 This is an abbreviation for, “We are 95% confident according to the Tukey procedure that the signed difference μ1-μ2 is between -1.040 hours and +10.326 hours.” So we can’t conclude that μ1≠μ2.

Conclude μ2≠μ5 and μ3≠μ5. More specifically, μ2<μ5 and μ3<μ5. Tukey's pairwise comparisons Family error rate = 0.0500 Individual error rate = 0.00670 Critical value = 4.02 Intervals for (column level mean) - (row level mean) 1 2 3 4 2 -1.040 10.326 3 -3.626 -8.269 7.740 3.097 4 -4.788 -9.431 -6.845 6.578 1.935 4.521 5 -9.696 -14.339 -11.753 -10.591 1.670 -2.973 -0.387 0.775

Tukey Simultaneous Tests Response Variable TIME All Pairwise Comparisons among Levels of COMPOUND COMPOUND = 1 subtracted from: Level Difference SE of Adjusted COMPOUND of Means Difference T-Value P-Value 2 -4.643 1.999 -2.322 0.1567 3 -2.057 1.999 -1.029 0.8406 4 -0.895 1.999 -0.448 0.9914 5 4.013 1.999 2.007 0.2792 COMPOUND = 2 subtracted from: 3 2.586 1.999 1.294 0.6965 4 3.748 1.999 1.875 0.3454 5 8.656 1.999 4.330 0.0008 COMPOUND = 3 subtracted from: 4 1.162 1.999 0.5812 0.9772 5 6.070 1.999 3.0363 0.0309 COMPOUND = 4 subtracted from: 5 4.908 1.999 2.455 0.1196

2 3 4 1 5 (“not significantly different”) 2 3 4 1 5 (“not significantly different”) (“not significantly different”)

Multiple Comparisons with a Control A different situation arises when you have a control group and are only interested in the comparison of other groups to the control group, not to one another. Then there is something like Tukey’s test, but modified, called Dunnett’s test.

Multiple Comparisons with a Control Tukey confidence intervals are for μi - μJ and there is one of them for every pair of distinct groups, which comes out to be k*(k-1)/n of them. Dunnett confidence intervals are for μi - μControl and there are only k-1 of them, one for each noncontrol group.

Multiple Comparisons with a Control Suppose compound 5 is thought of as a control group because it is the compound that has been in use previously. We want to know if any of the other compounds have different population mean survival times than compound 5.

Multiple Comparisons with a Control Dunnett's comparisons with a control Family error rate = 0.0500 Individual error rate = 0.0149 Critical value = 2.53 Control = level (5) of COMPOUND Intervals for treatment mean minus control mean Level Lower Center Upper -----+---------+---------+---------+-- 1 -9.074 -4.013 1.048 (------------*------------) 2 -13.717 -8.656 -3.595 (-----------*------------) 3 -11.131 -6.070 -1.009 (------------*-----------) 4 -9.969 -4.908 0.153 (------------*-----------) -----+---------+---------+---------+-- -12.0 -8.0 -4.0 0.0 For compounds 2 and 3, we are confident that their difference from the control is nonzero.

Multiple Comparisons with a Control Dunnett Simultaneous Tests Response Variable TIME Comparisons with Control Level COMPOUND = 5 subtracted from: Level Difference SE of Adjusted COMPOUND of Means Difference T-Value P-Value 1 -4.013 1.999 -2.007 0.1555 2 -8.656 1.999 -4.330 0.0003 3 -6.070 1.999 -3.036 0.0141 4 -4.908 1.999 -2.455 0.0597 Again we conclude that compounds 2 and 3 are different from the control group (actually, poorer).

Multiple comparison with the best There is also a kind of test called Hsu’s multiple comparison with the best (MCB). This test tries to decide which groups could plausibly have come from the population with the highest (or lowest, if you prefer) mean. The group with the highest (lowest) sample average is automatically the leading candidate, but any group that does not significantly differ from this group is still included as a possible best. So Hsu’s confidence intervals, one for each group, would be for μi -μbest. This approach is harder to understand than the others, perhaps because it seems to be doing more than one thing at the same time.

Next Lecture Linear Contrasts