1 Comparing multiple tests for separating populations Juliet Popper Shaffer Paper presented at the Fifth International Conference on Multiple Comparisons,

Slides:



Advertisements
Similar presentations
“Students” t-test.
Advertisements

Topic 12 – Further Topics in ANOVA
Review You run a t-test and get a result of t = 0.5. What is your conclusion? Reject the null hypothesis because t is bigger than expected by chance Reject.
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Part I – MULTIVARIATE ANALYSIS
MARE 250 Dr. Jason Turner Hypothesis Testing II To ASSUME is to make an… Four assumptions for t-test hypothesis testing: 1. Random Samples 2. Independent.
MARE 250 Dr. Jason Turner Hypothesis Testing II. To ASSUME is to make an… Four assumptions for t-test hypothesis testing:
Comparing Means.
Introduction to Hypothesis Testing
Topic 2: Statistical Concepts and Market Returns
Lecture 12 One-way Analysis of Variance (Chapter 15.2)
8-2 Basics of Hypothesis Testing
The Analysis of Variance
Inferences About Process Quality
K-group ANOVA & Pairwise Comparisons ANOVA for multiple condition designs Pairwise comparisons and RH Testing Alpha inflation & Correction LSD & HSD procedures.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Probability Population:
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 11 Introduction to Hypothesis Testing.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
False Discovery Rate (FDR) = proportion of false positive results out of all positive results (positive result = statistically significant result) Ladislav.
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
Lecture Slides Elementary Statistics Twelfth Edition
1 Dr. Jerrell T. Stracener EMIS 7370 STAT 5340 Probability and Statistics for Scientists and Engineers Department of Engineering Management, Information.
Testing Hypotheses Tuesday, October 28. Objectives: Understand the logic of hypothesis testing and following related concepts Sidedness of a test (left-,
CHAPTER 21: Comparing Two Proportions
Intermediate Applied Statistics STAT 460
Essential Statistics in Biology: Getting the Numbers Right
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Chapter 15 Data Analysis: Testing for Significant Differences.
Making decisions about distributions: Introduction to the Null Hypothesis 47:269: Research Methods I Dr. Leonard April 14, 2010.
Chapter 7 Hypothesis testing. §7.1 The basic concepts of hypothesis testing  1 An example Example 7.1 We selected 20 newborns randomly from a region.
Analysis of variance Petter Mostad Comparing more than two groups Up to now we have studied situations with –One observation per object One.
Essential Statistics Chapter 131 Introduction to Inference.
CHAPTER 14 Introduction to Inference BPS - 5TH ED.CHAPTER 14 1.
Chapter 13 Inference About Comparing Two Populations.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Testing Hypotheses about Differences among Several Means.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 1): Two-tail Tests & Confidence Intervals Fall, 2008.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 10 Section 1 – Slide 1 of 34 Chapter 10 Section 1 The Language of Hypothesis Testing.
Confidence intervals and hypothesis testing Petter Mostad
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Chapter 14 Repeated Measures and Two Factor Analysis of Variance
Chapter 10: Analysis of Variance: Comparing More Than Two Means.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 3: Testing Hypotheses.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
AP Statistics Section 11.1 B More on Significance Tests.
Chapter 13 Repeated-Measures and Two-Factor Analysis of Variance
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 26 Chapter 11 Section 1 Inference about Two Means: Dependent Samples.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
IE241 Final Exam. 1. What is a test of a statistical hypothesis? Decision rule to either reject or not reject the null hypothesis.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Chapter 12 Tests of Hypotheses Means 12.1 Tests of Hypotheses 12.2 Significance of Tests 12.3 Tests concerning Means 12.4 Tests concerning Means(unknown.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
Independent Samples ANOVA. Outline of Today’s Discussion 1.Independent Samples ANOVA: A Conceptual Introduction 2.The Equal Variance Assumption 3.Cumulative.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
SECTION 1 TEST OF A SINGLE PROPORTION
Chapter 10: The t Test For Two Independent Samples.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Lecture Slides Elementary Statistics Twelfth Edition
One-Way Analysis of Variance
Presentation transcript:

1 Comparing multiple tests for separating populations Juliet Popper Shaffer Paper presented at the Fifth International Conference on Multiple Comparisons, Vienna, July 10, 2007

2 Outline Background Original separation concepts Revised separation concepts Planned comparisons of different FDR and FWER-controlling methods Selected examples with FDR-controlling methods Summary and description of further planned work.

3 Background I begin thinking about this problem in the early 1970s, when I was approached by a faculty member with a rather common situation. He had compared means of three treatments in an analysis of variance followed by pairwise tests. He found treatment 1 and 3 significantly different, but neither 1 and 2 nor 2 and 3 significantly different.

4 I pointed out that this was a rather common outcome. His response was “What am I supposed to do with that?” A good question: No clear interpretation. The pattern of results of pairwise tests is important.

5 Consider four treatments. Suppose the outcome of pairwise treatments is: (a) 14 significant, 13 significant,24 significant. (b) 14 significant, 13 significant,12 significant. (b) is clearly interpretable, (a) is not.

6 (a) (b)

7 Original separation concepts I developed a measure of interpretability of the outcome of pairwise tests and published the description with a comparison of FWER-controlling methods including a new one for comparing three treatments as “Complexity: An interpretability criterion for multiple comparisons” (JASA, 1981).

8 A pattern was defined as simple if it consisted of distinct groupings. The measure was the number of additional rejections necessary to make the pattern simple. For 3 treatments, this is a reasonable measure:

9 3 treatments: either no rejections or at least two rejections are necessary to achieve a simple pattern Complexity = 2 if overall test is significant but no pairwise differences are significant 1 if one pairwise difference is significant 0 is two or three pairwise differences are significant or nothing is significant. i.e. given that overall equality is rejected and 1 -3 would be rejected before 1-2 or 2-3, simple patterns are (2 rejections), (2 rejections), or (3 rejections)

10 The results were interesting, and the F test followed by individual t tests resulted in greater average simplicity than the range test, when both controlled FWER. The study was limited to three treatments.

11 For more than 3 treatments, there are a multiplicity of patterns (e.g. 15 for four groups). It is also less clear that the measure used is best with more than three groups, and average complexity is certainly harder to interpret in that case. Furthermore, it seems desirable to distinguish true patterns from false patterns. If a pattern is false, a complex pattern is arguably more desirable than a simple pattern.

12 Since that time, the issue has been raised occasionally by others, so I decided to try again with a simpler way of dividing patterns. Also, with new concepts of error control especially FDR, it seemed interesting to see whether clearer patterns would emerge with FDR-controlling methods.

13 Revised separation concepts I’ll discuss patterns of treatment means, although this can be generalized to other parameters. Following Hartley (1955), I’ll call sets of populations with equal means (usually assumed identical) clusters.

14 True Pattern: a set of K clusters of sizes n 1,n 2, …,n k,… of n true means. (If exact equality is considered impossible, think of virtual equality.) Observed outcome: Set of rejections of subset equality hypotheses. Outcome clusters: Subsets of sample means declared significantly different from all other means, with no subclusters within them.

15 True outcome clusters: Outcome clusters in which all true means within the cluster are greater than all true means below it and smaller than all true means above it. False outcome clusters: Outcome clusters that are not true. If there is no separation into clusters, the number of outcome clusters is defined as zero.

16 Note that there may be rejections within a cluster, as long as they don’t separate it into subclusters. Pure true outcome clusters: True outcome clusters with no false rejections.

17 Note that there can be true rejections within a pure true outcome cluster if it contains true subclusters as long as there are no false rejections within it. False rejections refers to either rejecting equality when a pair is equal (Type I error), or rejecting equality when a pair is unequal, but deciding the difference is in the wrong direction (Type III error).

18 False cluster rate: Expected value of the ratio of false observed clusters to total observed clusters, defined as zero if there are no observed clusters. Various measures of cluster power.

19 Comparisons of different FDR- and FWER-controlling methods Note that it isn’t clear that more liberal methods will produce more true outcome clusters, more pure true outcome clusters, or a smaller false cluster rate. With the collaboration of Rhonda Kowalchek and Harvey Keselman, we are conducting a large study of cluster measures as well as standard error and power measures with several methods, all at nominal level.05 for either FWER or FDR control.

20 True mean configurations We’re looking at true configurations in which one mean is different from all K-1 others, and at various other cluster configurations of 3, 4, 8 and possibly 12 means. The work is still in progress.

21 Methods FWER-controlling: Tukey-Welsch multiple range test Modified Peritz multiple range test FDR-controlling Benjamini-Hochberg original stepup method (BH) Yekutieli-revised BH method with proven FDR control Newman-Keuls method with empirical evidence and limited proofs of FDR control (NK)

22 The Newman-Keuls method (NK) is little used these days. It is a multiple range method. Let M 1 < M 2 < … M n be the sample means of Populations P 1, P 2, …, P n with true means μ 1, μ 2, …, μ n, respectively. For simplicity I’ll describe the method assuming the populations are identical except for possible location shift.

23 Let. r j-i+1, α be the α-critical value of the range of j – i sample means. Then H ij : μ i = μ j is rejected if M j' – M i' > r j'-i'+1 for all j’ ≥ j, i’ ≤ i. In other words, it is identical in form to the Tukey-Welsch multiple range method, but every subrange is tested for significance at the same level α.

24 BH and NK I’ll present some comparisons of these two FDR-controlling methods. Significant pairwise comparisons are ordered differently in these, since BH is based on individual pairwise p-values, and NK is a multiple-range-based method. This makes the comparison of cluster outcomes especially interesting.

25 BH: The FWER increases with the number of populations being compared. NK: In addition to apparent FDR control, the NK has the additional property that the FWER is controlled at the nominal level α within each cluster. Thus either method can have the larger FWER, depending on the number of populations and the number of clusters.

26 True clusters (K-1)(1) BH apparently controls FDR according to simulation results. NK controls FDR, since it controls FWER in this case. With one true outcome cluster, it must be a pure true cluster. With two true outcome clusters, there may be one or two pure true clusters.

27 True clusters (K-1)(1) Simulation results indicate that there are more true outcome clusters and pure true outcome clusters with NK than with BH through most of the range, and the difference is greater with pure true outcome clusters. (When there are 1 or 2 means in each cluster, every true outcome cluster is a pure true outcome cluster.)

28

29

30

31 Two clusters, more than 1 mean in each The following slides show results for clusters (2)(2) and (2)(4).

32

33

34

35 False cluster rate The false cluster rate seems to be generally higher for NK than for BH, and in fact can get higher than might be desired for both. The worst case is that in which there are two means in each cluster, since then one Type I error may result in two false clusters, while that can’t happen with more than two means in a cluster.

36

37

38

39

40

41 Summary Gave the background for an interest in separating populations into clusters and previous ways of formulating the problem. Described new measures of population separation. Compared Newman-Keuls and Benjamini- Hochberg methods on these measures in two-cluster examples.

42 Further work More combinations of numbers of clusters and numbers of means within clusters will be examined. FWER-controlling methods will be compared among themselves and with FDR-controlling methods. F-type measures will be added. Nonparametric versions of the various methods will be examined. Proofs of properties will be extended if possible.