What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008.

Slides:



Advertisements
Similar presentations
The t Test for Two Independent Samples
Advertisements

Sampling: Theory and Methods
Introductory Mathematics & Statistics for Business
STATISTICS HYPOTHESES TEST (I)
Statistical vs Clinical or Practical Significance
Design of Dose Response Clinical Trials
Gatekeeping Testing Strategies in Clinical Trials Alex Dmitrienko, Ph.D. Eli Lilly and Company FDA/Industry Statistics Workshop September 2004.
1 Superior Safety in Noninferiority Trials David R. Bristol To appear in Biometrical Journal, 2005.
Aviation Security Training Module 4 Design and Conduct Exercise II 1.
Overview of Lecture Parametric vs Non-Parametric Statistical Tests.
Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
C82MST Statistical Methods 2 - Lecture 2 1 Overview of Lecture Variability and Averages The Normal Distribution Comparing Population Variances Experimental.
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
1 Contact details Colin Gray Room S16 (occasionally) address: Telephone: (27) 2233 Dont hesitate to get in touch.
Chapter 7 Sampling and Sampling Distributions
1 The Impact of Buy-Down on Sell Up, Unconstraining, and Spiral-Down Edward Kambour, Senior Scientist E. Andrew Boyd, SVP and Senior Scientist Joseph Tama,
Hypothesis Test II: t tests
You will need Your text Your calculator
On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach
9.4 t test and u test Hypothesis testing for population mean Example : Hemoglobin of 280 healthy male adults in a region: Question: Whether the population.
Contingency Tables Prepared by Yu-Fen Li.
Chapter 16 Goodness-of-Fit Tests and Contingency Tables
VOORBLAD.
Phase II/III Design: Case Study
Hypothesis Tests: Two Independent Samples
Chapter 4 Inference About Process Quality
1..
Understanding p-values Annie Herbert Medical Statistician Research and Development Support Unit
Statistical Analysis SC504/HS927 Spring Term 2008
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Module 17: Two-Sample t-tests, with equal variances for the two populations This module describes one of the most utilized statistical tests, the.
Issues of Simultaneous Tests for Non-Inferiority and Superiority Tie-Hua Ng*, Ph. D. U.S. Food and Drug Administration Presented at MCP.
Please enter data on page 477 in your calculator.
1 Using one or more of your senses to gather information.
Subtraction: Adding UP
Putting Statistics to Work
Januar MDMDFSSMDMDFSSS
Statistical Inferences Based on Two Samples
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
1 Phase III: Planning Action Developing Improvement Plans.
Chapter Thirteen The One-Way Analysis of Variance.
Ch 14 實習(2).
Chapter 8 Estimation Understandable Statistics Ninth Edition
©2006 Prentice Hall Business Publishing, Auditing 11/e, Arens/Beasley/Elder Audit Sampling for Tests of Controls and Substantive Tests of Transactions.
PSSA Preparation.
Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation.
Simple Linear Regression Analysis
Multiple Regression and Model Building
January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.
4/4/2015Slide 1 SOLVING THE PROBLEM A one-sample t-test of a population mean requires that the variable be quantitative. A one-sample test of a population.
Data, Now What? Skills for Analyzing and Interpreting Data
Comparing Means.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
January 31 and February 3,  Some formulae are presented in this lecture to provide the general mathematical background to the topic or to demonstrate.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
The Multiple Comparisons Problem in IES Impact Evaluations: Guidelines and Applications Peter Z. Schochet and John Deke June 2009, IES Research Conference.
EBM --- Journal Reading Presenter :呂宥達 Date : 2005/10/27.
Multiple Testing in Impact Evaluations: Discussant Comments IES Research Conference June 11, 2008 Larry L. Orr.
Chapters Way Analysis of Variance - Completely Randomized Design.
Guidelines for Multiple Testing in Impact Evaluations Peter Z. Schochet June 2008.
Chapter 9: Hypothesis Tests for One Population Mean 9.2 Terms, Errors, and Hypotheses.
ANalysis Of VAriance (ANOVA) Used for continuous outcomes with a nominal exposure with three or more categories (groups) Result of test is F statistic.
1 השוואות מרובות מדדי טעות, עוצמה, רווחי סמך סימולטניים ד"ר מרינה בוגומולוב מבוסס על ההרצאות של פרופ' יואב בנימיני ופרופ' מלכה גורפיין.
Chapter Review Problems
Presentation transcript:

What To Do About the Multiple Comparisons Problem? Peter Z. Schochet February 2008

Overview of Presentation Background Suggested testing guidelines Background Suggested testing guidelines 2

Background

Overview of the Problem Multiple hypothesis tests are often conducted in impact studies –Outcomes –Subgroups –Treatment groups Standard testing methods could yield: – Spurious significant impacts – Incorrect policy conclusions Multiple hypothesis tests are often conducted in impact studies –Outcomes –Subgroups –Treatment groups Standard testing methods could yield: – Spurious significant impacts – Incorrect policy conclusions 4

Assume a Classical Hypothesis Testing Framework True impacts are fixed for the study population Test H 0j : Impact j = 0 Reject H 0j if p-value of t-test < =.05 Chance of finding a spurious impact is 5 percent for each test alone True impacts are fixed for the study population Test H 0j : Impact j = 0 Reject H 0j if p-value of t-test < =.05 Chance of finding a spurious impact is 5 percent for each test alone 5

But Suppose No True Impacts and the Tests Are Considered Together Probability 1 t-test Number of Tests a Is Statistically Significant a Assumes independent tests 6

Impact Findings Can Be Misrepresented Publishing bias A focus on stars Publishing bias A focus on stars 7

Adjustment Procedures Lower Levels for Individual Tests Control the combined error rate Many available methods: –Bonferroni: Compare p-values to (.05 / # of tests) –Fishers LSD, Holm (1979), Sidak (1967), Scheffe (1959), Hochberg (1988), Rom (1990), Tukey (1953) –Resampling methods (Westfall and Young 1993) –Benjamini-Hochberg (1995) Control the combined error rate Many available methods: –Bonferroni: Compare p-values to (.05 / # of tests) –Fishers LSD, Holm (1979), Sidak (1967), Scheffe (1959), Hochberg (1988), Rom (1990), Tukey (1953) –Resampling methods (Westfall and Young 1993) –Benjamini-Hochberg (1995) 8

These Methods Reduce Statistical Power- The Chances of Finding Real Effects Simulated Statistical Power a Number of Tests Unadjusted Bonferroni a Assumes 1,000 treatments and 1,000 controls, 20 percent of all null hypotheses are true, and independent tests 9

Big Debate on Whether To Use Adjustment Procedures What is the proper balance between Type I and Type II errors? 10

To Adjust or Not To Adjust?

February, July, December 2007 Advisory Panel Meetings Held at IES Chairs: Phoebe Cottingham, IES Rob Hollister, Swarthmore Rebecca Maynard, U. of PA Chairs: Phoebe Cottingham, IES Rob Hollister, Swarthmore Rebecca Maynard, U. of PA Participants: Steve Bell, Abt Howard Bloom, MDRC John Burghardt, MPR Mark Dynarski, MPR Andrew Gelman, Columbia David Judkins, Westat Jeff Kling, Brookings David Myers, AIR Larry Orr, Abt Peter Schochet, MPR 12

Basic Principles for a Testing Strategy

The Multiplicity Problem Should Not Be Ignored Erroneous conclusions can result otherwise But need a strategy that balances Type I and II errors Erroneous conclusions can result otherwise But need a strategy that balances Type I and II errors 14

Limiting the Number of Outcomes and Subgroups Can Help But not always possible or desirable Need flexible strategy for confirmatory and exploratory analyses But not always possible or desirable Need flexible strategy for confirmatory and exploratory analyses 15

Problem Should Be Addressed by First Structuring the Data Structure will depend on the research questions Adjustments should not be conducted blindly across all contrasts Structure will depend on the research questions Adjustments should not be conducted blindly across all contrasts 16

Suggested Testing Guidelines

The Plan Must Be Specified Up Front Rigor requires that the strategy be documented prior to data analysis 18

Delineate Separate Outcome Domains Based on a conceptual framework that relates the intervention to the outcomes Represent key clusters of constructs Domain items are likely to measure the same underlying trait –Test scores –Teacher practices –School attendance Based on a conceptual framework that relates the intervention to the outcomes Represent key clusters of constructs Domain items are likely to measure the same underlying trait –Test scores –Teacher practices –School attendance 19

Testing Strategy: Both Confirmatory and Exploratory Components Confirmatory component –Addresses central study hypotheses –Must adjust for multiple comparisons –Must be specified in advance Exploratory component –Identify impacts or relationships for future study –Findings should be regarded as preliminary Confirmatory component –Addresses central study hypotheses –Must adjust for multiple comparisons –Must be specified in advance Exploratory component –Identify impacts or relationships for future study –Findings should be regarded as preliminary 20

Confirmatory Analysis Has Two Potential Parts 1. Domain-specific analysis 2. Between-domain analysis 1. Domain-specific analysis 2. Between-domain analysis 21

Domain-Specific Analysis

Test Impacts for Outcomes as a Group Create a composite domain outcome –Weighted average of standardized outcomes Simple average Index Latent factor Conduct a t-test on the composite Create a composite domain outcome –Weighted average of standardized outcomes Simple average Index Latent factor Conduct a t-test on the composite 23

What About Tests for Individual Domain Outcomes? If impact on composite is significant –Test impacts for individual domain outcomes without multiplicity corrections –Use only for interpretation If impact on composite is not significant –Further tests are not warranted If impact on composite is significant –Test impacts for individual domain outcomes without multiplicity corrections –Use only for interpretation If impact on composite is not significant –Further tests are not warranted 24

Between-Domain Analysis

Applicable If Studies Require Summative Evidence of Impacts Constructing unified composites may not make sense –Domains measure different latent traits Test domain composites individually using adjustment procedures Constructing unified composites may not make sense –Domains measure different latent traits Test domain composites individually using adjustment procedures 26

Testing Strategy Will Depend on the Research Questions Are impacts significant in all domains? –No adjustments are needed Are impacts significant in any domain? –Adjustments are needed Are impacts significant in all domains? –No adjustments are needed Are impacts significant in any domain? –Adjustments are needed 27

Other Situations That Require Multiplicity Adjustments 1. Designs with multiple treatment groups –Apply Tukey-Kramer, Dunnett, or resampling methods to domain composites 2. Subgroup analyses that are part of the confirmatory analysis –Conduct F-tests for differences across subgroup impacts 1. Designs with multiple treatment groups –Apply Tukey-Kramer, Dunnett, or resampling methods to domain composites 2. Subgroup analyses that are part of the confirmatory analysis –Conduct F-tests for differences across subgroup impacts 28

Statistical Power Studies must be designed to have sufficient statistical power for all confirmatory analyses –Includes subgroup analyses Studies must be designed to have sufficient statistical power for all confirmatory analyses –Includes subgroup analyses 29

Reporting Must Link to the Study Protocols Qualify confirmatory and exploratory analysis findings in reports –No one way to present adjusted and unadjusted p-values –Confidence intervals may be helpful –Emphasize confirmatory analysis results in the executive summary Qualify confirmatory and exploratory analysis findings in reports –No one way to present adjusted and unadjusted p-values –Confidence intervals may be helpful –Emphasize confirmatory analysis results in the executive summary 30

Testing Approach Summary Pre-specify plan in the study protocols Structure the data –Delineate outcome domains Confirmatory analysis –Within and between domains Exploratory analysis Qualify findings appropriately Pre-specify plan in the study protocols Structure the data –Delineate outcome domains Confirmatory analysis –Within and between domains Exploratory analysis Qualify findings appropriately 31