Optimality Considerations in Testing Massive Numbers of Hypotheses Peter H. Westfall Ananda Bandulasiri Texas Tech University.

Slides:



Advertisements
Similar presentations
Chapter 10.  Real life problems are usually different than just estimation of population statistics.  We try on the basis of experimental evidence Whether.
Advertisements

Chapter 9 Hypothesis Testing
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
AP Statistics – Chapter 9 Test Review
Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.
Stat Day 16 Observations (Topic 16 and Topic 14)
BCOR 1020 Business Statistics Lecture 22 – April 10, 2008.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Hypothesis : Statement about a parameter Hypothesis testing : decision making procedure about the hypothesis Null hypothesis : the main hypothesis H 0.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Choosing Statistical Procedures
1 Dr. Jerrell T. Stracener EMIS 7370 STAT 5340 Probability and Statistics for Scientists and Engineers Department of Engineering Management, Information.
Chapter 9 Testing a Claim
Introduction to Statistical Inference Probability & Statistics April 2014.
Tests About a Population Proportion
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
CHAPTER 17: Tests of Significance: The Basics
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Decision-Theoretic Views on Switching Between Superiority and Non-Inferiority Testing. Peter Westfall Director, Center for Advanced Analytics and Business.
Strong Control of the Familywise Type I Error Rate in DNA Microarray Analysis Using Exact Step-Down Permutation Tests Peter H. Westfall Texas Tech University.
Hypothesis Testing A procedure for determining which of two (or more) mutually exclusive statements is more likely true We classify hypothesis tests in.
Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Back to basics – Probability, Conditional Probability and Independence Probability of an outcome in an experiment is the proportion of times that.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
Introduction to Inferece BPS chapter 14 © 2010 W.H. Freeman and Company.
+ Chapter 12: Inference for Proportions Section 12.1 Inference for a Population Proportion.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Four Ending Wednesday, September 19 (Assignment 4 which is included in this study guide.
Section A Confidence Interval for the Difference of Two Proportions Objectives: 1.To find the mean and standard error of the sampling distribution.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Tests of Significance: The Basics BPS chapter 15 © 2006 W.H. Freeman and Company.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Chapter 11 Inferences about population proportions using the z statistic.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Logic and Vocabulary of Hypothesis Tests Chapter 13.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Chapter 9 Day 2 Tests About a Population Proportion.
Applied Quantitative Analysis and Practices LECTURE#14 By Dr. Osman Sadiq Paracha.
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Nine Hypothesis Testing.
Tests of Significance: The Basics ESS chapter 15 © 2013 W.H. Freeman and Company.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.2 Tests About a Population.
10.1 – Estimating with Confidence. Recall: The Law of Large Numbers says the sample mean from a large SRS will be close to the unknown population mean.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Hypothesis Testing Steps for the Rejection Region Method State H 1 and State H 0 State the Test Statistic and its sampling distribution (normal or t) Determine.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Hypothesis Testing.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
+ Chapter 9 Testing a Claim 9.1Significance Tests: The Basics 9.2Tests about a Population Proportion 9.3Tests about a Population Mean.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Hypothesis Testing Chapter Hypothesis Testing  Developing Null and Alternative Hypotheses  Type I and Type II Errors  One-Tailed Tests About.
Chapter 9 Introduction to the t Statistic
Chapter Nine Hypothesis Testing.
Part Four ANALYSIS AND PRESENTATION OF DATA
Chapter 9: Testing a Claim
Significance Tests: A Four-Step Process
Chapter 9: Hypothesis Tests Based on a Single Sample
Statistical Inference about Regression
Chapter 9: Testing a Claim
Chapter 9 Testing a Claim
Chapter 9: Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Chapter 9: Testing a Claim
Chapter 9: Testing a Claim
CHAPTER 9 Testing a Claim
Unit 5: Hypothesis Testing
Presentation transcript:

Optimality Considerations in Testing Massive Numbers of Hypotheses Peter H. Westfall Ananda Bandulasiri Texas Tech University

Hypotheses; FWE and FDR H 0i (point null) vs. H 1i, i=1,…,k. H 0i (point null) vs. H 1i, i=1,…,k. k is large! k is large! A decision algorithm for classifying k tests produces R total rejections, with V erroneous. A decision algorithm for classifying k tests produces R total rejections, with V erroneous. FWE= P(V>0 ) FWE= P(V>0 ) FDR = E(V/R + ) FDR = E(V/R + ) To control FWE: Hochberg, Westfall-Young,… To control FWE: Hochberg, Westfall-Young,… To control FDR: Benjamini and Hochberg,… To control FDR: Benjamini and Hochberg,…

Scale Up, wrt k FWE-controlling methods do not scale up as k  : Reject H 0i when p i  ~  /k. FWE-controlling methods do not scale up as k  : Reject H 0i when p i  ~  /k. FDR-controlling methods do scale up as k  : FDR-controlling methods do scale up as k  : Reject H 0i when p i   k  Reject H 0i when p i   k  where  k  as k  0<  where  k  as k  0< 

FDR Convergence as k  Critical t 3 for FDR(.05) is 4.93 Marginal (unadjusted)  is

Application: EEG Responses to Light Stimuli 43 time series responses; 62 scalp locations; 70 ind. reps; 5 trt: (1)G60% (2)R90% (3)G80% (4)R100% (5)G100%

Average EEG Curves

Questions of Interest Validity Check: Differences should exist between responses for different intensities. Validity Check: Differences should exist between responses for different intensities. Main Question: Are there differences between red and green stimuli? When? Where? Main Question: Are there differences between red and green stimuli? When? Where? Number of tests: Number of tests: k = (10 trt comparisons) x (62 scalp spots) x (43 time locations) = 26,660. k = (10 trt comparisons) x (62 scalp spots) x (43 time locations) = 26,660.

Histograms of t-statistics with null reference G60 v R90 G60 v G80

Histograms of t-statistics with null reference G60 v R100 G60 v G100

Histograms of t-statistics with null reference R90 v G80 R90 v R100

Histograms of t-statistics with null reference R90 v G100 G80 v R100

Histograms of t-statistics with null reference G80 v G100 R100 v G100

Results Westfall-Young FWE-controlling method: Westfall-Young FWE-controlling method: No significant R100 v G100 comparisons No significant R100 v G100 comparisons Significant comparisons for all other contrasts Significant comparisons for all other contrasts Benjamini-Hochberg FDR-controlling method: Benjamini-Hochberg FDR-controlling method: 23 significant R100 v G100 comparisons 23 significant R100 v G100 comparisons Claim: The FWE-controlling method gave the right answer. Claim: The FWE-controlling method gave the right answer.

A Comment FDR scales up better as k , but that does not necessarily mean the results are “better,” even for large k. FDR scales up better as k , but that does not necessarily mean the results are “better,” even for large k.

Scale Up, wrt n Model for test statistics Z i, i=1,…,k Model for test statistics Z i, i=1,…,k Z i |  i ~ N(n 1/2  i,1);  i =  i /  xi = stdzd effect size Z i |  i ~ N(n 1/2  i,1);  i =  i /  xi = stdzd effect size  i ~ F  i ~ F Suppose P(  i =0 ) = 0. Then Suppose P(  i =0 ) = 0. Then FDR does not scale up as n . FDR does not scale up as n . FWE might scale up, but only serendipitously, if n and k diverge at appropriate rates FWE might scale up, but only serendipitously, if n and k diverge at appropriate rates

Efron’s Method JASA(2004), JASA(2004), Estimate an “empirical null distribution,” f 0 (z), from the center of the histogram of z’s. Estimate an “empirical null distribution,” f 0 (z), from the center of the histogram of z’s. Estimate the combined distribution, f(z). Estimate the combined distribution, f(z). Estimate a “local FDR” for each z i, as fdr(z i ) = f 0 (z i )/f(z i ). Estimate a “local FDR” for each z i, as fdr(z i ) = f 0 (z i )/f(z i ). Choose as “interesting” cases with fdr(z i ) <0.1. Choose as “interesting” cases with fdr(z i ) <0.1.

Discussion P(  i =0) > 0 is usually false, but a reasonable approximation for small n. P(  i =0) > 0 is usually false, but a reasonable approximation for small n. As n  we need more realistic models: As n  we need more realistic models: “P(  i =0) > 0 never true”, but even if true - “P(  i =0) > 0 never true”, but even if true - Unobserved covariates Unobserved covariates Failed model assumptions Failed model assumptions Imperfect sampling procedures Imperfect sampling procedures  “Empirical null” sensible

Results from Efron’s Method Significant diffs only for 2 v 3, 2 v 4, 2 v 5 Significant diffs only for 2 v 3, 2 v 4, 2 v 5 No significant R100 v G100 comparisons (right answer) No significant R100 v G100 comparisons (right answer)

What is the “Right Answer”? Methods that have “good” utility are “right” Methods that have “good” utility are “right” MCPs must have reasonable utility, otherwise they would have disappeared long ago MCPs must have reasonable utility, otherwise they would have disappeared long ago DECISION THEORY: The right answer is to maximize utility/minimize loss. DECISION THEORY: The right answer is to maximize utility/minimize loss.

Loss Functions

 0 k C1C1 C2C2

Optimal Decision Rule: Classify to: Classify to:  “UE” when  < -  0  “OE” when  >  0  “NC” when |  | <  0, regardless of the distribution of . Problem: We observe x =  + , not .  Here  distributions *do* matter

Assumptions x|  ~ N( ,  x ) ; (say  x  )  = ~ N(0,   ) wp  0 ~ N(0,   ) wp 1-  0

Baseline Parameters Model Parameters Model Parameters   x =    = 0.05    = 1.00   0 = 0.80 Loss Function Parameters Loss Function Parameters  C 1 = 0.4  C 2 = 0.2  k = 0.99   0 = 0.223

Test Stats & Multiple Tests For testing H 0i :  i = 0, for i = 1,…,k,, p-Values are FWE, FDR controlling methods use p i ; Efron’s method use t i.

Average Loss Simulation:  generated, then x’s |  Simulation:  generated, then x’s |  Independence Independence All combinations of: All combinations of: p=(400, 2000, 10000, 50000) p=(400, 2000, 10000, 50000) n=(10, 20, 40, 80, 160) n=(10, 20, 40, 80, 160)

Concluding Comments Consider scale up in both p and n Consider scale up in both p and n FWE often ok (serendipity) FWE often ok (serendipity) Efron promising for scaling up both ways Efron promising for scaling up both ways Recommendations: Either Recommendations: Either a.Learn to recognize situations where FWE/FDR/Efron/… have good utility, or b.Bite the bullet and construct loss functions