Strong Control of the Familywise Type I Error Rate in DNA Microarray Analysis Using Exact Step-Down Permutation Tests Peter H. Westfall Texas Tech University
glass (1 cm 2 ) ~ 6,500 genes Microarrays Different cDNA sequence
Example Group 1: Acute Myeloid Leukemia (AML), n 1 =11 Group 2: Acute Lymphoblastic Leukemia (ALL), n 2 =27 Data: OBS TYPE G1 G2 G3 … G AML (Gene expression levels) 2 AML … … … … 11 AML 12 ALL … … 38 ALL
Testing for 7000 Gene Expression Levels Goal: Test H 0i : F ALL,i = F AML,i for i=1,…,7000. Here, “F” denotes cdf. Many choices for test statistics. Multiplicity problem: If tests are done at =.05, and there are 6600 equivalent genes, then.05*6600= 330 will be determined “non-equivalent.”
Closed Testing to Control False Discoveries Let S = {1,2,…,7000} (gene labels). Let K = {i 1,…,i k } S denote a particular subset. The Closed Testing Procedure: 1. Test H 0K : F ALL,K = F AML,K for each K S, using a valid -level test for each. 2. Reject H 0i : F ALL,i = F AML,i if H 0K is rejected for all K {i}.
Theorem: CTP strongly Controls FWE Proof: Suppose H 0j 1,..., H 0j m all are true (unknown to you which ones). You may reject at least one only when you reject the intersection H 0j 1 ... H 0j m. Thus, FWE = P(reject at least one of H 0j 1,..., H 0j m | H 0j 1,..., H 0j m all are true) P(reject H 0j 1 ... H 0j m | H 0j 1,..., H 0j m all are true) = .
Exact Tests for Composite Hypotheses H 0K Use the permutation distribution of min i K p i, where p i = 2P(T 38-2 > |t i |), and t i = p-value = proportion of the 38!/(27!11!) permutations for which min i K P i * min i K p i. Note: Exact despite “massively singular” covariance matrix!
A Slight Problem... There are subsets K to be tested This might take a while...
A Fantastic Simplification You need only test 7000 of the subsets! Why? Because P(min i K P i * c) P(min i K’ P i * c) when K K’. Significance for most lower order subsets is determined by significance of higher order subsets.
Illustration with Four Genes H {1234} min p =.0121, p {1234} =.0379 H {123} min p =.0121, p {123} <.0379 H {124} min p =.0121, p {124} <.0379 H {134} min p =.0121, p {134} <.0379 H {234} min p =.0142, p {234} =.0351 H {12} min p =.0121 p {12} <.0379 H {13} min p =.0121 p {13} <.0379 H {14} min p =.0121 p {14} <.0379 H {23} min p =.0142 p {23} <.0351 H {24} min p =.0142 p {24} <.0351 H {34} min p =.0191 p {34} =.0355 H 1 p 1 = p {1} <.0379 H 2 p 2 = p {2} <.0351 H 3 p 3 = p {3} =.1991 H 4 p 4 = p {4} <.0355 (Start at bottom.)
MULTTEST PROCEDURE Tests only the needed subsets (7000, not ). Samples from the permutation distribution. Only one sample is needed, not 7000 distinct samples: The joint distribution of minP is identical under H K and H S. (Called the “subset pivotality” condition by Westfall and Young, 1993.)
PROC MULTTEST code Proc multtest noprint out=adjp holm hoc stepperm n=200000; class type; /* AML or ALL */ test mean (gene1-gene7123); contrast ‘AML vs ALL’ -1 1; run; proc sort data=adjp(where=(raw_p le.0005)); by raw_p; proc print; var _var_ raw_p stppermp; run;
PROC MULTTEST Output (50 minutes for 200,000 samples)
Imbalance Issues Use of student t statistics does result in an exact, closed multiple testing procedure, but... There is imbalance: less power for gene types that are highly kurtotic than for normally distributed types. Solutions: Use exact unadjusted p-values – Already available for binary data – Computational difficulties otherwise Rank-transform the data prior to analysis
Rank Transform for Better Balance Proc rank; var gene1-gene7123; run; Proc multtest noprint out=adjp holm hoc stepperm n=200000; class type; /* AML or ALL */ test mean (gene1-gene7123); contrast ‘AML vs ALL’ -1 1; run; proc sort data=adjp(where=(raw_p le.0005)); by raw_p; proc print; var _var_ raw_p stppermp; run;
Rank Transformed Results
Comparing ALL and AML for Gene G E N E ALLAML TYPE
Is Better Balance Good? Maybe not - Imbalance induces more powerful multiple testing procedure –Bonferroni multiplier implicitly reduced through imbalance –Serendipity!
Summary Westfall-Young Method is an exact, closed testing method, despite large p, small n Detected genes are “honestly significant” Robust (nonparametric)