Download presentation
Presentation is loading. Please wait.
Published byNickolas Harris Modified over 9 years ago
1
Strong Control of the Familywise Type I Error Rate in DNA Microarray Analysis Using Exact Step-Down Permutation Tests Peter H. Westfall Texas Tech University
2
glass (1 cm 2 ) ~ 6,500 genes Microarrays Different cDNA sequence
3
Example Group 1: Acute Myeloid Leukemia (AML), n 1 =11 Group 2: Acute Lymphoblastic Leukemia (ALL), n 2 =27 Data: OBS TYPE G1 G2 G3 … G7000 1 AML (Gene expression levels) 2 AML … … … … 11 AML 12 ALL … … 38 ALL
4
Testing for 7000 Gene Expression Levels Goal: Test H 0i : F ALL,i = F AML,i for i=1,…,7000. Here, “F” denotes cdf. Many choices for test statistics. Multiplicity problem: If tests are done at =.05, and there are 6600 equivalent genes, then.05*6600= 330 will be determined “non-equivalent.”
5
Closed Testing to Control False Discoveries Let S = {1,2,…,7000} (gene labels). Let K = {i 1,…,i k } S denote a particular subset. The Closed Testing Procedure: 1. Test H 0K : F ALL,K = F AML,K for each K S, using a valid -level test for each. 2. Reject H 0i : F ALL,i = F AML,i if H 0K is rejected for all K {i}.
6
Theorem: CTP strongly Controls FWE Proof: Suppose H 0j 1,..., H 0j m all are true (unknown to you which ones). You may reject at least one only when you reject the intersection H 0j 1 ... H 0j m. Thus, FWE = P(reject at least one of H 0j 1,..., H 0j m | H 0j 1,..., H 0j m all are true) P(reject H 0j 1 ... H 0j m | H 0j 1,..., H 0j m all are true) = .
7
Exact Tests for Composite Hypotheses H 0K Use the permutation distribution of min i K p i, where p i = 2P(T 38-2 > |t i |), and t i = p-value = proportion of the 38!/(27!11!) permutations for which min i K P i * min i K p i. Note: Exact despite “massively singular” covariance matrix!
8
A Slight Problem... There are 2 7000 -1 subsets K to be tested This might take a while...
9
A Fantastic Simplification You need only test 7000 of the 2 7000 -1 subsets! Why? Because P(min i K P i * c) P(min i K’ P i * c) when K K’. Significance for most lower order subsets is determined by significance of higher order subsets.
10
Illustration with Four Genes H {1234} min p =.0121, p {1234} =.0379 H {123} min p =.0121, p {123} <.0379 H {124} min p =.0121, p {124} <.0379 H {134} min p =.0121, p {134} <.0379 H {234} min p =.0142, p {234} =.0351 H {12} min p =.0121 p {12} <.0379 H {13} min p =.0121 p {13} <.0379 H {14} min p =.0121 p {14} <.0379 H {23} min p =.0142 p {23} <.0351 H {24} min p =.0142 p {24} <.0351 H {34} min p =.0191 p {34} =.0355 H 1 p 1 = 0.0121 p {1} <.0379 H 2 p 2 = 0.0142 p {2} <.0351 H 3 p 3 = 0.1986 p {3} =.1991 H 4 p 4 = 0.0191 p {4} <.0355 (Start at bottom.)
11
MULTTEST PROCEDURE Tests only the needed subsets (7000, not 2 7000 - 1). Samples from the permutation distribution. Only one sample is needed, not 7000 distinct samples: The joint distribution of minP is identical under H K and H S. (Called the “subset pivotality” condition by Westfall and Young, 1993.)
12
PROC MULTTEST code Proc multtest noprint out=adjp holm hoc stepperm n=200000; class type; /* AML or ALL */ test mean (gene1-gene7123); contrast ‘AML vs ALL’ -1 1; run; proc sort data=adjp(where=(raw_p le.0005)); by raw_p; proc print; var _var_ raw_p stppermp; run;
13
PROC MULTTEST Output (50 minutes for 200,000 samples)
14
Imbalance Issues Use of student t statistics does result in an exact, closed multiple testing procedure, but... There is imbalance: less power for gene types that are highly kurtotic than for normally distributed types. Solutions: Use exact unadjusted p-values – Already available for binary data – Computational difficulties otherwise Rank-transform the data prior to analysis
15
Rank Transform for Better Balance Proc rank; var gene1-gene7123; run; Proc multtest noprint out=adjp holm hoc stepperm n=200000; class type; /* AML or ALL */ test mean (gene1-gene7123); contrast ‘AML vs ALL’ -1 1; run; proc sort data=adjp(where=(raw_p le.0005)); by raw_p; proc print; var _var_ raw_p stppermp; run;
16
Rank Transformed Results
17
Comparing ALL and AML for Gene 6128 0 1000 2000 G E N E 6 1 2 8 ALLAML TYPE
18
Is Better Balance Good? Maybe not - Imbalance induces more powerful multiple testing procedure –Bonferroni multiplier implicitly reduced through imbalance –Serendipity!
19
Summary Westfall-Young Method is an exact, closed testing method, despite large p, small n Detected genes are “honestly significant” Robust (nonparametric)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.