Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Testing in Microarray Data Analysis Mi-Ok Kim.

Similar presentations


Presentation on theme: "Multiple Testing in Microarray Data Analysis Mi-Ok Kim."— Presentation transcript:

1 Multiple Testing in Microarray Data Analysis Mi-Ok Kim

2 Outline 1.Hypothesis Testing 2.Issue in Multiple Testing in Microarray Analysis 1) Type I Error 2) Power 3) P-values 3. Permutation

3 1. Hypothesis Testing H0 : Null hypotheis vs. H1 : Alternative Hypothesis T : test statistics C : critical value If |T|>C, H0 is rejected. Otherwise H0 is retained Ex ) H0 :  1 =  2 vs. H1 :  1   2 T = ( x 1 - x 2 ) / pooled se If |T| > z (1-  /2), H0 is rejected at the significance level  C 

4 1. Hypothesis Testing Hypothesis Result Retained Rejected Truth H0 Type I error H1 Type II error Type I error rate = false positives (  : significance level ) Type II error rate = false negatives Power : 1–Type II error rate P-values : p=inf{  | H0 is rejected at the significance level  }

5 2. Issues in Multiple Comparison Q : Given n treatments, which two treatments are significantly different ? (simultaneous testing) cf) Is treatment A different from treatment B ? Ex ) m treatment means :  1,…,  n H j :  i =  j where i  j T j = ( x i - x j ) / pooled SE Type I error when testing each at 0.05 significance level one by one : 1 – (0.95) n Inflated Type I error, ex)  =1 – (0.95) 10 = 0.401263 Remedies : Bonferroni Method Type I error rate =  / # of comparison

6 3. Issues in Multiple Testing in Microarray Analysis the identification of differentially expressed genes. ex) a study of differentially expressed genes expression in tumor biopsy specimens from leukemia patients ( ALL / AML ) that includes 6,817 genes and 30 samples rows : genes ( m ) columns : samples ( n ) H j : jth gene is not differentially expressed Simultaneously testing m null hypotheses H j, j=1, …, m, to determine which hypotheses to reject while controlling a suitably defined Type I error and maximizing power

7 3-1) Type I Error Rates Hypothesis Result #retained #rejected Total Truth H0 U V m0 H1 T S m1 Total m-R R m Per-comparison error rate ( PCER ) = E(V) / m Per-family error rate ( PFER ) = E(V) Family-wise error rate = pr ( V ≥ 1 ) False discovery rate ( FDR ) = E(Q), Q V/R, if R > 0 0, if R = 0

8 3-1) Type I Error Rates Under the complete null hypothesis, each H j has Type I error rate  j. PCER = E(V) / m = (  1 +...+  m )/m PFER = E(V) =  1 +...+  m FWER= pr ( V ≥ 1 ) = 1 - Pr (H j, j=1, …, m, not rejected ) FDR = E(V / R) = FWER PCER = (  1 +...+  m )/m ≤ max (  1 +...+  m ) ≤ PWER = FDR ≤ PFER=  1 +...+  m

9 3-1) Type I Error Rate Assume H j, j=1, …, m, with their test statistics T j, j=1,…, m, which has a MN with mean  =(  1,…,  m ) and identity covariance vector Let R j = I ( H j is rejected) and r j is observed value of R j Let  j = Pr ( H j rejected under H j ). PFER =  j=1 m  j PCER =  j=1 m  j / m FWER = 1-  j=1 m (1-  j ) FDR =  r 1=0 1 …  r 1=0 1 (  j=1 m0 r j /  j=1 m r j )   j r j (1-  j ) 1-r j

10 3-2) Strong vs. Weak Control Expectations and probabilities are conditional on which hypotheses are true. Strong control : control of which Type I error rate under any combination of true and false hypotheses, ie. any value of m0 Weak control : control of the Type I error rate only when all the null hypotheses are true, ie. Under the complete null hypothesis ∩ j=1 m Hj In the microarray setting, where it is very unlikely that no genes are differentially expressed, it seems particularly important to have a strong control of the Type I error rate.

11 3-3) Power Within the class of multiple testing procedures that control a given Type I error rate at an acceptable level, maximize power, that is, minimize a suitably defined Type II error rate. Any-pair power : Pr ( S ≥ 1 ) = the probability of rejecting at least one false null hypothesis Per-pair power : average power = E(S) / m1 All-pair power : Pr ( S = m1 ) = the probability of rejecting all false null hypothesis

12 3-4) Multiple Testing Procedures based on P- values that control the family-wise error rate For a single hypothesis H 1, p 1 =inf{  | H 1 is rejected at the significance level  } If p 1 < , H 1 is rejected. Otherwise H 1 is retained Adjusted p-values for multiple testing (p*) p j *=inf{  | H 1 is rejected at FWER=  } If p j * < , H j is rejected. Otherwise H j is retained Single-Step, Step-Down and Step-Up procedure

13 3-4-1) Single-Step Procedure For a strong control of FWER, single-step Bonferroni adjusted p-values : p j *= min( mp j,1) single-Step Sidak adjsted pvalues : p j *= 1- (1-p j ) m For a weak control of FWER, single-step minP adjusted p-values p j *= min 1 ≤k≤m (P k ≤ p j | complete null) m single-step maxP adjusted p-values p j *= max 1 ≤k≤m (|T k | ≤ C j | complete null) m Under subset pivotal property, weak control = strong control

14 3-4-2) Step-Down Procedure Order the observed unadjusted p-values such that p r1 ≤ p r2 ≤ … ≤ p rm Accordingly, order H r1 ≤ H r2 ≤ … ≤ H rm Holm’s procedure j* = min { j | p rj >  / (m-j+1) }, reject H rj for j=1,.., j*-1 Adjusted step-down Holm’s p-values p rj *= max{ min( (m-k+1) p rk, 1) } p rj *= max{ 1-(1-p rk ) (m-k+1) } p rj *= max{ Pr( min rk<l<rm P l ≤ p rk | complete null) } p rj *= max{ Pr( max rk<l<rm |T l | ≤ C rk | complete null) }

15 3-4-3) Step-Up Procedure Order the observed unadjusted p-values such that p r1 ≤ p r2 ≤ … ≤ p rm Accordingly, order H r1 ≤ H r2 ≤ … ≤ H rm j* = max { j | p rj ≤  / (m-j+1) }, reject H rj for j=1,.., j* Adjusted step-down Holm’s p-values p rj *= min{ min( (m-k+1) p rk, 1) }

16 3-5) Resampling Method Rows – genes, Columns – samples Bootstrap or permutation based method Estimate the joint distribution of the test statistics under the complete null hypothesis by permuting the columns of the gene expression data matrix (permuting columns) For the bth permutation, b=1, …, B, compute test statistics t 1,b, …, t m,b p rj *=  j=1 B I (| t j,b | ≥ C j ) / B ex ) Colub (1999)

17 3-5) Resampling Method Efron et al. (2000) and Tusher et al. (2001) Compute a test statistics tj for each gene j and define order statistics t(j) such that t (1) ≥ t (2) ≥.. ≥ t (m) For each b permutation, b=1,..,B, compute the test statistics and define the order statistics t (1),b ≥ t (2),b ≥.. ≥ t (m),b From the permutations, estimate the expected value (under the complete null) of the order statistics by t* (j) =  t (j),b / B Form a Q-Q plot of the observed t (j) vs. the expected t* (j) Efron et al. – for a fixed threshold , genes with |t (j) -t* (j) | ≥  Tusher et al. - for a fixed threshold , let j*=max{j: t (j) -t* (j) ≥ , t* (j) > 0}


Download ppt "Multiple Testing in Microarray Data Analysis Mi-Ok Kim."

Similar presentations


Ads by Google