Presentation is loading. Please wait.

Presentation is loading. Please wait.

Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.

Similar presentations


Presentation on theme: "Different Expression Multiple Hypothesis Testing STAT115 Spring 2012."— Presentation transcript:

1 Different Expression Multiple Hypothesis Testing STAT115 Spring 2012

2 Tongji 20092 Outline Differential gene expression –Parametric test: t and Welch-t test –Non-parametric test: permutation t and Mann- Whitney Multiple hypothesis testing –Family-wide error rate, and FDR –Affy detection (present/absent calls)

3 Tongji 20093 Normalized & Summarized Data 5 Normal and 9 Myeloma (MM) Samples Samples Genes

4 Tongji 20094 Identify Differentially Expressed Genes Understand what is the difference between two conditions / samples –Disease pathways Find disease markers for diagnosis –Diagnosis chips Interested in genes with: –Statistical significance: observed differential expression is unlikely to be due to chance –Biological significance: observed differential expression is sufficient of biological relevance

5 Tongji 20095 Classical study of cancer subtypes Golub et al. (1999) Identification of Diagnostic Genes

6 Tongji 20096 Identify Differentially Expressed Genes Fold change Parametric test (assume expression value follows normal distribution)Parametric test –T test and Welch-t test Non-parametric test (no assumption of expression distribution)Non-parametric test –Permutation t-test and Mann-Whitney U (Wilcoxon rank sum) test Non-parametric is good only if you have plenty of samples to choose from –Expression with 3 treatment and 3 controls are better off with regular t or Welch-t statistic

7 Tongji 20097 Fold Change Naïve method Avg(X) / Avg(Y) May not be a good measure of differential expression, especially for less abundant transcripts Note on scale: –Natural scale: MAS4, MAS5, dChip –Log scale: RMA, need to take exp() before calculating fold change

8 Tongji 20098 Two Sample t-test Statistical significance in the two sample problem Group 1: X 1, X 2, … X n1 Group 2: Y 1, Y 2, … Y n2 If X i ~ Normal (μ 1, σ 2 ), Y i ~ Normal (μ 2, σ 2 ) Null hypothesis of μ 1 = μ 2

9 Tongji 20099 Two Sample t-test Statistical significance in the two sample problem Group 1: X 1, X 2, … X n1 Group 2: Y 1, Y 2, … Y n2 If X i ~ Normal (μ 1, σ 1 2 ), Y i ~ Normal (μ 2, σ 2 2 ) Null hypothesis of μ 1 = μ 2 Use Welch-t statistic Check T table for p-val A gene with small p-val (very big or small t) –Reject null –Significant difference between normal and MM

10 Tongji 200910 Permutation Test Non-parametric method for p-val calculation –Do not assume normal expression distribution –Do not assume the two groups have equal variance Randomly permute sample label, calculate t to form the empirical null t distribution –For MM-study, (14 choose 5) = 2002 different t values from permutation If the observed t extremely high/low  differential expression with statistical significance

11 Tongji 200911 Permutation Technique Condition 0Condition 1 Patient 4Patient 2Patient 3Patient 1Patient 5Patient 6 Condition 0Condition 1 Patient 1Patient 2Patient 5Patient 4Patient 3Patient 6 Condition 0Condition 1 Patient 1Patient 6Patient 3Patient 4Patient 5Patient 2 Condition 0Condition 1 Patient 1Patient 2Patient 3Patient 4Patient 5Patient 6 Compute T 0 Compute T 1 Compute T 2 Compute T 3 Compare T 0 to T * set

12 Tongji 200912 Wilcoxon Rank Sum Test Rank all data in row, count sum of ranks T T or T C Significance calculated from permutation as well E.g. 10 normal and 10 cancer –Min(T) = 55 –Max(T) = 155 –Significance(T=150) Check U table (transformation of T) for stat significance Intuition similar to permutation t-test

13 Tongji 200913 Multiple Hypotheses Testing We test differential expression for every gene with p-value, e.g. 0.01 If there are ~15 K genes on the array, potentially 0.01 x 15K = 150 genes wrongly called H 0 : no diff expr; H 1 : diff expr –Reject H 0 : call something to be differentially expressed Should control family-wise error rate or false discovery ratefamily-wise error ratefalse discovery rate Use Affy’s present/absent callsAffy’s present/absent calls

14 Tongji 200914 Family-Wise Error Rate P(false rejection at least one hypothesis) < α P(no false rejection ) > 1- α Bonferroni correction: to control the family- wise error rate for testing m hypotheses at level α, we need to control the false rejection rate for each individual test at α/m If α is 0.05, for 15K gene prediction, p-value cutoff is 0.05/15K = 3.33 E-6 Too conservative for differentially expressed gene selection

15 Tongji 200915 False Discovery Rate # not rejected Not called # rejected Called Total # H 0 Two groups similar UVm0m0 # H 1 Two groups different TSm1m1 Total m - RRm V: type I errors, false positives T: type II errors, false negatives FDR = V / R, FP / all called

16 Tongji 200916 False Discovery Rate Less conservative than family-wise error rate Benjamini and Hochberg (1995) method for FDR control, e.g. FDR ≤  * –Draw all m genes, ranked by p-val –Draw line y = x  * / m, x = 1…m –Call all the genes below the line

17 Tongji 200917 FDR Threshold Genes ranked by p-val x  * / m line

18 Tongji 200918 SAM for FDR Control Statistical Analysis of Microarrays (SAM), Tusher et al. PNAS 2001 –With small number of samples, there could be small  and very big t by chance –SAM: modified t*, increase  based on  of other genes on the array (i.e. lowest 5 percentile of  ) –Proceeds with regular FDR

19 Tongji 200919 Q-value Storey & Tibshirani, PNAS, 2003 Empirically derived q-value Every p-value has its corresponding q- value (FDR) FDR’s academic vs practical values

20 Tongji 200920 Affymetrix Detection MAS 5.0 makes an absent/marginal/present call for each probeset Define R = (PM-MM)/(PM+MM) –R near 1 means PM>>MM, abundant transcript –R near or below 0 means PM <= MM R should make cutoff (  ) to be considered present PM MM Present (P) PM MM Absent (A)

21 Tongji 200921 Affymetrix Detection  (default 0.015) empirically set by Affy Detection p-value from Wilcoxon signed rank test –Rank probes by (PM-MM) / (PM+MM) -  –T+: 25, T-: -20, n = 9 –Check T+ against Wilcoxon Table (n) for p-value

22 Tongji 200922 Affymetrix Detection  1 and  2 are user defined values but have optimized defaults in MAS5 Since expression index for low abundant transcripts is unreliable, it is better to find differentially expressed genes only from present call genes Increasing  can reduce FDR, but true present calls could be lost Present Marginal Absent Default: 0.04 0.06  1  2 P-value of a probe set

23 Tongji 200923 Outline Differential gene expression –Parametric test: t and Welch-t test –Non-parametric test: permutation t and Mann- Whitney Multiple hypothesis testing –Family-wide error rate and FDR –Find diff expr genes only on Affy present calls

24 Tongji 200924 Acknowledgment Kevin Coombes & Keith Baggerly Mark Craven Georg Gerber Gabriel Eichler Ying Xie Terry Speed & Group Larry Hunter Wing Wong & Cheng Li Mark Reimers Jenia Semyonov


Download ppt "Different Expression Multiple Hypothesis Testing STAT115 Spring 2012."

Similar presentations


Ads by Google