Presentation is loading. Please wait.

Presentation is loading. Please wait.

Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.

Similar presentations


Presentation on theme: "Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign."— Presentation transcript:

1 Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign

2 Websites  R Software: http://cran.r-project.org/  MAANOVA package: http://www.jax.org/staff/churchill/labsite/  SMA package: http://stat-www.berkeley.edu/users/terry/zarray

3 Data Normalization  Global lowess normalization (within slide): often useful.  Median Normalization (across the slides)

4 Global lowess Normalization Assume the changes are roughly symmetric for most genes. log 2 R/G -> log 2 R/G – c(A)= log 2 R/ (k(A)G), where c(A) is the lowess fit to M vs A plot.

5 Median Normalization (Optional)  Normalize the median log ratios of each gene across all slides to 0.  Formula of normalized data: where is the log ratio of gene g from slide i.

6 Transformations  Shift-log transformation (Kerr et al. 2002)  Curve Fitting Transformation (Yang et al. 2002 b)  Variance Stabilizing Transformation: Linlog Transformation

7 Shift-log transformation (Newton et al. 2001)  Move the origin along the line by adding the same positive constant to both channels,, k indicates genes.  The major effect is at shrinking the variance of log ratios at the low intensity end.  By expanding the range of C to include negative values, we can increase the variance at the low intensity end.

8 Curve Fitting Transformation (Yang et al. 2002 b)  Add one spot-specific constant to the signal values of one channel and subtract the same constant from signals in the other channel prior to the log transformation,.  Where C k is the spot-specific constant determined by the local regression line.

9 Linlog Transformation  Assume that additive error should be dominant at low intensity and multiplicative error should be dominant at high intensity,, i indicates channels (g/r).  In practice, people usually estimate d by the 25% quantile of the intensities.  The Linlog does not correct the curvatures in MA plots. We can combine the Linlog with either shift-log or lowess.

10 Normalization Comparison GeneSpringMAANOVASMA Data InputRaw dataAdjusted intensities Raw data Negative Measurements Set to 0 or 0.01Can not handleMissing Dye SwapYes No Intensity Dependent Normalization Yes Normalize to a Percentile(within)YesNoMedian Normalize to Positive Control GenesYesNo Normalize to a Constant Value YesNo

11 Normalization Comparison (continued) GeneSpringMAANOVASMA Divide by Specific SamplesYesNo Normalize to medianYesNo Median PolishingYesNo Print-tip Group LowessNoSort ofYes Scaled Print-tip Group Lowess No Yes Shift TransformationNoYesNo Linear-log TransformationNoYesNo Linear-log Shift TransformationNoYesNo

12 Statistical Analysis  One-sample t-Test  Two-sample t-Test  Nonparametric test (Wilcoxon-Mann-Whiteney test)  Global Error Model  Multiple Group comparisons (ANOVA)

13 One sample t-test  H 0: log ratio =0 versus H A: log ratio is not 0.  Reject H 0 if, where is the significance level.

14 Two sample t-test (equal variance)  H 0 : log ratios of two groups are equal  H A : log ratios of two groups are not equal  Exact p-value for normal data, even for small samples.

15 Two sample t-test (unequal variance)  T statistic:  Approximate with  Not exact p-value

16 Nonparametric test (Wilcoxon-Mann-Whiteney test)  Use ranks of data  When the number of replicates in each group is more than 5  Works for non-normal data  Alternative: Permutation test

17 Global Error Model  When there is no or few replicates  Some assumption on the variance is made

18 Multiple Group Tests (One way ANOVA)  Parametric test, assuming equal variance  Parametric test, not assuming equal variance  Nonparametric test (Kruskal-Wallis test)

19 Multiple Group Tests (Multi-way ANOVA)  Control for several factors  Example:, where log intensity of gene g on array i for dye j and condition k  No need to do certain normalizations  Equal variance is assumed

20 Test Comparison GeneSpringMAANOVASMA One Sample t-testYesNoIndirect Two sample t-test(equal variance)Yes Two sample t-test(unequal variance)YesNoYes Nonparametric testYesNo Global Error ModelYesNo One way ANOVAYes No Multiple group test—multi-way ANOVANoYesNo

21 Test Comparison GeneSpringMAANOVASMA Residual PlotNoYesNo NormalizationNeedNot needNeed Random effectNoYesNo PermutationNoYesNo Multiple test adjustmentBonferonni, FDR, Step- Down, Westfall and Young permutation FDRUse package: Multtest Average within-slide replicates AutomaticOptionalNo

22 Multiple Test Adjustment  Bonferroni: Adjusted pvalue=p-value*N  Step-down (Holm): controls the family-wise Type I error rate (FWER)  Westfall and Young permutation: controls the FWER with permutation. More time consuming.  FDR : controls the false discovery rate (FDR--- proportion of genes expected to be “significant” by chance relative to the proportion of identified genes.

23 References  Y. H. Yang, S. Dudoit et al.(2002) Normalization for cDNA Microarray Data. Nucleic Acids Research, Vol. 30, No. 4, e15.  Cui, Kerr and Churchill (2002), Data Transformation for cDNA Microarray Data. Submitted, find manuscript in www.jax.org/research/churchill.Data Transformation for cDNA Microarray Data

24 R  How to get help? help(t.test) ?t.test help->R language (html)  Why R? Free, convenient, flexible

25 Data and Software  Cattle experiment Two tissues: liver and spleen Dye swap Two replicates within slide  Software R: MAANOVA R: SMA GeneSpring

26 MAANOVA--Data Format  Combine the intensities of all arrays  One example of intensity file: metarowmetacolrowcolIDR1G1flag1R2… 1299AW28912224010320… 3399AW28912413030422… ………………………… Grid infoIntensities of array1 Intensities of array2 flag for array1

27 MAANOVA--Data Format (continued)  Design (Parameter) File For example: SampleID Condition 1 1 0 2 0 1 1 2 2 1 0 2 0 1 2 2

28 MAANOVA--Output  Residual plot  F-values, p-values, permutation p-values, adjusted p-values  IDs of differentially expressed genes  Volcano plot

29 SMA—Data Format  Import the intensities of each array separately  One example of intensity file IDF532B532F635B635 H3001A01875215297 H3001A01855114797 H3001A032855226391 ……………

30 SMA--Output  t-values, p-values, adjusted p-values  IDs of differentially expressed genes * SMA does not provide p-values directly


Download ppt "Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign."

Similar presentations


Ads by Google