Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.

Slides:



Advertisements
Similar presentations
Improved normalisation of microarray data by optimised iterative local regression Matthias E. Futschik Department of Information Science University of.
Advertisements

Lecture 9 Microarray experiments MA plots
Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.
Pre-processing in DNA microarray experiments Sandrine Dudoit PH 296, Section 33 13/09/2001.
Microarray Normalization
Normalization of microarray data
Normalization of Microarray Data - how to do it! Henrik Bengtsson Terry Speed
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Getting the numbers comparable
Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Preprocessing Methods for Two-Color Microarray Data
Microarray Data Preprocessing and Clustering Analysis
Gene Expression Data Analyses (3)
Differentially expressed genes
Statistical Analysis of Microarray Data
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Normalization of 2 color arrays Alex Sánchez. Dept. Estadística Universitat de Barcelona.
GCB/CIS 535 Microarray Topics John Tobias November 8th, 2004.
1 Test of significance for small samples Javier Cabrera.
Making Sense of Complicated Microarray Data
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
5-3 Inference on the Means of Two Populations, Variances Unknown
\department of mathematics and computer science Supervised microarray data analysis Mark van de Wiel.
Multiple Testing Procedures Examples and Software Implementation.
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
AM Recitation 2/10/11.
Inference for regression - Simple linear regression
Multiple testing in high- throughput biology Petter Mostad.
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
Essential Statistics in Biology: Getting the Numbers Right
DATA TRANSFORMATION and NORMALIZATION Lecture Topic 4.
1 Use of the Half-Normal Probability Plot to Identify Significant Effects for Microarray Data C. F. Jeff Wu University of Michigan (joint work with G.
Panu Somervuo, March 19, cDNA microarrays.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Differential Expression II Adding power by modeling all the genes Oct 06.
Differential Gene Expression Dennis Kostka, Christine Steinhoff Slides adapted from Rainer Spang.
Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
Statistical analysis of expression data: Normalization, differential expression and multiple testing Jelle Goeman.
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Techniques for Analysing Microarrays Which genes are involved in ovarian and prostate cancer?
Statistics for Differential Expression Naomi Altman Oct. 06.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Plate Effects in cDNA Microarray Data.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
For a specific gene x ij = i th measurement under condition j, i=1,…,6; j=1,2 Is a Specific Gene Differentially Expressed Differential expression.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
1 Estimation of Gene-Specific Variance 2/17/2011 Copyright © 2011 Dan Nettleton.
 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College
The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
1 Underlying population distribution is continuous. No other assumptions. Data need not be quantitative, but may be categorical or rank data. Very quick.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
Differential Gene Expression
Normalization for cDNA Microarray Data
Presentation transcript:

Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign

Websites  R Software:  MAANOVA package:  SMA package:

Data Normalization  Global lowess normalization (within slide): often useful.  Median Normalization (across the slides)

Global lowess Normalization Assume the changes are roughly symmetric for most genes. log 2 R/G -> log 2 R/G – c(A)= log 2 R/ (k(A)G), where c(A) is the lowess fit to M vs A plot.

Median Normalization (Optional)  Normalize the median log ratios of each gene across all slides to 0.  Formula of normalized data: where is the log ratio of gene g from slide i.

Transformations  Shift-log transformation (Kerr et al. 2002)  Curve Fitting Transformation (Yang et al b)  Variance Stabilizing Transformation: Linlog Transformation

Shift-log transformation (Newton et al. 2001)  Move the origin along the line by adding the same positive constant to both channels,, k indicates genes.  The major effect is at shrinking the variance of log ratios at the low intensity end.  By expanding the range of C to include negative values, we can increase the variance at the low intensity end.

Curve Fitting Transformation (Yang et al b)  Add one spot-specific constant to the signal values of one channel and subtract the same constant from signals in the other channel prior to the log transformation,.  Where C k is the spot-specific constant determined by the local regression line.

Linlog Transformation  Assume that additive error should be dominant at low intensity and multiplicative error should be dominant at high intensity,, i indicates channels (g/r).  In practice, people usually estimate d by the 25% quantile of the intensities.  The Linlog does not correct the curvatures in MA plots. We can combine the Linlog with either shift-log or lowess.

Normalization Comparison GeneSpringMAANOVASMA Data InputRaw dataAdjusted intensities Raw data Negative Measurements Set to 0 or 0.01Can not handleMissing Dye SwapYes No Intensity Dependent Normalization Yes Normalize to a Percentile(within)YesNoMedian Normalize to Positive Control GenesYesNo Normalize to a Constant Value YesNo

Normalization Comparison (continued) GeneSpringMAANOVASMA Divide by Specific SamplesYesNo Normalize to medianYesNo Median PolishingYesNo Print-tip Group LowessNoSort ofYes Scaled Print-tip Group Lowess No Yes Shift TransformationNoYesNo Linear-log TransformationNoYesNo Linear-log Shift TransformationNoYesNo

Statistical Analysis  One-sample t-Test  Two-sample t-Test  Nonparametric test (Wilcoxon-Mann-Whiteney test)  Global Error Model  Multiple Group comparisons (ANOVA)

One sample t-test  H 0: log ratio =0 versus H A: log ratio is not 0.  Reject H 0 if, where is the significance level.

Two sample t-test (equal variance)  H 0 : log ratios of two groups are equal  H A : log ratios of two groups are not equal  Exact p-value for normal data, even for small samples.

Two sample t-test (unequal variance)  T statistic:  Approximate with  Not exact p-value

Nonparametric test (Wilcoxon-Mann-Whiteney test)  Use ranks of data  When the number of replicates in each group is more than 5  Works for non-normal data  Alternative: Permutation test

Global Error Model  When there is no or few replicates  Some assumption on the variance is made

Multiple Group Tests (One way ANOVA)  Parametric test, assuming equal variance  Parametric test, not assuming equal variance  Nonparametric test (Kruskal-Wallis test)

Multiple Group Tests (Multi-way ANOVA)  Control for several factors  Example:, where log intensity of gene g on array i for dye j and condition k  No need to do certain normalizations  Equal variance is assumed

Test Comparison GeneSpringMAANOVASMA One Sample t-testYesNoIndirect Two sample t-test(equal variance)Yes Two sample t-test(unequal variance)YesNoYes Nonparametric testYesNo Global Error ModelYesNo One way ANOVAYes No Multiple group test—multi-way ANOVANoYesNo

Test Comparison GeneSpringMAANOVASMA Residual PlotNoYesNo NormalizationNeedNot needNeed Random effectNoYesNo PermutationNoYesNo Multiple test adjustmentBonferonni, FDR, Step- Down, Westfall and Young permutation FDRUse package: Multtest Average within-slide replicates AutomaticOptionalNo

Multiple Test Adjustment  Bonferroni: Adjusted pvalue=p-value*N  Step-down (Holm): controls the family-wise Type I error rate (FWER)  Westfall and Young permutation: controls the FWER with permutation. More time consuming.  FDR : controls the false discovery rate (FDR--- proportion of genes expected to be “significant” by chance relative to the proportion of identified genes.

References  Y. H. Yang, S. Dudoit et al.(2002) Normalization for cDNA Microarray Data. Nucleic Acids Research, Vol. 30, No. 4, e15.  Cui, Kerr and Churchill (2002), Data Transformation for cDNA Microarray Data. Submitted, find manuscript in Transformation for cDNA Microarray Data

R  How to get help? help(t.test) ?t.test help->R language (html)  Why R? Free, convenient, flexible

Data and Software  Cattle experiment Two tissues: liver and spleen Dye swap Two replicates within slide  Software R: MAANOVA R: SMA GeneSpring

MAANOVA--Data Format  Combine the intensities of all arrays  One example of intensity file: metarowmetacolrowcolIDR1G1flag1R2… 1299AW … 3399AW … ………………………… Grid infoIntensities of array1 Intensities of array2 flag for array1

MAANOVA--Data Format (continued)  Design (Parameter) File For example: SampleID Condition

MAANOVA--Output  Residual plot  F-values, p-values, permutation p-values, adjusted p-values  IDs of differentially expressed genes  Volcano plot

SMA—Data Format  Import the intensities of each array separately  One example of intensity file IDF532B532F635B635 H3001A H3001A H3001A ……………

SMA--Output  t-values, p-values, adjusted p-values  IDs of differentially expressed genes * SMA does not provide p-values directly