Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.

Slides:



Advertisements
Similar presentations
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Advertisements

Microarray Data Analysis Statistical methods to detect differentially expressed genes.
Pre-processing in DNA microarray experiments Sandrine Dudoit PH 296, Section 33 13/09/2001.
Microarray Normalization
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Normalization of Microarray Data - how to do it! Henrik Bengtsson Terry Speed
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Microarray Data Preprocessing and Clustering Analysis
Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.
Normalization Class web site: Statistics for Microarrays.
Differentially expressed genes
Statistical Analysis of Microarray Data
Gene Expression Data Analyses (2)
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Normalization of 2 color arrays Alex Sánchez. Dept. Estadística Universitat de Barcelona.
 Goal A: Find groups of genes that have correlated expression profiles. These genes are believed to belong to the same biological process and/or are co-regulated.
Making Sense of Complicated Microarray Data
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
A robust neural networks approach for spatial and intensity-dependent normalization of cDNA microarray data A.L. Tarca, J.E.K. Cooke and J. MacKay Presented.
Statistics for Microarrays
Multiple Testing Procedures Examples and Software Implementation.
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Hypothesis Testing Statistics for Microarray Data Analysis – Lecture 3 supplement The Fields Institute for Research in Mathematical Sciences May 25, 2002.
Multiple Testing in the Survival Analysis of Microarray Data
Multiple testing in high- throughput biology Petter Mostad.
(2) Ratio statistics of gene expression levels and applications to microarray data analysis Bioinformatics, Vol. 18, no. 9, 2002 Yidong Chen, Vishnu Kamat,
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
Essential Statistics in Biology: Getting the Numbers Right
CDNA Microarrays MB206.
1 Use of the Half-Normal Probability Plot to Identify Significant Effects for Microarray Data C. F. Jeff Wu University of Michigan (joint work with G.
Panu Somervuo, March 19, cDNA microarrays.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003.
Analysis of Microarray Data Analysis of images Preprocessing of gene expression data Normalization of data –Subtraction of Background Noise –Global/local.
Differential Gene Expression Dennis Kostka, Christine Steinhoff Slides adapted from Rainer Spang.
Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.
Multiple Testing in Microarray Data Analysis Mi-Ok Kim.
Strong Control of the Familywise Type I Error Rate in DNA Microarray Analysis Using Exact Step-Down Permutation Tests Peter H. Westfall Texas Tech University.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
1 Pre-processing - Normalization Databases Statistics for Microarray Data Analysis – Lecture 2 The Fields Institute for Research in Mathematical Sciences.
Back to basics – Probability, Conditional Probability and Independence Probability of an outcome in an experiment is the proportion of times that.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
The Multiple Comparisons Problem in IES Impact Evaluations: Guidelines and Applications Peter Z. Schochet and John Deke June 2009, IES Research Conference.
Statistics for Differential Expression Naomi Altman Oct. 06.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Plate Effects in cDNA Microarray Data.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
(1) Normalization of cDNA microarray data Methods, Vol. 31, no. 4, December 2003 Gordon K. Smyth and Terry Speed.
For a specific gene x ij = i th measurement under condition j, i=1,…,6; j=1,2 Is a Specific Gene Differentially Expressed Differential expression.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
Lecture 2 – Pre-processing and Normalization José Luis Mosquera Computational Lab on Microarrays Data Analysis Special Topics in Computer Science Institute.
Differential Gene Expression
Normalization for cDNA Microarray Data
Presentation transcript:

Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002

Introduction to cDNA Microarray Experiment Single-slide Design – Two mRNA samples (red/green) on the same slide Multiple-slide Design – Two or more types of mRNA on different slides – Exclude: time-course experiment

Examples of Multiple-slide Design Apo AI – Treatment group: 8 mice with apo AI gene knocked out – Control group: 8 C57B1/6 mice – Cy5: each of 16 mice – Cy3: pooling cDNA from 8 control mice SR-BI – Treatment group: 8 SR-BI transgenic mice – Control group: 8 “normal” FVB mice Microarray Setup – 6384 spots, 4X4 grids with 19X21 spots in each

Single-slide Methods Two types – Based solely on intensity ratio R/G – Take into account overall transcript abundance measured by R*G Historical Review – Fold increase/decrease cut-offs ( ) – Probabilistic modeling based on distributional assumptions ( ) – Consider R*G ( ) e.g. Gamma-Gamma-Bernoulli

Summary of Single-slide Methods Producing a model dependent rule: drawing two curves in the (R,G) plane – Power (1-Type II error rate) – False positive rate (Type I error rate) Multiple testing Replication is needed because gene expression data are too noisy

Image Analysis “Raw” data: 16-bit TIFF files Addressing – Within a batch, important characteristics are similar Segmentation – Seeded region growing algorithm Background adjustment – Morphological opening (a nonlinear filter) Software package: Spot in R environment

Single-slide Data Display Plot log 2 R vs. log 2 G – variation less dependent on absolute magnitude – normalization is additive for logged intensities – evens out highly skewed distributions – a more realistic sense of variation Plot M=log 2 (R/G) vs. A=[log 2 (RG)]/2 – More revealing in terms of identifying spot artifacts and for normalization purpose

Normalization Identify and remove sources of systematic variation other than differential expression – Different labeling efficiencies and scanning properties for Cy3 and Cy5 – Different scanning parameters – Print-tip, spatial or plate effects Red intensity is often lower than green intensity The imbalance between R and G varies – across spots and between arrays – Overall spot intensity A – Location on the array, plate origin, etc.

An Example: Self-Self Experiment

Normalization (Cont.) Global normalization – subtract mean or median from all intensity log-ratios More complex normalization – Robust locally weighted regression M=spot intensity A+location+plate origin Use print-tip group to represent the spot locations log 2 (R/G)  log 2 (R/G) –l(A,j) l(A,j): lowess in R (0.2<f<0.4) Control sequences

Apo AI: Normalization

Graphical Display for Test Statistics (I) Test statistics – H j : no association between treatment and the expression level of gene j, j=1,…,m. – Two-sided alternative – Two-sample Welch t-statistics – Replication is essential to assess the variability in treatment and control group – The joint distribution is estimated by a permutation procedure because the actual distribution is not a t- distribution

Graphical Display for Test Statistics (II) Quantile-Quantile plots

Graphical Display for Test Statistics (III) Plots vs. absolute expression levels

Multiple Hypothesis Testing: Adjusted p-values (I) P-value: P j =Pr(|T j |>=|t j ||H j ), j=1,…,m. Family-wise Type I Error Rate (FWER) – The probability of at least one Type I error in the family Strong Control of the FWER – Control the FWER for any combination of true and false hypotheses Weak Control of the FWER – Control the FWER only under the complete null hypothesis that all hypotheses in the family are true

Multiple Hypothesis Testing: Adjusted p-values (II) Adjusted p-value for H j – P j =inf{a: H j is rejected at FWER=a} – H j is rejected at FWER a if P j <=a P-value adjustment approaches – Bonferroni – Sidak single-step – Holm step-down – Westfall and Young step-down minP

Multiple Hypothesis Testing: Estimation of adjusted p-values (I)

Multiple Hypothesis Testing: Estimation of adjusted p-values (II)

Apo AI: Adjusted p-values (I)

Apo AI: Adjusted p-values (II)

Apo AI: Comparison with Single- slide Methods

Discussion M-A plots Normalization – Robust local regression, e.g. lowess Q-Q plots & Plots vs. absolute expression level False discovery rate (FDR) Replication is necessary Design issues Factorial experiments Joint behavior of genes R package SMA