Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.

Slides:



Advertisements
Similar presentations
Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.
Advertisements

ECS 289A Presentation Jimin Ding Problem & Motivation Two-component Model Estimation for Parameters in above model Define low and high level gene expression.
Pre-processing in DNA microarray experiments Sandrine Dudoit PH 296, Section 33 13/09/2001.
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Microarray Data Analysis Stuart M. Brown NYU School of Medicine.
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Getting the numbers comparable
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Preprocessing Methods for Two-Color Microarray Data
Microarray Data Preprocessing and Clustering Analysis
Differentially expressed genes
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Statistical Analysis of Microarray Data
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Gene Expression Data Analyses (2)
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Normalization of 2 color arrays Alex Sánchez. Dept. Estadística Universitat de Barcelona.
Making Sense of Complicated Microarray Data
Statistical Analysis of Microarray Data
Analysis of microarray data
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
DATA TRANSFORMATION and NORMALIZATION Lecture Topic 4.
CDNA Microarrays MB206.
Panu Somervuo, March 19, cDNA microarrays.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Analysis of Microarray Data Analysis of images Preprocessing of gene expression data Normalization of data –Subtraction of Background Noise –Global/local.
Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,
Microarray - Leukemia vs. normal GeneChip System.
Statistical analysis of a small set of time-ordered gene expression data using linear splines 生物晶片期中報告 組員 : 鐘英愷 劉彧郁 梁詩屏 陳泓君.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Statistical Principles of Experimental Design Chris Holmes Thanks to Dov Stekel.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Extracting quantitative information from proteomic 2-D gels Lecture in the bioinformatics course ”Gene expression and cell models” April 20, 2005 John.
Statistics for Differential Expression Naomi Altman Oct. 06.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006.
Example of a Functional Genomics Study Molecular Ecology ,
Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical analysis of a small set of time-ordered gene expression data using linear splines 生物晶片報告 組員 : 鐘英愷 劉彧郁 梁詩屏 陳泓君.
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
Functional Genomics in Evolutionary Research
Functional Genomics in Evolutionary Research
Normalization for cDNA Microarray Data
Presentation transcript:

Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009

2/25/2016BeeSpace Doctoral Forum2 Analytical Workflow Analysis of gene expression experiments: 1.Experimental design 2.Data preprocessing 3.Inference (detection of D.E. genes) 4.Classification 5.Validation of results

2/25/2016BeeSpace Doctoral Forum3 Experimental Design What is the plan? –Assign samples –Technical and biological sources of variation and confounding effects are considered –Microarray design Two-dye spotted microarrays: Reference and Loop

2/25/2016BeeSpace Doctoral Forum4 Experimental Design Reference Design Loop Design

2/25/2016BeeSpace Doctoral Forum5 Review of microarray… Spotted 2-dye microarrays: –mRNA samples are reverse-transcribed into complementary DNA (cDNA) and labeled with red (Cy5) and green (Cy3) fluorescent dyes –Samples then hybridized with arrayed probes/DNA sequences –Relative abundance of the spotted sequences can be measured (high intensity = high expression, low intensity = low expression) (Brazma et al., 2008)

2/25/2016BeeSpace Doctoral Forum6 Review of microarray… Two measurements per spot (Cy5, Cy3) –Ratio of intensities for each spot represents the relative abundance of corresponding DNA sequence Comparison of stages with reverse dye labeling Example:

2/25/2016BeeSpace Doctoral Forum7 Data Preprocessing Remove unreliable observations and systematic sources of variation –Feature extraction –Data filtering and transformation –Normalization –Collapsing

2/25/2016BeeSpace Doctoral Forum8 Data Preprocessing Feature extraction –Identify pixels on the image that are part of the feature –Identify nearby pixels that are used to calculate the background –Calculate numerical information: Signal and background means, medians, standard deviations, pixels –Identify flagged spots

2/25/2016BeeSpace Doctoral Forum9 Data Preprocessing Data Filtering –Remove flagged spots –Flag – indication of the quality of the feature spot Bad feature: pixel standard deviation is very high relative to the pixel mean Negative feature: signal of the feature is less than the signal of the background Dark feature: signal feature is very low Manually flagged feature: user-flagged feature

2/25/2016BeeSpace Doctoral Forum10 Data Preprocessing –Interpretation of flag values If value of flag is zero, feature is good Non-zero values suggest problems with the feature Common flag threshold values –-100, -75, -50

2/25/2016BeeSpace Doctoral Forum11 Data Preprocessing –Background subtraction Subtract the background signal from the feature intensity What if background is higher than feature? –Use lowest available signal-intensity measurement, which is typically 1 –If background > foreground, gene has little or no expression, and will be filtered out regardless –Intensity threshold criterion Pixel intensity 1 to 60,000 (saturated) Remove cDNAs with average expression intensity < 200 (on average)

2/25/2016BeeSpace Doctoral Forum12 Data Preprocessing Transformation of raw intensities –Logarithmic transformation makes the data exhibit a more Normal distribution –Log base 2 (log2) 2-fold up-regulated genes  log ratio of +1 2-fold down-regulated genes  log ratio of -1 Non-differentially expressed genes  log ratio of 0

2/25/2016BeeSpace Doctoral Forum13 Data Preprocessing Normalization –Allows for joint analysis of gene expression measurements from different dyes and microarrays –Used to minimize technical variation in gene expression profiles Green dye typically has higher intensity than red

2/25/2016BeeSpace Doctoral Forum14 Data Preprocessing Normalization –Global On entire dataset Assume red (Cy5) and green (Cy3) intensities are related by a constant Center of the distribution of log ratios is shifted to equal zero (Yang et al., 2002) (Kerr et al., 2002) OR

2/25/2016BeeSpace Doctoral Forum15 Data Preprocessing Normalization –Local On a physical subset of the dataset (apply to single microarray or by block) Remove systematic variation within and across microarrays –Sources of variation: dye, pin, microarray area effects Remove systematic noise on a more specific level (compared to global)

2/25/2016BeeSpace Doctoral Forum16 Data Preprocessing Normalization –Local LOESS/LOWESS –curve-fitting local dye normalization method where a local regression line is fitted to the ratio R/G (M) versus the average of the R and G intensities (A), and the data is re-centered LOWESS – linear function is used to fit the regression LOESS – quadratic function is used to fit the regression

2/25/2016BeeSpace Doctoral Forum17 Data Preprocessing Normalization –Loess

2/25/2016BeeSpace Doctoral Forum18 Data Preprocessing Collapsing –Platforms containing multiple spots of the same probe, replicate spots next to each other, etc, then these observations tend to be highly correlated –Solution? COLLAPSE THE DATA Average or median of the normalized fluorescence intensities from replicate spots One observation per gene becomes available for analysis

2/25/2016BeeSpace Doctoral Forum19 Inference Analysis of preprocessed measurements to identify D.E. genes –Model fitting –Hypothesis testing –Characterizing functions

2/25/2016BeeSpace Doctoral Forum20 Inference Model fitting –Fixed effect The levels of a factor have been predetermined for the experiment (dye, treatment) –Random effect The levels of a factor have been randomly selected from a population of possible levels (microarray, sample) –Covariate An independent factor that is not manipulated by the experimenter, but still affects the response

2/25/2016BeeSpace Doctoral Forum21 Inference –Mixed effects model: y ijkm is the gene expression measurement for dye i (i = red or green), treatment j, sample k, and microarray l μ is the overall mean response D is the effect of dye I T is the effect of treatment j S is the effect of sample k A is the effect of microarray l ε is the error term

2/25/2016BeeSpace Doctoral Forum22 Inference Hypothesis testing –T-test Experiments including two levels of a factor Test statistic compared to t-distribution, and p- value is generated –F-test Experiments including more than two levels of a factor, or more than two groups (ANOVA) Test statistic compared to F-distribution, and p- value is generated

2/25/2016BeeSpace Doctoral Forum23 Inference Hypothesis testing –P-value If less than threshold (confidence level), then measurement/gene is significant –< 0.05, 0.01, 0.001, Different thresholds lead to varying numbers of genes found to be important to the question at hand

2/25/2016BeeSpace Doctoral Forum24 Inference Hypothesis testing –Multiple test adjustment Microarray experiments contain hundreds or thousands of genes/probes, which increases the probability of the false positive rate Issue is overcome by multiple test adjustment techniques such as: –False Discovery Rate (FDR) –Bonferonni adjustment –Permutation tests

2/25/2016BeeSpace Doctoral Forum25 Inference Characterizing functions –Gene Ontology (GO) analysis Characterize molecular functions, biological processes or cellular components of differentially expressed genes –Interpretation of data from genomic standpoint

2/25/2016BeeSpace Doctoral Forum26 Classification Classification approaches used to identify distinctions between groups of observations (genes, samples, treatments) –Hierarchical clustering, K-nearest neighbors (KNN), etc.

2/25/2016BeeSpace Doctoral Forum27 Validation Additional verification of differentially expressed genes to clear false positives –Real-time PCR –Informal application – literature review of similar studies

2/25/2016BeeSpace Doctoral Forum28 Thank you!