Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.

Slides:



Advertisements
Similar presentations
Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.
Advertisements

Dahlia Nielsen North Carolina State University Bioinformatics Research Center.
Microarray Normalization
Mathematical Statistics, Centre for Mathematical Sciences
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Microarray Data Analysis Stuart M. Brown NYU School of Medicine.
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Getting the numbers comparable
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Microarray Data Preprocessing and Clustering Analysis
Differentially expressed genes
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Statistical Analysis of Microarray Data
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
09/05/2005 סמינריון במתמטיקה ביולוגית Dimension Reduction - PCA Principle Component Analysis.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Gene Expression Data Analyses (1) Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Analysis of microarray data
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Multiple testing in high- throughput biology Petter Mostad.
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Clustering of DNA Microarray Data Michael Slifker CIS 526.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
Lecture 22 Introduction to Microarray
CDNA Microarrays MB206.
Data Type 1: Microarrays
Panu Somervuo, March 19, cDNA microarrays.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
Agenda Introduction to microarrays
Microarray - Leukemia vs. normal GeneChip System.
ARK-Genomics: Centre for Comparative and Functional Genomics in Farm Animals Richard Talbot Roslin Institute and R(D)SVS University of Edinburgh Microarrays.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Extracting quantitative information from proteomic 2-D gels Lecture in the bioinformatics course ”Gene expression and cell models” April 20, 2005 John.
Statistics for Differential Expression Naomi Altman Oct. 06.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Statistical Analysis of Microarray Data By H. Bjørn Nielsen.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Microarray Data Analysis The Bioinformatics side of the bench.
Empirical Bayes Analysis of Variance Component Models for Microarray Data S. Feng, 1 R.Wolfinger, 2 T.Chu, 2 G.Gibson, 3 L.McGraw 4 1. Department of Statistics,
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Principal Components Analysis ( PCA)
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
基于 R/Bioconductor 进行生物芯片数据分析 曹宗富 博奥生物有限公司
Differential Gene Expression
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Getting the numbers comparable
Dimension reduction : PCA and Clustering
Data Type 1: Microarrays
Differential Expression of RNA-Seq Data
Presentation transcript:

Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005

The Central Dogma DNA Transcription RNA Translation Protein Source:

A gene is a sequence of nucleotides that codes for a protein All cells contain the same gene information in DNA, but only a few genes are expressed in certain cell The presence of mRNA in a cell indicates that a gene is active;

Microarray Technololgy

Microarray Examine how active the thousands of genes are at once Florescent-dye-labeled mRNA from different samples hybridize to the DNA on the array Intensity of florescent indicates the expression level of the gene in the sample

Steps in Microarray experiment Experimental Design Signal Extraction  Image Analysis  Normalization: remove the artifacts across arrays Data Analysis  Selection of Genes differentially expressed  Clustering and classification

Experimental Design For two-color cDNA experiment, only two sample mRNA can be hybridized on the one array Factors influencing choice of experimental design  Number of different samples  Aim of the experiment: which comparisons are of primary interest  Constraint of resources  Power of the experiment

Experimental Design Direct Comparison :  compare only two mRNA samples  Dye-swap is recommended to minimize the Reference Sample:  Compare several samples with reference  Indirect comparison between the samples Saturated Design  More than two MRNA samples  All comparison are of interest Loop Design  Used in time couse More complicated designs

Design used in Whitfield et al.(2003) Source: Whitfield, Cziko, Robinson, 2003, Gene Expression Profiles in the brain predict behavior in individual honey bees, Science, supplement materials

Gene expression measurements Gene expression data are noisy Source of errors  Microarray manufacturing  Preparation of mRNA from biological samples  Hybridization  Scanning  Imaging

Image Analysis Preprocess the raw scanned image Gridding, edge detection, segmentation, summarization of pixel intensities Output: foreground intensities (R, G), background intensities(Rb, Gb), “flagged” spots

Statistical Data Analysis of the data Objective: identifying as many genes that are differentially expressed across conditions as possible while keeping the probability of making false declarations of expression acceptably low

Software for statistical microarray analysis Generic statistical plat form  SAS  Splus  R  Matlab Specific packages for microarray data analysis  Maanova  Bioconductor ( limma,  Etc. etc.  Our own programs

Visualize data and check quality Look at original image Use MA plot(log fold change vs log intensity)  y-axis: M = log2 (R) - log2 (G)  x-axis: A = log2 (R) + log2 (G)

Raw image

MA plot

Normalization “to adjust micro array data for effects which arise from variation in the technology rather than from biological differences between RNA samples” (Smyth and Speed, 2003) “an iterative process of visualization, identification of likely artifacts and removal of artifacts when feasible” (Parmgiani et al. 2003) Two places Within-array normalization Across-array normalization Method: check MA plot, transform the data: loess transformation, lin-log transformation, etc.

Examples of Normalization

ANOVA (Analysis of Variance)Model Let y ijkg be the fluorescent intensity measured from Array i, Dye j, Variety k, and Gene g, on the appropriate scale (such as log). A typical analysis of variance (ANOVA) model is: y ijkg = µ + A i + D j + V k + G g + (AG) ig + (DG) jg + (VG) kg +  ijkg µ, A, D, V are “normalization” terms G are the overall gene effects AG’s are “spot” effects DG’s are gene-specific dye effects VG’s are the effects of interest. The capture the expression of genes specifically attributable to varieties.  is random error

Two stage ANOVA Global ANOVA model y ijkgr = µ + A i + D j + V k + G g + (AG) ig + (DG) jg + (VG) kg + ε ijkg However, fitting the global model is computationally prohibitive. In stead, breaking the model into two stages Two stage ANOVA  Fit the “normalization model” y ijkg = µ + A i + D j + V k + r ijkgr  Fit residuals on per gene basis r ijkr = G + (AG) i + (DG) j + (VG) k + ε ijk

Report significant genes: Multiple Test Adjustment P-values  P-value = if gene is not differentially expressed, the chance that we will observe more extreme case than what we observed. The smaller p-value, the more significant the result.  If we set the cutoff point at 0.05, and we test on 8000 genes, and assume that none of the gene is differentially expressed, we will expect to declare 400 genes are significant.  adjusted p-values Posterior probability False Discovery Rate (FDR)  FDR = E(#genes falsely declared diff. expr. / # genes decleared diff. expr.) Ranking the genes

Clustering After selecting the list of differentially expressed genes, we want to investigate the relationship between these genes Look at “profile” of gene expressions across the samples Cluster the selected genes into clusters, genes with similar profiles are clustered together  Kmeans  Hierarchical clustering

Example of Clustering from Whitfield et al 2003.

Principal Component Analysis Reduce the high dimension data into a small number of summary variables (principal components). Use correlation matrix 1 st component is the direction along which there is greatest variation in the data 2 nd component is orthogonal to 1 st component, which represent the greatest variation in data after controlling 1 st component Can be used to visually identify clusters or assist classifications. (for example, Whitfield 2003)

Example of PCA Source: Whitfield, Cziko, Robinson, 2003, Gene Expression Profiles in the brain predict behavior in individual honey bees, Science