\department of mathematics and computer science Supervised microarray data analysis Mark van de Wiel.

Slides:



Advertisements
Similar presentations
Linear Models for Microarray Data
Advertisements

Relating Gene Expression to a Phenotype and External Biological Information Richard Simon, D.Sc. Chief, Biometric Research Branch, NCI
Experiments and Variables
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
High-dimensional data analysis: Microarrays and multiple testing Mark van de Wiel 1,2 1. Dep. of Mathematics, VU University Amsterdam 2. Dep. of Biostatistics.
Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.
Neuroinformatics 1: review of statistics Kenneth D. Harris UCL, 28/1/15.
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
OHRI Bioinformatics Introduction to the Significance Analysis of Microarrays application Stem.
Getting the numbers comparable
Microarray Data Preprocessing and Clustering Analysis
Gene Expression Data Analyses (3)
Differentially expressed genes
Statistical Analysis of Microarray Data
. Differentially Expressed Genes, Class Discovery & Classification.
Normalization of 2 color arrays Alex Sánchez. Dept. Estadística Universitat de Barcelona.
GCB/CIS 535 Microarray Topics John Tobias November 8th, 2004.
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
A robust neural networks approach for spatial and intensity-dependent normalization of cDNA microarray data A.L. Tarca, J.E.K. Cooke and J. MacKay Presented.
Statistics for Microarrays
Guidelines on Statistical Analysis and Reporting of DNA Microarray Studies of Clinical Outcome Richard Simon, D.Sc. Chief, Biometric Research Branch National.
Chapter 14 Inferential Data Analysis
Multiple Testing Procedures Examples and Software Implementation.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Multiple testing in high- throughput biology Petter Mostad.
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
CDNA Microarrays MB206.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
The Broad Institute of MIT and Harvard Classification / Prediction.
1 Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting Authors: A. Dupuy and R.M. Simon.
Controlling FDR in Second Stage Analysis Catherine Tuglus Work with Mark van der Laan UC Berkeley Biostatistics.
ANOVA and Linear Regression ScWk 242 – Week 13 Slides.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
Classification of microarray samples Tim Beißbarth Mini-Group Meeting
Statistical Principles of Experimental Design Chris Holmes Thanks to Dov Stekel.
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Techniques for Analysing Microarrays Which genes are involved in ovarian and prostate cancer?
Statistics for Differential Expression Naomi Altman Oct. 06.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
1 Estimation of Gene-Specific Variance 2/17/2011 Copyright © 2011 Dan Nettleton.
Hybridization Design for 2-Channel Microarray Experiments Naomi S. Altman, Pennsylvania State University), NSF_RCN.
Empirical Bayes Analysis of Variance Component Models for Microarray Data S. Feng, 1 R.Wolfinger, 2 T.Chu, 2 G.Gibson, 3 L.McGraw 4 1. Department of Statistics,
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 9 Review.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Other uses of DNA microarrays
AN INTRODUCTION TO GENE EXPRESSION ANALYSIS BY MICROARRAY TECHNIQUE (PART II) DR. AYAT B. AL-GHAFARI MONDAY 10 TH OF MUHARAM 1436.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Estimation of Gene-Specific Variance
Significance Analysis of Microarrays (SAM)
Significance Analysis of Microarrays (SAM)
Statistical Analysis and Design of Experiments for Large Data Sets
Normalization for cDNA Microarray Data
Presentation transcript:

\department of mathematics and computer science Supervised microarray data analysis Mark van de Wiel

\department of mathematics and computer science Quality control Protocols Perform a small scale, well-controlled experiment to assess influence of experimental factors (Microarrays from different batches, printing tips, dyes, linearity of the scanner, etc.) Continuous factors (temperature, humidity, spotsize over time, intensity of control spot over time) can be monitored with standard control chart techniques.

\department of mathematics and computer science Design of the experiment Think very, very well what the biological goals are. What software do you have at your disposal to analyse the data? Do we need reference or not? ‘Biological design’: what tissues to combine on an array (cDNA)? More than one biological factor: factorial design Dye-bias: dye-swap. Design on the array (negative/positive controls, repeats?, how many genes? Pilot study first, distributing the repeats over experimental factors (spatial, printing tips, etc.)) Save some space on the (cDNA) microarray for assessing variability due to experimental factors (e.g. print same control gene with several printing tips)

\department of mathematics and computer science Analysis: Multiple testing (after normalization) Objective: control the number of falsely selected genes FWE: Family wise error rate Weak FWE control: P(falsely select gene i, i=1,..., | no gene truly expressed)   Strong FWE control: P(falsely select gene i, i=1,..., | some genes expressed, some genes not expressed)   FDR: False Discovery Rate F: Expected number of false rejections when no genes are expressed, T: Total number of rejections FDR control: F/T  

\department of mathematics and computer science Multiple testing: FWE vs FDR Control of FDR implies weak control of FWE Advantage strong control of the FWE: significance level  under all situations controlled Disadvantage: less power than FDR control FWE based procedures tend to select less genes than FDR based procedure Software: Bioconductor: Step-down Westfall-Young (Dudoit et al.), control FDR and FWE. SAM (permutation based ‘control’ of FDR)

\department of mathematics and computer science SAM Developed at Stanford, Tibshirani et al. (Paper: Tusher et al, PNAS 98, ) Claim is FDR-control Plus: 1.Ease of use, add-in to Excel 2.Allows asymmetric cut-offs Minus: 1.Distribution under the null-hypotheses (‘no expression’) needs to be the same for all genes to guarantee FDR control 2.Combination with k-fold rule: no control of FDR anymore Solutions: Use (normal) rank scores and a simple rank statistic Explicitly test on k-fold expression; combine with FDR criterion

\department of mathematics and computer science Modelling vs Normalisation + Testing Modelling forces you to state what the assumptions are (linearity, normality, independence, etc.) Normalisation steps may not be commutative Non-linearities can be dealt with by normalisation methods Advanced modelling requires help of statistician/bio-informatician Standard approach to modelling: ANOVA. Model has two levels: 1.Normalisation level which includes linear corrections for dye and microarray effects 2.Gene expression level which includes effects on gene level, including interactions (interaction of interest is usually gene*variety)

\department of mathematics and computer science Software Freeware: SAM, Bioconductor Specialized commercial software: Spotfire, Genespring, Genesight, Rosetta Most contain: normalisation, variance stabilizing transformations, ANOVA, testing (most do not yet include the advanced multiple testing criteria) Statistical software: SAS, S-Plus, SPSS Much more debugged, long history, better documentation (Often very unclear what the specialized packages really do.) Advantages specialized software: user-friendly, visualisation (nice pictures), link with data bases, annotation Try several!!!

\department of mathematics and computer science Bayesian models +Natural translation to networks (pathways) +Complex models (linearity is not necessary, interactions) +Prior biological knowledge can be included +Nesting of the models (image analysis + normalisation + gene expression) +Inference for complex functions of gene expression data is relatively easy -No ‘easy’ software -Computational methods may take time to find reliable estimates Example Network

\department of mathematics and computer science Validation Cross-validation: leave some data out and see how well the data values are predicted by the model (Note that for normalisation procedures it may be harder to predict the data from the normalized data) Biological validation (spikes: known concentrations) Very useful for validating the normalisation procedure or the model: 1.Pretend that spikes with equal concentrations that are used under different conditions (different dyes, microarray batch)are different quantities. 2.Estimate ratio of two estimates after normalisation or modelling 3.Ratio should approximately be equal to 1.

\department of mathematics and computer science Comparison and meta analysis Objective comparisons between methods very much needed! Simulations may help (because we know the truth then). Setting up realistic simulations may be hard! Competition between several methods (CAMDA ’03: Lung cancer) Future goals: Methods that allow for combining data from several experiments. From relative quantities to absolute quantities. Absolute quantities allow for direct comparison between labs. (otherwise, only if labs have used same reference material etc.)

\department of mathematics and computer science Useful overview papers, books Design: Churchill, G.A. (2002) Fundamental of experimental design for cDNA microarrays. Nature Genet.32 ( ) Analysis: Slonim, D.K. (2002) From patterns to pathways: gene expression data analysis comes of age Nature Genet.32 ( ) Normalisation: Quackenbush, J. (2002) Microarray normalisation and transformation Nature Genet.32 ( ) Pitfalls: Richard Simon et al. (2003) Pitfalls in the Use of DNA Microarray Data for Diagnostic and Prognostic Classification J Natl Cancer Inst; 95: Books: Baldi & Hatfield (2002), DNA Microarrays and Gene expression, Cambridge University Press Speed, T. (2003) Statistical Analysis of Gene Expression Microarray Data Chapman & Hall Acknowledgement: Nicola Armstrong (EURANDOM)