Some views on microarray experimental design Rainer Breitling Molecular Plant Science Group & Bioinformatics Research Centre University of Glasgow, Scotland,

Slides:



Advertisements
Similar presentations
Experiment Design for Affymetrix Microarray.
Advertisements

Optimal designs for one and two-colour microarrays using mixed models
Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.
Microarray Pitfalls Stem Cell Network Microarray Course, Unit 3 October 2006.
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
Mitochondrial Respiration. Respiration Glycolysis Glycolysis Citric acid cycle/kreb’s cycle Citric acid cycle/kreb’s cycle.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Getting the numbers comparable
Gene expression analysis summary Where are we now?
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Public data - available for projects 6 data sets: –Human Tissues –Leukemia –Spike-in –FARO compendium – Yeast Cell Cycle –Yeast Rosetta Find one yourself.
Copyright, ©, 2002, John Wiley & Sons, Inc.,Karp/CELL & MOLECULAR BIOLOGY 3E Transcriptional Control in Eukaryotes Background Information Microarrays.
Statistical Analysis of Microarray Data
Finding Transcription Modules from large gene-expression data sets Ned Wingreen – Molecular Biology Morten Kloster, Chao Tang – NEC Laboratories America.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In.
Designing Microarray Experiments Naomi Altman Oct. 06.
Testing Hypotheses.
Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
Agenda Introduction to microarrays
AEROBIC METABOLISM II: ELECTRON TRANSPORT CHAIN Khadijah Hanim Abdul Rahman School of Bioprocess Eng, UniMAP Week 15: 17/12/2012.
Evidence networks for the analysis of biological systems Rainer Breitling IBLS – Molecular Plant Science group Bioinformatics Research Centre University.
Scenario 6 Distinguishing different types of leukemia to target treatment.
Design of Experiments Problem formulation Setting up the experiment Analysis of data Panu Somervuo, March 20, 2007.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Statistical Principles of Experimental Design Chris Holmes Thanks to Dov Stekel.
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Statistics for Differential Expression Naomi Altman Oct. 06.
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Statistical Analysis of Microarray Data By H. Bjørn Nielsen.
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Analyzing Expression Data: Clustering and Stats Chapter 16.
1 Estimation of Gene-Specific Variance 2/17/2011 Copyright © 2011 Dan Nettleton.
Hybridization Design for 2-Channel Microarray Experiments Naomi S. Altman, Pennsylvania State University), NSF_RCN.
CGH Data BIOS Chromosome Re-arrangements.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.
Microarray: An Introduction
10.1 Estimating with Confidence Chapter 10 Introduction to Inference.
Nature as blueprint to design antibody factories Life Science Technologies Project course 2016 Aalto CHEM.
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu.
Fewer permutations, more accurate P-values Theo A. Knijnenburg 1,*, Lodewyk F. A. Wessels 2, Marcel J. T. Reinders 3 and Ilya Shmulevich 1 1Institute for.
Some statistical musings Naomi Altman Penn State 2015 Dagstuhl Workshop.
Estimation of Gene-Specific Variance
Biotechnology.
Microarray Technology and Applications
CHAPTER 18: Inference in Practice
Volume 23, Issue 11, Pages (June 2018)
Getting the numbers comparable
Electron Transport Chain
Chapter 24 Comparing Means Copyright © 2009 Pearson Education, Inc.
Mitochondrial Respiration
Figure 1 Oxidative phosphorylation
Presentation transcript:

Some views on microarray experimental design Rainer Breitling Molecular Plant Science Group & Bioinformatics Research Centre University of Glasgow, Scotland, UK

Personal Background University of Glasgow, Scotland, UK Molecular Plant Sciences Group Bioinformatics Research Centre Functional Genomics Facility

Some common questions in microarray experimental design How many arrays will I need? Should I pool my samples? Which arrays should I choose? Which samples should I put together on one array?

Why are microarrays special? produce large amounts of data instantaneously can look for unexpected effects are still quite expensive  almost never repeated  careful design necessary before you start

How many replicates? as many as possible Statistics says: The more replicates, the better your estimate of expression (that’s an asymptotic process, so if you add at least a few replicates, the effect will be really strong)

How many replicates? α significance level (probability of detecting FP) 1-β power to detect differences (probability of detecting TP) σ standard deviation of the log-ratios δ detectable difference between class mean log-ratios z percentile of standard normal distribution  n required number of arrays (reference design)

How many replicates? Five Experience shows: For most common experiments you get a reasonable list of differentially expressed genes with 5 replicates

How many replicates? Three One to convince yourself, one to convince your boss, one just in case...

How many replicates? It depends on –the quality of the sample –the magnitude of the expected effect –the experimental design –the method of analysis

The quality of the sample smaller samples (single cells) are more noisy than large samples (tissue homogenates) cell cultures are less noisy than patient biopsies sample pooling can decrease noise – if individual variation is not of interest

The magnitude of the effect Microarrays are very sensitive To keep effects small: –use early time points, gentle stimuli –never compare dogs and donuts if you get a list of 2000 genes that are significantly changed, your experiment failed!

The magnitude of the effect some problematic cases –stably transfected cell lines (are they still the same cells?) –knock-out organisms (even the same tissue can be a different) –local changes may be diluted  cell isolation will increase noise

The experimental design Three major options: –reference design (flexible) –balanced block design (efficient) –loop design (elegant)

The experimental design loop designs can save samples......but they can cause interpretation nightmares in less simple cases (use for large studies, if you have a full-time statistician in the team) AB CD A BCD R RRR

The method of analysis Golub et al. (1999) data set 38 leukemia patient bone marrow samples, hybridized individually to Affymetrix microarrays Differential expression between two leukemia types was examined, using random subsets of the complete dataset

The method of analysis 0h9.5h11.5h13.5h15.5h18.5h20.5h purine base metabolism tricarboxylic acid cycle heat shock protein activity tricarboxylic acid cycle cell wall (sensu Fungi) heat shock protein activity respiratory chain complex II (sensu Eukarya) tricarboxylic acid cycle heat shock protein activity spermine transporter activity response to stress oxidative phosphorylation, succinate to ubiquinone glycogen metabolism respiratory chain complex II (sensu Eukarya) polyamine transport spermine transporter activity succinate dehydrogenase (ubiquinone) activity response to stress oxidative phosphorylation, succinate to ubiquinone glycogen (starch) synthase activity heat shock protein activity glycogen (starch) synthase activity succinate dehydrogenase (ubiquinone) activity polyamine transport glycogen (starch) synthase activity cytochrome c oxidase activity glutamate biosynthesis fructose transporter activity vacuolar protein catabolism respiratory chain complex IV (sensu Eukarya) glyoxylate cycle mannose transporter activity response to stress respiratory chain complex II (sensu Eukarya) respiratory chain complex III (sensu Eukarya) vacuolar protein catabolism cytochrome c oxidase activity oxidative phosphorylation, succinate to ubiquinone aerobic respiration hexose transport respiratory chain complex IV (sensu Eukarya) succinate dehydrogenase (ubiquinone) activity cytochrome c oxidase activity iterative GroupAnalysis (iGA)

glyoxylate cycle citrate (TCA) cycle oxidative phosphorylation (complex V) respiratory chain complex III respiratory chain complex II Graph-based iterative GroupAnalysis (GiGA)

What is a good replicate? The experiment your competitor at the other side of the globe would do to see if your results are reproducible  Vary “all” parameters – challenge your results  Prepare new samples, from new cultures, using new buffers and new graduate students  Remember to produce matched controls

What is a “bad” replicate? technical replicates (i.e. hybridizing the same sample repeatedly) dye-swapping experiments (usually gene- specific dye bias is not a big issue, and dye balancing is more efficient anyway) pooled samples, hybridized repeatedly the same preparation, only labelled twice

Should samples be pooled? most samples are already pooled – they come from multiple cells pool to increase amount of mRNA, but only as much as necessary prepare independent pools to assess variation problems: bias, “contamination”, outliers, information loss...

Which arrays are the best? Standard arrays compare and exchange data easily Whole-genome arrays detect unexpected effects, increase confidence Single-color arrays (Affymetrix GeneChip) for more complex comparisons Annotated arrays

Further reading Dobbin, Shih & Simon (2003) J. Natl. Cancer Inst. 95: Yang & Speed (2002) Nature Rev. Genet. 3: 579. Breitling (2004)

Contact Rainer Breitling Bioinformatics Research Centre Davidson Building A416