A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Analysis of (cDNA) Microarray.

Slides:



Advertisements
Similar presentations
Improved normalisation of microarray data by optimised iterative local regression Matthias E. Futschik Department of Information Science University of.
Advertisements

M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
ECS 289A Presentation Jimin Ding Problem & Motivation Two-component Model Estimation for Parameters in above model Define low and high level gene expression.
Pre-processing in DNA microarray experiments Sandrine Dudoit PH 296, Section 33 13/09/2001.
Microarray Normalization
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
Normalization of microarray data
Mathematical Statistics, Centre for Mathematical Sciences
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Image Quantitation in Microarray Analysis More tomorrow...
Getting the numbers comparable
Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Preprocessing Methods for Two-Color Microarray Data
More On Preprocessing Javier Cabrera. Outline 1.Transform the data into a scale suitable for analysis. 2.Remove the effects of systematic and obfuscating.
Normalization Class web site: Statistics for Microarrays.
Differentially expressed genes
Gene Expression Data Analyses (2)
Normalization of 2 color arrays Alex Sánchez. Dept. Estadística Universitat de Barcelona.
Making Sense of Complicated Microarray Data
A robust neural networks approach for spatial and intensity-dependent normalization of cDNA microarray data A.L. Tarca, J.E.K. Cooke and J. MacKay Presented.
Corrections and Normalization in microarrays data analysis
Analysis of microarray data
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Image Quantitation in Microarray Analysis More tomorrow...
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
DATA TRANSFORMATION and NORMALIZATION Lecture Topic 4.
CDNA Microarrays MB206.
Panu Somervuo, March 19, cDNA microarrays.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Fishing expeditions in gloomy waters: Detecting differential expression in microarray data Matthias E. Futschik Institute for Theoretical Biology Humboldt-University,
Microarray - Leukemia vs. normal GeneChip System.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
The Analysis of Microarray data using Mixed Models David Baird Peter Johnstone & Theresa Wilson AgResearch.
1 Pre-processing - Normalization Databases Statistics for Microarray Data Analysis – Lecture 2 The Fields Institute for Research in Mathematical Sciences.
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
Statistical Principles of Experimental Design Chris Holmes Thanks to Dov Stekel.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Statistics for Differential Expression Naomi Altman Oct. 06.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Plate Effects in cDNA Microarray Data.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.
(1) Normalization of cDNA microarray data Methods, Vol. 31, no. 4, December 2003 Gordon K. Smyth and Terry Speed.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb GP3xCLI GenePix Post-Processing.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College
The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Lecture 2 – Pre-processing and Normalization José Luis Mosquera Computational Lab on Microarrays Data Analysis Special Topics in Computer Science Institute.
DNA Microarray. Microarray Printing 96-well-plate (PCR Products) 384-well print-plate Microarray.
Normalization Methods for Two-Color Microarray Data
Getting the numbers comparable
Normalization for cDNA Microarray Data
Presentation transcript:

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray Data: Part I. Sources of Bias and Normalisation

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Data included in GEXEX a.Whole data stored and “securely” available b.GP3xCLI on each hybridisation 2.Relaxed data acquisition criteria a.Signal to Noise > 1.00 (relaxer (sp?) exist) b.Mean to Median > 0.85 (Tran et al. 2002) 3.Data Normalisation 4.Mixed-Model Equations a.Check Residuals (plot Residuals vs Predicted) b.Check REML estimates of Variance Components c.Proportion of Total Variance due to Gene x Variety 5.Process Gene x Treatment BLUPs  Differentially Expressed Genes a.t-statistics  Z-score  P-value b.Mixtures of Distributions  Posterior Probabilities MICROARRAY ANALYSIS 6.Process Differentially Expressed genes a.Hierarchical clustering b.Gene ontology analysis My (Educated?) View

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb BASIC PIECES FOR SIGNAL DETECTION Foreground RED and GREENR f G f Background RED and GREENR b G b Background-correctedREDR = R f – R b GREENG = G f – G b Log-transformedLog 2 (R) Log 2 (G) Difference: “Minus”M = Log 2 (R) – Log 2 (G) = Log 2 (R/G) Mean: “Average”A = 0.5 * ( Log 2 (R) + Log 2 (G) ) = 0.5 * Log 2 (R*G) MA-Plots …to come True Signals! MICROARRAY ANALYSIS

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb The Red/Green Intensities can be spatially biased Data Acquisition Criteria

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb The Red/Green Intensities can be intensity-biased MA-Plot Data Acquisition Criteria Values should scatter around zero

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Data Acquisition Criteria Background Correction: Why bother?

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Background Correction: Why bother? Data Acquisition Criteria

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb RED versus GREEN Data Acquisition Criteria Log-transformation: Why bother?

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb MA-Plots: All versus only valid signals Data Acquisition Criteria

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Data Acquisition Criteria Signal to Noise Ratio Mean to Median Correlation

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Data Normalisation Normalisation is an attempt to correct for systematic bias. Normalisation allows you to compare data from one array to another. Systematic Bias can be introduced into microarray experiments at all stages. Need to: –Avoid it (as much as possible) –Recognize it –Correct for it –Discard unrecoverable data In practice we do not always understand the data - inevitably some biology will be removed too (or at least not revealed).

TumorPool of Cell Lines Differential labeling efficiency of dyes Different amounts of starting material. Different amounts of RNA in each channel Differential efficiency of hybridization over slide surface. Differential efficiency of scanning in each channel. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Data Normalisation Armidale Animal Breeding Summer Course, UNE, Feb Source: Catherine Ball (Stanford)

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Systematic Bias Sources … Different labeling efficiencies or dye effects Scanner malfunction Differences in concentration of DNA on arrays (plate effects) Printing or tip problems Uneven hybridization Batch bias Experimenter issues …and Dealing with it Detect and recognize the effect  Note something odd Determine magnitude and effect on data  Try a few methods Identify source of bias  Think big! Eliminate or reduce contributing factors Correct data Discard uncorrectable data

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Systematic Bias Labeling Efficiencies Cause Bias One channel of a two- channel array has higher intensity than the other (usually GREEN). Most common source of recognizable bias. Solution: Most easy to addressed (eg. dye- swaps, balanced loops).

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Systematic Bias Scanning (operator?) Bias Mis-aligned lasers can cause big problems In this case, the two channels are slightly out of register Solution: fix the scanner and repeat

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Systematic Bias Printing (operator?) Bias Irregular shaped spots are often observed (printing error) Slides from the same printing batch cluster together Solution: Probably limited to better printing technique and image analysis, rather than normalization

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Systematic Bias Probe Bias Different concentrations of probes might produce patterns in arrays Biological role of probes can produce patterns in arrays These patterns can create a spatial bias that are not artificial, but biological

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Systematic Bias Probe Bias Probes arranged on the array based on biological function cause spatial bias Solution: avoid arranging reporters based on function, know your experimental design Coding regions Intergenic regions

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Systematic Bias Hybridisation (operator?) Bias Poor technique during hybridisation can cause a spatial bias Operator is one of the largest sources of systematic bias Experiments done by the same operator often cluster together more tightly than warranted by the biology Solution: Consistent methods, successful techniques

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb TechniqueChoicesAim (Real)Aim (Ideal) Transformation “To Near Normality” Log 2 Lin-Log Numerically tractable Gaussian Normalisation “Location” Location Parameter: 1. Mean 2. Median 3. Regression(s) (LOWESS) Account for systematic effects Gaussian Standardisation “Scale” Scale ParameterStabilise variance Gaussian Data Normalisation …and other beautifying techniques

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Data Normalisation Transformation …to near normality Solution: Explore the entire Box-Cox family of power transformations: Maximum at λ  0, hence use the log-transformation

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Data Normalisation Transformation …to near normality Raw Data …exponential-like Log2 Transformed …normal-like

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Data Normalisation Transformation …to near normality Lin-Log Transformation x = background corrected = Fg - Bg

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Data Normalisation Transformation …to near normality The Edwards’ transformation as well as the Lin-Log transformation are an attempt to use the entire data, not only those for which foreground is greater than background. The reasoning is that errors are linear and multiplicative for small and large signals, respectively. The search for and choice of  could be rather unconvincing (eg. Different for different array slides). Solution:Use Log 2 if Foreground > Background Otherwise, use a small arbitrary value (say 0), Or simply disregard. Alternatively: Use only Foreground and Log 2 it

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Location Normalisation Log 2 (R/G) – c = M - c GLOBAL:Mean:c = Mean of M’s Median:c = Median of M’s LOWESS:c = Weighted Regress of M on A  Assumption: Changes roughly symmetric around Mean or Median  Assumption: Changes roughly symmetric at all intensities LOCAL:LOWESS:c = c(i) = Weighted Regression of M on A within print-tip-group i Location Parameter LOWESS = Locally WEighted Regression and Smoothing Scatterplots

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Location Normalisation LOWESS = Locally WEighted Regression and Smoothing Scatterplots Source: G Rosa 2003.

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Location Normalisation LOWESS = Locally WEighted Regression and Smoothing Scatterplots Source: G Rosa SAS Code Genetic analysis of complex traits using SAS ISBN

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Location Normalisation LOWESS = Locally WEighted Regression and Smoothing Scatterplots Source: G Rosa Normalised Intensities

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Location Normalisation LOWESS = Locally WEighted Regression and Smoothing Scatterplots Source: G Rosa 2003.

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Location Normalisation Source: Yang et al 2002 None

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Location Normalisation Source: Yang et al 2002 After Global Median

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Location Normalisation Source: Yang et al 2002 Global Lowess

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Location Normalisation Print-in-Group Lowess Source: Yang et al 2002

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Location Normalisation Source: Yang et al 2002 After Print-in-Group Lowess

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Location Normalisation Additional Assumption (other than symmetry of changes): The proportion of genes that are Differentially Expressed (DE) is minimal Question: Which genes to use? Answer:Only the ones (housekeeping) that we know are not DE Comment:“Boutique” arrays become a nuisance

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Scale Normalisation (Standardisation) Log 2 (R/G) – c(i) a(i) Notes:1. The scaling a(i) is such that Var(M) = a(i) 2  2 2. The estimation requires an approximation (“robust”) to the geometric mean: where MAD is the Median Absolute Deviation. 3. It doesn’t get any more heuristic (funnier?) than this “Some scale adjustments may be required so that the relative expression levels from one particular experiment (slide) do not dominate the average relative expression levels across replicate experiments.” Yang et al 2002

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Data Normalisation …and other beautifying techniques Notes: 1.Except Log 2, everything else applies only to Ratios: M = log 2 (R/G) 2.Except Log 2, everything else applies only within slide 3.Everything is beautified to identify DE genes straight from MA-plot, either from a single slide or from a function of M’s across slides. 4.The uncertainty in measurements increases as intensity decreases 5.Measurements close to the detection limit are the most uncertain (cf. Sensitivity) 6.Fold-change measurements ignore these effects 7.We can calculate an intensity-dependent z-score that measures the ratio relative to the standard deviation in the data

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Data Normalisation …and other beautifying techniques Corrected Log 10 ( Ratio ) Mean ( Log 10 ( Intensity ) ) 2-fold Locally estimated standard deviation of positive ratios Z= 1 Z= -1 Locally estimated standard deviation of negative ratios Local Log 10 ( Ratio ) Z-Score Mean ( Log 10 ( Intensity ) ) Z= 5 Z= -5 Corrected Log 10 ( Ratio ) Mean ( Log 10 ( Intensity ) ) 2-fold Z= 2 Z= 1 Z= -1 Z= -2 Z= 5 Z= -5 Z > 2 is at the ~ 95% confidence level Source: J Pevsner 2004

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Bilban M, Buehler LK, Head S, Desoye G, Quaranta V. Normalizing DNA microarray data. Curr Issues Mol Biol Apr;4(2): Durbin BP, Hardin JS, Hawkins DM, Rocke DM. A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics Jul;18 Suppl 1:S Kepler TB, Crosby L, Morgan KT. Normalization and analysis of DNA microarray data by self-consistency and local regression. Genome Biol Jun 28;3(7):RESEARCH0037. Schuchhardt, J., D. Beule, et al. Normalization Strategies for cDNA Microarrays. NAR (10): E47-e47. Tran PH, Peiffer DA, Shin Y, Meek LM, Brody JP, Cho KW. Microarray optimizations: increasing spot accuracy and automated identification of true microarray signals. Nucleic Acids Res Jun 15;30(12):e54. Tseng GC, Oh MK, Rohlin L, Liao JC, Wong WH. Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res Jun 15;29(12): Tsodikov A, Szabo A, Jones D. Adjustments and measures of differential expression for microarray data. Bioinformatics Feb;18(2): Yang MC, Ruan QG, Yang JJ, Eckenrode S, Wu S, McIndoe RA, She JX. A statistical method for flagging weak spots improves normalization and ratio estimates in microarrays. Physiol Genomics Oct 10;7(1): Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res Feb 15;30(4):e15.Curr Issues Mol Biol Apr;4(2):57-64.Bioinformatics Jul;18 Suppl 1:S Genome Biol Jun 28;3(7):RESEARCH0037.NAR (10): E47-e47.Nucleic Acids Res Jun 15;30(12):e54.Nucleic Acids Res Jun 15;29(12): Bioinformatics Feb;18(2): Physiol Genomics Oct 10;7(1):45-53.Nucleic Acids Res Feb 15;30(4):e15. Normalisation: References