Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Analysis of (cDNA) Microarray.

Similar presentations


Presentation on theme: "A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Analysis of (cDNA) Microarray."— Presentation transcript:

1 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Analysis of (cDNA) Microarray Data: Part I. Sources of Bias and Normalisation

2 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 1.Data included in GEXEX a.Whole data stored and “securely” available b.GP3xCLI on each hybridisation 2.Relaxed data acquisition criteria a.Signal to Noise > 1.00 (relaxer (sp?) exist) b.Mean to Median > 0.85 (Tran et al. 2002) 3.Data Normalisation 4.Mixed-Model Equations a.Check Residuals (plot Residuals vs Predicted) b.Check REML estimates of Variance Components c.Proportion of Total Variance due to Gene x Variety 5.Process Gene x Treatment BLUPs  Differentially Expressed Genes a.t-statistics  Z-score  P-value b.Mixtures of Distributions  Posterior Probabilities MICROARRAY ANALYSIS 6.Process Differentially Expressed genes a.Hierarchical clustering b.Gene ontology analysis My (Educated?) View

3 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 BASIC PIECES FOR SIGNAL DETECTION Foreground RED and GREENR f G f Background RED and GREENR b G b Background-correctedREDR = R f – R b GREENG = G f – G b Log-transformedLog 2 (R) Log 2 (G) Difference: “Minus”M = Log 2 (R) – Log 2 (G) = Log 2 (R/G) Mean: “Average”A = 0.5 * ( Log 2 (R) + Log 2 (G) ) = 0.5 * Log 2 (R*G) MA-Plots …to come True Signals! MICROARRAY ANALYSIS

4 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 The Red/Green Intensities can be spatially biased Data Acquisition Criteria

5 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 The Red/Green Intensities can be intensity-biased MA-Plot Data Acquisition Criteria Values should scatter around zero

6 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Data Acquisition Criteria Background Correction: Why bother?

7 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Background Correction: Why bother? Data Acquisition Criteria

8 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 RED versus GREEN Data Acquisition Criteria Log-transformation: Why bother?

9 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 MA-Plots: All versus only valid signals Data Acquisition Criteria

10 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Data Acquisition Criteria Signal to Noise Ratio Mean to Median Correlation

11 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Data Normalisation http://genome-www5.stanford.edu/mged/normalization.html Normalisation is an attempt to correct for systematic bias. Normalisation allows you to compare data from one array to another. Systematic Bias can be introduced into microarray experiments at all stages. Need to: –Avoid it (as much as possible) –Recognize it –Correct for it –Discard unrecoverable data In practice we do not always understand the data - inevitably some biology will be removed too (or at least not revealed).

12 TumorPool of Cell Lines Differential labeling efficiency of dyes Different amounts of starting material. Different amounts of RNA in each channel Differential efficiency of hybridization over slide surface. Differential efficiency of scanning in each channel. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Data Normalisation Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Source: Catherine Ball (Stanford)

13 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Systematic Bias Sources … Different labeling efficiencies or dye effects Scanner malfunction Differences in concentration of DNA on arrays (plate effects) Printing or tip problems Uneven hybridization Batch bias Experimenter issues …and Dealing with it Detect and recognize the effect  Note something odd Determine magnitude and effect on data  Try a few methods Identify source of bias  Think big! Eliminate or reduce contributing factors Correct data Discard uncorrectable data

14 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Systematic Bias Labeling Efficiencies Cause Bias One channel of a two- channel array has higher intensity than the other (usually GREEN). Most common source of recognizable bias. Solution: Most easy to addressed (eg. dye- swaps, balanced loops).

15 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Systematic Bias Scanning (operator?) Bias Mis-aligned lasers can cause big problems In this case, the two channels are slightly out of register Solution: fix the scanner and repeat

16 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Systematic Bias Printing (operator?) Bias Irregular shaped spots are often observed (printing error) Slides from the same printing batch cluster together Solution: Probably limited to better printing technique and image analysis, rather than normalization

17 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Systematic Bias Probe Bias Different concentrations of probes might produce patterns in arrays Biological role of probes can produce patterns in arrays These patterns can create a spatial bias that are not artificial, but biological

18 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Systematic Bias Probe Bias Probes arranged on the array based on biological function cause spatial bias Solution: avoid arranging reporters based on function, know your experimental design Coding regions Intergenic regions

19 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Systematic Bias Hybridisation (operator?) Bias Poor technique during hybridisation can cause a spatial bias Operator is one of the largest sources of systematic bias Experiments done by the same operator often cluster together more tightly than warranted by the biology Solution: Consistent methods, successful techniques

20 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 TechniqueChoicesAim (Real)Aim (Ideal) Transformation “To Near Normality” Log 2 Lin-Log Numerically tractable Gaussian Normalisation “Location” Location Parameter: 1. Mean 2. Median 3. Regression(s) (LOWESS) Account for systematic effects Gaussian Standardisation “Scale” Scale ParameterStabilise variance Gaussian Data Normalisation …and other beautifying techniques

21 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Data Normalisation Transformation …to near normality Solution: Explore the entire Box-Cox family of power transformations: Maximum at λ  0, hence use the log-transformation

22 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Data Normalisation Transformation …to near normality Raw Data …exponential-like Log2 Transformed …normal-like

23 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Data Normalisation Transformation …to near normality Lin-Log Transformation x = background corrected = Fg - Bg

24 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Data Normalisation Transformation …to near normality The Edwards’ transformation as well as the Lin-Log transformation are an attempt to use the entire data, not only those for which foreground is greater than background. The reasoning is that errors are linear and multiplicative for small and large signals, respectively. The search for and choice of  could be rather unconvincing (eg. Different for different array slides). Solution:Use Log 2 if Foreground > Background Otherwise, use a small arbitrary value (say 0), Or simply disregard. Alternatively: Use only Foreground and Log 2 it

25 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Location Normalisation Log 2 (R/G) – c = M - c GLOBAL:Mean:c = Mean of M’s Median:c = Median of M’s LOWESS:c = Weighted Regress of M on A  Assumption: Changes roughly symmetric around Mean or Median  Assumption: Changes roughly symmetric at all intensities LOCAL:LOWESS:c = c(i) = Weighted Regression of M on A within print-tip-group i Location Parameter LOWESS = Locally WEighted Regression and Smoothing Scatterplots

26 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Location Normalisation LOWESS = Locally WEighted Regression and Smoothing Scatterplots Source: G Rosa 2003.

27 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Location Normalisation LOWESS = Locally WEighted Regression and Smoothing Scatterplots Source: G Rosa 2003. SAS Code Genetic analysis of complex traits using SAS ISBN 1-59047-507-0

28 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Location Normalisation LOWESS = Locally WEighted Regression and Smoothing Scatterplots Source: G Rosa 2003. Normalised Intensities

29 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Location Normalisation LOWESS = Locally WEighted Regression and Smoothing Scatterplots Source: G Rosa 2003.

30 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Location Normalisation Source: Yang et al 2002 None

31 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Location Normalisation Source: Yang et al 2002 After Global Median

32 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Location Normalisation Source: Yang et al 2002 Global Lowess

33 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Location Normalisation Print-in-Group Lowess Source: Yang et al 2002

34 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Location Normalisation Source: Yang et al 2002 After Print-in-Group Lowess

35 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Location Normalisation Additional Assumption (other than symmetry of changes): The proportion of genes that are Differentially Expressed (DE) is minimal Question: Which genes to use? Answer:Only the ones (housekeeping) that we know are not DE Comment:“Boutique” arrays become a nuisance

36 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Scale Normalisation (Standardisation) Log 2 (R/G) – c(i) a(i) Notes:1. The scaling a(i) is such that Var(M) = a(i) 2  2 2. The estimation requires an approximation (“robust”) to the geometric mean: where MAD is the Median Absolute Deviation. 3. It doesn’t get any more heuristic (funnier?) than this “Some scale adjustments may be required so that the relative expression levels from one particular experiment (slide) do not dominate the average relative expression levels across replicate experiments.” Yang et al 2002

37 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Data Normalisation …and other beautifying techniques Notes: 1.Except Log 2, everything else applies only to Ratios: M = log 2 (R/G) 2.Except Log 2, everything else applies only within slide 3.Everything is beautified to identify DE genes straight from MA-plot, either from a single slide or from a function of M’s across slides. 4.The uncertainty in measurements increases as intensity decreases 5.Measurements close to the detection limit are the most uncertain (cf. Sensitivity) 6.Fold-change measurements ignore these effects 7.We can calculate an intensity-dependent z-score that measures the ratio relative to the standard deviation in the data

38 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Data Normalisation …and other beautifying techniques Corrected Log 10 ( Ratio ) Mean ( Log 10 ( Intensity ) ) 2-fold Locally estimated standard deviation of positive ratios Z= 1 Z= -1 Locally estimated standard deviation of negative ratios Local Log 10 ( Ratio ) Z-Score Mean ( Log 10 ( Intensity ) ) Z= 5 Z= -5 Corrected Log 10 ( Ratio ) Mean ( Log 10 ( Intensity ) ) 2-fold Z= 2 Z= 1 Z= -1 Z= -2 Z= 5 Z= -5 Z > 2 is at the ~ 95% confidence level Source: J Pevsner 2004

39 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Bilban M, Buehler LK, Head S, Desoye G, Quaranta V. Normalizing DNA microarray data. Curr Issues Mol Biol. 2002 Apr;4(2):57-64. Durbin BP, Hardin JS, Hawkins DM, Rocke DM. A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics. 2002 Jul;18 Suppl 1:S105-10. Kepler TB, Crosby L, Morgan KT. Normalization and analysis of DNA microarray data by self-consistency and local regression. Genome Biol. 2002 Jun 28;3(7):RESEARCH0037. Schuchhardt, J., D. Beule, et al. Normalization Strategies for cDNA Microarrays. NAR 2000 28(10): E47-e47. Tran PH, Peiffer DA, Shin Y, Meek LM, Brody JP, Cho KW. Microarray optimizations: increasing spot accuracy and automated identification of true microarray signals. Nucleic Acids Res. 2002 Jun 15;30(12):e54. Tseng GC, Oh MK, Rohlin L, Liao JC, Wong WH. Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res. 2001 Jun 15;29(12):2549-57. Tsodikov A, Szabo A, Jones D. Adjustments and measures of differential expression for microarray data. Bioinformatics. 2002 Feb;18(2):251-60. Yang MC, Ruan QG, Yang JJ, Eckenrode S, Wu S, McIndoe RA, She JX. A statistical method for flagging weak spots improves normalization and ratio estimates in microarrays. Physiol Genomics. 2001 Oct 10;7(1):45-53. Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002 Feb 15;30(4):e15.Curr Issues Mol Biol. 2002 Apr;4(2):57-64.Bioinformatics. 2002 Jul;18 Suppl 1:S105-10.Genome Biol. 2002 Jun 28;3(7):RESEARCH0037.NAR 2000 28(10): E47-e47.Nucleic Acids Res. 2002 Jun 15;30(12):e54.Nucleic Acids Res. 2001 Jun 15;29(12):2549-57.Bioinformatics. 2002 Feb;18(2):251-60.Physiol Genomics. 2001 Oct 10;7(1):45-53.Nucleic Acids Res. 2002 Feb 15;30(4):e15. Normalisation: References


Download ppt "A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 2006 Analysis of (cDNA) Microarray."

Similar presentations


Ads by Google