Presentation is loading. Please wait.

Presentation is loading. Please wait.

Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers.

Similar presentations


Presentation on theme: "Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers."— Presentation transcript:

1 Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

2 Quality Assessment Are there any factors that would lead you to doubt or distrust a particular datum (array) ? Quality of inputs – e.g. RNA quality Statistical QA – evidence of systematic variation different from others

3 BioAnalyzer Ideal: Two sharp peaks for 18S & 28S RNA

4 Spot QA for cDNA Spotted Arrays Spot Measures –Signal/Noise Foreground / background or –foreground / SD –Uniformity –Spot Area Global Measures –Qualitative assessments –Averages of spot measures Inspect images for artifacts –Streaks of dye, scratches etc. Are there biases in regions? With commercial arrays we assume these issues are under control

5 Statistical Approaches Question: Are any samples different from others on technical grounds? Exploratory Data Analysis (EDA) Boxplots, clustering, PCA –Are there any outliers? –Are there associations with technical factors? Technician; date of sample prep; etc.

6 EDA - Boxplots Boxplot of 16 chips from Cheung et al Nature 2005

7 Another Portrait - Densities

8 Probe Intensities in 23 Replicates

9 Some Causes of Technical Variation Temperature of hybridization differs Amount of RNA differs RNA degraded in some samples Yield of conversion to cDNA or cRNA differs Strength of ionic buffers differs Stringency of wash differs Scratches on some chips Ozone (affects Cy5) at some times

10 Borrow an Idea from Model Testing Question: Is the model adequate? Or do hidden factors cause systematic errors? Examine residuals after fitting model –Should be IID Normal –Is there structure in residuals? –Plot against known technical covariates, such as order of sample How to adapt residual examination for high-throughput assays?

11 Statistical QA for Arrays Model for signal of probe i on chip j: y ij ~  i +  ij –Each gene has same mean in all arrays (mostly true) –Look at residuals after fitting model New twist for high-throughput assays: –Examine residuals within each chip (fix j; vary i) –Plot against known technical factors of probes –Is there any factor that seems to be predicting systematic errors?

12 Statistical QA of Arrays Significant artifacts may not be obvious from visual inspection or bulk statistics General approach: plot deviations from average or residuals from fit against any technical variable: –Average Intensity across chips –CG content or T m –Probe position relative to 3’ end of gene (for poly-T primed RNA) –Physical location on chip

13 Ratio vs Intensity Plots: Saturation & Quenching Saturation –Decreasing rate of binding of RNA at higher occupancies on probe Quenching: –Light emitted by one dye molecule may be re-absorbed by a nearby dye molecule –Then lost as heat –Effect proportional to square of density Plot of log ratio against average log intensity across chips GSM25377 from the CEPH expression data GSE2552

14 How Much Variability on R-I? Ratio-Intensity plots for six arrays at random from Cheung et al Nature (2005)

15 Covariation with Probe T m MAQC project Agilent 44K –Array 1C3 –Performed by Agilent Plot of log ratios to average against Tm Bimodal distribution because two samples are very different

16 Covariation with Probe Position RNA degrades from 5’ end Intensity should decrease from 3’ end uniformly across chips affyRNAdeg plots in affy package Plot of average intensity for each probe position across all genes against probe position

17 Effect of Runs of Guanines 4 G’s allows quadruplex structure

18 Spatial Variation Across Chips Red/Green ratios show variation -probably concentrated Ratios of ratios on slide to ratios on standard show consistent biases

19 In House Spotted Arrays Ratio of ratios shows much clearer concentration of red spots on some slides Note non-random but highly irregular concentration of red Legend

20 Bioconductor arrayQuality Package

21 Background Subtraction (1) We think that local background contributes to bias Does subtracting background remove bias? Local off-spot background may not be the best estimate of spot background (non- specific hyb) Spots BG subtracted

22 Background Subtraction (2) Raw spot ratios show a mild bias relative to average After subtracting a high green bg in the center a red bias results Raw Ratios Background BG-subtracted

23 Other Bias Patterns This spotted oligo array shows strong biases at the beginning and end of each print-tip group The background shows a milder version of this effect Subtracting background compensates for about half this effect Processed Raw Spot Background

24 Local Bias on Affymetrix Chips Image of raw data on a log2 scale shows striations but no obvious artifacts Image of ratios of probes to standard shows a smudge Non- coding probes Images show high values as red, low values as yellow

25 Spatial Artifacts on Affy Chips Bubbles (yellow) in hybridization chamber Touching cover slip and wiping incompletely Scratches on cover slip

26 QC in Bioconductor Robust Multi-chip Analysis (RMA) –fits a linear model to each probe set –High residuals show regional patterns High residuals in green Available in affyQCReport package at www.bioconductor.org See http://plmimagegallery.bmbolstad.com /

27 Affy QC Metrics in Bioconductor affyPLM package fits probe level model to Affymetrix raw data NUSE - Normalized Unscaled Standard Errors –normalized relative to each gene How many big errors?

28 Spatial Artifacts in Agilent Usually not so strong as on other array types More diffuse artifacts – probably reflecting washing irregularities

29 Spatial Artifacts in Nimblegen More common than Agilent Usually more diffuse, probably reflecting washing Some sharp artifacts of unclear origin

30 Spatial Artifacts in Illumina Arrays Often bigger artifacts than Affy Less consequential because more beads, and all have same sequence


Download ppt "Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers."

Similar presentations


Ads by Google