Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway
slide 2 Outline Filtering: spots –removal of spots based on quality measures Normalization –compensation for measurement errors Examples of common problems
slide 3 Useful plots Channel - channel plot (CC)Intensity - ratio plot (AM or IR)
slide 4 Filtering: Spots Criteria used to remove spots –spot area [pixels] –signal/noise ratio (spot intensity vs. background intensity) –other quality measures (e.g. based on quality scores from image analysis software) morphological criteria pixel-level variability
slide 5 Filtering: Spots Spot area
slide 6 Filtering: Spots Spot area based filtering –keep spots with area > threshold in both channels –problem: setting the appropriate threshold –dependent on the definition of the spot (image analysis software), and the distribution of the spot area –typical value: 10 pixels
slide 7 Filtering: Spots Signal and background
slide 8 Filtering: Spots Signal/noise based filtering –keep spots with signal / background > threshold in both channels –problem: setting the appropriate threshold –dependent on the spot and background definition (image analysis software) –typical value: sgn/bkg > 2 (or, equivalent, sgn - bkg > bkg)
slide 9 Filtering: Spots Signal/noise based filtering (alternative) –flag spots if S ij < B ij +θσ Bij, where: S ij : i th spot intensity in j th channel (not corrected) B ij : i th spot background in j th channel σ Bij : i th spot background deviation in j th channel θ: user defined threshold
slide 10 Filtering: Spots (example)
slide 11 Filtering: Spots Other criteria –Intensity threshold on background corrected intensity (for each channel separately) –Spot quality measures (pixelwise distributional properties of spot and background intensities, manual morphology-based spot flagging etc.) –Replicate-based spot filtering (adaptive threshold selection based on a repeatability coefficient, coefficient of variation etc.)
slide 12 Filtering: Spots Total intensity (log 2 ) threshold
slide 13 Filtering: Spots Morphology based filtering
slide 14 Normalization Analysis of systematic errors –adjustment for bias coming from variation in the technology rather than from biology Different sources of non-linearity –Print-tip differences –Efficiency of dye incorporation (labelling) –Non-uniformity in hybridisation –Scanning –Between slide variation (print quality, ambient conditions)
slide 15 Normalization Selection of elements –Housekeeping genes, spike controls, tip-dependence, raw data, between array normalization Method –Constant subtraction (shift) (mean/median log 2 ratio, iterative c estimation, ANOVA) –Locally weighted mean (intensity or location dependent) –Other recently proposed methods
slide 16 Normalization (example 1) Intensity independent normalization with median ratio subtraction
slide 17 Normalization (example 1) Intensity independent normalization with median ratio subtraction
slide 18 Normalization (example 1) Intensity dependent normalization with locally weighted mean, global
slide 19 Normalization (example 1) Intensity dependent normalization with locally weighted mean, print-tip dependent
slide 20 Normalization (example 1) Intensity dependent normalization with locally weighted mean, global vs. print-tip dependent
slide 21 Normalization (example 2) Intensity dependent normalization with locally weighted mean, print-tip dependent
slide 22 Normalization Location dependent normalization with locally weighted mean (from SNOMAD web page)
slide 23 Common problems: examples
slide 24 Common problems: examples
slide 25 Common problems: examples
slide 26 Common problems: examples
slide 27 Common problems: examples
slide 28 Common problems: examples
slide 29 Common problems: examples
slide 30 Acknowledgments Mette Langaas Department of Mathematical Sciences, Norwegian Institute of Science and Technology Astrid Lægreid, Kristin Nørsett Department of Physiology and Biomedical Engineering, Norwegian Institute of Science and Technology Per Kristian Lehre Department of Computer and Information Science, Norwegian Institute of Science and Technology