Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway
slide 2 Outline Filtering: spots –removal of spots based on quality measures Normalization –compensation for measurement errors Filtering: genes (significance analysis) –identification of significantly expressed genes
slide 3 Filtering: Spots Criteria used to remove spots –spot area [pixels] –signal/noise ratio (spot intensity vs. background intensity) –other quality measures (e.g. based on quality scores from image analysis software) morphological criteria pixel-level variability
slide 4 Filtering: Spots Spot area based filtering –remove spots with area < threshold in both channels –problem: setting an appropriate threshold –dependent on the definition of a spot (image analysis software), and the distribution of the spot area –typical value: 10 pixels
slide 5 Filtering: Spots Signal/noise based filtering –keep spots with signal / background > threshold in both channels –problem: setting an appropriate threshold –dependent on the spot and background definition (image analysis software) –typical value: sgn/bkg > 2 or, equivalent, sgn - bkg > bkg
slide 6 Filtering: Spots (example)
slide 7 Filtering: Spots Other criteria –Intensity threshold on background corrected intensity –Spot quality measures (pixelwise distributional properties of spot and background intensities, manual morphology-based spot flagging etc.) –Replicate-based spot filtering (adaptive threshold selection based on a repeatability coefficient, coefficient of variation etc.)
slide 8 Normalization Analysis of systematic errors –adjustment for bias coming from variation in the technology rather than from biology Different sources of non-linearity –Efficiency of dye incorporation (labelling) –Print-tip differences –Non-uniformity in hybridisation –Scanning –Between slide variation (print quality, ambient conditions)
slide 9 Normalization Selection of elements –Housekeeping genes, spike controls, tip- dependence, raw data, between array normalization Method –Constant subtraction normalization (mean/median log 2 ratio, iterative c estimation, ANOVA) –Locally weighted mean normalization (intensity or location dependent) –Other recently proposed methods
slide 10 Normalization (example 1) Intensity dependent normalization with locally weighted mean, global
slide 11 Normalization (example 1) Intensity dependent normalization with locally weighted mean, print-tip dependent
slide 12 Normalization (example 1) Intensity dependent normalization with locally weighted mean, global vs. print-tip dependent
slide 13 Normalization (example 2) Intensity dependent normalization with locally weighted mean, print-tip dependent
slide 14 Normalization Location dependent normalization with locally weighted mean (from SNOMAD web page)
slide 15 Normalization Local variance correction across element signal intensity (from SNOMAD web page)
slide 16 Acknowledgments Mette Langaas Department of Mathematical Sciences, Norwegian Institute of Science and Technology Astrid Lægreid Department of Physiology and Biomedical Engineering, Norwegian Institute of Science and Technology Per Kristian Lehre Department of Computer and Information Science, Norwegian Institute of Science and Technology
slide 17 Slide show history DateActionBy CreatedWacław Kuśnierczyk CorrectedWacław Kuśnierczyk