MicroArray Image Analysis Robin Liechti

Slides:



Advertisements
Similar presentations
Microarray Technique, Analysis, and Applications in Dermatology Jennifer Villaseñor-Park 1 and Alex G Ortega-Loayza 2 1 Department of Dermatology, University.
Advertisements

AFGC Damares C Monte Carnegie Institution of Washington,
5th Intensive Course on Soil Micromorphology Naples th - 14th September Image Analysis Lecture 5 Thresholding/Segmentation.
Pre-processing in DNA microarray experiments Sandrine Dudoit PH 296, Section 33 13/09/2001.
MicroArray Image Analysis
MicroArray Image Analysis Robin Liechti
Measuring Gene Expression Part 3
Introduction to Microarray
Microarray Normalization
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical.
Mathematical Statistics, Centre for Mathematical Sciences
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Image Quantitation in Microarray Analysis More tomorrow...
TIGR Spotfinder: a tool for microarray image processing
Introduction to DNA Microarray Technology Steen Knudsen, April 2005.
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Getting the numbers comparable
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
Gene Expression Data Analyses (2)
Low Level Statistics and Quality Control Javier Cabrera.
Sample preparation 1. Design experiment Question? Replicates? Test? 2. Perform experiment 4. Label RNA Amplification? Direct or indirect? Label? wild.
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
Image Analysis Class web site: Statistics for Microarrays.
Introduce to Microarray
Scanning and image analysis Scanning -Dyes -Confocal scanner -CCD scanner Image File Formats Image analysis -Locating the spots -Segmentation -Evaluating.
Analysis of microarray data
B IOINFORMATICS Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 8 Analyzing Microarray Data Aleppo University Faculty of technical.
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
Microarray Preprocessing
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
Image Quantitation in Microarray Analysis More tomorrow...
Tal Mor  Create an automatic system that given an image of a room and a color, will color the room walls  Maintaining the original texture.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
(2) Ratio statistics of gene expression levels and applications to microarray data analysis Bioinformatics, Vol. 18, no. 9, 2002 Yidong Chen, Vishnu Kamat,
Introduction to microarray technology and analysis
Affymetrix vs. glass slide based arrays
IMAGE INFORMATICS SOLUTIONS Extracting Information From Images Array-Pro 4.5 Training, May 2003.
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
CDNA Microarrays MB206.
Panu Somervuo, March 19, cDNA microarrays.
1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data.
Analysis of Microarray Data Analysis of images Preprocessing of gene expression data Normalization of data –Subtraction of Background Noise –Global/local.
Microarray - Leukemia vs. normal GeneChip System.
What Is Microarray A new powerful technology for biological exploration Parallel High-throughput Large-scale Genomic scale.
Measuring Gene Expression Part 3 David Wishart Bioinformatics 301
ImArray - An Automated High-Performance Microarray Scanner Software for Microarray Image Analysis, Data Management and Knowledge Mining Wei-Bang Chen and.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Computational Biology and Bioinformatics Lab. Songhwan Hwang Functional Genomics DNA Microarray Technology.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Plate Effects in cDNA Microarray Data.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
(1) Normalization of cDNA microarray data Methods, Vol. 31, no. 4, December 2003 Gordon K. Smyth and Terry Speed.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Knowledge Discovery in Microarray Gene Expression Data Insilicogen Junhyung Park.
Microarray - Leukemia vs. normal GeneChip System.
Normalization Methods for Two-Color Microarray Data
Image Processing for cDNA Microarray Data
The Basics of Microarray Image Processing
Getting the numbers comparable
Normalization for cDNA Microarray Data
Presentation transcript:

MicroArray Image Analysis Robin Liechti

Microarray analysis Array construction, hybridisation, scanning Quantitation of fluorescence signals Data visualisation Meta-analysis (clustering) More visualisation

Technical probe (on chip) sample (labelled) pseudo-colour image [image from Jeremy Buhler]

Experimental design Track what’s on the chip which spot corresponds to which gene Duplicate experimental spots reproducibility Controls DNAs spotted on glass positive probe (induced or repressed) negative probe (bacterial genes on human chip) oligos on glass or synthesised on chip (Affymetrix) point mutants (hybridisation plus/minus)

Images from scanner Resolution standard 10  m [currently, max 5  m] 100  m spot on chip = 10 pixels in diameter Image format TIFF (tagged image file format) 16 bit (65’536 levels of grey) 1cm x 1cm image at 16 bit = 2Mb (uncompressed) other formats exist e.g.. SCN (used at Stanford University) Separate image for each fluorescent sample channel 1, channel 2, etc.

Images in analysis software The two 16-bit images (cy3, cy5) are compressed into 8-bit images Goal : display fluorescence intensities for both wavelengths using a 24-bit RGB overlay image RGB image : Blue values (B) are set to 0 Red values (R) are used for cy5 intensities Green values (G) are used for cy3 intensities Qualitative representation of results

Images : examples cy3 cy5 Spot colorSignal strengthGene expression yellowControl = perturbedunchanged redControl < perturbedinduced greenControl > perturbedrepressed Pseudo-color overlay

Processing of images Addressing or gridding Assigning coordinates to each of the spots Segmentation Classification of pixels either as foreground or as background Intensity extraction (for each spot) Foreground fluorescence intensity pairs (R, G) Background intensities Quality measures

ScanAlyze Parameters to address the spots positions Separation between rows and columns of grids Individual translation of grids Separation between rows and columns of spots within each grid Small individual translation of spots Overall position of the array in the image Addressing (I) The basic structure of the images is known (determined by the arrayer)

Addressing (II) The measurement process depends on the addressing procedure Addressing efficiency can be enhanced by allowing user intervention (slow!) Most software systems now provide for both manual and automatic gridding procedures

Segmentation (I) Classification of pixels as foreground or background -> fluorescence intensities are calculated for each spot as measure of transcript abundance Production of a spot mask : set of foreground pixels for each spot

Segmentation (II) Segmentation methods : Fixed circle segmentation Adaptive circle segmentation Adaptive shape segmentation Histogram segmentation Fixed circleScanAlyze, GenePix, QuantArray Adaptive circleGenePix, Dapple Adaptive shapeSpot, region growing and watershed Histogram methodImaGene, QuantArraym DeArray and adaptive thresholding

Fixed circle segmentation Fits a circle with a constant diameter to all spots in the image Easy to implement The spots need to be of the same shape and size Bad example !

Adaptive circle segmentation The circle diameter is estimated separately for each spot Dapple finds spots by detecting edges of spots (second derivative) Problematic if spot exhibits oval shapes

Adaptive shape segmentation Specification of starting points or seeds Regions grow outwards from the seed points preferentially according to the difference between a pixel’s value and the running mean of values in an adjoining region.

Histogram segmentation Uses a target mask chosen to be larger than any other spot Foreground and background intensity are determined from the histogram of pixel values for pixels within the masked area Example : QuantArray Background : mean between 5th and 20th percentile Foreground : mean between 80th and 95th percentile Unstable when a large target mask is set to compensate for variation in spot size BkgdForeground

Information extraction

Spot intensity The total amount of hybridization for a spot is proportional to the total fluorescence at the spot Spot intensity = sum of pixel intensities within the spot mask Since later calculations are based on ratios between cy5 and cy3, we compute the average* pixel value over the spot mask *alternative : use ratios of medians instead of means

Background intensity Motivation : spot’s measured intensity includes a contribution of non-specific hybridization and other chemicals on the glass Fluorescence from regions not occupied by DNA should by different from regions occupied by DNA -> could be interesting to use local negative controls (spotted DNA that should not hybridize) Different background methods : Local background, morphological opening, constant background, no adjustment

Local background Focusing on small regions surrounding the spot mask. Median of pixel values in this region Most software package implement such an approach ScanAlyzeImaGeneSpot, GenePix By not considering the pixels immediately surrounding the spots, the background estimate is less sensitive to the performance of the segmentation procedure

Morphological opening (spot) Applied to the original images R and G Use a square structuring element with side length at least twice as large as the spot separation distance Remove all the spots and generate an image that is an estimate of the background for the entire slide For individual spots, the background is estimated by sampling this background image at the nominal center of the spot Lower background estimate and less variable

Constant background Global method which subtracts a constant background for all spots Some findings suggests that the binding of fluorescent dyes to ‘negative control spots’ is lower than the binding to the glass slide -> More meaningful to estimate background based on a set of negative control spots If no negative control spots : approximation of the average background = third percentile of all the spot foreground values

No adjustment Do not consider the background

Quality measures (-> Flag) How good are foreground and background measurements ? Variability measures in pixel values within each spot mask Spot size Circularity measure Relative signal to background intensity b-value : fraction of background intensities less than the median foreground intensity p-score : extend to which the position of a spot deviates from a rigid rectangular grid Based on these measurements, one can flag a spot

Summary The choice of background correction method has a larger impact on the log-intensity ratios than the segmentation method used The morphological opening method provides a better estimate of background than other methods Low within- and between-slide variability of the log 2 R/G Background adjustment has a larger impact on low intensity spots Spot, GenePix ScanAlyze M = log 2 R/G A = log 2 √(RG)

Farmer group in Lausanne Using Imagene 4.2 To avoid great variability of the ratios around the low-intensity spots, we use a cut-off value for the intensity minus background values (e.g ). Lost of information, but no bad information ! M A

References Yang, Y. H., Buckley, M. J., Dudoit, S. and Speed, T. P. (2001), ‘Comparisons of methods for image analysis on cDNA microarray data’. Technical report #584, Department of Statistics, University of California, Berkeley. Yang, Y. H., Buckley, M. J. and Speed, T. P. (2001), ‘Analysis of cDNA microarray images’. Briefings in bioinformatics, 2 (4),

Imagene demo