©2003/04 Alessandro Bogliolo Analysis of gene expression by means of Microarrays
©2003/04 Alessandro Bogliolo Outline 1.Introduction 2.Data acquisition 3.Image processing 1.Noise filtering 2.Quality evaluation 3.Normalization 4.Data analysis 1.Scatterplot 2.Classification - Principal component analysis 3.Clustering – Hyerarchical clustering
©2003/04 Alessandro Bogliolo Microarray technology
©2003/04 Alessandro Bogliolo Microarray experiment Gene 1 Gene 2 Gene 3 Gene 4 ….. 30min 1hr 2hr 4hr 30min 2MA 1hr 2MA 2hr 2MA 4hr 2MA 30min AIG 1hr AIG 2hr AIG 4hr AIG … genes 132 conditions
©2003/04 Alessandro Bogliolo Outline 1.Introduction 2.Data acquisition 3.Image processing 1.Noise filtering 2.Quality evaluation 3.Normalization 4.Data analysis 1.Scatterplot 2.Classification - Principal component analysis 3.Clustering – Hyerarchical clustering
©2003/04 Alessandro Bogliolo Microarray scanner
©2003/04 Alessandro Bogliolo Microarray scanner
©2003/04 Alessandro Bogliolo Outline 1.Introduction 2.Data acquisition 3.Image processing 1.Noise filtering 2.Quality evaluation 3.Normalization 4.Data analysis 1.Scatterplot 2.Classification - Principal component analysis 3.Clustering – Hyerarchical clustering
©2003/04 Alessandro Bogliolo Noise Noise sources: –Sample preparation, labeling, amplification –Reaction variations –Environment –Target volume –Hybridization parameters (temperature, time,...) –Aspecific hybridization –Dust –Scanner settings –Quantization
©2003/04 Alessandro Bogliolo Noise filtering Gridding: identify spot locations Segmentation: distinguish foreground from background –Fixed Circle: put a circle around the foreground area –Seeded region growing: identify initial spot “seeds” and grow high intensity regions –Edge detection algorithms Background cancellation –Intensity = FGintensity - BGintensity
©2003/04 Alessandro Bogliolo Noise filtering
©2003/04 Alessandro Bogliolo Quality evaluation Irregular size or shape Irregular placement Low intensity Saturation Spot variance Background variance indistinguishablesaturated bad print artifact misalignment
©2003/04 Alessandro Bogliolo Normalization Noralize data to correct for artificial variances Red = FGred - BGred Green = FGgreen – BGgreen PixelValue = log 2 (Red/Green)-log 2 (Red avg /Green avg ) Pixel color: –Green if pixel value < 0 –Yellowif pixel value = 0 –Redif pixel value > 0
©2003/04 Alessandro Bogliolo Normalization Calibrated, red and green equally detectedUncalibrated, red light under detected
©2003/04 Alessandro Bogliolo Outline 1.Introduction 2.Data acquisition 3.Image processing 1.Noise filtering 2.Quality evaluation 3.Normalization 4.Data analysis 1.Scatterplot 2.Classification - Principal component analysis 3.Clustering – Hyerarchical clustering
©2003/04 Alessandro Bogliolo Scatterplot
©2003/04 Alessandro Bogliolo Classification Goal: Identify subset of genes that distinguish between treatments, tissues, etc. Method –Collect several samples grouped by treatments (e.g. ALL vs. AML) –Use genes as “features” –Build a classifier to distinguish treatments Classifiers –Neural networks, decision trees,...
©2003/04 Alessandro Bogliolo Principal component analysis
©2003/04 Alessandro Bogliolo Clustering Hypothesis: Genes with similar function have similar expression profiles Find group of genes with similar expression profiles Find groupd of individuals with similar expression profiles within a population
©2003/04 Alessandro Bogliolo Clustering: k-mean algorithm A. D. C. B.
©2003/04 Alessandro Bogliolo Hyerarchical clustering Top-down (division) Bottom-up (agglomeration)