cDNA Microarrays MB206
What is a cDNA Microarray? Also known as DNA Chip Allows simultaneous measurement of the level of transcription for every gene in a genome (gene expression) Transcription? Process of copying of DNA into messenger RNA (mRNA) Environment dependant! Microarray detects mRNA, or rather the more stable cDNA MB206
The cDNA Microarray Technique High-throughput measuring - 5000-20000 gene expressions at the same time Identify genes that behaves different in different cell populations - tumor cells vs healthy cells - brain cells vs liver cells - same tissue different organisms Time series experiments - gene expressions over time after treatment
Overview cDNA clones (probes) printing Hybridize RNA Tumor sample cDNA PCR product amplification purification printing 0.1nl / spot excitation red laser green laser emission scanning Hybridize RNA Tumor sample cDNA Reference sample overlay images and normalise microarray analysis
Creating the slides
RNA Extraction & Hybridization Hybridize RNA Tumor sample cDNA Reference sample
Example of a cDNA Microarray
Scanning & Image Analysis
Data Output
Differentially expressed genes Sample class prediction etc. Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment Estimation Experimental design Image analysis Normalization Clustering Discrimination R, G 16-bit TIFF files (Rfg, Rbg), (Gfg, Gbg)
Reading an array Laser scans array and produces images One laser for each color, e.g. one for green, one for red Image analysis, main tasks: Noise suppression Spot localization and detection, including the extraction of the background intensity, the spot position, and the spot boundary and size Data quantification and quality assessment Image Analysis is a book on its own: Kamberova, G. & Shah, S. “DNA Array Image Analysis Nuts & Bolts“. DNA Press LLC, 2002 MB206
Data Transformation “Observed” data {(R,G)}n=1..5184: R = red channel signal G = green channel signal (background corrected or not) Transformed data {(M,A)}n=1..5184: M = log2(R/G) (ratio), A = log2(R·G)1/2 = 1/2·log2(R·G) (intensity signal) R=(22A+M)1/2, G=(22A-M)1/2
Normalization Biased towards the green channel & Intensity dependent artifacts
Replicated measurements Scaled print-tip normalization Median Absolute Deviation (MAD) Scaling Averaging
Identification of differentially expressed genes Extreme in M values? ...or extreme in some other statistics? Extreme in T values?
List of genes that the biologist can understand and verify with other experiments Gene: Mavg Aavg T SE 2341 -0.86 10.9 -18.0 0.125 6412 -0.75 11.1 -14.7 0.102 6123 -0.70 9.8 -12.2 0.121 102 0.65 10.3 -14.5 0.136 2020 0.64 9.3 -11.9 0.118 3132 0.62 9.9 -14.4 0.090 4439 -0.62 9.7 -14.6 0.088 2031 -0.61 10.7 -13.7 0.087 657 -0.60 9.2 -13.6 0.094 502 0.58 10.0 -12.7 0.101 1239 -0.58 9.8 -11.4 0.103 5392 -0.57 9.9 -20.7 0.057 3921 0.52 11.3 13.5 0.083 ...
Time Course Gene Expression Profiles
Statistical Problems Image analysis - what is foreground? - what is background? Quality - which spots can we trust? - which slides can we trust? Artifacts from preparing the RNA, the printing, the scanning etc. Data cleanup Normalization within an experiment: - when few genes change. - when many genes change. - dye-swap to minimize dye effects. Normalization between experiments: - location and scale effects. What is noise and what is variability? Which genes are actually up- and down regulated? P-values. Planning of experiments: - what is best design? - what is an optimal sample sizes? Classification: - of samples. - of genes. Clustering: - of samples. - of genes. Time course experiments. Gene networks. - identification of pathways ...
Overview of Example Brown & Botstein, 1999 MB206