Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of Biological Complexity Virginia Commonwealth University Richmond, VA
Expression Profiling: A Non-biased, Genomic Approach to Resolving the Mechanisms of Addiction Candidate Gene Studies Cycles of Expression Profiling: “Molecular Triangulation” Merge with Biological Databases
High Density DNA Microarrays
Oligonucleotide Array Analysis AAAA Oligo(dT)-T7 Total RNA Rtase/ Pol II dsDNA AAAA-T7 TTTT-T7 CTP-biotin T7 pol TTTT-5’ 5’ Biotin-cRNA Hybridization Steptavidin- phycoerythrin Scanning PM MM
Stepwise Analysis of Microarray Data Low-level analysis -- image analysis, expression quantitation Primary analysis -- is there a change in expression? Secondary analysis -- what genes show correlated patterns of expression? (supervised vs. unsupervised) Tertiary analysis -- is there a phenotypic “trace” for a given expression pattern?
Hybridization and Scanning GE Database (SQL Server) Primary Analysis (MAS-5, S- score, d-chip, PDNN) Clustering Techniques Statistical Filtering (e.g. SAM) Overlay Biological Databases (PubGene, GenMAPP, EASE, WebQTL, etc.) Provisional Gene “Patterns” Filtered Gene Lists Candidate Genes Molecular Validation (RT-PCR, in situ, Western) Behavioral Validation Normalize, De-noise Experimental Design
Quality Assessment Gene specific: R/G correlation, %BG, %spot, biological variation Array specific: normalization factor, % genes present, linearity, control/spike performance (e.g. 5’/3’ ratio, intensity) Across arrays: linearity, correlation, background, normalization factors
Sources of Variance in Microarray Experiments
Chip Normalization Procedures Whole chip intensity –Assumes relatively few changes, uniform error/noise across chip and abundance classes –Linear vs. “piece wise” linear (quantile, lowess) Spiked standards –Requires exquisite technical control, assumes uniform behavior Internal Standards –Assumes no significant regulation
“Lowess” normalization, Pin-specific Profiles After Print-tip Normalization Slide Normalization: Pieces and Pins See also: Schuchhardt, J. et al., NAR 28: e47 (2000)
Affymetrix Arrays: PM-MM Difference Calculation Probe pairs control for non-specific hybridization of oligonucleotides
Probe Level Analysis: Challenges Large variability in PM and MM intensities Only probe pairs MM is a complex mixture of true signal and background Normalization required to compare across chips Intensity dependent noise Etc.
Probe Level Analysis Methods AvgDiff -- Affymetrix 1996, trimmed mean with exclusion of outliers, PM-MM MAS 5 -- Affymetrix 2001, modeled correction of MM, Tukey’s bi-weight, PM-MM or PM-m MBEI -- Li and Wong 2001, modeled correction and outlier detection, PM-MM or PM only RMA (Robust Multichip Analysis) -- Irizarry et al. 2002, PM only PDNN (Position Dependent Nearest Neighbor) - Zhang et al. 2003, thermodynamic model for probe interactions, PM only
MAS 5 Fold-Change vs. S-scores
Secondary Analysis: Expression Patterns Supervised multivariate analyses –Support vector machines Non-supervised clustering methods –Hierarchical –K-means –SOM
PFC HIP VTA NAC Use of S- score in Hierarchical Clustering of Brain Regional Expression Patterns relative change PFC HIP NAC VTA AvgDiffS-score
Tertiary Analysis: Connecting Function with Expression Patterns Annotation –UniGene/Swiss-Prot, SOURCE, DAVID Biased functional assessment –Manual, GenMAPP, GeneSpring Non-biased functional queries –PubGen –MAPPFinder, DAVID/Ease, GEPAS, GOTree Machine, others Overlaying genomics and genetics –WebQTL
Non-biased (semi) Functional Group Analysis: GenMAPP
Expression Analysis Systematic Explorer -- EASE Genome Biol. 2003;4(10):R70. Epub 2003 Sep 11.
EASE -- Options in Analysis
Efforts to Integrate Diverse Biological Databases with Expression Information: PubGen
NACPFCVTA B6 Et D2 Et B6/D2 B6 Et D2 Et B6/D2 B6 Et D2 Et B6/D2 Functional Annotation Association Mining (EASE) High-throughput Literature Association Mining (PubGene) Genetic Associations (WebQTL) Additional Expression Associations (Molecular Triangulation)
Expression Networks Expression Profiling Pharmacology Genetics Complex Trait Prot-Prot Interactions Ontology Homolo -Gene BioMed Lit Relations Quaternary Analysis: Profiles to Physiology