A robust neural networks approach for spatial and intensity-dependent normalization of cDNA microarray data A.L. Tarca, J.E.K. Cooke and J. MacKay Presented by Dana Mohamed
Microarrays
Importance of Microarrays (and that the data is correct) Assumption that microarray data linearly reflects amount of mRNA present in cell –In turn, reflects gene expression levels If the data is incorrect, –So is our interpretation of gene expression And therefore all the science built on that interpretation is also incorrect
Where error is Intensity of Fluorescence –Overall imbalance of dye intensity 2 dyes: Cy5 (R) and Cy3 (G) If R & G expressed at equal levels, R/G = 1 Space –Intensities variable on coordinates Can be “dirty” on sides of microarray
Previous Methods Many address intensity bias Few address spatial bias Most rely on M* = M – m –M* is the normalized values –M is the raw log-ratio (M = log 2 R/G) –m is the estimate of the bias
Important Variables M = log 2 (R/G) –Log ratio converts multiplicative error to additive error A = (1/2)0.5log 2 RG –Average of the log-intensities Minus-add plots –M vs. A –Useful for assessing systematic bias
Calculating m in other methods gMed – global median normalization –m = median(M i ) –M i are all the values of M pLo – print tip loess –m = c i (A) pLoGS –found in GeneSight biodiscovery.com biodiscovery.com –Local group median (3x3 square regions) + print tip loess cPLo2D - print tip loess + pure 2D normalization –BioConductor bioconductor.org bioconductor.org –m = α c i (A) + β c i (SpotRow,SpotCol) –c i (SpotRow,SpotCol) is the loess estimate of M using spot row and column coordinates inside the ith print tip gLoMedF –global loess normalization + spatial median filter
Robust Neural Networks Technique pNN2DA – print tip robust neural nets 2D and A –Attempt to find the best fit of M using A and the 2-D space coordinates of the spots: m = c i (A,X,Y) Instead of using individual print tips – use 3x3 “bins” of them – X and Y –Accounts for spatial bias
Neural Nets Terminology Uses multi-layer feedforward network Sigmoid Function
Neural Networks Uses multi-layer feedforward network x is the vector (X,Y,A,1), I = 3, w are the weights, sigma one represents the hidden neurons and they are sigmoid functions, sigma two is the single neuron in the output layer, which is also sigmoid, Sigma one J+1 accounts for the second layer bias, J represents the number of neurons in the hidden layer of the network
Multi-layered Feedforward Usually, J = 3 to take care of outliers but also so as to avoid over-fitting
Criteria & Datasets Criteria: a) reduce variability of log-ratios between replicated slides and within slides b) ability to distinguish truly regulated genes from the other genes Datasets: 1)Apo AI: a,b 2)Swirl Zebra Fish: a 3)Poplar experiment: a 4)Perturbed Apo AI: b
Classic Neural Nets vs. Robust NNets
Criteria refresher The ability to reduce the variability of log-ratios between replicated slides and within slides The ability to distinguish truly regulated genes from the other genes
Impact on Variability
Cont. – 3 Data Sets
Downregulated Gene Sorting – Apo AI set
DRGS – Perturbed Apo AI set
Spatial Uniformity of M values distribution
Results Table
Strengths/Weaknesses Seems promising Uses multiple tests to determine efficacy Doesn’t use enough datasets Uses patterned perturbed dataset –But no “real” perturbed dataset
Future Work More datasets When should this normalization technique be used over other techniques? Should this technique be combined with elements of other techniques to further improve it?
References Tarca, A.L., J.E.K. Cooke, and J. Mackay. “A robust neural networks approach for spatial and intensity-dependent normalization of cDNA microarray data." Bioinformatics Jun 2005; 21: Haykin, Simon. Neural Networks: A Comprehensive Foundation. New Jersey: Prentice Hall, Mount, David W. Bioinformatics: sequence and genome analysis. New York: Cold Spring Harbor Laboratory Press, 2001.