Making the Most of Small Sample High Dimensional Micro-Array Data Allan Tucker, Veronica Vinciotti, Xiaohui Liu; Brunel University Paul Kellam; Windeyer Institute
MicroArray Data High dimensional Small number of samples Need to identify predictive genes E.g. classification Rate confidence on genes based upon predictive ability / classification
Identifying Predictive Genes We use Naïve Bayes Classifier Well established Minimises parameters Feature selection using SA Repeated 10 times Apply cross validation
Identifying Predictive Genes Identify genes robustly Data perturbed during CV Repeats of stochastic SA search Assign confidence based upon the frequencies of genes being selected Limit maximum number of links
Effect of Model Complexity
Classification Accuracy Generally RSN performs best SA global search better than local Anomaly with B-Cell? Synthetic data supports global over local
Confidence Scores Relatively small number of genes Identified with high confidence Consistency between runs
Identified Genes
Conclusions When micro-array data only has small samples: Simple models with small parameters best Global search for parameters better Proposed RSN successfully identifes genes of interest paving way for further biological analysis Need to explore different parameters