Dimensionality Reduction for fMRI Brain Imaging Data Leman Akoglu Carnegie Mellon University, Computer Science Department Abstract Functional Magnetic Resonance Imaging (fMRI) is a very powerful instrument to collect data about activity in the human brain. Like in many empirical sciences, this new method has led to a flood of new data. Motivation: If appropriate analysis tools can be developed given the big amount of data produced, fMRI technology offers revolutionary approaches to the study of human brain functioning. For example, if cognitive states of the brain could be decoded, medical diagnosis of Alzheimer’s, Dementia, Brain Tumors or Schizophrenia would be possible given the fMRI brain activity of a human subject. Limitations: (1) sparse data (tens of training examples per human subject), (2) noisy data (3) extremely high dimensional (up to 10 5 ) feature space. Objectives: (1) Figure out powerful dimensionality reduction methods in order to make “learning” easier and faster. (2) Find best informative features in order to increase classification accuracy. FEATURE SELECTION METHODS Discrim Train a separate classifier for each voxel. Each voxel has 16 features ( 8-sec intervals) The accuracy of each single-voxel classifier over the training data is regarded as the measure of discriminating power. Pick top n most discriminating voxels. EXPERIMENT RESULTS Picture versus Sentence case study Active Score each voxel based on how active it is relative to the fixation (rest) condition. Pick top n most active voxels. ActiveThenDiscrim Select most active m voxels. Train a separate classifier for each of m active voxels. Pick top n most discriminating active voxels. DiscrimAndActive Train a separate classifier for each voxel. Select top n most ‘discriminating’ voxels. Select top n voxels with highest activity score. Pick the subset of voxels in the intersection (most active AND discriminating voxels) *Time-SeriesAvg Group those voxels time-series of which are highly correlated. Correlation measure is covariance. Average time series of voxels in the same group to form new supervoxels. *Time-SeriesMost Determine the most effective voxel. Find those voxels time-series of which is not correlated to that of the most effective voxel (informative voxels). Drop voxels with time-series highly correlated to that of the most effective voxel (reduce redundancy). Feature selectionAvgErrABCDEF All (~5000) Active(120) Discrim(120) ActiveThenDiscrim (nToKeep=120, nActive=2000) DiscrimAndActive (nDiscrim=120, nActive=2000) ActiveTSavg(240) DiscrimTSavg(120) ActiveThenDiscrimTSavg (nToKeep=120, nActive=2000) ActiveTSmost(120) DiscrimTSmost(120) ActiveThenDiscrimTSmost (nToKeep=120, nActive=2000) Feature selection Average error 1NN3NN9NNSVM Active (nToKeep) (120)0.2854(240)0.3000(480) (240) Discrim (nToKeep) (120)0.2417(120)0.2042(120) (120) ActiveThenDiscrim (nToKeep, nActive) (240,1000) (120,1000) (120,2000) (120,1000) DiscrimAndActive (nDiscrim, nActive) (120,2000) (120,3000) (120,3000) (120,3000) All (~5000) CONCLUSIONS Brain cognitive state classification is possible (better than random classification accuracies). Error decreases considerably when feature selection is used for all types of classifiers. Discrimination-based method outperforms activity-based method. But, Discrim is computationally more expensive than Active. It is also prone to overfitting as its performance is evaluated on training data. ActiveThenDiscrim outperforms Active and its accuracy is very close to that of Discrim, but is computationally less demanding, which makes it a good alternative. DiscrimAndActive outperforms Active and well approximates the error rates of Discrim, just like ActiveThenDiscrim. But, it is computationally as demanding as Discrim. Still, it could be a good alternative for feature selection as it reduces the number of voxels significantly. For the time-series methods, the number of features are further reduced, almost halved. Still, accuracy results are very close to those without applying time-series methods. These methods come with extra computational cost, but can be employed when high dimensionality is a problem as it makes learning difficult, increasing the number of parameters to be estimated consecutive trials for 6 human subjects - fMRI images every 500 msec - rest (fixation) periods for zero-signal-data - find a mapping function f : fMRI-sequence(t 0,t 0 +8) { Picture, Sentence } 1 st stimulus (picture): 4 secs Rest(fixation) period: 4 secs 2 nd stimulus (sentence) : 4secs