Classifying Event-Related Desynchronization in EEG, ECoG, and MEG Signals Kim Sang-Hyuk
Bioimaging Introduction Experimental setup and procedure Preanalysis Data processing Generalization error estimation Contents
Bioimaging Introduction Several different technologies exist for measuring brain activity – They have their own advantages and limitations – Spatial and temporal resolution – Cost, portability and risk to the user Comparative studies are required in order to guide Motor-imagery BCI experiments based on Electroencephalography (EEG), electrocorticography (ECoG) and magnetoencephalography (MEG) A simple binary synchronous (trial-based) paradigm Present quantitative results focusing on – The effect of the number of trial – The effect of spatial filtering
Bioimaging Introduction EEG – Electrical signals are measured by passive electrodes – Very high temporal resolution – Low cost, risk, and portability – Limitation of spatial resolution ECoG – Electrical signals obtained from an array of electrodes beneath the skull – High SNR – A better response at higher frequencies – Invasive MEG – Measuring the tiny magnetic field fluctuations induced by the electrical activity of cerebral neurons – Expensive and nonportable
Bioimaging Experimental Setup and Procedure EEG – 8 untrained right handed male subjects – 39 silver chloride electrodes – Sampling frequency: 256Hz – The subjects were seat in an armchair at 1-m distance in front of a computer screen : used for data acquisition : reference Positions of electrodes
Bioimaging Experimental Setup and Procedure Each trial started with a blank screen A small fixation cross displayed in the center of the screen from second 2 to 9 At 2s, a short warning tone (beep) At 3s, the fixation cross was overlaid with an arrow at the center of the monitor for 1.5s – The direction of arrow point either to the left or to the right In order to avoid event related signals in later processing stages, only data from seconds 4 to 9 of each trial was considered
Bioimaging Preanalysis In order to identify and exclude subjects that did not show significant μ- activity at all Restricted to only the 17 EEG channels that were located over or close to the motor cortex – Calculate of the μ-band using the Welch method (short time Fourier transform) for each subject This feature extraction resulted in one parameter per trial and channel The eight data sets consisting of the Welch-features were classified with linear support vector machines including individual model selection for each subject Generalization errors were estimated by 10-fold cross validation (CV) For three subjects the preanalysis showed very poor error rates close to chance level, their data sets were excluded from further analysis
Bioimaging Preanalysis Short Time Fourier Transform (STFT) A Fourier-related transformation used to examine the frequency and phase content of local sections of a signal over time Discrete-time STFT – W[n] is the window function – Window is sliding along time axis Examples of window overlap
Bioimaging Preanalysis Short Time Fourier Transform (STFT) Examples of STFT
Bioimaging Short Time Fourier Transform (STFT) 5 segment for a trial, overlapping 50% Averaging the spectra of 5 A vector of log amplitudes at different frequencies for each sensor Preanalysis 5 A trial 5 A vector Averaging
Bioimaging Autoregressive (AR) Model AR(p) model is defined as – Where are the parameters of the model – P is order The output is modeled as a linear combination of P past values of the output For the remaining five subjects, the recorded 5s windows of each trial resulted in a time series of 1280 sample points per channel AR model of order 3 is fitted to the time series of all 39 channels using forward backward linear prediction – The three resulting coefficients per channel and trial formed the new representation of the data – The extraction of the features did not explicitly incorporate prior knowlede – They are not directly linked to the μ-rhythm Data Preprocessing
Bioimaging Support Vector Machine Linear Support Vector Machine Choose a decision boundary between classes such that margin is maximized – Margin: the distance in feature space between the boundary and the nearest data points (support vectors) Linearly separable case
Bioimaging Support Vector Machine Linear Support Vector Machine The function of hyperplane – : weight vector normal to hyperplane – : threshold The distance of a point from a hyperplane
Bioimaging Support Vector Machine Linear Support Vector Machine Scale so that the value of, at the support vectors, is equal to 1 for S 1 and equal to -1 for S 2 – Margin: – Compute the parameters, of the hyperplane so that to: – Minimize – Subject to to where corresponding class indicator (+1 for, -1 for ) S2S2 S1S1
Bioimaging Support Vector Machine Linear Support Vector Machine The Karush-Kuhn-Tucker (KKT) conditions – is the vector of the Lagrange multipliers – is the Lagrangian function defined as Finally results are
Bioimaging Support Vector Machine Soft Margin Support Vector Machine In the case where the classes are not separable, soft margin support vector machine is available The training feature vectors categorized into three cases – Vectors that fall outside the band and are correctly classified – Vectors falling inside the band and which are correctly classified – Vectors that are misclassified
Bioimaging Support Vector Machine Soft Margin Support Vector Machine All three cases can be treated under a single type of constraints – The first category of data: – The second: – The third: The goal is to make the margin as large as possible but at the same time to keep the number of points with as small as possible Cost function – Where is the vector of the parameters
Bioimaging Support Vector Machine Soft Margin Support Vector Machine The parameter C is a positive constant that controls the relative influence of the two competing terms Optimization of the cost function is difficult due to a discontinuous function – A closely related cost function – Minimize – Subject to Depending on C, the optimal margin will widen and more points will become support vectors – Finding a good value for C is part of the model selection procedure
Bioimaging Generalization Error Estimation K-Fold Cross Validation A statistical method for validating a predictive model Whole data is separated into k subsets (folds) of equal size Each fold is also divided into k subsets and k subsets are categorized into train set and test set – K-1 subsets are used for training of classifier – 1 set is used for validation Model training and evaluation is repeated k times with each of the k subsets An example of 5-fold cross-validation
Bioimaging Contents of Next Lecture Feature Selection Method – Fisher criterion – Zero-norm optimization – Recursive feature elimination (RFE) Results in EEG Procedure and results in ECoG Procedure and results in MEG Overview of results in EEG, ECoG and MEG