Descriptive Statistics The means for all but the C 3 features exhibit a significant difference between both classes. On the other hand, the variances for.

Slides:



Advertisements
Similar presentations
FMRI Methods Lecture 10 – Using natural stimuli. Reductionism Reducing complex things into simpler components Explaining the whole as a sum of its parts.
Advertisements

1 CPC group SeminarThursday, June 1, 2006 Classification techniques for Hand-Written Digit Recognition Venkat Raghavan N. S., Saneej B. C., and Karteek.
Computer Vision – Image Representation (Histograms)
Corpus Development EEG signal files and reports had to be manually paired, de-identified and annotated: Corpus Development EEG signal files and reports.
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
Data mining and statistical learning, lecture 4 Outline Regression on a large number of correlated inputs  A few comments about shrinkage methods, such.
Face Recognition Using Eigenfaces
Independent Component Analysis (ICA) and Factor Analysis (FA)
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Abstract EEGs, which record electrical activity on the scalp using an array of electrodes, are routinely used in clinical settings to.
Automatic Labeling of EEGs Using Deep Learning M. Golmohammadi, A. Harati, S. Lopez I. Obeid and J. Picone Neural Engineering Data Consortium College of.
Summarized by Soo-Jin Kim
Abstract The emergence of big data and deep learning is enabling the ability to automatically learn how to interpret EEGs from a big data archive. The.
Multimodal Interaction Dr. Mike Spann
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.
THE TUH EEG CORPUS: A Big Data Resource for Automated EEG Interpretation A. Harati, S. López, I. Obeid and J. Picone Neural Engineering Data Consortium.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
TUH EEG Corpus Data Analysis 38,437 files from the Corpus were analyzed. 3,738 of these EEGs do not contain the proper channel assignments specified in.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
Multimodal Interaction Dr. Mike Spann
Abstract Automatic detection of sleep state is important to enhance the quick diagnostic of sleep conditions. The analysis of EEGs is a difficult time-consuming.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Principal Component Analysis (PCA)
Data Analysis Generation of the corpus statistics was accomplished through the analysis of information contained in the EDF headers. Figure 4 shows some.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Principal Component Analysis
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Abstract Automatic detection of sleep state is an important queue in accurate detection of sleep conditions. The analysis of EEGs is a difficult time-consuming.
Feature Extraction 主講人:虞台文.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
The Neural Engineering Data Consortium Mission: To focus the research community on a progression of research questions and to generate massive data sets.
Constructing Multiple Views The architecture of the window was altered to accommodate switching between waveform, spectrogram, energy, and all combinations.
Constructing Multiple Views The architecture of the window was altered to accommodate switching between waveform, spectrogram, energy, and all combinations.
Results from Mean and Variance Calculations The overall mean of the data for all features was for the REF class and for the LE class. The.
DEEP LEARNING BOOK CHAPTER to CHAPTER 6
LECTURE 11: Advanced Discriminant Analysis
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
CLASSIFICATION OF SLEEP EVENTS IN EEG’S USING HIDDEN MARKOV MODELS
G. Suarez, J. Soares, S. Lopez, I. Obeid and J. Picone
Enhanced Visualizations for Improved Real-Time EEG Monitoring
Analyzing Redistribution Matrix with Wavelet
Principal Component Analysis (PCA)
Random walk initialization for training very deep feedforward networks
Enhanced Visualizations for Improved Real-Time EEG Monitoring
N. Capp, E. Krome, I. Obeid and J. Picone
Optimizing Channel Selection for Seizure Detection
EEG Recognition Using The Kaldi Speech Recognition Toolkit
Techniques for studying correlation and covariance structure
Descriptive Statistics vs. Factor Analysis
AN ANALYSIS OF TWO COMMON REFERENCE POINTS FOR EEGS
E. von Weltin, T. Ahsan, V. Shah, D. Jamshed, M. Golmohammadi, I
EE513 Audio Signals and Systems
X.1 Principal component analysis
Vinit Shah, Joseph Picone and Iyad Obeid
feature extraction methods for EEG EVENT DETECTION
Feature space tansformation methods
Principal Component Analysis
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Hypothesis Testing: The Difference Between Two Population Means
Machine Learning for Visual Scene Classification with EEG Data
Feature Selection Methods
Principal Component Analysis
Making Use of Associations Tests
Testing Causal Hypotheses
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

Descriptive Statistics The means for all but the C 3 features exhibit a significant difference between both classes. On the other hand, the variances for each feature tend to be fairly close for the most part, with the exception being the energy feature. PCA For the most part, the components are similar for both classes. A notable exception is the sixth component of the third vector, which indicates a problem with polarity. Descriptive Statistics The means for all but the C 3 features exhibit a significant difference between both classes. On the other hand, the variances for each feature tend to be fairly close for the most part, with the exception being the energy feature. PCA For the most part, the components are similar for both classes. A notable exception is the sixth component of the third vector, which indicates a problem with polarity. Abstract High performance automatic labeling of EEG signals for clinical applications requires statistical modeling systems that are trained on large amounts of data. The Temple University Hospital EEG Corpus (TUH EEG) is the world’s largest publicly available resource and contains over 28,00 EEG records. However, it is the nature of clinical data that there are many forms of inconsistency including channel labels, sampling rates and electrode positioning. In this study, we examine one such issue: 51% of the recordings in TUH EEG are REF referenced, while the other 49% are LE referenced. It is unclear whether this difference will have an affect on any statistical machine learning algorithms trained using the data. A rudimentary statistical analysis suggests that the means and variances of features generated from this data are significantly different and will require some form of post processing if the data are to be used together to train a single statistical model. Abstract High performance automatic labeling of EEG signals for clinical applications requires statistical modeling systems that are trained on large amounts of data. The Temple University Hospital EEG Corpus (TUH EEG) is the world’s largest publicly available resource and contains over 28,00 EEG records. However, it is the nature of clinical data that there are many forms of inconsistency including channel labels, sampling rates and electrode positioning. In this study, we examine one such issue: 51% of the recordings in TUH EEG are REF referenced, while the other 49% are LE referenced. It is unclear whether this difference will have an affect on any statistical machine learning algorithms trained using the data. A rudimentary statistical analysis suggests that the means and variances of features generated from this data are significantly different and will require some form of post processing if the data are to be used together to train a single statistical model. Statistical Comparison of REF vs. LE Referenced EEG signals Aaron Gross, Silvia Lopez, Dr. Iyad Obeid and Dr. Joseph Picone The Neural Engineering Data Consortium, Temple University Introduction Since EEG signals are fundamentally voltages, each channel needs to be measured with respect to some common reference voltage in order for a meaningful comparison of the channels to be done. 51% of the EEG recordings in TUH EEG are REF referenced — any electrode on the scalp is used as a reference, then the mean of all the electrodes are used to create an average reference. 49% are LE referenced – voltages are measured using the left ear lobe as a reference. The features calculated from these two types of recordings could have a systematic bias that can have a substantial negative impact on our machine learning algorithms. To further probe this issue, we performed two types of statistical analyses:  Simple descriptive statistical measures, such as mean and variance, that form the basis for our feature extraction process, were analyzed.  More advanced measures based on Principle Component Analysis (PCA) were employed to analyze the distribution of the variance in the data when varying its dimensionality. Introduction Since EEG signals are fundamentally voltages, each channel needs to be measured with respect to some common reference voltage in order for a meaningful comparison of the channels to be done. 51% of the EEG recordings in TUH EEG are REF referenced — any electrode on the scalp is used as a reference, then the mean of all the electrodes are used to create an average reference. 49% are LE referenced – voltages are measured using the left ear lobe as a reference. The features calculated from these two types of recordings could have a systematic bias that can have a substantial negative impact on our machine learning algorithms. To further probe this issue, we performed two types of statistical analyses:  Simple descriptive statistical measures, such as mean and variance, that form the basis for our feature extraction process, were analyzed.  More advanced measures based on Principle Component Analysis (PCA) were employed to analyze the distribution of the variance in the data when varying its dimensionality. College of Engineering Temple University Mean and Variance Normalization In mean and variance normalization, features are transformed to have zero mean and unit variance. This process is important for a number of reasons. Features with unit variance along each dimension have an identity covariance matrix, meaning that each feature is statistically independent. This makes factoring a joint probability distribution simpler. Mean and variance normalization is an important step to take before performing PCA. Since PCA depends on the magnitude of variances in data, feature scaling reduces the effects of biasing towards larger features. Similarly, in processes like gradient descent, larger features will cause certain weights to update faster than others. In summary, mean and variance normalization of features is an important step in machine learning that solves many problems that could arise during training, and generally leads to improved performance. The current baseline system uses a single global whitening transformation on the features. This portion of the experiment is currently ongoing. Mean and Variance Normalization In mean and variance normalization, features are transformed to have zero mean and unit variance. This process is important for a number of reasons. Features with unit variance along each dimension have an identity covariance matrix, meaning that each feature is statistically independent. This makes factoring a joint probability distribution simpler. Mean and variance normalization is an important step to take before performing PCA. Since PCA depends on the magnitude of variances in data, feature scaling reduces the effects of biasing towards larger features. Similarly, in processes like gradient descent, larger features will cause certain weights to update faster than others. In summary, mean and variance normalization of features is an important step in machine learning that solves many problems that could arise during training, and generally leads to improved performance. The current baseline system uses a single global whitening transformation on the features. This portion of the experiment is currently ongoing. Summary Systematic biases such as those found in the difference in mean and variance for the REF an LE classes can cause incompatibilities between features generated by the two. Normalization can be used to overcome incompatibilities such as this, and generally improves performance. This analysis was performed for only the first channel of data, meaning one location on the scalp. This poses a problem because channel features could change depending on how near or far the channel electrodes are from the reference. To obtain more conclusive results, the analysis will be repeated for several other channels to measure the degree of channel mismatch for each class. Acknowledgements This research was funded by the Temple University Honors Program’s Merit Scholar Stipend Program. Summary Systematic biases such as those found in the difference in mean and variance for the REF an LE classes can cause incompatibilities between features generated by the two. Normalization can be used to overcome incompatibilities such as this, and generally improves performance. This analysis was performed for only the first channel of data, meaning one location on the scalp. This poses a problem because channel features could change depending on how near or far the channel electrodes are from the reference. To obtain more conclusive results, the analysis will be repeated for several other channels to measure the degree of channel mismatch for each class. Acknowledgements This research was funded by the Temple University Honors Program’s Merit Scholar Stipend Program. Feature Extraction Feature extraction generates a vector of measurements 10 times a second. A 200 msec analysis window is used. For each frame, 9 basic features are computed:  frequency domain energy (E f )  7 cepstral coefficients (c 1 -c 7 )  differential energy (E d ) From these base features, derivative and second derivative terms are calculated, excluding the second derivative of the differential energy term. Any systematic biases in the statistics can be amplified by the differentiation process. The feature extraction process could result in features which are incompatible across the two reference classes. If so, normalization techniques must be use. The goal of this study is to investigate the need for such normalization techniques. Feature Extraction Feature extraction generates a vector of measurements 10 times a second. A 200 msec analysis window is used. For each frame, 9 basic features are computed:  frequency domain energy (E f )  7 cepstral coefficients (c 1 -c 7 )  differential energy (E d ) From these base features, derivative and second derivative terms are calculated, excluding the second derivative of the differential energy term. Any systematic biases in the statistics can be amplified by the differentiation process. The feature extraction process could result in features which are incompatible across the two reference classes. If so, normalization techniques must be use. The goal of this study is to investigate the need for such normalization techniques. Methodology Descriptive Statistics:  The mean and variance of each element of the feature vector were calculated for both subsets of the data to determine if there were systematic biases in the vectors.  These were also compared to the global mean of the pooled data to determine the significance and direction of the bias. PCA Analysis:  The covariance of this data was calculated, resulting in a 9x9 covariance matrix. The eigenvectors and eigenvalues of this matrix were calculated.  Our baseline technology, which uses hidden Markov models, typically assume the features are uncorrelated and models only the diagonal elements of this covariance matrix. This is known as variance normalization.  A major goal of this study was to asses the extent to which the features are correlated and cross- correlated, so that we can assess the need for something more sophisticated than a simple variance normalization approach. Methodology Descriptive Statistics:  The mean and variance of each element of the feature vector were calculated for both subsets of the data to determine if there were systematic biases in the vectors.  These were also compared to the global mean of the pooled data to determine the significance and direction of the bias. PCA Analysis:  The covariance of this data was calculated, resulting in a 9x9 covariance matrix. The eigenvectors and eigenvalues of this matrix were calculated.  Our baseline technology, which uses hidden Markov models, typically assume the features are uncorrelated and models only the diagonal elements of this covariance matrix. This is known as variance normalization.  A major goal of this study was to asses the extent to which the features are correlated and cross- correlated, so that we can assess the need for something more sophisticated than a simple variance normalization approach. Feature MeanVariance REFLEREFLE EfEf c1c c2c c3c c4c c5c c6c c7c EdEd Figure 1. The International electrode placement system Table 1. The mean and variance of the base features. Figure 3. Variance as a function of magnitude of the eigenvalues for the REF and LE classes Figure 2. Frame-based feature extraction Figure 4. Covariance matrix eigenvector components