EEG Recognition Using The Kaldi Speech Recognition Toolkit

Slides:



Advertisements
Similar presentations
M. Tabrizi: Seizure Detection May, A COMPARATIVE ANALYSIS OF NONLINEAR FEATURES FOR AN HMM-BASED SEIZURE DETECTION SYSTEM Masih Tabrizi and Joseph.
Advertisements

Yasuhiro Fujiwara (NTT Cyber Space Labs)
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Corpus Development EEG signal files and reports had to be manually paired, de-identified and annotated: Corpus Development EEG signal files and reports.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Abstract EEGs, which record electrical activity on the scalp using an array of electrodes, are routinely used in clinical settings to.
Automatic Labeling of EEGs Using Deep Learning M. Golmohammadi, A. Harati, S. Lopez I. Obeid and J. Picone Neural Engineering Data Consortium College of.
English vs. Mandarin: A Phonetic Comparison Experimental Setup Abstract The focus of this work is to assess the performance of three new variational inference.
Data Processing Machine Learning Algorithm The data is processed by machine algorithms based on hidden Markov models and deep learning. They are then utilized.
Abstract The emergence of big data and deep learning is enabling the ability to automatically learn how to interpret EEGs from a big data archive. The.
Speech and Language Processing
Acknowledgements This research was also supported by the Brazil Scientific Mobility Program (BSMP) and the Institute of International Education (IIE).
Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.
Views The architecture was specifically changed to accommodate multiple views. The used of the QStackedWidget makes it easy to switch between the different.
THE TUH EEG CORPUS: A Big Data Resource for Automated EEG Interpretation A. Harati, S. López, I. Obeid and J. Picone Neural Engineering Data Consortium.
Temple University QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone, PhD Department of Electrical and Computer.
Data Acquisition An EEG measurement represents a difference between the voltages at two electrodes. The signal is usually displayed using a montage which.
Weekly presentation Jônatas Macêdo Soares 6/15/2015.
The Goal Use computers to aid physicians in diagnosis of neurological diseases, particularly epilepsy Detect pathological events in real time Currently,
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
TUH EEG Corpus Data Analysis 38,437 files from the Corpus were analyzed. 3,738 of these EEGs do not contain the proper channel assignments specified in.
Weekly Presentation Fabricio Teles Dutra Goncalves 05/22/2015.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Experimental Results Abstract Fingerspelling is widely used for education and communication among signers. We propose a new static fingerspelling recognition.
Automatic Discovery and Processing of EEG Cohorts from Clinical Records Mission: Enable comparative research by automatically uncovering clinical knowledge.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
Automated Interpretation of EEGs: Integrating Temporal and Spectral Modeling Christian Ward, Dr. Iyad Obeid and Dr. Joseph Picone Neural Engineering Data.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
Abstract Automatic detection of sleep state is important to enhance the quick diagnostic of sleep conditions. The analysis of EEGs is a difficult time-consuming.
Demonstration A Python-based user interface: Waveform and spectrogram views are supported. User-configurable montages and filtering. Scrolling by time.
Market: Customer Survey: 57 clinicians from academic medical centers and community hospitals, and 44 industry professionals. Primary Customer Need: 70%
Data Analysis Generation of the corpus statistics was accomplished through the analysis of information contained in the EDF headers. Figure 4 shows some.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Views The architecture was specifically changed to accommodate multiple views. The used of the QStackedWidget makes it easy to switch between the different.
Abstract Automatic detection of sleep state is an important queue in accurate detection of sleep conditions. The analysis of EEGs is a difficult time-consuming.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Improved EEG Event Classification Using Differential Energy A.Harati, M. Golmohammadi, S. Lopez, I. Obeid and J. Picone Neural Engineering Data Consortium.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
The Neural Engineering Data Consortium Mission: To focus the research community on a progression of research questions and to generate massive data sets.
Constructing Multiple Views The architecture of the window was altered to accommodate switching between waveform, spectrogram, energy, and all combinations.
Constructing Multiple Views The architecture of the window was altered to accommodate switching between waveform, spectrogram, energy, and all combinations.
Results from Mean and Variance Calculations The overall mean of the data for all features was for the REF class and for the LE class. The.
Descriptive Statistics The means for all but the C 3 features exhibit a significant difference between both classes. On the other hand, the variances for.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Human Language Technology Research Institute
Scalable EEG interpretation using Deep Learning and Schema Descriptors
Online Multiscale Dynamic Topic Models
College of Engineering Temple University
CLASSIFICATION OF SLEEP EVENTS IN EEG’S USING HIDDEN MARKOV MODELS
G. Suarez, J. Soares, S. Lopez, I. Obeid and J. Picone
Enhanced Visualizations for Improved Real-Time EEG Monitoring
College of Engineering
THE TUH EEG SEIZURE CORPUS
Enhanced Visualizations for Improved Real-Time EEG Monitoring
Optimizing Channel Selection for Seizure Detection
Big Data Resources for EEGs: Enabling Deep Learning Research
AN ANALYSIS OF TWO COMMON REFERENCE POINTS FOR EEGS
E. von Weltin, T. Ahsan, V. Shah, D. Jamshed, M. Golmohammadi, I
Improved EEG Event Classification Using Differential Energy
Automatic Interpretation of EEGs for Clinical Decision Support
feature extraction methods for EEG EVENT DETECTION
LECTURE 15: REESTIMATION, EM AND MIXTURES
Machine Learning for Visual Scene Classification with EEG Data
EEG Event Classification Using Deep Learning
Deep Residual Learning for Automatic Seizure Detection
EEG Event Classification Using Deep Learning
Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen
Presentation transcript:

EEG Recognition Using The Kaldi Speech Recognition Toolkit Jonatas Macedo Soares, Fabricio Goncalves, Dr. Iyad Obeid and Dr. Joseph Picone The Neural Engineering Data Consortium, Temple University College of Engineering Temple University www.isip.piconepress.com Abstract The goal of this research is to adapt the Kaldi toolkit to the recognition of EEG signal events. Kaldi is a popular open source speech recognition system that supports a finite state transducer approach to acoustic modeling, and includes a powerful deep learning module. Six types of EEG events are detected: signal events (GPED, PLED, SPSW) and background events (ARTF, EYEM, BCKG). As a first step, we reproduced several well-known speech recognition experiments and exceeded our previous best performance on these tasks. Performance on a new task, AlphaDigits, however, was problematic. Performance on a baseline EEG task based on the TUH EEG Corpus was significantly lower than our baseline based on hidden Markov models (a detection error rate of 63% vs. 33%). Kaldi: Open Source Speech Recognition Our baseline EEG and speech recognition technology uses a hidden Markov model (HMM) based approach to acoustic modeling and a dynamic programming (DP) beam search for language modeling. Kaldi is an open source speech recognition toolkit which uses finite state transducers (FST) for both acoustic and language modeling. Both tasks are integrated into a single large transducer. Kaldi has excelled at very large vocabulary recognition and has become a popular alternative to other open source tools. It is based on the AT&T OpenFst toolkit and supports deep learning. An FST is a machine that maps a sequence of inputs to a sequence of outputs. Kaldi uses FSTs for both acoustic and language modeling. For example, a lexicon converts words to phonemes. An acoustic model converts phonemes to states: Baseline Experiments Three speech databases were used in this research to obtain a better understanding of Kaldi: TIDigits: Men, women, boys and girls reading digit strings of varying lengths. Resource Management (RM): A medium-sized vocabulary with 1,000 sentence patterns and bigram perplexity of 60. AlphaDigits (AD): letters and numbers spoken over the telephone. TIDigits: Resource Management: AlphaDigits: Baseline EEG Experiments In this experiment, we used the same features generated by HTK (.htk files). Originally, there were used 264 .htk files for training and 264 for evaluation. From those files, 2902 segments and 1658 segments were generated for training and evaluation respectively, where each segment corresponds to a 1-second event. Kaldi formatted feature files were then generated by extracting features from the HTK files for the corresponding segments. These features files consisted of two files: an archive (.ark) file and a script (.scp) file. ARK is a binary file that contains the feature vectors. SCP is a script file that defines the location of the features. There are six models, with approximately 5 mixtures per state, for the classification of all six events (GPED, PLED, SPSW, ARTF, EYEM, BCKG). In this experiment, the lexicon FST maps one symbol per event as if each event were a word with only one phoneme. In this experiment we use Kaldi (and ISIP/HTK) to compute likelihoods; no language model is used. Results Table 4 presents a comparison of the detection error rate for all six classes of events for closed and open- loop testing: In Table 4 we see that the error rate for Kaldi is about twice as high as the baseline system. In Table 7, we see that many of the errors are associated with the background classes and can be ignored. SPSW has the highest error rate due to the fact that these events occur less frequently. Evaluation Set ISIP/HTK Kaldi Training 19.0% 35.58% Test 33.2% 61.03% Table 4: Detection error rates for the TUH EEG Corpus Ref/Hyp: artf: bckg: eyem: gped: pled: spsw: 128 (61.84%) 79 (38.16%) 0 (0.00%) 160 (17.45%) 718 (78.30%) 23 (2.51%) 13 (1.42%) 3 (0.33%) 3 (5.66%) 45 (84.91%) 2 (3.77%) 1 (0.80%) 3 (2.40%) 67 (53.60%) 51 (40.80%) 2 (1.60%) 7 (5.22%) 1 (0.75%) 25 (18.66%) 88 (65.67%) 6 (4.48%) 15 (11.36%) 17 (12.88%) 54 (40.91%) 30 (22.73%) 1 (0.76%) Acoustic Model ISIP/HTK Kaldi Monophones 2.95% 1.00% Triphones 0.57% 0.40% Triphones (LDA) N/A 0.30% Table 5: Confusion matrix for ISIP/HTK Table 1: A comparison of results on TIDigits Ref/ Hyp: artf bckg eyem gped pled spsw del 93 (44.93%) 84 (40.58%) 0 (0.00%) 11 (5.31%) 2 (0.97%) 17 (8.21%) 312 (34.02%) 400 (43.62%) 80 (8.72%) 10 (1.09%) 15 (1.64%) 100 (10.91%) 4 (7.55%) 1 (1.89%) 39 (73.58%) 3 (5.66%) 2 (3.77%) 12 (9.60%) 3 (2.40%) 5 (4.00%) 48 (38.40%) 44 (35.20%) 8 (6.40%) 17 (12.69%) 2 (1.49%) 14 (10.45%) 24 (17.91%) 59 (44.03%) 11 (8.21%) 7 (5.22%) 23 (17.42%) 24 (18.18%) 14 (10.61%) 27 (20.45%) 17 (12.88%) 4 (3.03%) ins 14 7 1 6 3 Introduction Electroencephalography (EEG) measures the electrical activity in the brain and is used to diagnose patients suffering from epilepsy and strokes. We use a maximum likelihood approach for classifying 1 second epochs of an EEG signal into: Signal Events: generalized periodic epileptiform discharge (GPED), periodic lateralized epileptiform discharge (PLEDs), spike and sharp wave (SPSW) Noise Events: artifact (ARTF), eye movement (EYEM), and background activity (BCKG). Spikes and sharp waves are particularly important for accurate diagnoses of brain disorders: Acoustic Model ISIP/HTK Kaldi Monophones 22.06% 8.73% Triphones 10.04% 3.57% Table 2: A comparison of results on RM Table 6: Confusion matrix for Kaldi Acoustic Model HTK Kaldi Monophones 12.2% 20.86% Triphones 10.5% 17.22% Ref/ Hyp: bckg: gped: pled: spsw: del 933 (79.27%) 95 (8.07%) 15 (1.27%) 119 (10.11%) 20 (16.00%) 48 (38.40%) 44 (35.20%) 8 (6.40%) 5 (4.00%) 33 (24.63%) 24 (17.91%) 59 (44.03%) 11 (8.21%) 7 (5.22%) 61 (46.21%) 27 (20.45%) 17 (12.88%) 4 (3.03%) 23 (17.42%) Figure 3: An overview of Kaldi Table 3: A comparison of results on AD Table 7: 4-way confusion matrix on Kaldi Summary Kaldi demonstrates significant potential on well- known tasks provided as demos (e.g., RM). Kaldi’s generic algorithms and universal recipes make it straightforward to work on new tasks. Performance on a new task (AD), however, is significantly below state of the art. Performance on EEG data is also below our previously published levels of performance. Acknowledgement This research was also supported by the Brazil Scientific Mobility Program (BSMP) and the Institute of International Education (IIE). Figure 1: PLEDs occurring in the left hemisphere Figure 4: Lexicon FST for TIDigits Figure 2: An example of a spike