THE TUH EEG SEIZURE CORPUS

THE TUH EEG SEIZURE CORPUS
M. Golmohammadi1, V. Shah2, S. Lopez2, S. Ziyabari2, S. Yang2, J. Camaratta1, I. Obeid2 and J. Picone2 1. Biosignal Analytics, Inc. 2. The Neural Engineering Data Consortium, Temple University Abstract Introduction: Automatic seizure detection can reduce the time to diagnosis and enhance real- time applications such as ICU monitoring. A major goal of this study was to generate a large annotated corpus of seizure events to support the development of machine learning technology. Methods: Using the TUH EEG Corpus, we implemented a semi-automated strategy: EEG reports were parsed using natural language processing techniques to locate sessions most likely to contain seizures. Two seizure detection tools (Persyst and AutoEEG) were used to identify sessions likely to contain seizures. Sessions for which both tools agreed with high confidence were manually annotated by a group of experts based on ACNS guidelines. The data was partitioned by patient into an evaluation and training set. Results: The current dataset includes 50 patients for evaluation and 219 patients for training. Conclusion: The existence of the TUH EEG Seizure Corpus provides a sufficient amount of data for machine learning research. Visualization Tools For Annotation NLP-Based Parsing of Reports Corpus Statistics Seizure event annotations include: start and stop times; localization of a seizure (e.g., focal, generalized) type of seizure (e.g., simple partial); nature of the seizure (e.g., convulsive). Non-seizure event annotations include: artifacts which could be confused with seizure- like events such as ventilatory artifacts; non-epileptiform activity that may resemble epileptiform discharges, such as psychomotor variant, mu, breach rhythms and POSTS; abnormal background which could be confused with seizure-like events (e.g. triphasics); interictal and postictal states. There are multiple sessions for each patient record; each expert reviewed the entire record. Our goal is to reach a consensus amongst all annotators, so several iterations are being conducted to reconcile differences. The TUH EEG Seizure Corpus EEG Reports EEG Signals NegEx Seizures were not observed during the recording [PREN] Two seizures were observed as the patient … [AFFR] Seizure Annotation Train Eval Patients 219 50 Sessions 590 262 Files 2270 1190 Seizure (hrs.) 12 Non-Seizure (hrs.) 300 140 Total (hrs.) 350 152 An annotation tool was developed to increase productivity, accuracy and consistency. Waveform, spectrogram and energy displays are supported in user-customizable displays. Alternate methods for visualizing signals allows more accurate identification of seizure start times. Standard filters commonly found in commercial EEG tools (e.g., notch filters) are supported. Users can scroll forward by time or by selected events. Per-channel and per-epoch labels are supported. Integrated annotation and cohort retrieval tools are being developed in related research projects. A modular object-oriented Python programming environment is used that makes it easy to add views and customize displays. EEG reports were parsed using a natural language processing (NLP) method based on NegEx to most likely sessions with seizures. Algorithm: (1) Pre-process reports to show one sentence per line (2) Remove all punctuation (3) Index medical conditions (4) Index different types of negation. Two types of negation were selected: [POST] and [PREN]. Labels for the word “seizure” ([PRES]) and affirmative expressions ([AFFR]). Approximately 25% of the sessions identified contained seizures. The TUH EEG Data Corpus Existing corpora are not large enough to train complex deep learning models: The CHB-MIT dataset contains only EEG recordings from 22 pediatric subjects. IEEG used only intracranial EEGs and contains EEG recordings from animals and humans. Our publicly available corpus consists of 30,000+ clinical EEG recordings from 16,000+ patients (see Corpus development involved pairing, de- identification and annotation of EEG data: Data Extraction and Annotation Process Outcomes The TUH EEG Seizure Detection Corpus is an ongoing effort that includes: identifying and annotating the remaining sessions with seizures in the TUH EEG Corpus; manually reviewing each annotation by a panel of at least three expert neurologists; collecting marked data from other institutions (e.g., NYU, Duke and Emory). The TUH EEG Seizure Corpus is the world’s largest open-source clinical EEG seizure corpus including more than 500 hrs. of EEGs. A hybrid machine learning system was developed on this data using a combination of hidden Markov models (HMMs) for sequential decoding and deep learning for postprocessing. A deep learning system is also under development. Acknowledgements Research reported in this poster was supported by National Human Genome Research Institute of the NIH under award number 3U01HG S1 and by the NSF under Grant No. IIP The TUH EEG Corpus development was sponsored by DARPA, Temple University’s College of Engineering and Office of Research. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding organizations. The lack of big data resources that can be used to train sophisticated statistical models compounds a major problem in automatic seizure detection. Manual annotation of a large amount of data by a team of certified neurologists is extremely expensive and time consuming. We have developed a team of students trained by an expert to expedite data selection and preliminary annotation. Two commercially-available automatic seizure detection tools were used to find sessions that most likely have seizures. These sessions were annotated by a group of trained students using medical reports and advanced visualization tools. The inter-rater agreement was found to be using the kappa statistic. Each seizure event was manually annotated by at least three neurologists as well as three members of our annotation team. Types of seizures included in the corpus: Tonic, Tonic-Clonic, Simple-partial, Complex-partial, Myoclonic, and Absence. The corpus includes many EEGs that are difficult to interpret, which is crucial for training robust machine learning technology. For example, the screenshot to the left is from a patient with Lennox-Gastaut syndrome. It is a very hard to identify a seizure event, particularly the onset of the seizure, when there are generalized spikes or sharp waves. Lennox-Gastaut Syndrome

THE TUH EEG SEIZURE CORPUS

Similar presentations

Presentation on theme: "THE TUH EEG SEIZURE CORPUS"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

THE TUH EEG SEIZURE CORPUS

Similar presentations

Presentation on theme: "THE TUH EEG SEIZURE CORPUS"— Presentation transcript:

Similar presentations

About project

Feedback