Human Language Technology Research Institute

Slides:



Advertisements
Similar presentations
Senior Consultant Neurologist Singapore General Hospital
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Abstract Electrical activity in the cortex can be recorded by surface electrodes. Electro Encephalography (EEG) machine records potential difference between.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
1 Affective Learning with an EEG Approach Xiaowei Li School of Information Science and Engineering, Lanzhou University, Lanzhou, China
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Extraction of Adverse Drug Effects from Clinical Records E. ARAMAKI* Ph.D., Y. MIURA **, M. TONOIKE ** Ph.D., T. OHKUMA ** Ph.D., H. MASHUICHI ** Ph.D.,K.WAKI.
Analysis of Temporal Lobe Paroxysmal Events Using Independent Component Analysis Jonathan J. Halford MD Department of Neuroscience, Medical University.
Abstract The emergence of big data and deep learning is enabling the ability to automatically learn how to interpret EEGs from a big data archive. The.
Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Learning Objective Chapter 9 The Concept of Measurement and Attitude Scales Copyright © 2000 South-Western College Publishing Co. CHAPTER nine The Concept.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Automatic Identification of Pro and Con Reasons in Online Reviews Soo-Min Kim and Eduard Hovy USC Information Sciences Institute Proceedings of the COLING/ACL.
Big Mechanism for Processing EEG Clinical Information on Big Data Aim 1: Automatically Recognize and Time-Align Events in EEG Signals Aim 2: Automatically.
Automatic Discovery and Processing of EEG Cohorts from Clinical Records Mission: Enable comparative research by automatically uncovering clinical knowledge.
Abstract Automatic detection of sleep state is important to enhance the quick diagnostic of sleep conditions. The analysis of EEGs is a difficult time-consuming.
Demonstration A Python-based user interface: Waveform and spectrogram views are supported. User-configurable montages and filtering. Scrolling by time.
Generating and Using a Qualified Medical Knowledge Graph for Patient Cohort Retrieval from Big Clinical Electroencephalography (EEG) Data Sanda Harabagiu,
Abstract Automatic detection of sleep state is an important queue in accurate detection of sleep conditions. The analysis of EEGs is a difficult time-consuming.
Managing Data Resources File Organization and databases for business information systems.
The Neural Engineering Data Consortium Mission: To focus the research community on a progression of research questions and to generate massive data sets.
Constructing Multiple Views The architecture of the window was altered to accommodate switching between waveform, spectrogram, energy, and all combinations.
Human Language Technology Research Institute
Automatically Labeled Data Generation for Large Scale Event Extraction
Queensland University of Technology
Phenotyping youth depression
Ramon Maldonado, Travis Goodwin, Sanda M. Harabagiu
Quality of Physician-Patient Communication during Hospitalization
Scalable EEG interpretation using Deep Learning and Schema Descriptors
Bloom’s Taxonomy of Learning
Results and Discussion
Classroom Assessment A Practical Guide for Educators by Craig A
The Big Data to Knowledge (BD2K)
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
CLASSIFICATION OF SLEEP EVENTS IN EEG’S USING HIDDEN MARKOV MODELS
THE TUH EEG SEIZURE CORPUS
6th Annual CTSOG Workshop, Ann Arbor MI
System Modeling Chapter 4
Improving a Pipeline Architecture for Shallow Discourse Parsing
Human Language Technology Research Institute
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
N. Capp, E. Krome, I. Obeid and J. Picone
MANAGING DATA RESOURCES
Names, titles & Affiliations
Lecture 12: Data Wrangling
Outcome Based Education
Optimizing Channel Selection for Seizure Detection
Vessel Extraction in X-Ray Angiograms Using Deep Learning
Names, titles & Affiliations
EEG Recognition Using The Kaldi Speech Recognition Toolkit
Big Data Resources for EEGs: Enabling Deep Learning Research
Names, titles & Affiliations
Names, titles & Affiliations
E. von Weltin, T. Ahsan, V. Shah, D. Jamshed, M. Golmohammadi, I
Names, titles & Affiliations
Names, titles & Affiliations
Automatic Interpretation of EEGs for Clinical Decision Support
feature extraction methods for EEG EVENT DETECTION
CHAPTER 7: Information Visualization
EEG Event Classification Using Deep Learning
A Dissertation Proposal by: Vinit Shah
Deep Residual Learning for Automatic Seizure Detection
EEG Event Classification Using Deep Learning
Bloom’s Taxonomy.
Our goal is to be thinking at a higher level.
Context-Aware Internet
Stance Classification of Context-Dependent Claims
Presentation transcript:

Human Language Technology Research Institute Active Deep Learning-Based Annotation of Electroencephalography Reports for Patient Cohort Retrieval Human Language Technology Research Institute Sanda Harabagiu, PhD, Travis Goodwin, Ramon Maldonado, Stuart Taylor The Human Language Technology Research Institute University of Texas at Dallas www.hlt.utdallas.edu Abstract The annotation of a large corpus of Electroencephalography (EEG) reports is a crucial step in the development of an EEG-specific patient cohort retrieval system. The annotation of multiple types of EEG-specific clinical concepts is challenging, especially when automatically performed on Big Data. To address this challenge, we developed a novel framework which combines the advantages of active and deep learning. Our Multi-task Active Deep Learning (MTADL) paradigm performs concurrently multiple identification tasks for: (1) EEG activities and their attributes, (2) EEG events, (3) medical problems, (4) medical treatments and (5) medical tests, along with their inferred forms of modality and polarity. An important step of the MTADL paradigm was the design of the deep learning architectures capable to identify multiple forms EEG-specific clinical concepts. We experimented with two deep learning architectures. The first architecture aims to identify (1) the anchors of all EEG activities; as well as (2) the boundaries of all mentions of EEG events, medical problems, medical treatments and medical tests. The second architecture was designed to recognize multiple attributes considered for each EEG activity, as well as the type of the EEG-specific medical concepts. In addition, the second deep learning architecture identifies the modality and the polarity of EEG-specific concepts. The ability to learn jointly multiple types of concepts and attributes was made possible by the sampling mechanism used in the MTADL paradigm, based on the rank combination protocol, which combines several single-task active learning selection decisions into one. The Multi-task Active Deep Learning (MTADL) paradigm requires the following 5 steps: STEP 1: The development of an annotation schema; STEP 2: Annotation of initial training data; STEP 3: Design of deep learning methods that are capable to be trained on the data; STEP 4: Development of sampling methods for Multi-task Active Deep Learning system STEP 5: Usage of the Active Learning system which involves: Step 5.a.: Accepting/Editing annotations of sampled examples Step 5.b.: Re-training the deep learning methods and evaluation the new system. Annotation Schema: The annotation schema that we have developed considered EEG events, medical problems, treatment and tests to be annotated in similar ways as in the 2012 i2b2 challenge, namely by specifying (1) the boundary of each mention of concepts; (2) the concept type; (3) its modality and (4) its polarity. However, the EEG activities could not be annotated in the same way. First, we noticed that EEG activities are not mentioned in a continuous expression. To solve this problem, we annotated the anchors of EEG activities and their attributes. Since one of the attributes of EEG activities, namely, morphology, best defines these concepts, we decided to use it as an anchor. We considered three classes of attributes for EEG activities, namely (a) general attributes of the waves, e.g. the morphology, the frequency band; (b) temporal attributes and (c) spatial attributes. All attributes have multiple possible values associated with them. When annotating the morphology attribute we considered a hierarchy of values, distinguishing first two types: (1) Rhythm and (2) Transient. In addition, the Transient type contains three subtypes: Single Wave, Complex and Pattern. Each of these sub-types can take multiple possible values. Architecture of the Multi-Task Active Deep Learning for annotating EEG reports Design of Deep Learning Architectures: The second deep neural architecture is designed to recognize (i) the sixteen attributes that we have considered for each EEG activity, as well as (ii) the type of the EEG-specific medical concepts, discriminated as either an EEG event (EV), a medical problem (MP), a medical test (Test) or a medical treatment (TR). In addition, the second deep learning architecture identifies the modality and the polarity of these concepts. Design of Deep Learning Architectures: The first architecture aims to identify (1) the anchors of all EEG activities mentioned in an EEG report; as well as (2) the boundaries of all mentions of EEG events, medical problems, medical treatments and medical tests. Example B: CLINICAL HISTORY: 58 year old woman found [unresponsive]<TYPE=MP, MOD=Factual, POL=Positive>, history of [multiple sclerosis]<TYPE=MP, MOD=Factual, POL=Positive>, evaluate for [anoxic encephalopathy]<TYPE=MP, MOD=Possible, POL=Positive>. MEDICATIONS: [Depakote]<TYPE=TR, MOD=Factual, POL=Positive>, [Pantoprazole]<TYPE=TR, MOD=Factual, POL=Positive>, [LOVENOX]<TYPE=TR, MOD=Factual, POL=Positive>. INTRODUCTION: [Digital video EEG]<TYPE=Test, MOD=Factual, POL=Positive> was performed at bedside using standard 10.20 system of electrode placement with 1 channel of [EKG]<TYPE=Test, MOD=Factual, POL=Positive>. When the patient relaxes and the [eye blinks]<TYPE=EV, MOD=Factual, POL=Positive> stop, there are frontally predominant generalized [spike and wave discharges]<MORPHOLGY=Transient>Complex>Spike and slow wave complex, FREQUENCYBAND=Delta, BACKGROUND=No, MAGNITUDE=Normal, RECURRENCE=Repeated, DISPERSAL=Generalized, HEMISPHERE=N/A, LOCATION={Frontal}, MOD=Factual, POL=Positive> as well as [polyspike and wave discharges]<MORPHOLGY=Transient>Complex>Polyspike and slow wave complex, FREQUENCYBAND=Delta, BACKGROUND=No, MAGNITUDE=Normal, RECURRENCE=Repeated, DISPERSAL=Generalized, HEMISPHERE=N/A, LOCATION={Frontal}, MOD=Factual, POL=Positive> at 4 to 4.5 Hz. Example A: CLINICAL HISTORY: 58 year old woman found [unresponsive]2, history of [multiple sclerosis]2, evaluate for [anoxic encephalopathy]2. MEDICATIONS: [Depakote]2, [Pantoprazole]2, [LOVENOX]1. INTRODUCTION: [Digital video EEG]2 was performed at bedside using standard 10.20 system of electrode placement with 1 channel of [EKG]2. When the patient relaxes and the [eye blinks]2 stop, there are frontally predominant generalized [spike and wave discharges]1 as well as [polyspike and wave discharges]1 at 4 to 4.5 Hz. Development of Sampling Methods: The choice of sampling mechanism is crucial for validation as it determines what makes one annotation a better candidate for validation over another. Multi-task Active Deep Learning (MTADL) is an active learning paradigm for multiple annotation tasks where new EEG reports are selected to be as informative as possible for a set of annotation tasks instead of a single annotation task. The sampling mechanism that we designed used the rank combination protocol, which combines several single-task active learning selection decisions into one. The usefulness score 𝑠 𝑋 𝑗 (𝛼) of each un-validated annotation 𝛼 from an EEG Report is calculated with respect to each annotation task 𝑋 𝑗 and then translated into a rank 𝑟 𝑋 𝑗 (𝛼) where higher usefulness means lower rank (examples with identical scores get the same rank). Then, for each EEG Report, we sum the ranks of each annotation task to get the overall rank 𝑟 𝛼 = 𝑗=1 𝑟 𝑋 𝑗 (𝛼) . All examples are sorted by this combined rank and annotations with lowest ranks are selected for validation. For each annotation task, we score an EEG Report 𝑑: 𝑠 𝑋 𝑗 𝑑 = 1 𝑑 𝑎∈𝑑 𝐻(𝛼) where 𝛼 is an annotation from 𝑑 and 𝑑 is the number of annotations in document 𝑑, and 𝐻 𝛼 =− 𝑐 𝑞 𝑐 𝛼 log 𝑞 𝑐 𝛼 is the Shannon Entropy of 𝛼. This protocol favors selecting documents containing annotations the model is uncertain about from all annotation tasks. Attributes & Attribute Values A P R F1 Morphology 0.99 0.75 0.70 0.72 DISORGANIZATION 0.97 0.88 0.78 0.83 POLYSPIKE_AND_WAVE 0.22 0.40 0.28 AMPLITUDE_GRADIENT 1.00 0.90 SPIKE_AND_SLOW_WAVE 0.94 0.95 SPIKE 0.85 0.77 PLEDS 0.50 0.60 LAMBDA_WAVE K_COMPLEX POLYSPIKE 0.52 0.62 SLOW_WAVE 0.92 0.93 RHYTHM 0.81 0.91 0.86 Attributes & Attribute Values A P R F1 SLEEP_SPINDLE 0.99 1.00 0.91 0.95 SHARP_AND_SLOW_WAVE 0.60 0.50 0.54 SUPPRESSION 0.84 0.88 PHOTIC_DRIVING 0.94 0.97 TRIPHASIC_WAVE 0.90 SHARP_WAVE 0.98 0.96 0.92 WICKET EPILEPTIFORM_DISCHARGE 0.89 SLOWING BREACH_RHYTHM 0.58 0.73 VERTEX_WAVE Magnitude 0.80 0.71 0.75 Deep Learning architecture for the identification of (1) the EEG activity anchors and (2) the boundaries of expressions of (a) EEG events, (b) medical problems; (c) tests and (d) treatments. Deep Learning Architectures for Automatic Recognition of (1) attributes of EEG activities; (2) type for all the other clinical concepts expressed in EEG reports; and (3) modality and polarity for all concepts Learning Curves for All Annotations Acknowledgements: Research reported in this poster was supported by the National Human Genome Research Institute of the National Institutes of Health under award number  1U01HG008468. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health