Ramon Maldonado, Travis Goodwin, Sanda M. Harabagiu

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Title Course opinion mining methodology for knowledge discovery, based on web social media Authors Sotirios Kontogiannis Ioannis Kazanidis Stavros Valsamidis.

Xyleme A Dynamic Warehouse for XML Data of the Web.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Presented by Zeehasham Rasheed

Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.

Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.

Automatic Labeling of EEGs Using Deep Learning M. Golmohammadi, A. Harati, S. Lopez I. Obeid and J. Picone Neural Engineering Data Consortium College of.

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.

Analysis of Temporal Lobe Paroxysmal Events Using Independent Component Analysis Jonathan J. Halford MD Department of Neuroscience, Medical University.

Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.

THE TUH EEG CORPUS: A Big Data Resource for Automated EEG Interpretation A. Harati, S. López, I. Obeid and J. Picone Neural Engineering Data Consortium.

A Language Independent Method for Question Classification COLING 2004.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

C. Lawrence Zitnick Microsoft Research, Redmond Devi Parikh Virginia Tech Bringing Semantics Into Focus Using Visual.

Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Big Mechanism for Processing EEG Clinical Information on Big Data Aim 1: Automatically Recognize and Time-Align Events in EEG Signals Aim 2: Automatically.

Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.

Automatic Discovery and Processing of EEG Cohorts from Clinical Records Mission: Enable comparative research by automatically uncovering clinical knowledge.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Abstract Automatic detection of sleep state is important to enhance the quick diagnostic of sleep conditions. The analysis of EEGs is a difficult time-consuming.

Generating and Using a Qualified Medical Knowledge Graph for Patient Cohort Retrieval from Big Clinical Electroencephalography (EEG) Data Sanda Harabagiu,

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Abstract Automatic detection of sleep state is an important queue in accurate detection of sleep conditions. The analysis of EEGs is a difficult time-consuming.

Research Methodology Proposal Prepared by: Norhasmizawati Ibrahim (813750)

The Neural Engineering Data Consortium Mission: To focus the research community on a progression of research questions and to generate massive data sets.

Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.

Graph-based Dependency Parsing with Bidirectional LSTM Wenhui Wang and Baobao Chang Institute of Computational Linguistics, Peking University.

Human Language Technology Research Institute

Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C

Automatically Labeled Data Generation for Large Scale Event Extraction

Scalable EEG interpretation using Deep Learning and Schema Descriptors

Deep Learning for Bacteria Event Identification

The Big Data to Knowledge (BD2K)

An Artificial Intelligence Approach to Precision Oncology

Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.

CRF &SVM in Medication Extraction

Adversarial Learning for Neural Dialogue Generation

G. Suarez, J. Soares, S. Lopez, I. Obeid and J. Picone

A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis

Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD

Human Language Technology Research Institute

Efficient Estimation of Word Representation in Vector Space

Compact Query Term Selection Using Topically Related Text

N. Capp, E. Krome, I. Obeid and J. Picone

Distributed Representation of Words, Sentences and Paragraphs

Optimizing Channel Selection for Seizure Detection

Big Data Resources for EEGs: Enabling Deep Learning Research

E. von Weltin, T. Ahsan, V. Shah, D. Jamshed, M. Golmohammadi, I

Chap. 7 Regularization for Deep Learning (7.8~7.12 )

iSRD Spam Review Detection with Imbalanced Data Distributions

Automatic Interpretation of EEGs for Clinical Decision Support

Resource Recommendation for AAN

Designing Neural Network Architectures Using Reinforcement Learning

Machine Learning in Practice Lecture 27

Human Language Technology Research Institute

Ying Dai Faculty of software and information science,

A Dissertation Proposal by: Vinit Shah

Word embeddings (continued)

Hierarchical, Perceptron-like Learning for OBIE

Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.

CoXML: A Cooperative XML Query Answering System

Shengcong Chen, Changxing Ding, Minfeng Liu 2018

Presentation transcript:

Active Deep Learning-Based Annotation of Electroencephalography Reports for Cohort Identification Ramon Maldonado, Travis Goodwin, Sanda M. Harabagiu The University of Texas at Dallas Human Language Technology Research Institute http://www.hlt.utdallas.edu/~{ramon, travis, sanda}

There are no conflicts of interest First, I’m going to introduce out method and why we’ve done it.

Outline Introduction The data Multi-task Active Deep Learning Deep Learning Architectures Sampling Method Experimental Results Conclusion First, I’m going to introduce out method and why we’ve done it.

Introduction Clinical electroencephalography (EEG) is the most important investigation in the diagnosis and management of epilepsies. As more clinical EEG becomes available, the interpretation of EEG signals can be improved by providing neurologists with results of search for patients that exhibit similar EEG characteristics. MERCuRY (Multi-modal ElectroencephalogRam patient Cohort discoveRY) - Goodwin & Harabagiu (2016)1 for cohort identification - The EEG signal is complex, and thus its interpretation documented in EEG reports is known to have moderate inter-observer agreement. - Recently, Goodwin & Harabagiu (2016)1 have described the MERCuRY (Multi-modal ElectroencephalogRam patient Cohort discoveRY) system that operates on a multi-modal EEG index created by the automatic processing of both the EEG signals and the EEG reports that document and interpret the signals. - The MERCuRY system allows neurologists to search a vast data archive of clinical EEG signals and EEG reports, enabling them to discover patient populations relevant to specific queries.

Introduction QUERY: Patients taking topiramate (Topomax) with a diagnosis of headache and EEGs demonstrating sharp waves, spikes or spike/polyspike and wave activity EXAMPLE RECORD: CLINICAL HISTORY: Recently [seizure]PROB-free but with [episodes of light flashing in her peripheral vision]PROB followed by [blurry vision]PROB and [headaches] PROB MEDICATIONS: [Topomax]TR DESCRIPTION OF THE RECORD: There are also bursts of irregular, frontally predominant [sharply contoured delta activity]ACT, some of which seem to have an underlying [spike complex]ACT from the left mid-temporal region. The discovery of relevant patient cohorts satisfying the characteristics expressed in queries like this one relies on the ability to automatically and accurately recognize medical concepts both in the queries and throughout the collection of EEG reports. This example record is clearly relevant to the query because it shares key medical concepts with the query including Topomax Headache spikes In general, to determine which records are relevant to a query we can automatically annotate/detect medical concepts in both the query and records. As more EEG data becomes available, new deep learning techniques show promise for producing such annotations with high efficiency and accuracy. The context surrounding EEG activity mentions contains relevant, characteristic information pertaining to the EEG activities. Instead of annotating the entire span of relevant text, we encode much of the information in the form of attributes. i.e., [sharply contoured delta activity] has location=frontal

Introduction Active Learning has been proven to effectively reduce the amount of human annotation and validation needed when an efficient sampling mechanism is utilized because it selects, for validation, instances whose annotation will have the most impact on learning quality. In the work of Hahn et al. (2012)2, active-learning-based annotation operating on MEDLINE abstracts was used to identify medical concepts. However, in our work, in addition to annotating medical concepts in biomedical text, we annotate attributes of those concepts and annotate non-contiguous mentions of one type of medical concept (EEG Activities) using an annotation schema that captures the semantic richness of attributes of EEG Activities. - To automatically annotate these kind of medical concepts, we first need to manually annotate a subset of the reports to use as training data. - Because EEG Activities themselves are so complex, their representations in the text are equally complex.

Outline Introduction The data Multi-task Active Deep Learning Deep Learning Architectures Sampling Method Experimental Results Conclusion

The Data EEG reports from Temple University Hospital (TUH) Sections: 25,000 reports from 15,000 patients collected over 12 years Sections: Clinical History: Lists past and current medical problems, symptoms, signs, and treatments as well as significant medical events. Medications Introduction: depiction of the techniques used for the EEG Description: a complete and objective description of the EEG, noting all observed activity, patterns, and events Impression: states whether the EEG test is normal or abnormal and, if abnormal, lists the abnormalities in order of importance Clinical Correlation: explains what the EEG findings mean in terms of clinical interpretation - American Clinical Neurophysiology Society Guidelines for writing EEG reports - Idiopathic generalized epilepsy

Outline Introduction The data Multi-task Active Deep Learning Deep Learning Architectures Sampling Method Experimental Results Conclusion In order to both reduce the amount of manual annotation and to train deep learning systems capable of automatic annotation, we developed the multi-task active deep learning paradigm

Multi-task Active Deep Learning The goal of the Multi-task Active Deep Learning (MTADL) paradigm is to concurrently perform multiple annotation tasks corresponding to the identification of: EEG Activities EEG Events Medical Problems Medical Treatments Medical Tests The relevant attributes for each medical concept type The Modality3 of each of the above medical concepts The Polarity3 of each of the above medical concepts Medical Concept Type EEG Activity attributes - EEG event: any extracerebral force that activates the EEG - i2b2 2012 on evaluating temporal relations in medical text Modality: used to determine if a medical concept mentions have actually occurred, have possibly occurred, of are proposed to be occurring now in the future

Multi-task Active Deep Learning The MTADL Paradigm consists of 5 steps: STEP 1: The development of an annotation schema STEP 2: Annotation of initial training data STEP 3: Design of deep learning methods capable of learning from the data STEP 4: Development of sampling methods for MTADL STEP 5: Usage of the Active Learning system involving: STEP 5.a: Accepting/Editing annotations of sampled examples STEP 5.b: Re-training the deep learning methods

MTADL – Annotation Schema Medical Concept Annotation Schema Type Medical Problem Medical Treatment Medical Test EEG Event EEG Activity Modality Factual Possible Proposed Polarity Positive Negative

MTADL – Annotation Schema EEG Activity Attributes Morphology: represents the type or “form” of EEG waves Rhythm Transient Single Wave Spike Sharp Wave … Complex K-complex Polyspike complex Pattern PLED Suppression Frequency Band: alpha, beta, delta, theta, gamma Background: is the EEG activity in the background Magnitude: describes the amplitude of the EEG activity if it is emphasized Recurrence: describes how often the EEG activity occurs Dispersal: describes the spread of the activity over regions of the brain Hemisphere: describes which hemisphere of the brain the activity occurs in Brain Location: the region of the brain in which the activity occurs Recurrence: continuous, repeated, none Brain location (9): standard 10-20 system of electrode placement

MTADL – Annotation Schema When the patient relaxes and the eye blinks stop, there are frontally predominant generalized spike and wave discharges as well as polyspike and wave discharges at 4 to 4.5 Hz. “spike and wave discharges” Morphology: Spike and Slow Wave Complex Freq. Band: Theta Background: No Magnitude: Normal Recurrence: Repeated Dispersal: Generalized Hemisphere: n/a Brain Location: Frontal “polyspike and wave discharges” Polyspike and Slow Wave Complex Theta No Normal Repeated Generalized n/a Frontal If we wanted to annotate the entire span of text that describes each activity, we would end up annotating the same span for both activities in this example. Instead, we annotate the span of text corresponding to each activity’s morphology. We refer to these spans as EEG Activity Anchors. The rest of the information in encoded in the form of attribtues

Multi-task Active Deep Learning From the full corpus of EEG Reports we randomly select a small subset and manually annotate medical concepts and their attributes. We use this initial training data to train two deep learning systems The first learns to detect EEG Activity anchors and the textual boundaries of the other medical concepts The second deep learning architecture learns to predict the attributes of each medical concept Once the two deep learners are trained, they are used to automatically annotate the entire corpus of EEG reports We then use our sampling method to select a subset of the automatically annotated reports and we manually validate and edit the automatic annotations in those reports. We introduce the newly validated documents into the training set and begin the active learning process anew by retraining the deep learning models In addition to the two deep learning architectures shown here, the sampling method used to select new documents for annotation is integral to achieving efficiency during active learning

Outline Introduction The data Multi-task Active Deep Learning Deep Learning Architectures Sampling Method Experimental Results Conclusion

Deep Learning Architectures Stacked Long Short-Term Memory6 (LSTM) network EEG Activity Anchors Medical Concept Boundaries Deep Rectified Linear Network7 (DRLN) EEG Activity attributes including modality and polarity Medical Concept type (EEG Event, medical problem, medical treatment, medical test), modality, and polarity LSTM – boundary detection, including anchors DRLN – attribute classification

Deep Learning Architectures – Stacked LSTM Operates at the sentence level Assigns a label {I, O, B} to each token in the sentence occasional left anterior temporal sharp and slow wave complexes Token Features: Lemma of the token and previous/next tokens PoS of the token and previous/next tokens Phrase chunk of the token and the previous/next tokens Brown cluster5 of the token UMLS Concept Unique Identifier (cui) of UMLS concepts containing the token Title of the section containing the token Two Models Assigns a label to each token in the sentence, {i, o, b} corresponding to whether a token is inside, outside, or at the beginning of a medical concept mention

Deep Learning Architectures – Stacked LSTM Updates a memory state that is shared throughout the network Each LSTM cell incorporates information about the current token and all previous tokens in the sentence. Softmax layer produces a probability distribution over the labels

Deep Learning Architectures – DRLN Deep Rectified Linear Network for Attribute Classification Traditionally, attribute classification is performed by training a classifier, such as an SVM, to determine the value for each attribute. This approach would require training 18 separate attribute classifiers for EEG Activities and 3 classifiers for all other medical concepts. However, by leveraging the power of deep learning, we can simplify this task by learning one multi-task embedding – a low-dimensional vector representation of a medical concept – and use this representation to determine each attribute simultaneously with the same deep learning network. Traditionally, attribute classification is performed by training a classifier, such as an SVM, to determine the value for each attribute. This approach would require training 18 separate attribute classifiers for EEG Activities and 3 classifiers for all other medical concepts. However, by leveraging the power of deep learning, we can simplify this task by learning one multi-task embedding – a low-dimensional vector representation of a medical concept – and use this representation to determine each attribute simultaneously with the same deep learning network.

Deep Learning Architectures – DRLN There are two Deep Rectified Linear Networks, one for EEG Activity attribute detection and one for attribute detection for the other types of medical concepts. Both networks pass a feature vector representing a medical concept through five fully connected rectified linear units to produce the multi-task embedding. The multi-task embedding is then passed to one softmax layer per attribute type to produce a probability distribution over that attribute’s values.

Deep Learning Architectures – DRLN DRLN Features The text of medical concept mention itself The lemmas of each token in the medical concept mention The PoS of each token in the medical concept mention The lemmas of 3 tokens before/after the medical concept mention The title of the containing section Context Features: For each token, t, in the sentence: The syntactic dependency path to t. The number of words between the medical concept mention and t The number of “hops” in the syntactic dependency path from the head of the medical concept mention to t The number of medical concepts between the medical concept mention and t The features used by the DRLN are described in the paper, but it should be noted that several context features are used to encode information contained in the sentence that might pertain to the medical concept, but may not be near it.

Outline Introduction The data Multi-task Active Deep Learning Deep Learning Architectures Sampling Method Experimental Results Conclusion

Sampling Method Rank Combination Protocol4: combine several single-task active learning selection decisions into one Usefulness rank The usefulness score 𝑠𝑋𝑗 (𝑑) of each un-validated EEG report 𝑑 is calculated with respect to each annotation task 𝑋j Each score is translated into a rank 𝑟Xj(𝑑) where higher usefulness means lower rank For each EEG report, we sum the ranks of each annotation task to get the overall rank, 𝑟(𝑑) All reports are sorted by this rank and the reports with lowest rank are selected for validation By combining the individual ranks for each annotation task, we are able to choose the documents that have the most usefulness for all the tasks as a whole. When choosing a new record to manually annotate, the sampling method we use must be able to incorporate information about each annotation task we are trying to do.

Sampling Method To calculate the usefulness score for a report with respect to an annotation task, we use the average Shannon entropy of each annotation for that task in the report. For example, to get the score for EEG Activity Anchor boundary detection, we average the Shannon entropy over each token in that document given by softmax layers of the stacked LSTM used for anchor detection.

Outline Introduction The data Multi-task Active Deep Learning Deep Learning Architectures Sampling Method Experimental Results Conclusion

Experimental Results Boundary Detection Attribute Classification EEG Activity Anchors Other Medical Concepts Precision, Recall, F1 Attribute Classification 10 attribute classes for EEG Activities 3 attribute classes for other medical concepts Precision, Recall, F1, Accuracy Active Learning Learning curve as active learning progresses F1 by active learning iteration - For both boundary detection and attribute classification, we use precision, recall and F1 measure which is a combination of both. - For attribute classification, we also report accuracy. - The learning curve reported for active learning shows the F1 measure of each task as a function of active learning iteration.

Experimental Results – Boundary Detection The performance of the stacked LSTM models when automatically detecting anchors and boundaries EEG Activity Anchors Other Medical Concept Boundaries Measure Exact Partial Precision .8949 .9591 .9169 .9469 Recall .8125 .8228 .8797 .8831 F1 .8517 .8857 .8975 .9139 As we can see we are able to achieve an F1 score of .8517 when detecting the exact spans of EEG Activity anchors and an F1 score of .8975 on all other medical concept boundaries Both numbers increase if we relax the evaluation parameters to allow for partial matches as was done in the i2b2 2012 shared task.

Experimental Results – Attribute Classification Accuracy Precision Recall F1 Morphology 0.990 0.757 0.704 0.724 Hemisphere 0.924 0.775 0.754 0.762 Magnitude 0.909 0.806 0.710 0.750 Recurrence 0.831 0.739 0.731 Dispersal 0.871 0.733 0.751 Freq. Band 0.982 0.664 0.620 0.640 Background 0.960 0.890 0.820 0.854 Location 0.970 0.653 0.560 0.602 Modality 0.977 0.527 0.397 0.426 Polarity 0.741 0.816 Type 0.943 0.936 0.939 0.973 0.742 0.605 0.659 0.978 0.829 0.719 0.770 Here we see the experimental results for attribute classification. As we can see from the high accuracies, our method is able to accurately classify the attributes of medical concepts in the majority of cases. However, as we can see from the moderate F1 measures, that there is still work to be done. For instance, consider the morphology attribute with an accuracy of .99 but an F1 score of .724. This is due to the fact that there are 25 morphology classes, some of which are under-represented in the data, skewing the F1 score which is averaged among the classes. The performance of the DRLN models when automatically detecting attributes. The first ten rows correspond to EEG Activity attributes, the last three rows are attributes of the other four medical concept types.

Experimental Results – Active Learning Here we see the learning curves shown for the first 100 EEG reports annotated, evaluated with F1 measure. Each curve shows a clear increase from the beginning to the end Interestingly, the first two iterations of active learning produce decreases in the performance of EEG Activity Anchor detection Anchors are spans of text corresponding to the morphology of the activity. Since the AL system selects documents it is most uncertain about, it is likely to hone in on document with activities with morphologies as yet unseen in the training data. This will cause the performance to drop since these new morphologies may be completely underrepresented in the rest of the training data. However, as active learning progresses, this is less and less of a problem, and performance increases. Learning Curves shown for the first 100 EEG reports annotated and evaluated with F1 measure.

Experimental Results - Discussion Rare attribute values F1 score for morphology: 0.724 F1 score for morphology for classes with >=10 instances: 0.875 Future work may benefit from incorporating domain knowledge (Neurological Ontologies, general knowledge representations) Ungrammatical sentences “There are rare sharp transients noted in the record but without after going slow waves as would be expected in epileptiform sharp waves.” The annotations produces by MTADL enables the generation of EEG-specific qualified medical knowledge Graphical Representations Embedded knowledge graphs The largest problem brought to light by the evaluations is the difficulty out methods have predicting attribute values that are uncommon in the data

Outline Introduction The data Multi-task Active Deep Learning Deep Learning Architectures Sampling Method Experimental Results Conclusion

Conclusion In this paper, we described a novel active learning annotation framework that operates on a large corpus of EEG Reports using two deep learning architectures. We devised an annotation schema capable of capturing the complexity and semantic richness of EEG activity mentions in the reports We designed two deep learning architectures to Discover the textual boundaries of medical concepts in the reports Perform multi-task attribute detection We used a sampling method that allows the MTADL system to incorporate information about each task into one active learning sampling decision The experimental evaluations have yielded promising results.

Acknowledgements Research reported in this publication was supported by the National Human Genome Research Institute of the National Institutes of Health under award number 1U01HG008468. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

References Goodwin TR, Harabagiu SM. Multimodal Patient Cohort Identification from EEG Report and Signal Data. In: AMIA Annual Symposium Proceedings. American Medical Informatics Association; 2016. Hahn U, Beisswanger E, Buyko E, Faessler E. Active Learning-Based Corpus Annotation—The PathoJen Experience. In: AMIA Annual Symposium Proceedings [Internet]. American Medical Informatics Association; 2012 [cited 2016 Sep 23]. p. 301. Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3540513/ Sun W, Rumshisky A, Uzuner O. Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. J Am Med Inform Assoc JAMIA. 2013 Sep;20(5):806–13. Reichart R, Tomanek K, Hahn U, Rappoport A. Multi-Task Active Learning for Linguistic Annotations. In: ACL [Internet]. 2008 [cited 2016 Sep 22]. p. 861–9. Available from: http://www.anthology.aclweb.org/P/P08/P08-1.pdf#page=905 Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC. Class-based n-gram models of natural language. Comput Linguist. 1992;18(4):467–79. Pascanu R, Gulcehre C, Cho K, Bengio Y. How to construct deep recurrent neural networks. ArXiv Prepr ArXiv13126026 [Internet]. 2013 [cited 2016 Sep 22]; Available from: http://arxiv.org/abs/1312.6026 Glorot X, Bordes A, Bengio Y. Deep Sparse Rectifier Neural Networks. In: Aistats [Internet]. 2011 [cited 2016 Sep 22]. p. 275. Available from: http://www.jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf

Questions ???