Creating and Evaluating a Consensus for Negated and Speculative Words in a Swedish Clinical Corpus Hercules Dalianis Maria Skeppstedt Stockholm University.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Negation Detection in Swedish Clinical Text Maria Skeppstedt PhD Student at Stockholm University Department of Computer and Systems Sciences.
SentiStrength: Sentiment Strength Detection in MySpace and Twitter Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK.
Section 5.2 ~ Properties of the Normal Distribution
EVIDENCE BASED MEDICINE for Beginners
Annotation of 311 Admission Summaries of the ICU Corpus Yefeng Wang.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Information Extraction from Clinical Reports Wendy W. Chapman, PhD University of Pittsburgh Department of Biomedical Informatics.
Minimum Error Rate Training in Statistical Machine Translation By: Franz Och, 2003 Presented By: Anna Tinnemore, 2006.
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
Cis-Regulatory/ Text Mining Interface Discussion.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.
Extraction of Adverse Drug Effects from Clinical Records E. ARAMAKI* Ph.D., Y. MIURA **, M. TONOIKE ** Ph.D., T. OHKUMA ** Ph.D., H. MASHUICHI ** Ph.D.,K.WAKI.
A Light-weight Approach to Coreference Resolution for Named Entities in Text Marin Dimitrov Ontotext Lab, Sirma AI Kalina Bontcheva, Hamish Cunningham,
Unit 11.2b: Data Quality Attributes Data Quality Improvement Component 12/Unit 11 Health IT Workforce Curriculum Version 1.0/Fall
Unit 11C: Data Quality Attributes Data Quality Improvement This material was developed by Johns Hopkins University, funded by the Department of Health.
Evaluation in NLP Zdeněk Žabokrtský. Intro The goal of NLP evaluation is to measure one or more qualities of an algorithm or a system Definition of proper.
A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,
The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of.
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
Experiments of Opinion Analysis On MPQA and NTCIR-6 Yaoyong Li, Kalina Bontcheva, Hamish Cunningham Department of Computer Science University of Sheffield.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
Cleveland Clinic Science Internship Program How Fast Are We? Throughput Times for Admissions from the Emergency Department Brian Hom; Deborah Porter RN,
Acknowledgements Contact Information Objective An automated annotation tool was developed to assist human annotators in the efficient production of a high.
Computing Science, University of Aberdeen1 Reflections on Bayesian Spam Filtering l Tutorial nr.10 of CS2013 is based on Rosen, 6 th Ed., Chapter 6 & exercises.
Grading and Analysis Report For Clinical Portfolio 1.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
Day Problems Simplify each expression – – (-8.4) 3. Evaluate each expression for a = -2, b = 3.5, and c = a – b + c5. |c + a + 5|
Notes Over 2.8 Rules for Dividing Negative Numbers. ( Same as Multiplying ) If there is an even number of negative numbers, then the answer is Positive.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
Automatic recognition of discourse relations Lecture 3.
Detection of Spelling Errors in Swedish Clinical Text Nizamuddin Uddin and Hercules Dalianis Department of Computer and Systems Sciences, (DSV)
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Uncertainty and Measurements There are errors associated with any measurement. Random error Random error – These errors can be caused by a variety of sources:
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
Annotating and measuring Temporal relations in texts Philippe Muller and Xavier Tannier IRIT,Université Paul Sabatier COLING 2004.
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
7.5 – Radical Expressions. Radical Radical – radical symbol.
Kidney Donor 1. Please complete the “Participant Card” 2.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
FOR Sissy.. =p definition of variables, Methods (which includes the schema and the tools we used), assumptions.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Learning part-of-speech taggers with inter-annotator agreement loss EACL 2014 Barbara Plank, Dirk Hovy, Anders Søgaard University of Copenhagen Presentation:
Research Directions using Text Mining on the Stockholm Electronic Patient Record Corpus Maria Skeppstedt
Copyright © 2009 Pearson Education, Inc. 5.2 Properties of the Normal Distribution LEARNING GOAL Know how to interpret the normal distribution in terms.
Breaking it down but keeping it real.  It is not a review  It is not an editorial  It is not a research paper  It is not a synthesis.
Simplifying Expressions
Clinical NLP in North Germanic Languages
Erasmus University Rotterdam
About Nursing…. Hello. My name is ____________ and I am a nurse. (briefly describe your current nursing position and previous positions you have had)
About Nursing…. Hello. My name is ____________ and I am a nurse. (briefly describe your current nursing position and previous positions you have had)
An Inteligent System to Diabetes Prediction
Normative values of aerobe fitness of Dutch adolescents, years old, measured with the PACER-15 m test. G.E.C.Slager 1, W.P. Krijnen 2, C.P. van der.
5.2 Properties of the Normal Distribution
Internet and Community Resources
Accelerated Precalculus
History Honors Research Paper.
SVM Based Learning System for F-term Patent Classification
PURE Learning Plan Richard Lee, James Chen,.
U3L2 The Need For Algorithms
Presentation transcript:

Creating and Evaluating a Consensus for Negated and Speculative Words in a Swedish Clinical Corpus Hercules Dalianis Maria Skeppstedt Stockholm University Department of Computer and Systems Sciences

Dalianis & Skeppstedt, NeSp-NLP July 10, 2010 Intro and Contents An experiment with annotated clinical text 1Background 2Creation of a consensus 3Automatic detection of cues and the class 4Comparison with the BioScope Corpus 5Conclusion and next step 2

Dalianis & Skeppstedt, NeSp-NLP July 10, 2010 What is special about clinical text? 3

Dalianis & Skeppstedt, NeSp-NLP July 10, 2010 Kvinna med hjärtsvikt, förmaksflimmer, angina pectoris. Ensamstående änka. Tidigare CVL med sequelae högersidig hemipares och afasi. Tidigare vårdad för krampanfall misstänkt apoplektisk. Inkommer nu efter att ha blivit hittad på en stol och sannolikt suttit så över natten. Inkommer nu för utredning. Sonen Johan är med. Example of clinical text (Swedish) 4

Dalianis & Skeppstedt, NeSp-NLP July 10, 2010 Woman with heart failures, atrial fibrillation, and angina pectoris. Single, widow. Former CVL with sequele, right hemiparesis and aphasia. Prior hosp. care for seizures, apoplectic suspected. Arrive to hospital after being found in a chair and probably been sitting there over night. Arrive for further investigation and care. Accompanied by her son Johan. Example of clinical text 5

Dalianis & Skeppstedt, NeSp-NLP July 10, 2010 Related research: Negation and speculation detection in clinical text Both rule-based systems and machine learning systems Precision and recall from just above 80% to just below 100% Most on English text 6

Dalianis & Skeppstedt, NeSp-NLP July 10, 2010 The Stockholm EPR Corpus Clinics in Stockholm >800 clinics, >1 million patients In Swedish 7

Dalianis & Skeppstedt, NeSp-NLP July 10, 2010 The annotation Three annotators The assessment part of health records sentences Annotated: –Cues for negation and speculation –Classify the sentence as either certain or uncertain, or break it up the into sub-clauses 8

The annotation Dalianis & Skeppstedt, NeSp-NLP July 10, 2010 Not really much worse than before 9

Construction of a consensus General idea: Choose the majority annotation Discarded: The first annotation rounds discarded (16%) 2% too different to be resolved, also discarded In the resulting consensus: 92% identically annotated by at least two persons 6% identically annotated by at least two persons for class. (For cues, only identical when disregarding the scope. Ex. could perhaps) 2% only identical for class, only when scope of class disregarded. 10

Dalianis & Skeppstedt, NeSp-NLP July 10, 2010 Differences between the individual annotations and the consensus 1.Fewer uncertain expressions 2.Fewer cues for speculation 3.Fewer sentences that were divided into sub- clauses 11

Dalianis & Skeppstedt, NeSp-NLP July 10, 2010 The BioScope Corpus 1.Cues for speculation and negation 2.The scope of speculation and negation Correlation with the patient's height and weight may be some value. 12

Dalianis & Skeppstedt, NeSp-NLP July 10, 2010 Comparison between the BioScope Corpus and our corpus Type of wordOur ConsensusBioScope Unique negation cues 1319 Negation cues occurring only once 510 Unique speculation cues Speculation cues occurring only once

Dalianis & Skeppstedt, NeSp-NLP July 10, 2010 Our corpus/the BioScope Corpus 1.Not so detailed guidelines/More detailed guidelines 2.Consensus with majority decision/Resolving differences with chief annotator (also higher inter-annotator agreement) 3.Assessment part from many clinics/Radiology reports 14

Dalianis & Skeppstedt, NeSp-NLP July 10, 2010 Experiment with the Stanford Named Entity Recognizer Based on Conditional Random Fields Detections of cues and certain/uncertain Comparison between our corpus and the BioScope Corpus 15

Dalianis & Skeppstedt, NeSp-NLP July 10, 2010 Result of automatic detection of cues for negation PrecisionRecall Our corpus The BioScope corpus

Dalianis & Skeppstedt, NeSp-NLP July 10, 2010 Result of automatic detection of cues for speculation PrecisionRecall Our corpus The BioScope corpus

Dalianis & Skeppstedt, NeSp-NLP July 10, 2010 Result of automatic detection of class and scope PrecisionRecall Our corpus (Uncertain expression) PrecisionRecall BioScope (Scope for either negation or speculation)

Dalianis & Skeppstedt, NeSp-NLP July 10, 2010 Conclusion and next step 1.Low results for detecting cues for speculation and class in our constructed corpus 2.Simplifying the task can hopefully result in: Higher inter-annotator agreement Easier to automatically learn to detect speculation 19

Thank you! Questions? Hercules Dalianis Maria Skeppstedt