Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA 2 Department of Information Studies, University at Albany SUNY, Albany,

Slides:

Advertisements

Similar presentations

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Advertisements

Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

A Machine Learning Approach to Coreference Resolution of Noun Phrases By W.M.Soon, H.T.Ng, D.C.Y.Lim Presented by Iman Sen.

Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara.

Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.

1 Semi-supervised learning for protein classification Brian R. King Chittibabu Guda, Ph.D. Department of Computer Science University at Albany, SUNY Gen*NY*sis.

1 Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper bypaper Mary Elaine Califf and Raymond.

Easy-First Coreference Resolution Veselin Stoyanov and Jason Eisner Johns Hopkins University.

Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.

Information Extraction from Clinical Reports Wendy W. Chapman, PhD University of Pittsburgh Department of Biomedical Informatics.

Empirical Methods in Information Extraction - Claire Cardie 자연어처리연구실 한 경 수

Supervised models for coreference resolution Altaf Rahman and Vincent Ng Human Language Technology Research Institute University of Texas at Dallas 1.

Improving Machine Learning Approaches to Coreference Resolution Vincent Ng and Claire Cardie Cornell Univ. ACL 2002 slides prepared by Ralph Grishman.

Anaphora Resolution Sanghoon Kwak Takahiro Aoyama.

A Global Relaxation Labeling Approach to Coreference Resolution Coling 2010 Emili Sapena, Llu´ıs Padr´o and Jordi Turmo TALP Research Center Universitat.

Disambiguation of References to Individuals Levon Lloyd (State University of New York) Varun Bhagwan, Daniel Gruhl (IBM Research Center) Varun Bhagwan,

Information Extraction

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

The Problem Finding information about people in huge text collections or on-line repositories on the Web is a common activity Person names, however, are.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Classifying Tags Using Open Content Resources Simon Overell, Borkur Sigurbjornsson & Roelof van Zwol WSDM ‘09.

ONTOLOGY LEARNING AND POPULATION FROM FROM TEXT Ch8 Population.

Iterative Readability Computation for Domain-Specific Resources By Jin Zhao and Min-Yen Kan 11/06/2010.

Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,

Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.

Introduction  Information Extraction (IE)  A limited form of “complete text comprehension”  Document 로부터 entity, relationship 을 추출 

Incorporating Extra-linguistic Information into Reference Resolution in Collaborative Task Dialogue Ryu Iida Shumpei Kobayashi Takenobu Tokunaga Tokyo.

Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.

1 Determining the Hierarchical Structure of Perspective and Speech Expressions Eric Breck and Claire Cardie Cornell University Department of Computer Science.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.

Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.

Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.

Markov Logic and Deep Networks Pedro Domingos Dept. of Computer Science & Eng. University of Washington.

1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.

A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart

Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

COLING 2012 Extracting and Normalizing Entity-Actions from Users’ comments Swapna Gottipati, Jing Jiang School of Information Systems, Singapore Management.

1 Multi-Perspective Question Answering Using the OpQA Corpus (HLT/EMNLP 2005) Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University.

An Entity-Mention Model for Coreference Resolution with Inductive Logic Programming Xiaofeng Yang 1 Jian Su 1 Jun Lang 2 Chew Lim Tan 3 Ting Liu 2 Sheng.

1 Toward Opinion Summarization: Linking the Sources Veselin Stoyanov and Claire Cardie Department of Computer Science Cornell University Ithaca, NY 14850,

1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.

Detection of Spelling Errors in Swedish Clinical Text Nizamuddin Uddin and Hercules Dalianis Department of Computer and Systems Sciences, (DSV)

Evaluation issues in anaphora resolution and beyond Ruslan Mitkov University of Wolverhampton Faro, 27 June 2002.

Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin, Sanghack Lee, Ngot Bui.

Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Document Indexing in Large Medical Collections.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Correcting Misuse of Verb Forms John Lee, Stephanie Seneff Computer Science and Artiﬁcial Intelligence Laboratory, MIT, Cambridge ACL 2008.

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.

Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.

A Deep Memory Network for Chinese Zero Pronoun Resolution

Simone Paolo Ponzetto University of Heidelberg Massimo Poesio

CRF &SVM in Medication Extraction

NYU Coreference CSCI-GA.2591 Ralph Grishman.

Clustering Algorithms for Noun Phrase Coreference Resolution

A Machine Learning Approach to Coreference Resolution of Noun Phrases

Introduction Task: extracting relational facts from text

Automatic Detection of Causal Relations for Question Answering

Lecture 13 Information Extraction

A Machine Learning Approach to Coreference Resolution of Noun Phrases

Presentation transcript:

Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA 2 Department of Information Studies, University at Albany SUNY, Albany, NY, USA Rochester, MN MCORES: a system for noun phrase coreference resolution for clinical records 2012 SHARPn Summit “Secondary Use”

Medical coreference resolution system (MCORES) Experimental results Conclusion Page 2

Electronic Medical Records (EMRs) – large information repositories Clinical information requires processing  Lower level: sentence parsing, tokenization  Higher level: coreference resolution, semantic disambiguation Coreference resolution: a fundamental step in text processing Page 3

English medical corpus provided by i2b2 National Center for Biomedical Computing  De-identified medical discharge summaries ▪ Source: PH & BIDMC ▪ Content: 230(PH) + 196(BIDMC) discharge summaries  Annotated concepts and coreference chains Concept types Page 4 Persons Problems Treatments Tests Pronouns

NP Instance Creation Feature Generation Classification Output Clustering Page 5

Markables of same semantic category are paired together MCORES creates positive instances only from neighboring markable pairs in a chain 1 Instance creation akin to McCharty and Lehnert Page 6

Page 7 Table 3: Distribution of coreferent and non-coreferent instances per semantic category over instances containing exact, partial, and no textual overlap.

Multi-perspective features  Antecedent perspective  Anaphor perspective  Greedy perspective  Stingy perspective Phrase-level lexical Sentence-level lexical Syntactic Semantic Miscellaneous Page 8

Phrase-level lexical Token overlap* Normalized token overlap Edit-distance Normalized edit-distance Sentence-level lexical Sentence-level token overlap* Filtered sentence-level token overlap* Left and right mention overlap  stingy and greedy perspectives only Page 9 * multi-perspective feature

Syntactic Number agreement Noun overlap* Surname match Semantic UMLS CUI overlap* UMLS CUI token overlap* UMLS semantic type overlap* Anaphor UMLS semantic type Page 10 * multi-perspective feature

Token distance Mention distance All-mention distance Sentence distance Section match Section distance Page 11

C4.5 decision tree algorithm  Flexible  Readable prediction model Classify pairs of markables based on values of the feature vectors Page 12

Classifier makes pairwise predictions only Pairwise predictions clustered into coference chains  Aggressive-merge 1 clustering algorithm prediction [M 1 ] - [M 2 ] all preceding pairwise predictions linked to [M 1 ]or [M 2 ] 1 Aggresive-merge algorithm proposed by McCarthy and Lehnert Page 13

Feature set evaluation Perspectives evaluation Performance evaluation against  In house baseline  Third party system (RECONCILE ACL09 & BART) Evaluation metric: unweighted averages of Recall, Precision, and F-measures of  MUC  B 3  CEAF  BLANC Page 14

Page 15

MCORES’ advantage comes from linking markables with no token overlap Phrase-level sub-MCORES performs similarly to MCORES Greedy perspective system is the most favorable single-perspective system Multi-perspective system performs as well or better than single-perspective systems Error analysis  MCORES fails to classify misspelled person pairs  Medical problems false positives due to difference between newly and recurring events  Treatments false positives due to medications presenting different routes of administration  Tests false positive due to the large number of full overlap instances that did not corefer Page 16

Developed coreference resolution system for the medical domain (MCORES) MCORES innovates through a multi-perspective and knowledge-based feature set MCORES outperforms third party systems and an in-house baseline, improving coreference resolution on clinical records Page 17