A Light-weight Approach to Coreference Resolution for Named Entities in Text Marin Dimitrov Ontotext Lab, Sirma AI Kalina Bontcheva, Hamish Cunningham,

Slides:



Advertisements
Similar presentations
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Advertisements

A Machine Learning Approach to Coreference Resolution of Noun Phrases By W.M.Soon, H.T.Ng, D.C.Y.Lim Presented by Iman Sen.
Processing of large document collections Part 6 (Text summarization: discourse- based approaches) Helena Ahonen-Myka Spring 2006.
Automatic Metaphor Interpretation as a Paraphrasing Task Ekaterina Shutova Computer Lab, University of Cambridge NAACL 2010.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Notes on Pronouns.  In your DGP book, write out a definition and an example for each of these types of nouns and pronouns (take a guess if you are not.
Corpus 06 Discourse Characteristics. Reasons why discourse studies are not corpus-based: 1. Many discourse features cannot be identified automatically.
CS 4705 Lecture 21 Algorithms for Reference Resolution.
Supervised models for coreference resolution Altaf Rahman and Vincent Ng Human Language Technology Research Institute University of Texas at Dallas 1.
Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.
 A pronoun is a word that takes the place of one or more nouns or pronouns.  The word that the pronoun refers to is called the antecedent.  There are.
Mining and Summarizing Customer Reviews
CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
The Parts of Speech Warriner, John E., Mary E. Whitten and Francis Griffith. Warriner’s English Grammar and Composition Third Course. New York: Harcourt.
© 2006 SOUTH-WESTERN EDUCATIONAL PUBLISHING 11th Edition Hulbert & Miller Effective English for Colleges Chapter 2 PRONOUNS.
Differential effects of constraints in the processing of Russian cataphora Kazanina and Phillips 2010.
Mr. Loeb English II Kenwood Academy High School
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,
Binding Theory Describing Relationships between Nouns.
A multiple knowledge source algorithm for anaphora resolution Allaoua Refoufi Computer Science Department University of Setif, Setif 19000, Algeria .
On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute.
Te Kaitito A dialogue system for CALL Peter Vlugter, Alistair Knott, and Victoria Weatherall Department of Computer Science School of Māori, Pacific, and.
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
1/(13) Using Corpora and Evaluation Tools Diana Maynard Kalina Bontcheva
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Experiments of Opinion Analysis On MPQA and NTCIR-6 Yaoyong Li, Kalina Bontcheva, Hamish Cunningham Department of Computer Science University of Sheffield.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
Web-Assisted Annotation, Semantic Indexing and Search of Television and Radio News (proceedings page 255) Mike Dowman Valentin Tablan Hamish Cunningham.
Grammar Fix Part 1. Pronouns What are they? Words that take the place of a noun How many can you think of? There are many, but they fall in to Five main.
Subjective Case Objective Case Possessive Form used before a Noun Possessive Form used Independently I me my mine you your.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
A Cross-Lingual ILP Solution to Zero Anaphora Resolution Ryu Iida & Massimo Poesio (ACL-HLT 2011)
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
Parts of Speech A Brief Review. Noun Person, Place, Thing, or Idea Common: begins with lower case letter (city) Proper: begins with capital letter (Detroit)
Processing of large document collections Part 6 (Text summarization: discourse- based approaches) Helena Ahonen-Myka Spring 2005.
Reference Resolution. Sue bought a cup of coffee and a donut from Jane. She met John as she left. He looked at her enviously as she drank the coffee.
Natural Language Programming David Vadas The University of Sydney Supervisor: James Curran.
Hierarchical Clustering for POS Tagging of the Indonesian Language Derry Tanti Wijaya and Stéphane Bressan.
Parts of Speech A Brief Review. Noun Person, Place, Thing, or Idea Common: begins with lower case letter (city) Proper: begins with capital letter (Detroit)
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Evaluation issues in anaphora resolution and beyond Ruslan Mitkov University of Wolverhampton Faro, 27 June 2002.
Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Annotating and measuring Temporal relations in texts Philippe Muller and Xavier Tannier IRIT,Université Paul Sabatier COLING 2004.
Word Class Noun Paul, paper, speech, playVerb talk, become, likeAdjective young, dark, cheerfulAdverb carefully, quietly, warmly.
ANAPHORA RESOLUTION SYSTEM FOR NATURAL LANGUAGE REQUIREMENTS DOCUMENT IN KOREAN 課程 : 自然語言與應用 課程老師 : 顏國郎 報告者 : 鄭冠瑀.
Coreference: Current and outlook Silvie Cinková (CU) Companions Semantic Representation and Dialog Interfacing Workshop Edinburgh, March 5, 2008.
Pronouns Part 2. Possessive pronouns A possessive pronoun such as mine indicates possession. Mine, yours, his, hers, its, ours, yours, theirs Possessive.
Pronouns. Subject Pronouns Take the place of a noun that is used as the subject of the sentence. They are found at the beginning of a phrase or clause.
Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.
Automatically Labeled Data Generation for Large Scale Event Extraction
NYU Coreference CSCI-GA.2591 Ralph Grishman.
Pronoun Notes.
Grammar: Issues with Agreement
Pronouns: By Ms. Arlene Opina
Clustering Algorithms for Noun Phrase Coreference Resolution
Extracting Semantic Concept Relations
Content Augmentation for Mixed-Mode News Broadcasts Mike Dowman
Pronoun: a word that has taken the place of a noun
Week 7/8- Selecting the Adequate Blocks – Pronouns
Deixis Saja S. Athamna
Presentation transcript:

A Light-weight Approach to Coreference Resolution for Named Entities in Text Marin Dimitrov Ontotext Lab, Sirma AI Kalina Bontcheva, Hamish Cunningham, Diana Maynard, Horacio Saggion Department of Computer Science, University of Sheffield 1(12)

Overview A “knowledge-poor” approach to pronominal anaphora resolution – inexpensive and fast yet useful for practical tasks Goal: Resolution of pronoun anaphora when the antecedent is a named entity – person, organisation, location, etc. Our approach relies on part-of-speech information and named entity recognition No syntax parsing, focus identification or deep semantic knowledge is used 2(12)

Corpus Analysis (1) Corpus data: bnews: ASR-transcribed broadcast news - approx words npaper: OCR-transcribed newspaper articles - approx words nwire: newswire - approx words Pronouns included in the analysis: personal – I, me, you, he, she, it, we, they, etc. possessive adjectives – my, your, her, his, its, etc. possessive pronouns – mine, yours, hers, his, its, etc. reflexive pronouns – myself, yourself, herself, himself, itself, etc. 3(12)

Corpus Analysis (2) Total pronouns: Avg. 4.2% (highest in broadcast news 5.6%; avg. 3.5% otherwise) Average is three times higher than previously reported in (Barbu&Mitkov 2001) because they used technical manuals Pronouns by type: Similar to previously reported results The most frequent pronouns change depending on the corpus type - bnews is different from npaper and nwire – I and you are much more important Pleonastic It: Lower frequency compared to other studies due to different domains Avg. 3.2% of all pronouns are pleonastic it occurrences or 17.5% of all it pronouns 4(12)

Coreference Module Design Modular design so new parts can be added easily. Currently: Quoted text module – identifies the quoted text segments to be used in the resolution of I, me, etc. Pleonastic It module – identifying the pleonastic occurrences of it in the text. Pronoun Coreference Resolution module Freely available as part of GATE from 5(12)

6(11) Pleonastic It Identification Pattern-based using patterns from (Lappin & Leass, 1994) extended with some new ones derived from our corpus and synonyms/antonyms from WordNet However, still 41.3% of pleonastic occurrences in the corpus are not matched by any pattern Not all patterns are detected correctly by the module so on average only 38% of all pleonastic it occurrences are identified correctly Hence there is scope for further improvement here, which in turn, will improve the performance on the resolution of it, its, etc.

7(12) Resolution of he, she, etc. 1. Inspect the context of the anaphor for candidate antecedents. Each Person entity is considered as a candidate. 2. For each candidate, perform a gender compatibility check. 3. Evaluate each candidate against the best candidate so far: - If the two candidates are anaphoric for the pronoun then choose the one that appears closer. - The same holds for the case where the two candidates are cataphoric relative to the pronoun. - If one is anaphoric and the other is cataphoric then choose the former, even if the latter appears closer to the pronoun.

Resolution of it, its, etc. Resolution is harder because there are fewer constraints, e.g., no gender The number of nominal antecedents is higher (33%) so a nominal anaphora resolution module is needed to improve performance here In 52% of the cases the most recent named entity of type Organization and Location was the correct antecedent. In 15% of the cases the most recent named entity was not the right antecedent and in half of these cases this is due to appositions (which we will handle in the future) No need to consider cataphoric named entities as potential antecedents 8(12)

9(12) Resolution of I, me, etc. Contrary to the other pronouns the antecedents here are mainly cataphoric. Resolved only if they occur in a quoted speech segment In 52% of all occurrences the antecedent is the closest named entity in the text following the quoted segment In 29% of all cases the antecedent is a named entity in the previous sentence In 3% of the cases the antecedent is in the same sentence, but before the quote From the remaining 16% (not covered currently), in 13% of the cases the antecedent is a nominal

Evaluation The evaluation corpus was 5% of the entire corpus with 4.5% of the pronouns No pronouns were excluded, so unhandled ones (like we and you) degrade the recall, while nominal antecedents degrade the precision 66% precision and 46% recall – comparable to other knowledge-poor approaches Precision/recall per pronoun type: he, she, her, etc % precision / 77.2% recall it, its, etc. – 43.5% precision / 51.7% recall I, me, myself, etc. – 77.8% precision / 62.2% recall Precision/recall are degraded partly by errors in the named entity recogniser – we get approx. 10% improvement if using human-marked named entities 10(12)

Conclusion We demonstrated that a very lightweight approach is useful in practical tasks like entity detection and tracking Further improvements can be achieved by resolving the nominals and detecting apposition Since it is freely available, it can be used as a baseline against which other approaches can be compared Unfortunately the ACE corpus used here cannot be made available, as it is a closed evaluation. Nor can we disclose how our approach ranked compared to other participating systems. 11(12)