Pastra et al., LREC 2002 How feasible is the reuse of grammars for Named Entity Recognition? Katerina Pastra, Diana Maynard, Oana Hamza, Hamish Cunningham.

Slides:

Advertisements

Similar presentations

1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.

Advertisements

1/(20) Introduction to ANNIE Diana Maynard University of Sheffield March 2004

An Introduction to GATE

University of Sheffield, NLP Case study: GATE in the NeOn project Diana Maynard University of Sheffield.

University of Sheffield NLP Exercise I Objective: Implement a ML component based on SVM to identify the following concepts in company profiles: company.

1(18) GATE: A Unicode-based Infrastructure Supporting Multilingual Information Extraction Kalina Bontcheva, Diana Maynard, Valentin Tablan, Hamish Cunningham.

Automatic indexing and retrieval of crime-scene photographs Katerina Pastra, Horacio Saggion, Yorick Wilks NLP group, University of Sheffield Scene of.

CS4025: Advanced Information Extraction. Overview CS4025, Department of Computing Science, University of Aberdeen 2 Overview of aspects of IE and General.

McGill University School of Computer Science Ph.D. Candidate in the Modelling, Simulation and Design Lab MPM’09 Explicit Transformation Modelling Thomas.

Introduction To System Analysis and Design

Named Entity Recognition LING 570 Fei Xia Week 10: 11/30/09.

1 SWE Introduction to Software Engineering Lecture 23 – Architectural Design (Chapter 13)

Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.

Software Lifecycle A series of steps through which a software product progresses Lifetimes vary from days to months to years Consists of –people –overall.

Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.

Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.

Image-Language Association: are we looking at the right features? Katerina Pastra Language Technology Applications, Institute for Language and Speech Processing,

Analysing Crime-Scene Reports Katerina Pastra and Horacio Saggion University of Sheffield Scene of Crime Information System.

A Light-weight Approach to Coreference Resolution for Named Entities in Text Marin Dimitrov Ontotext Lab, Sirma AI Kalina Bontcheva, Hamish Cunningham,

Named Entity Recognition without Training Data on a Language you don’t speak Diana Maynard Valentin Tablan Hamish Cunningham NLP group, University of Sheffield,

Software Engineering Muhammad Fahad Khan

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse.

Logic Programming for Natural Language Processing Menyoung Lee TJHSST Computer Systems Lab Mentor: Matt Parker Analytic Services, Inc.

April 2005CSA2050:NLTK1 CSA2050: Introduction to Computational Linguistics NLTK.

Software Product Families. Generative Programming Main text: Ian Sommerville, Software Engineering, 8 th edition, chapter 18 Additional readings: K. Czarnecki.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.

ML-based approaches to Named Entity Recognition for German newspaper texts ESSLLI 02 – Workshop on ML Aproaches for CL Marc Rössler University of Duisburg.

Final Review 31 October WP2: Named Entity Recognition and Classification Claire Grover University of Edinburgh.

Survey of Semantic Annotation Platforms

Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.

1 Named Entity Recognition based on three different machine learning techniques Zornitsa Kozareva JRC Workshop September 27, 2005.

Researcher affiliation extraction from homepages I. Nagy, R. Farkas, M. Jelasity University of Szeged, Hungary.

1 Technologies for (semi-) automatic metadata creation Diana Maynard.

Introduction To System Analysis and Design

Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.

1/(13) Using Corpora and Evaluation Tools Diana Maynard Kalina Bontcheva

Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.

Experience with a Computer-Assisted Formal Programming Examination John English University of Brighton.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

Introduction to GATE Developer Ian Roberts. University of Sheffield NLP Overview The GATE component model (CREOLE) Documents, annotations and corpora.

Using NLP to Support Scalable Assessment of Short Free Text Responses Alistair Willis Department of Computing and Communications, The Open University,

LREC 2008 Marrakech1 Clustering Related Terms with Definitions Scott Piao, John McNaught and Sophia Ananiadou

CS 6998 NLP for the Web Columbia University 04/22/2010 Analyzing Wikipedia and Gold-Standard Corpora for NER Training William Y. Wang Computer Science.

Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.

©2003 Paula Matuszek Taken primarily from a presentation by Lin Lin. CSC 9010: Text Mining Applications.

BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

Sheffield -- Victims of Mad Cow Disease???? Or is it really possible to develop a named entity recognition system in 4 days on a surprise language with.

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

1 Viewing Vision-Language Integration as a Double-Grounding case Katerina Pastra Department of Computer Science, Natural Language Processing Group, University.

1 Language Technologies (2) Valentin Tablan University of Sheffield, UK ACAI 05 ADVANCED COURSE ON KNOWLEDGE DISCOVERY.

MedKAT Medical Knowledge Analysis Tool December 2009.

Internet and Intranet Protocols and Applications Lecture 5a: HTTP Client-Server Design and Implementation February 15, 2005 Arthur Goldberg Computer Science.

Link Translation provides training and practical experience on industry- standard Computer Assisted Translation (CAT) tools for our team of linguists.

Listener-Control Navigation of VoiceXML. Nuance Speech Analysis 92% of customer service is through phone. 84% of industrialists believe speech better.

A Unicode-based Environment for the Creation and use of LRs Valentin Tablan, Cristian Ursu, Kalina Bontcheva, Hamish Cunningham, Diana Maynard, Oana Hamza,

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.

Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.

©2012 Paula Matuszek GATE and ANNIE Information taken primarily from the GATE user manual, gate.ac.uk/sale/tao, and GATE training materials,

University of Sheffield NLP Module 1: Introduction to JAPE © The University of Sheffield, This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike.

General Architecture of Retrieval Systems 1Adrienn Skrop.

Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.

SOFTWARE DESIGN AND ARCHITECTURE

Distribution and components

Using Uneven Margins SVM and Perceptron for IE

University of Illinois System in HOO Text Correction Shared Task

Hierarchical, Perceptron-like Learning for OBIE

Presentation transcript:

Pastra et al., LREC 2002 How feasible is the reuse of grammars for Named Entity Recognition? Katerina Pastra, Diana Maynard, Oana Hamza, Hamish Cunningham and Yorick Wilks Department of Computer Science, Natural Language Processing Group, University of Sheffield, U.K.

Pastra et al., LREC 2002 The paradox NER results: close to human performance Reuse of NER resources: minimal We will focus on:  Traditional rule-based NER systems  NER in text  Reuse of grammars for NER  Manual adaptation of grammars

Pastra et al., LREC ) Grammar Formalism 2) Application Domain 3) Natural Language What is it that hinders grammar reuse? The use of Flexible System Architectures guarantees reusability of resources >>> But is this a “sine qua non” solution ? Does the lack of such architectures render reusability simply “not feasible” ?

Pastra et al., LREC 2002 Grammar Formalism (1) >> Current Practice: No standardised formalism >> Traditional pattern-matching languages: inappropriate for NER >> Norm: Use of AV notations (allow for reference to token attributes from multiple analysis levels). Translating formalisms: a time-effective solution? Time gained-information lost: is there a trade-off?

Pastra et al., LREC 2002 Grammar Formalism (2) The need: NER for SOCIS (not main task – limited time) The problem:Existing grammar in another formalism >> NEA – JAPE Similarities: Declarative, context-sensitive, non-det PM… >> NEA – JAPE Differences: BU rule invocation – FST cascades Appelt control mechanism - Appelt, First, Brill Rules augmented with PROLOG – JAVA Wildcards, “don’t care sequ”: not common Iterations, (!=) : different mechanisms

Pastra et al., LREC 2002 Grammar Formalism (3) The experiment: From the NEA notation to JAPE NEA notation: A => B\C/D JAPE: (B)(C) :label (D)  :label.EntityType = {attr} one’s LHS another’s RHS same things handled in different ways differences in modules run before NER affect rules STILL: Original set in 2 months – SOCIS set in 1 week

Pastra et al., LREC 2002 Application Domain (1) Is there a core set of grammar rules that are always domain independent ? General purpose NER grammars: Developed to serve grammar reuse, but originated themselves from specific applications They separate specific from general information. MUSE: automatic resource switches ~ text features HaSIE: company reports on health and safety issues

Pastra et al., LREC 2002 Application Domain (2) The experiment: The gazetteers were enriched with police and crime related information All original domain-specific rules were deleted Original results with no modifications to the grammar : close to 90% Only 1 change to the core set and addition of rules From newswire text on Biotechnology to … Crime Scene Police Reports

Pastra et al., LREC 2002 Natural Language (1) Parameters to consider: The relation of A and B (close related or not)  determines the extent of reuse Nature of NEs (formation, syntagmatic relations)  unpredictable behaviour and structure  finite set NER Grammar in language (A) + linguistic knowledge of NE in (B) = NER grammar for (B) ?

Pastra et al., LREC 2002 Natural Language (2) Romanian NE (compared to English): Rich inflection Flexible word order Different word order (e.g modifier follows noun) The experiment: Run NER grammar for English on Romanian text

Pastra et al., LREC 2002 Natural Language (3) 1 st experiment: Romanian Gaz + English grammar >> Overall Results: P = 0.82, R = 0.67 Low recall even for entity types rec with high P (e.g. Org 0.75P – 0.39R) 2 nd experiment: Romanian Gaz + Adapted grammar >> Overall Results: P = 0.95, R = 0.94 Corpus: 1MB of Romanian newspaper texts Manual marking of NEs – Romanian NER (3 weeks)

Pastra et al., LREC 2002 Natural Language (3) Entity TypePrecisionRecall Address0.81 Date Location Money Organisation Percent10.82 Person Identifier Overall Entity TypePrecisionRecall Address Date Location Money Organisation Percent10.99 Person Identifier Overall

Pastra et al., LREC 2002 Reuse of existing NER grammars is time effective and should be attempted even when the formalisms, applications and languages involved are different Conclusions Further issues to be addressed: Reuse of NER grammars for spoken NEs Reuse in statistical/ML NER approaches Automating grammar reuse