Download presentation
Presentation is loading. Please wait.
Published byClinton Allen Modified over 9 years ago
1
Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK
2
Introduction Challenges posed by progression from traditional IE to a more semantic representation of NEs What techniques are best for the deeper level of analysis necessary? Can traditional rule-based methods cope with such a transition, or does the future lie solely with machine learning?
3
The ACE program “A program to develop technology to extract and characterise meaning from human language” Aims: produce structured information about entities, events and the relations that hold between them promote design of more generic systems rather than those tuned to a very specific domain and text type (as with MUC)
4
The ACE tasks Identification of entities and classification into semantic types (Person, Organisation, Location, GPE, Facility) Identification and coreference of all mentions of each entity in the text (name, pronominal, nominal) Identification of relations holding between such entities
5
<entity ID="ft-airlines-27-jul-2001-2" GENERIC="FALSE" entity_type = "ORGANIZATION"> <entity_mention ID="M003" TYPE = "NAME" string = "National Air Traffic Services"> <entity_mention ID="M004" TYPE = "NAME" string = "NATS"> <entity_mention ID="M005" TYPE = "PRO" string = "its"> <entity_mention ID="M006" TYPE = "NAME" string = "Nats">
6
The MACE System Rule-based NE system developed within GATE, adapted from ANNIE PRs: tokeniser, sentence splitter, POS tagger, gazetteer, semantic tagger, orthomatcher, pronominal and nominal coreferencer Also: genre ID, switching controller to select different PRs automatically
8
Differences between ANNIE and MACE Locations Location / GPE GPEs have roles (GPE, Per, Org, Loc) New type Facility (subsumes some Orgs) Metonymy means context is necessary for disambiguation (e.g. England cricket team vs England country) No Date, Time, Money, Percent, Address, Identifier
9
What does this mean in practical terms? Separation of specific from general information makes adaptation easier Reclassification of gazetteers unnecessary Changes mainly to semantic grammars to - use different gazetteer lookups -use more contextual information -group rules together differently
10
Semantic Grammars ANNIE uses 21 phases, 187 rules, 9 entity types (av. 20.8 rules per entity type) MACE uses 15 phases, 180 rules, 5 entity types (av. 36 rules per entity type) The important factor is the increased complexity of new rules, rather than the number Rules may be hand-crafted, but an experienced JAPE user can write several rules per minute 6 weeks for adaptation
11
Evaluation (1) TextPrecisionRecallFmeasure ACE82.48282.2 MUC ENAMEX only 899089.5
12
Evaluation (2) NEWS – 92 articles (business news) ACE – 86 broadcast news from September 2002 evaluation Difference on ACE task MACE on MUC-style annotations –GPEs are left as GPE (so count as errors) –GPEs are mapped to Locations
13
Comparison of ANNIE vs MACE 72% Precision, 84% Recall if GPEs mapped to Locations
14
Conclusions MACE is a rule-based NE system, in contrast with most systems which use ML. Advantages that doesn’t require much training data, and is fast to adapt because of its robust design If large amounts of training data are available, HMM-based systems tend to perform slightly better Rule-based systems tend to be good at recall but sometimes low on precision unless supported additionally by ML methods
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.