Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.

Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK

Introduction Challenges posed by progression from traditional IE to a more semantic representation of NEs What techniques are best for the deeper level of analysis necessary? Can traditional rule-based methods cope with such a transition, or does the future lie solely with machine learning?

The ACE program “A program to develop technology to extract and characterise meaning from human language” Aims: produce structured information about entities, events and the relations that hold between them promote design of more generic systems rather than those tuned to a very specific domain and text type (as with MUC)

The ACE tasks Identification of entities and classification into semantic types (Person, Organisation, Location, GPE, Facility) Identification and coreference of all mentions of each entity in the text (name, pronominal, nominal) Identification of relations holding between such entities

The MACE System Rule-based NE system developed within GATE, adapted from ANNIE PRs: tokeniser, sentence splitter, POS tagger, gazetteer, semantic tagger, orthomatcher, pronominal and nominal coreferencer Also: genre ID, switching controller to select different PRs automatically

Differences between ANNIE and MACE Locations  Location / GPE GPEs have roles (GPE, Per, Org, Loc) New type Facility (subsumes some Orgs) Metonymy means context is necessary for disambiguation (e.g. England cricket team vs England country) No Date, Time, Money, Percent, Address, Identifier

What does this mean in practical terms? Separation of specific from general information makes adaptation easier Reclassification of gazetteers unnecessary Changes mainly to semantic grammars to - use different gazetteer lookups -use more contextual information -group rules together differently

Semantic Grammars ANNIE uses 21 phases, 187 rules, 9 entity types (av. 20.8 rules per entity type) MACE uses 15 phases, 180 rules, 5 entity types (av. 36 rules per entity type) The important factor is the increased complexity of new rules, rather than the number Rules may be hand-crafted, but an experienced JAPE user can write several rules per minute 6 weeks for adaptation

Evaluation (1) TextPrecisionRecallFmeasure ACE82.48282.2 MUC ENAMEX only 899089.5

Evaluation (2) NEWS – 92 articles (business news) ACE – 86 broadcast news from September 2002 evaluation Difference on ACE task MACE on MUC-style annotations –GPEs are left as GPE (so count as errors) –GPEs are mapped to Locations

Comparison of ANNIE vs MACE 72% Precision, 84% Recall if GPEs mapped to Locations

Conclusions MACE is a rule-based NE system, in contrast with most systems which use ML. Advantages that doesn’t require much training data, and is fast to adapt because of its robust design If large amounts of training data are available, HMM-based systems tend to perform slightly better Rule-based systems tend to be good at recall but sometimes low on precision unless supported additionally by ML methods

Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.

Similar presentations

Presentation on theme: "Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.

Similar presentations

Presentation on theme: "Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK."— Presentation transcript:

Similar presentations

About project

Feedback