Download presentation
Presentation is loading. Please wait.
1
Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition Author: Andrew Borthwick John Sterling Eugene Agichtein Ralph Grishman Speaker: Shasha Liao
2
Content Name Entity Recognition (NER) Maximum Entropy (ME) System Architecture Results Conclusions
3
Name Entity Recognition (NER) Give a tokenization of a test corpus and a set of n (n=7) tags, NER is the problem of how to assigning one of (4n+1) tags to each token. – x_begin, x_continue, x_end, x_unique MUC-7: – Proper names (people, organizations, locations) – expressions of time – quantities – monetary values – percentages
4
Name Entity Recognition (NER) Jim bought 300 shares o Acme Corp. in 2006. per_unnique other qua_unique other org_begin org-end other time_unique other Jim bought 300 shares of Acme Corp. in 2006.
5
Maximum Entropy (ME) Statistical modeling technique Estimate probability distribution based on partial knowledge Principle: correct probability distribution maximizes entropy (uncertainty) based on what is known
6
Maximum Entropy (ME) ---build ME model
7
Maximum Entropy (ME) --- Initialize Features
8
Maximum Entropy (ME) --- ME Estimation
9
Maximum Entropy (ME) --- Generalized Interactive Scaling
10
System Architecture --- Features(1) Feature set – Binary: similar to BBN’s Nymble/Identification system – Lexical: all tokens with a count of 3 or more – Section: date, preamble, text… – Dictionary: name list – External system: futures in other systems become histories – Compound: external system : section feature
11
System Architecture --- Features(2) Feature selection – Features which activate on default value of a history view.(99% cases are not names) – Lexicons which predict the future ”other” less than 6 times instead of 3 – Features which predict “other” at position token -2 and tokens 2
12
System Architecture --- Decoding and Viterbi Search Viterbi Search: dynamic programming – Find the highest probability legal path through the lattice of conditional probabilities – Example: Mike England person_start(0.66) gpe_unique(0.6) p(g_u/p_s) = 0 person-start(0.66) person_end(0.3) p(p_e/p_s) =0.7
13
Result(1)
14
Result(2) Probable reasons: – Dynamic updating of vocabulary during decoding.( reference resolution) Andrew Borthwick – Binary model VS multi-class model.
15
Conclusion Future work: – Incorporating long-range reference resolution – Use general compound features – Use Acronyms Advantage of MENE: – Can incorporate previous token’s information – Features can be overlap – Highly portable – Easy to be combined with other systems
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.