Download presentation
Presentation is loading. Please wait.
Published byMary Booker Modified over 9 years ago
1
Ling 570 Day 16: Sequence modeling Named Entity Recognition
2
Sequence Labeling Goal: Find most probable labeling of a sequence Many sequence labeling tasks – POS tagging – Word segmentation – Named entity tagging – Story/spoken sentence segmentation – Pitch accent detection – Dialog act tagging
3
HMM search space POStimeflieslikeanarrow N0.1 V P DT0.3 NVPDT N0.20.70.1 V0.40.1 0.4 P 0.6 DT1.0 N0.5 V0.1 P DT0.4 N N V V P P DT timeflieslikeanarrow N N V V P P DT N N V V P P N N V V P P N N V V P P
4
timeflieslikeanarrow N0.05 [0] 0.001[N]00=0.0000084*1.0*0.1[DT] 0.00000084 V0.01 [0] 0.0035[N]0.00007[N]0 P0 [0] 00.000035[V]0 DT0 [0] 000.0000084[V] POStimeflieslikeanarrow N0.1 V P DT0.3 NVPDT N0.20.70.1 V0.40.1 0.4 P 0.6 DT1.0 N0.5 V0.1 P DT0.4 (1)Find max in last column, (2)Follow back-pointer chains to recover that best sequence N N DTDT DTDT V V N N N N
5
Viterbi
6
Decoding Goal: Identify highest probability tag sequence Issues: – Features include tags from previous words Not immediately available – Uses tag history Just knowing highest probability preceding tag insufficient
7
Decoding Approach: Retain multiple candidate tag sequences – Essentially search through tagging choices Which sequences? – We can’t look at all of them – exponentially many! – Instead, use top K highest probability sequences
8
Breadth-First Search time flies like an arrow
9
Breadth-First Search time flies like an arrow
10
Breadth-First Search time flies like an arrow
11
Breadth-First Search time flies like an arrow
12
Breadth-First Search time flies like an arrow
13
Breadth-First Search time flies like an arrow
14
Breadth-first Search Is breadth-first search efficient?
15
Breadth-first Search Is it efficient? – No, it tries everything
16
Beam Search Intuition: – Breadth-first search explores all paths – Lots of paths are (pretty obviously) bad – Why explore bad paths? – Restrict to (apparently best) paths Approach: – Perform breadth-first search, but – Retain only k ‘best’ paths thus far – k: beam width
17
Beam Search, k=3 time flies like an arrow
18
Beam Search, k=3 time flies like an arrow
19
Beam Search, k=3 time flies like an arrow
20
Beam Search, k=3 time flies like an arrow
21
Beam Search, k=3 time flies like an arrow
22
Beam Search W={w 1,w 2,…,w n }: test sentence s ij : j th highest prob. sequence up to & inc. word w i Generate tags for w 1, keep top k, set s 1j accordingly for i=2 to n: – Extension: add tags for w i to each s (i-1)j – Beam selection: Sort sequences by probability Keep only top k sequences Return highest probability sequence s n1
23
POS Tagging Overall accuracy: 96.3+% Unseen word accuracy: 86.2% Comparable to HMM tagging accuracy or TBL Provides – Probabilistic framework – Better able to model different info sources Topline accuracy 96-97% – Consistency issues
24
Beam Search Beam search decoding: – Variant of breadth first search – At each layer, keep only top k sequences Advantages: – Efficient in practice: beam 3-5 near optimal Empirically, beam 5-10% of search space; prunes 90-95% – Simple to implement Just extensions + sorting, no dynamic programming – Running time: O(kT) [vs. O(N T )] Disadvantage: Not guaranteed optimal (or complete)
25
Viterbi Decoding Viterbi search: – Exploits dynamic programming, memoization Requires small history window – Efficient search: O(N 2 T) Advantage: – Exact: optimal solution is returned Disadvantage: – Limited window of context
26
Beam vs Viterbi Dynamic programming vs heuristic search Guaranteed optimal vs no guarantee Different context window
27
MaxEnt POS Tagging Part of speech tagging by classification: – Feature design word and tag context features orthographic features for rare words Sequence classification problems: – Tag features depend on prior classification Beam search decoding – Efficient, but inexact Near optimal in practice
28
NAMED ENTITY RECOGNITION
29
Roadmap Named Entity Recognition – Definition – Motivation – Challenges – Common Approach
30
Named Entity Recognition Task: Identify Named Entities in (typically) unstructured text Typical entities: – Person names – Locations – Organizations – Dates – Times
31
Example Lady Gaga is playing a concert for the Bushes in Texas next September
32
person location time person Example Lady Gaga is playing a concert for the Bushes in Texas next September
33
person organization location value Example from financial news Ray Dalio’s Bridgewater Associates is an extremely large and extremely successful hedge fund. Based in Westport and known for its strong -- some would say cultish -- culture, it has grown to well over $100 billion in assets under management with little negative impact on its returns.
34
Entity types may differ by applicaiton News: – People, countries, organizations, dates, etc. Medical records: – Diseases, medications, organisms, organs, etc.
35
Named Entity Types Common categories
36
Named Entity Examples For common categories:
37
Why NER? Machine translation: – Lady Gaga is playing a concert for the Bushes in Texas next September – La señora Gaga es toca un concierto para los arbustos … – Number: 9/11: Date vs ratio 911: Emergency phone number, simple number
38
Why NER? Information extraction: – MUC task: Joint ventures/mergers Focus on Company names, Person Names (CEO), valuations Information retrieval: – Named entities focus of retrieval – In some data sets, 60+% queries target NEs Text-to-speech: – 206-616-5728 Phone numbers (vs other digit strings), differ by language
39
Challenges Ambiguity – Washington chose D.C., State, George, etc – Most digit strings – cat: (95 results) CAT(erpillar) stock ticker Computerized Axial Tomography Chloramphenicol Acetyl Transferase small furry mammal
40
Context & Ambiguity
41
Evaluation Precision Recall F-measure
42
Resources Online: – Name lists Baby name, who’s who, newswire services, census.gov – Gazetteers – SEC listings of companies Tools – Lingpipe – OpenNLP – Stanford NLP toolkit
43
Approaches to NER Rule/Regex-based: – Match names/entities in lists – Regex: e.g \d\d/\d\d/\d\d: 11/23/11 – Currency: $\d+\.\d+ Machine Learning via Sequence Labeling: – Better for names, organizations Hybrid
44
NER AS SEQUENCE LABELING
45
NER as Classification Task Instance:
46
NER as Classification Task Instance: token Labels:
47
NER as Classification Task Instance: token Labels: – Position: B(eginning), I(nside), Outside
48
NER as Classification Task Instance: token Labels: – Position: B(eginning), I(nside), Outside – NER types: PER, ORG, LOC, NUM
49
NER as Classification Task Instance: token Labels: – Position: B(eginning), I(nside), Outside – NER types: PER, ORG, LOC, NUM – Label: Type-Position, e.g. PER-B, PER-I, O, … – How many tags?
50
NER as Classification Task Instance: token Labels: – Position: B(eginning), I(nside), Outside – NER types: PER, ORG, LOC, NUM – Label: Type-Position, e.g. PER-B, PER-I, O, … – How many tags? (|NER Types|x 2) + 1
51
NER as Classification: Features What information can we use for NER?
52
NER as Classification: Features What information can we use for NER?
53
NER as Classification: Features What information can we use for NER? – Predictive tokens: e.g. MD, Rev, Inc,.. How general are these features?
54
NER as Classification: Features What information can we use for NER? – Predictive tokens: e.g. MD, Rev, Inc,.. How general are these features? – Language? Genre? Domain?
55
NER as Classification: Shape Features Shape types:
56
NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case
57
NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case – capitalized: e.g. Washington First letter uppercase
58
NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case – capitalized: e.g. Washington First letter uppercase – all caps: e.g. WHO all letters capitalized
59
NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case – capitalized: e.g. Washington First letter uppercase – all caps: e.g. WHO all letters capitalized – mixed case: eBay Mixed upper and lower case
60
NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case – capitalized: e.g. Washington First letter uppercase – all caps: e.g. WHO all letters capitalized – mixed case: eBay Mixed upper and lower case – Capitalized with period: H.
61
NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case – capitalized: e.g. Washington First letter uppercase – all caps: e.g. WHO all letters capitalized – mixed case: eBay Mixed upper and lower case – Capitalized with period: H. – Ends with digit: A9
62
NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case – capitalized: e.g. Washington First letter uppercase – all caps: e.g. WHO all letters capitalized – mixed case: eBay Mixed upper and lower case – Capitalized with period: H. – Ends with digit: A9 – Contains hyphen: H-P
63
Example Instance Representation Example
64
Sequence Labeling Example
65
Evaluation System: output of automatic tagging Gold Standard: true tags
66
Evaluation System: output of automatic tagging Gold Standard: true tags Precision: # correct chunks/# system chunks Recall: # correct chunks/# gold chunks F-measure:
67
Evaluation System: output of automatic tagging Gold Standard: true tags Precision: # correct chunks/# system chunks Recall: # correct chunks/# gold chunks F-measure: F 1 balances precision & recall
68
Evaluation Standard measures: – Precision, Recall, F-measure – Computed on entity types (Co-NLL evaluation)
69
Evaluation Standard measures: – Precision, Recall, F-measure – Computed on entity types (Co-NLL evaluation) Classifiers vs evaluation measures – Classifiers optimize tag accuracy
70
Evaluation Standard measures: – Precision, Recall, F-measure – Computed on entity types (Co-NLL evaluation) Classifiers vs evaluation measures – Classifiers optimize tag accuracy Most common tag?
71
Evaluation Standard measures: – Precision, Recall, F-measure – Computed on entity types (Co-NLL evaluation) Classifiers vs evaluation measures – Classifiers optimize tag accuracy Most common tag? – O – most tokens aren’t NEs – Evaluation measures focuses on NE
72
Evaluation Standard measures: – Precision, Recall, F-measure – Computed on entity types (Co-NLL evaluation) Classifiers vs evaluation measures – Classifiers optimize tag accuracy Most common tag? – O – most tokens aren’t NEs – Evaluation measures focuses on NE State-of-the-art: – Standard tasks: PER, LOC: 0.92; ORG: 0.84
73
Hybrid Approaches Practical sytems – Exploit lists, rules, learning…
74
Hybrid Approaches Practical sytems – Exploit lists, rules, learning… – Multi-pass: Early passes: high precision, low recall Later passes: noisier sequence learning
75
Hybrid Approaches Practical sytems – Exploit lists, rules, learning… – Multi-pass: Early passes: high precision, low recall Later passes: noisier sequence learning Hybrid system: – High precision rules tag unambiguous mentions Use string matching to capture substring matches
76
Hybrid Approaches Practical sytems – Exploit lists, rules, learning… – Multi-pass: Early passes: high precision, low recall Later passes: noisier sequence learning Hybrid system: – High precision rules tag unambiguous mentions Use string matching to capture substring matches – Tag items from domain-specific name lists – Apply sequence labeler
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.