Ling 570 Day 16: Sequence modeling Named Entity Recognition.

Ling 570 Day 16: Sequence modeling Named Entity Recognition

Sequence Labeling Goal: Find most probable labeling of a sequence Many sequence labeling tasks – POS tagging – Word segmentation – Named entity tagging – Story/spoken sentence segmentation – Pitch accent detection – Dialog act tagging

HMM search space POStimeflieslikeanarrow N0.1 V P DT0.3 NVPDT N0.20.70.1 V0.40.1 0.4 P 0.6 DT1.0 N0.5 V0.1 P DT0.4 N N V V P P DT timeflieslikeanarrow N N V V P P DT N N V V P P N N V V P P N N V V P P

timeflieslikeanarrow N0.05 [0] 0.001[N]00=0.0000084*1.0*0.1[DT] 0.00000084 V0.01 [0] 0.0035[N]0.00007[N]0 P0 [0] 00.000035[V]0 DT0 [0] 000.0000084[V] POStimeflieslikeanarrow N0.1 V P DT0.3 NVPDT N0.20.70.1 V0.40.1 0.4 P 0.6 DT1.0 N0.5 V0.1 P DT0.4 (1)Find max in last column, (2)Follow back-pointer chains to recover that best sequence N N DTDT DTDT V V N N N N

Viterbi

Decoding Goal: Identify highest probability tag sequence Issues: – Features include tags from previous words Not immediately available – Uses tag history Just knowing highest probability preceding tag insufficient

Decoding Approach: Retain multiple candidate tag sequences – Essentially search through tagging choices Which sequences? – We can’t look at all of them – exponentially many! – Instead, use top K highest probability sequences

Breadth-First Search time flies like an arrow

Breadth-first Search Is breadth-first search efficient?

Breadth-first Search Is it efficient? – No, it tries everything

Beam Search Intuition: – Breadth-first search explores all paths – Lots of paths are (pretty obviously) bad – Why explore bad paths? – Restrict to (apparently best) paths Approach: – Perform breadth-first search, but – Retain only k ‘best’ paths thus far – k: beam width

Beam Search, k=3 time flies like an arrow

Beam Search W={w 1,w 2,…,w n }: test sentence s ij : j th highest prob. sequence up to & inc. word w i Generate tags for w 1, keep top k, set s 1j accordingly for i=2 to n: – Extension: add tags for w i to each s (i-1)j – Beam selection: Sort sequences by probability Keep only top k sequences Return highest probability sequence s n1

POS Tagging Overall accuracy: 96.3+% Unseen word accuracy: 86.2% Comparable to HMM tagging accuracy or TBL Provides – Probabilistic framework – Better able to model different info sources Topline accuracy 96-97% – Consistency issues

Beam Search Beam search decoding: – Variant of breadth first search – At each layer, keep only top k sequences Advantages: – Efficient in practice: beam 3-5 near optimal Empirically, beam 5-10% of search space; prunes 90-95% – Simple to implement Just extensions + sorting, no dynamic programming – Running time: O(kT) [vs. O(N T )] Disadvantage: Not guaranteed optimal (or complete)

Viterbi Decoding Viterbi search: – Exploits dynamic programming, memoization Requires small history window – Efficient search: O(N 2 T) Advantage: – Exact: optimal solution is returned Disadvantage: – Limited window of context

Beam vs Viterbi Dynamic programming vs heuristic search Guaranteed optimal vs no guarantee Different context window

MaxEnt POS Tagging Part of speech tagging by classification: – Feature design word and tag context features orthographic features for rare words Sequence classification problems: – Tag features depend on prior classification Beam search decoding – Efficient, but inexact Near optimal in practice

NAMED ENTITY RECOGNITION

Roadmap Named Entity Recognition – Definition – Motivation – Challenges – Common Approach

Named Entity Recognition Task: Identify Named Entities in (typically) unstructured text Typical entities: – Person names – Locations – Organizations – Dates – Times

Example Lady Gaga is playing a concert for the Bushes in Texas next September

person location time person Example Lady Gaga is playing a concert for the Bushes in Texas next September

person organization location value Example from financial news Ray Dalio’s Bridgewater Associates is an extremely large and extremely successful hedge fund. Based in Westport and known for its strong -- some would say cultish -- culture, it has grown to well over $100 billion in assets under management with little negative impact on its returns.

Entity types may differ by applicaiton News: – People, countries, organizations, dates, etc. Medical records: – Diseases, medications, organisms, organs, etc.

Named Entity Types Common categories

Named Entity Examples For common categories:

Why NER? Machine translation: – Lady Gaga is playing a concert for the Bushes in Texas next September – La señora Gaga es toca un concierto para los arbustos … – Number: 9/11: Date vs ratio 911: Emergency phone number, simple number

Why NER? Information extraction: – MUC task: Joint ventures/mergers Focus on Company names, Person Names (CEO), valuations Information retrieval: – Named entities focus of retrieval – In some data sets, 60+% queries target NEs Text-to-speech: – 206-616-5728 Phone numbers (vs other digit strings), differ by language

Challenges Ambiguity – Washington chose D.C., State, George, etc – Most digit strings – cat: (95 results) CAT(erpillar) stock ticker Computerized Axial Tomography Chloramphenicol Acetyl Transferase small furry mammal

Context & Ambiguity

Evaluation Precision Recall F-measure

Resources Online: – Name lists Baby name, who’s who, newswire services, census.gov – Gazetteers – SEC listings of companies Tools – Lingpipe – OpenNLP – Stanford NLP toolkit

Approaches to NER Rule/Regex-based: – Match names/entities in lists – Regex: e.g \d\d/\d\d/\d\d: 11/23/11 – Currency: $\d+\.\d+ Machine Learning via Sequence Labeling: – Better for names, organizations Hybrid

NER AS SEQUENCE LABELING

NER as Classification Task Instance:

NER as Classification Task Instance: token Labels:

NER as Classification Task Instance: token Labels: – Position: B(eginning), I(nside), Outside

NER as Classification Task Instance: token Labels: – Position: B(eginning), I(nside), Outside – NER types: PER, ORG, LOC, NUM

NER as Classification Task Instance: token Labels: – Position: B(eginning), I(nside), Outside – NER types: PER, ORG, LOC, NUM – Label: Type-Position, e.g. PER-B, PER-I, O, … – How many tags?

NER as Classification Task Instance: token Labels: – Position: B(eginning), I(nside), Outside – NER types: PER, ORG, LOC, NUM – Label: Type-Position, e.g. PER-B, PER-I, O, … – How many tags? (|NER Types|x 2) + 1

NER as Classification: Features What information can we use for NER?

NER as Classification: Features What information can we use for NER? – Predictive tokens: e.g. MD, Rev, Inc,.. How general are these features?

NER as Classification: Features What information can we use for NER? – Predictive tokens: e.g. MD, Rev, Inc,.. How general are these features? – Language? Genre? Domain?

NER as Classification: Shape Features Shape types:

NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case

NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case – capitalized: e.g. Washington First letter uppercase

NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case – capitalized: e.g. Washington First letter uppercase – all caps: e.g. WHO all letters capitalized

NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case – capitalized: e.g. Washington First letter uppercase – all caps: e.g. WHO all letters capitalized – mixed case: eBay Mixed upper and lower case

NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case – capitalized: e.g. Washington First letter uppercase – all caps: e.g. WHO all letters capitalized – mixed case: eBay Mixed upper and lower case – Capitalized with period: H.

NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case – capitalized: e.g. Washington First letter uppercase – all caps: e.g. WHO all letters capitalized – mixed case: eBay Mixed upper and lower case – Capitalized with period: H. – Ends with digit: A9

NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case – capitalized: e.g. Washington First letter uppercase – all caps: e.g. WHO all letters capitalized – mixed case: eBay Mixed upper and lower case – Capitalized with period: H. – Ends with digit: A9 – Contains hyphen: H-P

Example Instance Representation Example

Sequence Labeling Example

Evaluation System: output of automatic tagging Gold Standard: true tags

Evaluation System: output of automatic tagging Gold Standard: true tags Precision: # correct chunks/# system chunks Recall: # correct chunks/# gold chunks F-measure:

Evaluation System: output of automatic tagging Gold Standard: true tags Precision: # correct chunks/# system chunks Recall: # correct chunks/# gold chunks F-measure: F 1 balances precision & recall

Evaluation Standard measures: – Precision, Recall, F-measure – Computed on entity types (Co-NLL evaluation)

Evaluation Standard measures: – Precision, Recall, F-measure – Computed on entity types (Co-NLL evaluation) Classifiers vs evaluation measures – Classifiers optimize tag accuracy

Evaluation Standard measures: – Precision, Recall, F-measure – Computed on entity types (Co-NLL evaluation) Classifiers vs evaluation measures – Classifiers optimize tag accuracy Most common tag?

Evaluation Standard measures: – Precision, Recall, F-measure – Computed on entity types (Co-NLL evaluation) Classifiers vs evaluation measures – Classifiers optimize tag accuracy Most common tag? – O – most tokens aren’t NEs – Evaluation measures focuses on NE

Evaluation Standard measures: – Precision, Recall, F-measure – Computed on entity types (Co-NLL evaluation) Classifiers vs evaluation measures – Classifiers optimize tag accuracy Most common tag? – O – most tokens aren’t NEs – Evaluation measures focuses on NE State-of-the-art: – Standard tasks: PER, LOC: 0.92; ORG: 0.84

Hybrid Approaches Practical sytems – Exploit lists, rules, learning…

Hybrid Approaches Practical sytems – Exploit lists, rules, learning… – Multi-pass: Early passes: high precision, low recall Later passes: noisier sequence learning

Hybrid Approaches Practical sytems – Exploit lists, rules, learning… – Multi-pass: Early passes: high precision, low recall Later passes: noisier sequence learning Hybrid system: – High precision rules tag unambiguous mentions Use string matching to capture substring matches

Hybrid Approaches Practical sytems – Exploit lists, rules, learning… – Multi-pass: Early passes: high precision, low recall Later passes: noisier sequence learning Hybrid system: – High precision rules tag unambiguous mentions Use string matching to capture substring matches – Tag items from domain-specific name lists – Apply sequence labeler

Ling 570 Day 16: Sequence modeling Named Entity Recognition.

Similar presentations

Presentation on theme: "Ling 570 Day 16: Sequence modeling Named Entity Recognition."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ling 570 Day 16: Sequence modeling Named Entity Recognition.

Similar presentations

Presentation on theme: "Ling 570 Day 16: Sequence modeling Named Entity Recognition."— Presentation transcript:

Similar presentations

About project

Feedback