Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ling 570 Day 16: Sequence modeling Named Entity Recognition.

Similar presentations


Presentation on theme: "Ling 570 Day 16: Sequence modeling Named Entity Recognition."— Presentation transcript:

1 Ling 570 Day 16: Sequence modeling Named Entity Recognition

2 Sequence Labeling Goal: Find most probable labeling of a sequence Many sequence labeling tasks – POS tagging – Word segmentation – Named entity tagging – Story/spoken sentence segmentation – Pitch accent detection – Dialog act tagging

3 HMM search space POStimeflieslikeanarrow N0.1 V P DT0.3 NVPDT N0.20.70.1 V0.40.1 0.4 P 0.6 DT1.0 N0.5 V0.1 P DT0.4 N N V V P P DT timeflieslikeanarrow N N V V P P DT N N V V P P N N V V P P N N V V P P

4 timeflieslikeanarrow N0.05 [0] 0.001[N]00=0.0000084*1.0*0.1[DT] 0.00000084 V0.01 [0] 0.0035[N]0.00007[N]0 P0 [0] 00.000035[V]0 DT0 [0] 000.0000084[V] POStimeflieslikeanarrow N0.1 V P DT0.3 NVPDT N0.20.70.1 V0.40.1 0.4 P 0.6 DT1.0 N0.5 V0.1 P DT0.4 (1)Find max in last column, (2)Follow back-pointer chains to recover that best sequence N N DTDT DTDT V V N N N N

5 Viterbi

6 Decoding Goal: Identify highest probability tag sequence Issues: – Features include tags from previous words Not immediately available – Uses tag history Just knowing highest probability preceding tag insufficient

7 Decoding Approach: Retain multiple candidate tag sequences – Essentially search through tagging choices Which sequences? – We can’t look at all of them – exponentially many! – Instead, use top K highest probability sequences

8 Breadth-First Search time flies like an arrow

9 Breadth-First Search time flies like an arrow

10 Breadth-First Search time flies like an arrow

11 Breadth-First Search time flies like an arrow

12 Breadth-First Search time flies like an arrow

13 Breadth-First Search time flies like an arrow

14 Breadth-first Search Is breadth-first search efficient?

15 Breadth-first Search Is it efficient? – No, it tries everything

16 Beam Search Intuition: – Breadth-first search explores all paths – Lots of paths are (pretty obviously) bad – Why explore bad paths? – Restrict to (apparently best) paths Approach: – Perform breadth-first search, but – Retain only k ‘best’ paths thus far – k: beam width

17 Beam Search, k=3 time flies like an arrow

18 Beam Search, k=3 time flies like an arrow

19 Beam Search, k=3 time flies like an arrow

20 Beam Search, k=3 time flies like an arrow

21 Beam Search, k=3 time flies like an arrow

22 Beam Search W={w 1,w 2,…,w n }: test sentence s ij : j th highest prob. sequence up to & inc. word w i Generate tags for w 1, keep top k, set s 1j accordingly for i=2 to n: – Extension: add tags for w i to each s (i-1)j – Beam selection: Sort sequences by probability Keep only top k sequences Return highest probability sequence s n1

23 POS Tagging Overall accuracy: 96.3+% Unseen word accuracy: 86.2% Comparable to HMM tagging accuracy or TBL Provides – Probabilistic framework – Better able to model different info sources Topline accuracy 96-97% – Consistency issues

24 Beam Search Beam search decoding: – Variant of breadth first search – At each layer, keep only top k sequences Advantages: – Efficient in practice: beam 3-5 near optimal Empirically, beam 5-10% of search space; prunes 90-95% – Simple to implement Just extensions + sorting, no dynamic programming – Running time: O(kT) [vs. O(N T )] Disadvantage: Not guaranteed optimal (or complete)

25 Viterbi Decoding Viterbi search: – Exploits dynamic programming, memoization Requires small history window – Efficient search: O(N 2 T) Advantage: – Exact: optimal solution is returned Disadvantage: – Limited window of context

26 Beam vs Viterbi Dynamic programming vs heuristic search Guaranteed optimal vs no guarantee Different context window

27 MaxEnt POS Tagging Part of speech tagging by classification: – Feature design word and tag context features orthographic features for rare words Sequence classification problems: – Tag features depend on prior classification Beam search decoding – Efficient, but inexact Near optimal in practice

28 NAMED ENTITY RECOGNITION

29 Roadmap Named Entity Recognition – Definition – Motivation – Challenges – Common Approach

30 Named Entity Recognition Task: Identify Named Entities in (typically) unstructured text Typical entities: – Person names – Locations – Organizations – Dates – Times

31 Example Lady Gaga is playing a concert for the Bushes in Texas next September

32 person location time person Example Lady Gaga is playing a concert for the Bushes in Texas next September

33 person organization location value Example from financial news Ray Dalio’s Bridgewater Associates is an extremely large and extremely successful hedge fund. Based in Westport and known for its strong -- some would say cultish -- culture, it has grown to well over $100 billion in assets under management with little negative impact on its returns.

34 Entity types may differ by applicaiton News: – People, countries, organizations, dates, etc. Medical records: – Diseases, medications, organisms, organs, etc.

35 Named Entity Types Common categories

36 Named Entity Examples For common categories:

37 Why NER? Machine translation: – Lady Gaga is playing a concert for the Bushes in Texas next September – La señora Gaga es toca un concierto para los arbustos … – Number: 9/11: Date vs ratio 911: Emergency phone number, simple number

38 Why NER? Information extraction: – MUC task: Joint ventures/mergers Focus on Company names, Person Names (CEO), valuations Information retrieval: – Named entities focus of retrieval – In some data sets, 60+% queries target NEs Text-to-speech: – 206-616-5728 Phone numbers (vs other digit strings), differ by language

39 Challenges Ambiguity – Washington chose D.C., State, George, etc – Most digit strings – cat: (95 results) CAT(erpillar) stock ticker Computerized Axial Tomography Chloramphenicol Acetyl Transferase small furry mammal

40 Context & Ambiguity

41 Evaluation Precision Recall F-measure

42 Resources Online: – Name lists Baby name, who’s who, newswire services, census.gov – Gazetteers – SEC listings of companies Tools – Lingpipe – OpenNLP – Stanford NLP toolkit

43 Approaches to NER Rule/Regex-based: – Match names/entities in lists – Regex: e.g \d\d/\d\d/\d\d: 11/23/11 – Currency: $\d+\.\d+ Machine Learning via Sequence Labeling: – Better for names, organizations Hybrid

44 NER AS SEQUENCE LABELING

45 NER as Classification Task Instance:

46 NER as Classification Task Instance: token Labels:

47 NER as Classification Task Instance: token Labels: – Position: B(eginning), I(nside), Outside

48 NER as Classification Task Instance: token Labels: – Position: B(eginning), I(nside), Outside – NER types: PER, ORG, LOC, NUM

49 NER as Classification Task Instance: token Labels: – Position: B(eginning), I(nside), Outside – NER types: PER, ORG, LOC, NUM – Label: Type-Position, e.g. PER-B, PER-I, O, … – How many tags?

50 NER as Classification Task Instance: token Labels: – Position: B(eginning), I(nside), Outside – NER types: PER, ORG, LOC, NUM – Label: Type-Position, e.g. PER-B, PER-I, O, … – How many tags? (|NER Types|x 2) + 1

51 NER as Classification: Features What information can we use for NER?

52 NER as Classification: Features What information can we use for NER?

53 NER as Classification: Features What information can we use for NER? – Predictive tokens: e.g. MD, Rev, Inc,.. How general are these features?

54 NER as Classification: Features What information can we use for NER? – Predictive tokens: e.g. MD, Rev, Inc,.. How general are these features? – Language? Genre? Domain?

55 NER as Classification: Shape Features Shape types:

56 NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case

57 NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case – capitalized: e.g. Washington First letter uppercase

58 NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case – capitalized: e.g. Washington First letter uppercase – all caps: e.g. WHO all letters capitalized

59 NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case – capitalized: e.g. Washington First letter uppercase – all caps: e.g. WHO all letters capitalized – mixed case: eBay Mixed upper and lower case

60 NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case – capitalized: e.g. Washington First letter uppercase – all caps: e.g. WHO all letters capitalized – mixed case: eBay Mixed upper and lower case – Capitalized with period: H.

61 NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case – capitalized: e.g. Washington First letter uppercase – all caps: e.g. WHO all letters capitalized – mixed case: eBay Mixed upper and lower case – Capitalized with period: H. – Ends with digit: A9

62 NER as Classification: Shape Features Shape types: – lower: e.g. e. e. cummings All lower case – capitalized: e.g. Washington First letter uppercase – all caps: e.g. WHO all letters capitalized – mixed case: eBay Mixed upper and lower case – Capitalized with period: H. – Ends with digit: A9 – Contains hyphen: H-P

63 Example Instance Representation Example

64 Sequence Labeling Example

65 Evaluation System: output of automatic tagging Gold Standard: true tags

66 Evaluation System: output of automatic tagging Gold Standard: true tags Precision: # correct chunks/# system chunks Recall: # correct chunks/# gold chunks F-measure:

67 Evaluation System: output of automatic tagging Gold Standard: true tags Precision: # correct chunks/# system chunks Recall: # correct chunks/# gold chunks F-measure: F 1 balances precision & recall

68 Evaluation Standard measures: – Precision, Recall, F-measure – Computed on entity types (Co-NLL evaluation)

69 Evaluation Standard measures: – Precision, Recall, F-measure – Computed on entity types (Co-NLL evaluation) Classifiers vs evaluation measures – Classifiers optimize tag accuracy

70 Evaluation Standard measures: – Precision, Recall, F-measure – Computed on entity types (Co-NLL evaluation) Classifiers vs evaluation measures – Classifiers optimize tag accuracy Most common tag?

71 Evaluation Standard measures: – Precision, Recall, F-measure – Computed on entity types (Co-NLL evaluation) Classifiers vs evaluation measures – Classifiers optimize tag accuracy Most common tag? – O – most tokens aren’t NEs – Evaluation measures focuses on NE

72 Evaluation Standard measures: – Precision, Recall, F-measure – Computed on entity types (Co-NLL evaluation) Classifiers vs evaluation measures – Classifiers optimize tag accuracy Most common tag? – O – most tokens aren’t NEs – Evaluation measures focuses on NE State-of-the-art: – Standard tasks: PER, LOC: 0.92; ORG: 0.84

73 Hybrid Approaches Practical sytems – Exploit lists, rules, learning…

74 Hybrid Approaches Practical sytems – Exploit lists, rules, learning… – Multi-pass: Early passes: high precision, low recall Later passes: noisier sequence learning

75 Hybrid Approaches Practical sytems – Exploit lists, rules, learning… – Multi-pass: Early passes: high precision, low recall Later passes: noisier sequence learning Hybrid system: – High precision rules tag unambiguous mentions Use string matching to capture substring matches

76 Hybrid Approaches Practical sytems – Exploit lists, rules, learning… – Multi-pass: Early passes: high precision, low recall Later passes: noisier sequence learning Hybrid system: – High precision rules tag unambiguous mentions Use string matching to capture substring matches – Tag items from domain-specific name lists – Apply sequence labeler


Download ppt "Ling 570 Day 16: Sequence modeling Named Entity Recognition."

Similar presentations


Ads by Google