Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011.

Slides:

Advertisements

Similar presentations

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.

Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)

Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.

Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.

Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)

Semantic Role Labeling Abdul-Lateef Yussiff

Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.

Applications of Sequence Learning CMPT 825 Mashaal A. Memon

MaxEnt POS Tagging Shallow Processing Techniques for NLP Ling570 November 21, 2011.

Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.

PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.

Named Entity Recognition LING 570 Fei Xia Week 10: 11/30/09.

Parsing with CFG Ling 571 Fei Xia Week 2: 10/4-10/6/05.

1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.

1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.

1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.

1 I256: Applied Natural Language Processing Marti Hearst Sept 25, 2006.

Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.

Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.

11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.

March 2006 CLINT-CS 1 Introduction to Computational Linguistics Chunk Parsing.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.

Overview of Machine Learning for NLP Tasks: part II Named Entity Tagging: A Phrase-Level NLP Task.

Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.

1 Named Entity Recognition based on three different machine learning techniques Zornitsa Kozareva JRC Workshop September 27, 2005.

October 2005CSA3180: Text Processing II1 CSA3180: Natural Language Processing Text Processing 2 Shallow Parsing and Chunking Python and NLTK NLTK Exercises.

Ling 570 Day 17: Named Entity Recognition Chunking.

10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.

Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides.

Lecture 13 Information Extraction Topics Name Entity Recognition Relation detection Temporal and Event Processing Template Filling Readings: Chapter 22.

Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart

NTCIR /21 ASQA: Academia Sinica Question Answering System for CLQA (IASL) Cheng-Wei Lee, Cheng-Wei Shih, Min-Yuh Day, Tzong-Han Tsai, Tian-Jian Jiang,

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.

Natural language processing tools Lê Đức Trọng 1.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

Ling 570 Day 16: Sequence modeling Named Entity Recognition.

CPSC 503 Computational Linguistics

Supertagging CMSC Natural Language Processing January 31, 2006.

February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.

POS Tagger and Chunker for Tamil

Shallow Parsing for South Asian Languages -Himanshu Agrawal.

11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.

©2012 Paula Matuszek CSC 9010: Information Extraction Overview Dr. Paula Matuszek (610) Spring, 2012.

Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.

Part-of-Speech Tagging & Sequence Labeling Hongning Wang

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Question Classification Ling573 NLP Systems and Applications April 25, 2013.

Natural Language Processing Information Extraction Jim Martin (slightly modified by Jason Baldridge)

Natural Language Processing Vasile Rus

CSCI 5832 Natural Language Processing

CSCE 590 Web Scraping – Information Retrieval

Machine Learning in Natural Language Processing

CS 388: Natural Language Processing: Syntactic Parsing

LING 388: Computers and Language

Lecture 13 Information Extraction

CSCI 5832 Natural Language Processing

Chunk Parsing CS1573: AI Application Development, Spring 2003

CS246: Information Retrieval

CSCI 5832 Natural Language Processing

Presentation transcript:

Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011

Roadmap Named Entity Recognition Chunking HW #9

Named Entity Recognition

Roadmap Named Entity Recognition Definition Motivation Challenges Common Approach

Named Entity Recognition Task: Identify Named Entities in (typically) unstructured text Typical entities: Person names Locations Organizations Dates Times

Example Microsoft released Windows Vista in Example due to F. Xia

Example Microsoft released Windows Vista in Microsoft released Windows Vista in 2007 Example due to F. Xia

Example Microsoft released Windows Vista in Microsoft released Windows Vista in 2007 Entities: Often application/domain specific Business intelligence: Example due to F. Xia

Example Microsoft released Windows Vista in Microsoft released Windows Vista in 2007 Entities: Often application/domain specific Business intelligence: products, companies, features Biomedical: Example due to F. Xia

Example Microsoft released Windows Vista in Microsoft released Windows Vista in 2007 Entities: Often application/domain specific Business intelligence: products, companies, features Biomedical: Genes, proteins, diseases, drugs, … Example due to F. Xia

Named Entity Types Common categories

Named Entity Examples For common categories:

Why NER? Machine translation:

Why NER? Machine translation: Person

Why NER? Machine translation: Person names typically not translated Possibly transliterated Waldheim Number:

Why NER? Machine translation: Person names typically not translated Possibly transliterated Waldheim Number: 9/11: Date vs ratio 911: Emergency phone number, simple number

Why NER? Information extraction: MUC task: Joint ventures/mergers Focus on

Why NER? Information extraction: MUC task: Joint ventures/mergers Focus on Company names, Person Names (CEO), valuations

Why NER? Information extraction: MUC task: Joint ventures/mergers Focus on Company names, Person Names (CEO), valuations Information retrieval: Named entities focus of retrieval In some data sets, 60+% queries target NEs

Why NER? Information extraction: MUC task: Joint ventures/mergers Focus on Company names, Person Names (CEO), valuations Information retrieval: Named entities focus of retrieval In some data sets, 60+% queries target NEs Text-to-speech:

Why NER? Information extraction: MUC task: Joint ventures/mergers Focus on Company names, Person Names (CEO), valuations Information retrieval: Named entities focus of retrieval In some data sets, 60+% queries target NEs Text-to-speech: Phone numbers (vs other digit strings), differ by language

Challenges Ambiguity Washington chose

Challenges Ambiguity Washington chose D.C., State, George, etc Most digit strings

Challenges Ambiguity Washington chose D.C., State, George, etc Most digit strings cat: (95 results)

Challenges Ambiguity Washington chose D.C., State, George, etc Most digit strings cat: (95 results) CAT(erpillar) stock ticker Computerized Axial Tomography Chloramphenicol Acetyl Transferase small furry mammal

Context & Ambiguity

Evaluation Precision Recall F-measure

Resources Online: Name lists Baby name, who’s who, newswire services, census.gov Gazetteers SEC listings of companies Tools Lingpipe OpenNLP Stanford NLP toolkit

Approaches to NER Rule/Regex-based:

Approaches to NER Rule/Regex-based: Match names/entities in lists Regex:

Approaches to NER Rule/Regex-based: Match names/entities in lists Regex: e.g \d\d/\d\d/\d\d: 11/23/11 Currency: $\d+\.\d+

Approaches to NER Rule/Regex-based: Match names/entities in lists Regex: e.g \d\d/\d\d/\d\d: 11/23/11 Currency: $\d+\.\d+ Machine Learning via Sequence Labeling: Better for names, organizations

Approaches to NER Rule/Regex-based: Match names/entities in lists Regex: e.g \d\d/\d\d/\d\d: 11/23/11 Currency: $\d+\.\d+ Machine Learning via Sequence Labeling: Better for names, organizations Hybrid

NER as Sequence Labeling

NER as Classification Task Instance:

NER as Classification Task Instance: token Labels:

NER as Classification Task Instance: token Labels: Position: B(eginning), I(nside), Outside

NER as Classification Task Instance: token Labels: Position: B(eginning), I(nside), Outside NER types: PER, ORG, LOC, NUM

NER as Classification Task Instance: token Labels: Position: B(eginning), I(nside), Outside NER types: PER, ORG, LOC, NUM Label: Type-Position, e.g. PER-B, PER-I, O, … How many tags?

NER as Classification Task Instance: token Labels: Position: B(eginning), I(nside), Outside NER types: PER, ORG, LOC, NUM Label: Type-Position, e.g. PER-B, PER-I, O, … How many tags? (|NER Types|x 2) + 1

NER as Classification: Features What information can we use for NER?

NER as Classification: Features What information can we use for NER?

NER as Classification: Features What information can we use for NER? Predictive tokens: e.g. MD, Rev, Inc,.. How general are these features?

NER as Classification: Features What information can we use for NER? Predictive tokens: e.g. MD, Rev, Inc,.. How general are these features? Language? Genre? Domain?

NER as Classification: Shape Features Shape types:

NER as Classification: Shape Features Shape types: lower: e.g. cumming All lower case

NER as Classification: Shape Features Shape types: lower: e.g. cumming All lower case capitalized: e.g. Washington First letter uppercase

NER as Classification: Shape Features Shape types: lower: e.g. cumming All lower case capitalized: e.g. Washington First letter uppercase all caps: e.g. WHO all letters capitalized

NER as Classification: Shape Features Shape types: lower: e.g. cumming All lower case capitalized: e.g. Washington First letter uppercase all caps: e.g. WHO all letters capitalized mixed case: eBay Mixed upper and lower case

NER as Classification: Shape Features Shape types: lower: e.g. cumming All lower case capitalized: e.g. Washington First letter uppercase all caps: e.g. WHO all letters capitalized mixed case: eBay Mixed upper and lower case Capitalized with period: H.

NER as Classification: Shape Features Shape types: lower: e.g. cumming All lower case capitalized: e.g. Washington First letter uppercase all caps: e.g. WHO all letters capitalized mixed case: eBay Mixed upper and lower case Capitalized with period: H. Ends with digit: A9

NER as Classification: Shape Features Shape types: lower: e.g. cumming All lower case capitalized: e.g. Washington First letter uppercase all caps: e.g. WHO all letters capitalized mixed case: eBay Mixed upper and lower case Capitalized with period: H. Ends with digit: A9 Contains hyphen: H-P

Example Instance Representation Example

Sequence Labeling Example

Evaluation System: output of automatic tagging Gold Standard: true tags

Evaluation System: output of automatic tagging Gold Standard: true tags Precision: # correct chunks/# system chunks Recall: # correct chunks/# gold chunks F-measure:

Evaluation System: output of automatic tagging Gold Standard: true tags Precision: # correct chunks/# system chunks Recall: # correct chunks/# gold chunks F-measure: F 1 balances precision & recall

Evaluation Standard measures: Precision, Recall, F-measure Computed on entity types (Co-NLL evaluation)

Evaluation Standard measures: Precision, Recall, F-measure Computed on entity types (Co-NLL evaluation) Classifiers vs evaluation measures Classifiers optimize tag accuracy

Evaluation Standard measures: Precision, Recall, F-measure Computed on entity types (Co-NLL evaluation) Classifiers vs evaluation measures Classifiers optimize tag accuracy Most common tag?

Evaluation Standard measures: Precision, Recall, F-measure Computed on entity types (Co-NLL evaluation) Classifiers vs evaluation measures Classifiers optimize tag accuracy Most common tag? O – most tokens aren’t NEs Evaluation measures focuses on NE

Evaluation Standard measures: Precision, Recall, F-measure Computed on entity types (Co-NLL evaluation) Classifiers vs evaluation measures Classifiers optimize tag accuracy Most common tag? O – most tokens aren’t NEs Evaluation measures focuses on NE State-of-the-art: Standard tasks: PER, LOC: 0.92; ORG: 0.84

Hybrid Approaches Practical sytems Exploit lists, rules, learning…

Hybrid Approaches Practical sytems Exploit lists, rules, learning… Multi-pass: Early passes: high precision, low recall Later passes: noisier sequence learning

Hybrid Approaches Practical sytems Exploit lists, rules, learning… Multi-pass: Early passes: high precision, low recall Later passes: noisier sequence learning Hybrid system: High precision rules tag unambiguous mentions Use string matching to capture substring matches

Hybrid Approaches Practical sytems Exploit lists, rules, learning… Multi-pass: Early passes: high precision, low recall Later passes: noisier sequence learning Hybrid system: High precision rules tag unambiguous mentions Use string matching to capture substring matches Tag items from domain-specific name lists Apply sequence labeler

Chunking

Roadmap Chunking Definition Motivation Challenges Approach

What is Chunking? Form of partial (shallow) parsing

What is Chunking? Form of partial (shallow) parsing Extracts major syntactic units, but not full parse trees

What is Chunking? Form of partial (shallow) parsing Extracts major syntactic units, but not full parse trees Task: identify and classify Flat, non-overlapping segments of a sentence

What is Chunking? Form of partial (shallow) parsing Extracts major syntactic units, but not full parse trees Task: identify and classify Flat, non-overlapping segments of a sentence Basic non-recursive phrases

What is Chunking? Form of partial (shallow) parsing Extracts major syntactic units, but not full parse trees Task: identify and classify Flat, non-overlapping segments of a sentence Basic non-recursive phrases Correspond to major POS May ignore some categories; i.e. base NP chunking

What is Chunking? Form of partial (shallow) parsing Extracts major syntactic units, but not full parse trees Task: identify and classify Flat, non-overlapping segments of a sentence Basic non-recursive phrases Correspond to major POS May ignore some categories; i.e. base NP chunking Create simple bracketing [ NP The morning flight][ PP from][ NP Denver][ Vp has arrived]

What is Chunking? Form of partial (shallow) parsing Extracts major syntactic units, but not full parse trees Task: identify and classify Flat, non-overlapping segments of a sentence Basic non-recursive phrases Correspond to major POS May ignore some categories; i.e. base NP chunking Create simple bracketing [ NP The morning flight][ PP from][ NP Denver][ Vp has arrived] [ NP The morning flight] from [ NP Denver] has arrived

Why Chunking? Used when full parse unnecessary

Why Chunking? Used when full parse unnecessary Or infeasible or impossible (when?)

Why Chunking? Used when full parse unnecessary Or infeasible or impossible (when?) Extraction of subcategorization frames Identify verb arguments e.g. VP NP VP NP NP VP NP to NP

Why Chunking? Used when full parse unnecessary Or infeasible or impossible (when?) Extraction of subcategorization frames Identify verb arguments e.g. VP NP VP NP NP VP NP to NP Information extraction: who did what to whom

Why Chunking? Used when full parse unnecessary Or infeasible or impossible (when?) Extraction of subcategorization frames Identify verb arguments e.g. VP NP VP NP NP VP NP to NP Information extraction: who did what to whom Summarization: Base information, remove mods

Why Chunking? Used when full parse unnecessary Or infeasible or impossible (when?) Extraction of subcategorization frames Identify verb arguments e.g. VP NP VP NP NP VP NP to NP Information extraction: who did what to whom Summarization: Base information, remove mods Information retrieval: Restrict indexing to base NPs

Processing Example Tokenization: The morning flight from Denver has arrived

Processing Example Tokenization: The morning flight from Denver has arrived POS tagging: DT JJ N PREP NNP AUX V

Processing Example Tokenization: The morning flight from Denver has arrived POS tagging: DT JJ N PREP NNP AUX V Chunking: NP PP NP VP

Processing Example Tokenization: The morning flight from Denver has arrived POS tagging: DT JJ N PREP NNP AUX V Chunking: NP PP NP VP Extraction: NP NP VP etc

Approaches Finite-state Approaches Grammatical rules in FSTs Cascade to produce more complex structure

Approaches Finite-state Approaches Grammatical rules in FSTs Cascade to produce more complex structure Machine Learning Similar to POS tagging

Finite-State Rule-Based Chunking Hand-crafted rules model phrases Typically application-specific

Finite-State Rule-Based Chunking Hand-crafted rules model phrases Typically application-specific Left-to-right longest match (Abney 1996) Start at beginning of sentence Find longest matching rule

Finite-State Rule-Based Chunking Hand-crafted rules model phrases Typically application-specific Left-to-right longest match (Abney 1996) Start at beginning of sentence Find longest matching rule Greedy approach, not guaranteed optimal

Finite-State Rule-Based Chunking Chunk rules: Cannot contain recursion NP -> Det Nominal:

Finite-State Rule-Based Chunking Chunk rules: Cannot contain recursion NP -> Det Nominal: Okay Nominal -> Nominal PP:

Finite-State Rule-Based Chunking Chunk rules: Cannot contain recursion NP -> Det Nominal: Okay Nominal -> Nominal PP: Not okay Examples: NP  (Det) Noun* Noun NP  Proper-Noun VP  Verb VP  Aux Verb

Finite-State Rule-Based Chunking Chunk rules: Cannot contain recursion NP -> Det Nominal: Okay Nominal -> Nominal PP: Not okay Examples: NP  (Det) Noun* Noun NP  Proper-Noun VP  Verb VP  Aux Verb Consider: Time flies like an arrow Is this what we want?

Cascading FSTs Richer partial parsing Pass output of FST to next FST

Cascading FSTs Richer partial parsing Pass output of FST to next FST Approach: First stage: Base phrase chunking Next stage: Larger constituents (e.g. PPs, VPs) Highest stage: Sentences

Example

Chunking by Classification Model chunking as task similar to POS tagging Instance:

Chunking by Classification Model chunking as task similar to POS tagging Instance: tokens Labels: Simultaneously encode segmentation & identification

Chunking by Classification Model chunking as task similar to POS tagging Instance: tokens Labels: Simultaneously encode segmentation & identification IOB (or BIO tagging) (also BIOE or BIOSE) Segment: B(eginning), I (nternal), O(utside)

Chunking by Classification Model chunking as task similar to POS tagging Instance: tokens Labels: Simultaneously encode segmentation & identification IOB (or BIO tagging) (also BIOE or BIOSE) Segment: B(eginning), I (nternal), O(utside) Identity: Phrase category: NP, VP, PP, etc.

Chunking by Classification Model chunking as task similar to POS tagging Instance: tokens Labels: Simultaneously encode segmentation & identification IOB (or BIO tagging) (also BIOE or BIOSE) Segment: B(eginning), I (nternal), O(utside) Identity: Phrase category: NP, VP, PP, etc. The morning flight from Denver has arrived NP-B NP-I NP-I PP-B NP-B VP-B VP-I

Chunking by Classification Model chunking as task similar to POS tagging Instance: tokens Labels: Simultaneously encode segmentation & identification IOB (or BIO tagging) (also BIOE or BIOSE) Segment: B(eginning), I (nternal), O(utside) Identity: Phrase category: NP, VP, PP, etc. The morning flight from Denver has arrived NP-B NP-I NP-I PP-B NP-B VP-B VP-I NP-B NP-I NP-I NP-B

Features for Chunking What are good features?

Features for Chunking What are good features? Preceding tags for 2 preceding words

Features for Chunking What are good features? Preceding tags for 2 preceding words Words for 2 preceding, current, 2 following

Features for Chunking What are good features? Preceding tags for 2 preceding words Words for 2 preceding, current, 2 following Parts of speech for 2 preceding, current, 2 following

Features for Chunking What are good features? Preceding tags for 2 preceding words Words for 2 preceding, current, 2 following Parts of speech for 2 preceding, current, 2 following Vector includes those features + true label

Chunking as Classification Example

Evaluation System: output of automatic tagging Gold Standard: true tags Typically extracted from parsed treebank Precision: # correct chunks/# system chunks Recall: # correct chunks/# gold chunks F-measure: F 1 balances precision & recall

State-of-the-Art Base NP chunking: 0.96

State-of-the-Art Base NP chunking: 0.96 Complex phrases: Learning: Most learners achieve similar results Rule-based:

State-of-the-Art Base NP chunking: 0.96 Complex phrases: Learning: Most learners achieve similar results Rule-based: Limiting factors:

State-of-the-Art Base NP chunking: 0.96 Complex phrases: Learning: Most learners achieve similar results Rule-based: Limiting factors: POS tagging accuracy Inconsistent labeling (parse tree extraction) Conjunctions Late departures and arrivals are common in winter Late departures and cancellations are common in winter

HW #9

Building a MaxEnt POS Tagger Q1: Build feature vector representations for POS tagging in SVMlight format maxent_features.* training_file testing_file rare_wd_threshold rare_feat_threshold outdir training_file, testing_file: like HW#7 w1/t1 w2/t2 …wn/tn Filter rare words and infrequent features Store vectors & intermediate representations in outdir

Feature Representations Features: Ratnaparkhi, 1996, Table 1 (duplicated in MaxEnt slides) Character issues: Replace “,” with “comma” Replace “:” with “colon” Mallet and svmlight format use these as delimiters

Q2: Experiments Run MaxEnt classification using your training and test files Compare effects of different thresholds on feature count, accuracy, and runtime Note: Big files This assignment will produce even larger sets of results that HW#8. Please gzip your tar files. If the DropBox won’t accept the files, you can store the files on patas. Just let Sanghoun know where to find them.