Download presentation
Presentation is loading. Please wait.
Published byOrion Lloyd Modified over 9 years ago
1
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011
2
Chunking
3
Roadmap Chunking Definition Motivation Challenges Approach
4
What is Chunking? Form of partial (shallow) parsing
5
What is Chunking? Form of partial (shallow) parsing Extracts major syntactic units, but not full parse trees
6
What is Chunking? Form of partial (shallow) parsing Extracts major syntactic units, but not full parse trees Task: identify and classify Flat, non-overlapping segments of a sentence
7
What is Chunking? Form of partial (shallow) parsing Extracts major syntactic units, but not full parse trees Task: identify and classify Flat, non-overlapping segments of a sentence Basic non-recursive phrases
8
What is Chunking? Form of partial (shallow) parsing Extracts major syntactic units, but not full parse trees Task: identify and classify Flat, non-overlapping segments of a sentence Basic non-recursive phrases Correspond to major POS May ignore some categories; i.e. base NP chunking
9
What is Chunking? Form of partial (shallow) parsing Extracts major syntactic units, but not full parse trees Task: identify and classify Flat, non-overlapping segments of a sentence Basic non-recursive phrases Correspond to major POS May ignore some categories; i.e. base NP chunking Create simple bracketing [ NP The morning flight][ PP from][ NP Denver][ Vp has arrived]
10
What is Chunking? Form of partial (shallow) parsing Extracts major syntactic units, but not full parse trees Task: identify and classify Flat, non-overlapping segments of a sentence Basic non-recursive phrases Correspond to major POS May ignore some categories; i.e. base NP chunking Create simple bracketing [ NP The morning flight][ PP from][ NP Denver][ Vp has arrived] [ NP The morning flight] from [ NP Denver] has arrived
11
Why Chunking? Used when full parse unnecessary
12
Why Chunking? Used when full parse unnecessary Or infeasible or impossible (when?)
13
Why Chunking? Used when full parse unnecessary Or infeasible or impossible (when?) Extraction of subcategorization frames Identify verb arguments e.g. VP NP VP NP NP VP NP to NP
14
Why Chunking? Used when full parse unnecessary Or infeasible or impossible (when?) Extraction of subcategorization frames Identify verb arguments e.g. VP NP VP NP NP VP NP to NP Information extraction: who did what to whom
15
Why Chunking? Used when full parse unnecessary Or infeasible or impossible (when?) Extraction of subcategorization frames Identify verb arguments e.g. VP NP VP NP NP VP NP to NP Information extraction: who did what to whom Summarization: Base information, remove mods
16
Why Chunking? Used when full parse unnecessary Or infeasible or impossible (when?) Extraction of subcategorization frames Identify verb arguments e.g. VP NP VP NP NP VP NP to NP Information extraction: who did what to whom Summarization: Base information, remove mods Information retrieval: Restrict indexing to base NPs
17
Processing Example Tokenization: The morning flight from Denver has arrived
18
Processing Example Tokenization: The morning flight from Denver has arrived POS tagging: DT JJ N PREP NNP AUX V
19
Processing Example Tokenization: The morning flight from Denver has arrived POS tagging: DT JJ N PREP NNP AUX V Chunking: NP PP NP VP
20
Processing Example Tokenization: The morning flight from Denver has arrived POS tagging: DT JJ N PREP NNP AUX V Chunking: NP PP NP VP Extraction: NP NP VP etc
21
Approaches Finite-state Approaches Grammatical rules in FSTs Cascade to produce more complex structure
22
Approaches Finite-state Approaches Grammatical rules in FSTs Cascade to produce more complex structure Machine Learning Similar to POS tagging
23
Finite-State Rule-Based Chunking Hand-crafted rules model phrases Typically application-specific
24
Finite-State Rule-Based Chunking Hand-crafted rules model phrases Typically application-specific Left-to-right longest match (Abney 1996) Start at beginning of sentence Find longest matching rule
25
Finite-State Rule-Based Chunking Hand-crafted rules model phrases Typically application-specific Left-to-right longest match (Abney 1996) Start at beginning of sentence Find longest matching rule Greedy approach, not guaranteed optimal
26
Finite-State Rule-Based Chunking Chunk rules: Cannot contain recursion NP -> Det Nominal:
27
Finite-State Rule-Based Chunking Chunk rules: Cannot contain recursion NP -> Det Nominal: Okay Nominal -> Nominal PP:
28
Finite-State Rule-Based Chunking Chunk rules: Cannot contain recursion NP -> Det Nominal: Okay Nominal -> Nominal PP: Not okay Examples: NP (Det) Noun* Noun NP Proper-Noun VP Verb VP Aux Verb
29
Finite-State Rule-Based Chunking Chunk rules: Cannot contain recursion NP -> Det Nominal: Okay Nominal -> Nominal PP: Not okay Examples: NP (Det) Noun* Noun NP Proper-Noun VP Verb VP Aux Verb Consider: Time flies like an arrow Is this what we want?
30
Cascading FSTs Richer partial parsing Pass output of FST to next FST
31
Cascading FSTs Richer partial parsing Pass output of FST to next FST Approach: First stage: Base phrase chunking Next stage: Larger constituents (e.g. PPs, VPs) Highest stage: Sentences
32
Example
33
Chunking by Classification Model chunking as task similar to POS tagging Instance:
34
Chunking by Classification Model chunking as task similar to POS tagging Instance: tokens Labels: Simultaneously encode segmentation & identification
35
Chunking by Classification Model chunking as task similar to POS tagging Instance: tokens Labels: Simultaneously encode segmentation & identification IOB (or BIO tagging) (also BIOE or BIOSE) Segment: B(eginning), I (nternal), O(utside)
36
Chunking by Classification Model chunking as task similar to POS tagging Instance: tokens Labels: Simultaneously encode segmentation & identification IOB (or BIO tagging) (also BIOE or BIOSE) Segment: B(eginning), I (nternal), O(utside) Identity: Phrase category: NP, VP, PP, etc.
37
Chunking by Classification Model chunking as task similar to POS tagging Instance: tokens Labels: Simultaneously encode segmentation & identification IOB (or BIO tagging) (also BIOE or BIOSE) Segment: B(eginning), I (nternal), O(utside) Identity: Phrase category: NP, VP, PP, etc. The morning flight from Denver has arrived NP-B NP-I NP-I PP-B NP-B VP-B VP-I
38
Chunking by Classification Model chunking as task similar to POS tagging Instance: tokens Labels: Simultaneously encode segmentation & identification IOB (or BIO tagging) (also BIOE or BIOSE) Segment: B(eginning), I (nternal), O(utside) Identity: Phrase category: NP, VP, PP, etc. The morning flight from Denver has arrived NP-B NP-I NP-I PP-B NP-B VP-B VP-I NP-B NP-I NP-I NP-B
39
Features for Chunking What are good features?
40
Features for Chunking What are good features? Preceding tags for 2 preceding words
41
Features for Chunking What are good features? Preceding tags for 2 preceding words Words for 2 preceding, current, 2 following
42
Features for Chunking What are good features? Preceding tags for 2 preceding words Words for 2 preceding, current, 2 following Parts of speech for 2 preceding, current, 2 following
43
Features for Chunking What are good features? Preceding tags for 2 preceding words Words for 2 preceding, current, 2 following Parts of speech for 2 preceding, current, 2 following Vector includes those features + true label
44
Chunking as Classification Example
45
Evaluation System: output of automatic tagging Gold Standard: true tags Typically extracted from parsed treebank Precision: # correct chunks/# system chunks Recall: # correct chunks/# gold chunks F-measure: F 1 balances precision & recall
46
State-of-the-Art Base NP chunking: 0.96
47
State-of-the-Art Base NP chunking: 0.96 Complex phrases: Learning: 0.92-0.94 Most learners achieve similar results Rule-based: 0.85-0.92
48
State-of-the-Art Base NP chunking: 0.96 Complex phrases: Learning: 0.92-0.94 Most learners achieve similar results Rule-based: 0.85-0.92 Limiting factors:
49
State-of-the-Art Base NP chunking: 0.96 Complex phrases: Learning: 0.92-0.94 Most learners achieve similar results Rule-based: 0.85-0.92 Limiting factors: POS tagging accuracy Inconsistent labeling (parse tree extraction) Conjunctions Late departures and arrivals are common in winter Late departures and cancellations are common in winter
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.