Download presentation
Presentation is loading. Please wait.
1
Chunking Pierre Bourreau Cristina España i Bonet LSI-UPC PLN-PTM
2
Plan Introduction Methods HMM SVM CRF Global analysis Conclusion
3
Introduction What is chunking? Identifying groups of contiguous words. Ex: He is the person you read about. [NP He] [VP is] [NP the person] [NP you] [VP read] [PP about]. First step to full parsing
4
Introduction Chunking task in CoNLL Based on a previous POS tagging Chunks: B/I/O-Chunk ADJP (adjective phrase) ADVP (adverb phrase) CONJP (conjunction phrase) INTJ (interjection) LST (list marker) NP (noun phrase) PP (prepositional phrase) PRT (particles) SBAR (subordinated clause) UCP (unlike coordinated phrase) VP (verb phrase) 2.060 4.227 56 31 10 55.081 21.281 556 2.207 2 21.467 ( over 106.978 chunks )
5
Introduction Corpus Wall Street Journal (WSJ) Training set: Four sections: 15-18 211.727 tokens Test set: One section: 20 47.377 tokens
6
Evaluation Output files style Word POS Real-Chunk Processed-Chunk Ex: Boeing NNP B-NP I-NP 's POS B-NP B-NP 747 CD I-NP I-NP jetliners NNS I-NP I-NP.. O O Evaluation’s script: precision, recall, F 1 score (1+β)*recall*precision/(recall+ βprecision) where β=1
7
Plan Introduction Methods HMM SVM CRF Global analysis Conclusion
8
Hidden Markov Models (HMM) A bit of theory... Find the most probable tags for a sentence (I), given a vocabulary and the set of possible tags Bayes theorem All the states?! First order Second order Bigrams Trigrams
9
HMM: Chunking Common Setting Input sentence: words Input/output tags: POS Chunking Input sentence: POS Input/output tags: Chunks Tagger Problem! Small vocabulary
10
HMM: Chunking Solution: Specialization Input sentence: POS Input/output tags: POS + Chunks Improvement Input sentence: Special words + POS Input/output tags: Special words + POS + Chunks
11
HMM: Chunking In practice: Modification of the Input data (WSJ train and test) NNP NNP·O of·IN of·IN·B-PP the·DT the·DT·B-NP NNP NNP·I-NP NNP NNP·B-NP NNP NNP·I-NP POS POS·B-NP VBN VBN·I-NP NN NN·I-NP to·TO to·TO·B-PP a·DT a·DT·B-NP NN NN·I-NP JJ JJ·I-NP NN NN·I-NP has·VBZ has·VBZ·B-VP helped·VBN helped·VBN·I-VP to·TO to·TO·I-VP VB VB·I-VP a·DT a·DT·B-NP NN NN·I-NP in·IN in·IN·B-PP NN NN·B-NP over·IN over·IN·B-PP the·DT the·DT·B-NP past·JJ past·JJ·I-NP NN NN·I-NP..·O Chancellor NNP O of IN B-PP the DT B-NP Exchequer NNP I-NP Nigel NNP B-NP Lawson NNP I-NP 's POS B-NP restated VBN I-NP commitment NN I-NP to TO B-PP a DT B-NP firm NN I-NP monetary JJ I-NP policy NN I-NP has VBZ B-VP helped VBN I-VP to TO I-VP prevent VB I-VP a DT B-NP freefall NN I-NP in IN B-PP sterling NN B-NP over IN B-PP the DT B-NP past JJ I-NP week NN I-NP.. O
12
HMM: Results Tool: TnT Tagger (Thorsten Brants) Implements Viterbi algorithm for second order MM Allows to evaluate unigrams, bigrams and trigrams MM
13
HMM: Results Configuration 1: No special words, no POS 3grams Default parameters Results: Far from the best scores (F 1 ~94%) Non-lex chunks Precision (%) Recall (%) F β=1 (%) ADJP 67.9747.4955.91 ADVP 71.4167.7869.55 CONJP 0.00 INTJ 50.00 LST 0.00 NP 85.5086.7086.09 PP 83.5793.7688.37 PRT 55.0020.7530.14 SBAR 80.496.1711.46 UCP 85.7485.5785.66 Total 84.3184.3584.33
14
HMM: Results Trying to improve… Configuration 2: Lexical specialization (409 words, F. Pla) Trigrams Configuration 3 : Lexical specialization (409 words, F. Pla) Bigrams -makes any difference?-
15
HMM: Results BigramsTrigrams Chunks Precision (%) Recall (%) F β=1 (%) Precision (%) Recall (%) F β=1 (%) ADJP 69.9569.6369.7968.6969.6369.16 ADVP 79.7278.5279.1279.4478.5278.98 CONJP 33.3355.5641.6745.4555.5650.00 INTJ 33.3350.0040.0050.00 NP 90.4991.0490.7691.8692.6192.23 PP 96.3996.6396.5196.6697.5597.10 PRT 71.5483.0276.8671.4375.4773.39 SBAR 85.2883.3684.3185.9684.6785.31 UCP 0.00 VP 90.2691.9591.1090.1691.9191.03 Total 90.6291.2590.9391.3792.2491.81
16
HMM: Results Comments: Adding specialization information improves 7 points the total F 1. That’s much more that the improvement of using trigrams instead of bigrams (~1%). As before, NP and PP are the best determined chunks. Impressive improvement for PRT and SBAR (but small #).
17
HMM: Results Importance of the training set size: Test: Divide the training set in 7 parts (~17000 tokens/part). Calculate the results adding a part each time. Conclusion: Performances improve with the set size (see plot). Limit? Molina & Pla got a F 1 =93.26% with 18 sections of WSJ as training set.
18
HMM: Results
19
Support Vector Machines (SVM) A bit of theory… Objective: Maximize the minimum margin Allow missclassifications Controlled by the C parameter
20
SVM Tool: SVMTool (Jesús Giménez & Lluís Màrquez) Uses SVMLight (Thorsten Joachims) for learning. Sequential tagger chunking No necessity to change input data Binarizes the problem to apply SVMs
21
SVM Features (model 0):
22
SVM Results Default parameters, vary C and/or direction (LR/LRL) Very small variations with this configuration ModelPrecision (%)Recall (%)F β=1 (%) C=0.1, LR 89.8390.1890.01 C=0.1 LRL 89.3290.1089.71 C=1 LR 89.0189.2689.13 C=0.05 LR 89.6890.0689.87
23
SVM Chunks Precision (%) Recall (%) F β=1 (%) ADJP 69.1266.4467.75 ADVP 74.8675.6475.24 CONJP 44.44 INTJ 100.0050.0066.67 LST 0.00 NP 91.7291.7991.76 PP 88.5096.0192.10 PRT 51.8526.4235.00 SBAR 82.0234.9549.02 VP 91.8692.8192.33 Total 89.8390.1890.01 Best results: F1 > 90% for the three main chunks. Modest values for the others. Main difference with HMM in PP.
24
Conditional Random Fields (CRF) A bit of theory… Idea based on extension of HMM and Maximum-Entropy Models. We don’t consider a chain but a graph G=(V,E) Conditioned on X, observation sequence variable Each node represents a value Y v of Y (output label)
25
Conditional Random Fields P(y|x) (Lafferty ) where y is a label, and x an observation sequence tj is a transition feature function (regarding previous features and observation sequence). sj is a state feature function (regarding current features and observation sequence). factors are set at the training level.
26
Conditional Random Fields (CRF) CRF++ 0.45 Developed by Taku Kudo (2nd at the CoNll2000 with SVM combination) Parameters: Features being used: We can use words, POS tagging We proposed three alternatives: Using a binary combinations of words+POS on a frame size=2 Using the above + a 3-ary combination of POS Using only POS on a 3 size frame Unigrams or Bigrams Getting a score regarding probabilities for our current OR for the pair of words.
27
Conditional Random Fields (CRF) Results
28
Conditional Random Fields (CRF) Analysis: Bigrams with maximum features -> 93.81% Global F1 Global F1-score does not depend much on feature window, but on bigram/unigram selection: tagging pairs of tokens give more power than single tagging Ocurrences: LST->0, INTJ->1, CONJP->9 => Identical resuts PRT is the only POS tag which depends more on feature window and works better for size 2 windows. Prepositions tagging rely on bigger windows (ex: out, around, in, …) Slightly the same for SBAR. (ex: than, rather, …)
29
Conditional Random Fields (CRF) How to improve results: Molina & Pla’s method? Should improve efficiency in SBAR and CONJP Mixing the different methods?
30
Plan Introduction Methods HMM SVM CRF Global analysis Conclusion
31
Global Analysis
32
CRF outperforms HMM and SVM HMM performs better than SVM <= context. Particularly evident for SBAR and PRT. HMM performs outperforms CRF for CONJP! HMM: uses 3-grams -> better for expression like “as well as”, “rather than” HMM improvement with Pla’s method
33
Global Analysis CRF results are close to CoNll 2000 best results: Need finest analysis, per POS precisionrecallF1 [ZDJ01]94.2994.0194.13 [KM01]93.8993.9293.91 CRF++93.8193.9693.66 [CM03]94.1993.2993.74
34
Gobal Anaysis Combining the three methods:
35
Global Analysis Combining does not help for PRPT, where the difference was big between HMM and CRF! Helps… just a bit on SBAR Global results are better for CRF alone: 93.81>93.57
36
Plan Introduction Methods HMM SVM CRF Global analysis Conclusion
37
Without exicalization, SVM performs a lot better than HMM With lexical specialization, HMM performs better than SVM… and is a lot faster! Only 3 experiments for votation: few. Taggers make mistakes for the same POS tags.
38
Conclusion At a certain stage, hard to improve results. CRF proves to be efficient without any specific modification -> how can we improve it? => CRF with 3-grams… but probably really slow. Some fine comparisons with CoNll results?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.