Download presentation
Presentation is loading. Please wait.
Published byFelicity Crankshaw Modified over 9 years ago
1
Features, Formalized Stephen Mayhew Hyung Sul Kim 1
2
Outline What are features? How are they defined in NLP tasks in general? How they are defined specifically for relation extraction? (Kernel methods) 2
3
What are features? 3
4
Feature Extraction Pipeline 1.Define Feature Generation Functions (FGF) 2.Apply FGFs to Data to make a lexicon 3.Translate examples into feature space 4.Learning with vectors 4
5
Feature Generation Functions 5
6
6
7
7
8
Feature Extraction Pipeline 1.Define Feature Generation Functions (FGF) 2.Apply FGFs to Data to make a lexicon 3.Translate examples into feature space 4.Learning with vectors 8
9
Lexicon Apply our FGF to all input data. Creates grounded features and indexes them … 3534: hasWord(stark) 3535: hasWord(stamp) 3536: hasWord(stampede) 3537: hasWord(starlight) … 9
10
Feature Extraction Pipeline 1.Define Feature Generation Functions (FGF) 2.Apply FGFs to Data to make a lexicon 3.Translate examples into feature space 4.Learning with vectors 10
11
Translate examples to feature space From Lexicon: … 98: hasWord(In) … 241: hasWord(the) … 3534: hasWord(stark) 3535: hasWord(stamp) 3536: hasWord(stampede) 3537: hasWord(starlight) … “In the stark starlight” “In the stark starlight” 11
12
Feature Extraction Pipeline 1.Define Feature Generation Functions (FGF) 2.Apply FGFs to Data to make a lexicon 3.Translate examples into feature space 4.Learning with vectors Easy. 12
13
Feature Extraction Pipeline 1.FGFs are already defined 2.Lexicon is already defined 3.Translate examples into feature space 4.Learning with vectors No surprises here. Testing 13
14
Structured Pipeline - Training Exactly the same as before! 14
15
Structured Pipeline - Testing 15
16
Automatic Feature Generation Two ways to look at this: 1.Creating an FGF This is a black art, not even intuitive for humans to do 2.Choosing the best subset of a closed set This is possible, algorithms exist 16
17
Exploiting Syntactico-Semantic Structures for Relation Extraction Before doing the hard task of relation classification, apply some easy heuristics to recognize: Premodifiers: [the [Seattle] Zoo] Possessives: [[California’s] Governor] Prepositions: [officials] in [California] Formulaics: [Medford], [Massachusetts] These 4 structures cover 80% of the mention pairs (in ACE 2004) Chan and Roth, ACL 2011 17
18
Kernels for Relation Extraction Hyung Sul Kim 18
19
Kernel Tricks Borrowed a few slides from ACL2012 Tutorial for Kernels in NLP by Moschitti 19
20
20
21
21
22
22
23
23
24
All We Need is K(x 1, x 2 ) = ϕ(x 1 ) · ϕ(x 2 ) Computing K(x 1, x 2 ) can be possible without mapping x to ϕ(x) 24
25
Linear Kernels with Features (Zhou et al., 2005) Pairwise binary-SVM training Features Words Entity Types Mention Level Overlap Base Phrase Chunking Dependency Tree Parse Tree Semantic Resources 25
26
FeatureDescriptionExample WM1bag-of-words in M1{they} HM1head word of M1they WM2bag-of-words in M2{their, children} HM2head word of M2children HM12combination of HM1 and HM2 WBNULLwhen no word in between0 WBFLthe only word in between when only one word in between0 WBFfirst word in between when at least two words in betweendo WBLlast word in between when at least two words in betweenput WBOother words in between except first and last words when at least three words in between not BM1Ffirst word before M10 BM1Lsecond word before M10 AM2Ffirst word after M2in AM2Lsecond word after M2a Word Features 26
27
Entity Types, Mention Level, Overlap FeatureDescriptionExample 1Example 2 ET12combination of mention entity types (PER, ORG, FAC, LOC, GPE) ML12combination of mention levels (NAME, NOMIAL, PRONOUN) #MBnumber of other mentions in between00 #WBnumber of words in between30 M1>M21 if M2 is included in M101 M1<M21 if M1 is included in M200 27
28
Base Phrase Chunking FeatureDescriptionExample CPHBNULLwhen no phrase in between0 CPHBFLthe only phrase head when only one phrase in between0 CPHBFfirst phrase head in between when at least two phrases in betweenJAPAN CPHBLlast phrase head in between when at least two phrase heads in betweenKILLED CPHBOother phrase heads in between except first and last phrase heads when at least three phrases in between 0 CPHBM1Ffirst phrase head before M10 CPHBM1Lsecond phrase head before M10 CPHAM2Ffirst phrase head after M20 CPHAM2Lsecond phrase head after M20 28
29
Dependency Trees That's because Israel was expected to retaliate against Hezbollah forces in areas controlled by Syrian troops. FeatureDescriptionExample ET1DW1combination of the entity type and the dependent word for M1 H1DW1combination of the head word and the dependent word for M1 ET2DW2combination of the entity type and the dependent word for M2 H2DW2combination of the head word and the dependent word for M2 ET12SameNPcombination of ET12 and whether M1 and M2 included in the same NP0 ET12SamePPcombination of ET12 and weather M1 and M2 included in the same PP0 ET12SameVPcombination of ET12 and weather M1 and M2 included in the same VP1 M1 M2 29
30
Performance of Features (F1 Measure) 30
31
Performance Comparison YearAuthorsMethodF-Measure 2005Zhou et al.Linear Kernels with Handcrafted Features55.5 31
32
Syntactic Kernels (Zhao and Grishman, 2005) Syntactic Kernels (Composite of 5 Kernels) Argument Kernel Bigram Kernel Link Sequence Kernel Dependency Path Kernel Local Dependency Kernel 32
33
Bigram Kernel All unigrams and bigrams in the text from M1 to M2 UnigramBigram theythey do dodo not notnot put putput their theirtheir children children 33
34
Dependency Path Kernel That's because Israel was expected to retaliate against Hezbollah forces in areas controlled by Syrian troops. 34
35
Performance Comparison YearAuthorsMethodF-Measure 2005Zhou et al.Linear Kernels with Handcrafted Features55.5 2005 Zhao and Grishman Syntactic Kernels (Composite of 5 Kernels)70.35 35
36
Composite Kernel (Zhang et al., 2006) Composite of Two Kernels Entity Kernel (Linear Kernel with entity related features given by ACE datasets) Convolution Tree Kernel (Collins and Duffy, 2001) Two ways to composite two kernels Linear Combination Polynomial Expansion 36
37
Convolution Tree Kernel (Collins and Duffy, 2001) An example tree Efficiently Compute K(x 1, x 2 ) by O(|x 1 |·|x 2 |) 37
38
Relation Instance Spaces 51.3 61.9 59.2 60.4 38
39
Performance Comparison YearAuthorsMethodF-Measure 2005Zhou et al.Linear Kernels with Handcrafted Features55.5 2005 Zhao and Grishman Syntactic Kernels (Composite of 5 Kernels)70.35 2006Zhang et al.Entity Kernel + Convolution Tree Kernel72.1 39
40
Context-Sensitive Tree Kernel (Zhou et al., 2007) Motivational Example: John and Mary got married called predicate-linked category (10%) PT: 63.6 Context-Sensitive Tree Kernel: 73.2 40
41
Performance Comparison YearAuthorsMethodF-Measure 2005Zhou et al.Linear Kernels with Handcrafted Features55.5 2005 Zhao and Grishman Syntactic Kernels (Composite of 5 Kernels)70.35 2006Zhang et al.Entity Kernel + Convolution Tree Kernel72.1 2007Zhou et al. (Zhou et al., 2005) + Context-sensitive Tree Kernel 75.8 41
42
Best Kernel (Nguyen et al., 2009) Use Multiple Kernels on Constituent Trees Dependency Trees Sequential Structures Design 5 different Kernel Composites with 4 Tree Kernels and 6 Sequential Kernels 42
43
Convolution Tree Kernels on 4 Special Trees 68.9 56.3 60.2 58.5 PET DW GR GRW PET + GR = 70.5 DW + GR = 61.8 43
44
Word Sequence Kernels on 6 Special Sequences SK1. Sequence of terminals (lexical words) in the PET e.g. T2-LOC washington, U.S. T1-PER officials SK2. Sequence of part-of-speech (POS) tags in the PET e.g. T2-LOC NN, NNP T1-PER NNS SK3. Sequence of grammatical relations in the PET e.g. T2-LOC pobj, nn T1-PER nsubj SK4. Sequence of words in the DW e.g. Washington T2-LOC In working T1-PER officials GPE U.S. SK5. Sequence of grammatical relations in the GR e.g. pobj T2-LOC prep ROOT T1-PER nsubj GPE nn SK6. Sequence of POS tags in the DW e.g. NN T2-LOC IN VBP T1-PER NNS GPE NNP 61.0 60.8 61.6 59.7 59.8 59.7 SK1 + SK2 + SK3 + SK4 + SK5 + SK6 = 69.8 44
45
Word Sequence Kernels (Cancedda et al., 2003) Extended Sequence Kernels Map to high-dimensional spaces using every subsequence Penalties to common subsequences (using IDF) longer subsequences non-contiguous subsequences 45
46
Performance Comparison YearAuthorsMethodF-Measure 2005Zhou et al.Linear Kernels with Handcrafted Features55.5 2005 Zhao and Grishman Syntactic Kernels (Composite of 5 Kernels)70.35 2006Zhang et al.Entity Kernel + Convolution Tree Kernel72.1 2007Zhou et al. (Zhou et al., 2005) + Context-sensitive Tree Kernel 75.8 2009Nguyen et al. Multiple Tree Kernels + Multiple Sequence Kernels 71.5 (Zhang et al., 2006) F-measure 68.9 in our settings (Zhou et al., 2007) “Such heuristics expand the tree and remove unnecessary information allowing a higher improvement on RE. They are tuned on the target RE task so although the result is impressive, we cannot use it to compare with pure automatic learning approaches, such us our models. “ (Zhang et al., 2006) F-measure 68.9 in our settings (Zhou et al., 2007) “Such heuristics expand the tree and remove unnecessary information allowing a higher improvement on RE. They are tuned on the target RE task so although the result is impressive, we cannot use it to compare with pure automatic learning approaches, such us our models. “ 46
47
Topic Kernel (Wang et al., 2011) Use Wikipedia InfoBox to learn topics of relations (like topics of words) based on co-occurrences TopicsTop Relations Topic 1active_years_end_date, career_end, final_year, retired Topic 2commands, part_of, battles, not_able_commanders Topic 3influenced, school_tradition, not_able_ideas, main_interests Topic 4destinations, end, through, post_town Topic 5prizes, award, academy_awards, highlights Topic 6inflow, outflow, length, maxdepth Topic 7after, successor, ending_terminus Topic 8college, almamater, education … 47
48
Overview 48
49
Performance Comparison YearAuthorsMethodF-Measure 2005Zhou et al.Linear Kernels with Handcrafted Features55.5 2005 Zhao and Grishman Syntactic Kernels (Composite of 5 Kernels)70.35 2006Zhang et al.Entity Kernel + Convolution Tree Kernel72.1 2007Zhou et al. (Zhou et al., 2005) + Context-sensitive Tree Kernel 75.8 2009Nguyen et al. Multiple Tree Kernels + Multiple Sequence Kernels 71.5 2011Wang et al. Entity Features + Word Features + Dependency Path + Topic Kernels 73.24 49
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.