Semi-supervised Relation Extraction with Large-scale Word Clustering Ang Sun Ralph Grishman Satoshi Sekine New York University June 20, 2011 NYU.

Semi-supervised Relation Extraction with Large-scale Word Clustering Ang Sun Ralph Grishman Satoshi Sekine New York University June 20, 2011 NYU

Outline 1.Task 2.Problems 3.Solutions and Experiments 4.Conclusion NYU

1. Task Relation Extraction ▫The last U.S. president to visit … NYU M1 M2 M := Entity Mention Is there a relation between M1 and M2 ? Is there a relation between M1 and M2 ? If so, what kind of relation ? If so, what kind of relation ?

1. Task Relation Types (ACE 2004) NYU TypeDefinitionExample EMP-ORGEmploymentUS president PHYSLocated, near, part-wholea military base in Germany GPE-AFFAffiliationU.S. businessman PER-SOCSociala spokesman for the senator DISCDiscourseeach of whom ARTUser, owner, inventor …US helicopters OTHER-AFFEthnic, ideology …Cuban-American people

2. Problems Sparsity of lexical features Word cluster features to the rescue NYU Training Instances ♪ US president ♪ US senator ♪ Arkansas governor ♪ Israeli government spokesman ♪ … … Training features ♪ HeadOfM2 = president ♪ HeadOfM2 = spokesman ♪ … … Training Instances ♪ US president ♪ US senator ♪ Arkansas governor ♪ Israeli government spokesman ♪ … … Training features ♪ HeadOfM2 = president ♪ HeadOfM2 = spokesman ♪ … … Testing Instances ♪ US ambassador ♪ U.N. spokeswoman ♪ … … Testing features ♪ HM2 = ambassador ♪ HM2 = spokeswoman ♪ … … Testing Instances ♪ US ambassador ♪ U.N. spokeswoman ♪ … … Testing features ♪ HM2 = ambassador ♪ HM2 = spokeswoman ♪ … … WordClusterHM2 = C1 WC_HM2 = C1 C1C1 president ambassador spokesman spokeswoman

2. Problems Problem 1: How to choose effective clusters? ▫The Brown word hierarchy NYU Where To Cut ?

2. Problems Problem 2: Augment which lexical feature to improve generalization accuracy? ▫Named entity recognition augments every token with cluster ▫Same for relation extraction? Relation instance NYU LeftContext M1 MidContext M2 RightContext Where To Generalize ? Where To Generalize ?

3.1 Cluster Selection Main idea ▫Rank each length (from 1 to the length of the longest bit string) based on importance measures ▫Select a subset of lengths to cut the word hierarchy  Typically select 3 or 4 prefix lengths to avoid commitment to a single cluster NYU 3. Solutions and Experiments

3.1 Cluster Selection Importance measure 1: Information Gain (IG) NYU 3. Solutions and Experiments A cluster feature with the length i to rank relation class Value of the cluster feature prior entropy of classes posterior entropy, given values V of the feature f

3.1 Cluster Selection Importance measure 2: Prefix Coverage (PC) NYU 3. Solutions and Experiments i := length := lexical feature := non-null cluster feature for the lexical feature Count (*) := number of occurrences i := length := lexical feature := non-null cluster feature for the lexical feature Count (*) := number of occurrences

3.1 Cluster Selection Other measures to compare with ▫Use All Prefixes (UA): consider every length, hoping that the underlying learning algorithm can assign proper weights ▫Exhaustive Search (ES): try every possible subset of lengths and pick the one that works the best NYU 3. Solutions and Experiments

3.1 Cluster Selection Experiment ▫Setup  348 ACE 2004 bnews and nwire documents  70 as testing, the rest 278 are split into training and development sets in a ratio of 7:3  The development set is used to learn the best lengths  Choose only 3 or 4 lengths (match prior work)  For simplicity, only augment the head of each mention with clusters  Induced 1,000 word clusters on the TDT 5 corpora using the Brown Algo. ▫Baseline  Feature based MaxEnt classification model  A large feature set:  full set from Zhou et al. (2005);  cherry-picked effective features from Zhao and Grishman (2005), Jiang and Zhai (2007) and others NYU 3. Solutions and Experiments

3.1 Cluster Selection Experiment ▫Effectiveness of Cluster Selection Methods NYU 3. Solutions and Experiments SystemF △ Training Time (in minutes) Baseline70.701 UA71.19+0.491.5 PC371.65+0.9530 PC471.72+1.0246 IG371.65+0.9545 IG471.68+0.9878 ES371.66+0.96465 ES471.60+0.901678

3.2 Effectiveness of cluster features Explore cluster features in a systematic way ▫Rank each lexical feature according to its importance  Importance is based on linguistic intuition and performance contribution from previous research ▫Test the effectiveness of a lexical feature with augmentation of word clusters  individually and incrementally NYU 3. Solutions and Experiments

3.2 Effectiveness of cluster features Importance of lexical features ▫Simplify an instance into a 3-tuple NYU 3. Solutions and Experiments Lexical Feature Cluster Feature Importance HMHM1_WC, HM2_WC, HM12_WC 1 BagWMBagWM_W C 2 HCHC_WC3 BagWCBagWC_W C 4 M1 M2 Context Other | Head

3.2 Effectiveness of cluster features Experiment ▫Setup  5-fold cross-validation  PC4 was used to select effective clusters ▫Performance NYU 3. Solutions and Experiments

3.2 Effectiveness of cluster features The Impact of Training Size ( augment mention heads only ) NYU 3. Solutions and Experiments Sometimes word cluster features allow reduction in annotation

3.2 Effectiveness of cluster features Performance of each individual relation class NYU 3. Solutions and Experiments The highlighted 5 types share the same entity type GPE; PER-SOC holds only between PERSON and PERSON; We may say word cluster can also help to distinguish between ambiguous relation types. No improvement for the PHYS relation? It is just too hard! The highlighted 5 types share the same entity type GPE; PER-SOC holds only between PERSON and PERSON; We may say word cluster can also help to distinguish between ambiguous relation types. No improvement for the PHYS relation? It is just too hard!

4. Conclusion Main contributions ▫Proposed a principled way in choosing clusters at an appropriate level of granularity ▫Systematically explored the effectiveness of word cluster features for relation extraction Future work ▫Extend to  phrase clustering (Lin and Wu, 2009)  pattern clustering (Sun and Grishman, 2010) NYU

Thanks! NYU

Semi-supervised Relation Extraction with Large-scale Word Clustering Ang Sun Ralph Grishman Satoshi Sekine New York University June 20, 2011 NYU.

Similar presentations

Presentation on theme: "Semi-supervised Relation Extraction with Large-scale Word Clustering Ang Sun Ralph Grishman Satoshi Sekine New York University June 20, 2011 NYU."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Semi-supervised Relation Extraction with Large-scale Word Clustering Ang Sun Ralph Grishman Satoshi Sekine New York University June 20, 2011 NYU.

Similar presentations

Presentation on theme: "Semi-supervised Relation Extraction with Large-scale Word Clustering Ang Sun Ralph Grishman Satoshi Sekine New York University June 20, 2011 NYU."— Presentation transcript:

Similar presentations

About project

Feedback