Semi-supervised Relation Extraction with Large-scale Word Clustering Ang Sun Ralph Grishman Satoshi Sekine New York University June 20, 2011 NYU.

Slides:



Advertisements
Similar presentations
University of Sheffield NLP Module 4: Machine Learning.
Advertisements

Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Features, Formalized Stephen Mayhew Hyung Sul Kim 1.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Using Query Patterns to Learn the Durations of Events Andrey Gusev joint work with Nate Chambers, Pranav Khaitan, Divye Khilnani, Steven Bethard, Dan Jurafsky.
Particle swarm optimization for parameter determination and feature selection of support vector machines Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen,
The Impact of Task and Corpus on Event Extraction Systems Ralph Grishman New York University Malta, May 2010 NYU.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Multi-Task Transfer Learning for Weakly- Supervised Relation Extraction Jing Jiang Singapore Management University ACL-IJCNLP 2009.
Cross-Domain Bootstrapping for Named Entity Recognition Ang Sun Ralph Grishman New York University July 28, 2011 Beijing, EOS, SIGIR 2011 NYU.
Re-ranking for NP-Chunking: Maximum-Entropy Framework By: Mona Vajihollahi.
Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
July 9, 2003ACL An Improved Pattern Model for Automatic IE Pattern Acquisition Kiyoshi Sudo Satoshi Sekine Ralph Grishman New York University.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
1 Homework  What’s important (i.e., this will be used in determining your grade): Finding features that make a difference You should expect to do some.
Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee:
Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,
Multi-view Exploratory Learning for AKBC Problems Bhavana Dalvi and William W. Cohen School Of Computer Science, Carnegie Mellon University Motivation.
Scalable Text Mining with Sparse Generative Models
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Disambiguation of References to Individuals Levon Lloyd (State University of New York) Varun Bhagwan, Daniel Gruhl (IBM Research Center) Varun Bhagwan,
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
Text Classification, Active/Interactive learning.
Fuzzy Entropy based feature selection for classification of hyperspectral data Mahesh Pal Department of Civil Engineering National Institute of Technology.
Evaluation CSCI-GA.2590 – Lecture 6A Ralph Grishman NYU.
ACBiMA: Advanced Chinese Bi-Character Word Morphological Analyzer 1 Ting-Hao (Kenneth) Huang Yun-Nung (Vivian) Chen Lingpeng Kong
Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
Bootstrapping Information Extraction with Unlabeled Data Rayid Ghani Accenture Technology Labs Rosie Jones Carnegie Mellon University & Overture (With.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov , 2013 Discriminative Latent Variable Based Classifier.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Exploiting Background Knowledge for Relation Extraction Yee Seng Chan and Dan Roth University of Illinois at Urbana-Champaign 1.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
ICE: a GUI for training extraction engines CSCI-GA.2590 Ralph Grishman NYU.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Relation Extraction: Rule-based Approaches CSCI-GA.2590 Ralph Grishman NYU.
1 Cross Market Modeling for Query- Entity Matching Manish Gupta, Prashant Borole, Praful Hebbar, Rupesh Mehta, Niranjan Nayak.
Automatically Labeled Data Generation for Large Scale Event Extraction
Using lexical chains for keyword extraction
Relation Extraction CSCI-GA.2591
NYU Coreference CSCI-GA.2591 Ralph Grishman.
(Entity and) Event Extraction CSCI-GA.2591
Applying Key Phrase Extraction to aid Invalidity Search
Lei Sha, Jing Liu, Chin-Yew Lin, Sujian Li, Baobao Chang, Zhifang Sui
Learning and Memorization
Presentation transcript:

Semi-supervised Relation Extraction with Large-scale Word Clustering Ang Sun Ralph Grishman Satoshi Sekine New York University June 20, 2011 NYU

Outline 1.Task 2.Problems 3.Solutions and Experiments 4.Conclusion NYU

1. Task Relation Extraction ▫The last U.S. president to visit … NYU M1 M2 M := Entity Mention Is there a relation between M1 and M2 ? Is there a relation between M1 and M2 ? If so, what kind of relation ? If so, what kind of relation ?

1. Task Relation Types (ACE 2004) NYU TypeDefinitionExample EMP-ORGEmploymentUS president PHYSLocated, near, part-wholea military base in Germany GPE-AFFAffiliationU.S. businessman PER-SOCSociala spokesman for the senator DISCDiscourseeach of whom ARTUser, owner, inventor …US helicopters OTHER-AFFEthnic, ideology …Cuban-American people

2. Problems Sparsity of lexical features Word cluster features to the rescue NYU Training Instances ♪ US president ♪ US senator ♪ Arkansas governor ♪ Israeli government spokesman ♪ … … Training features ♪ HeadOfM2 = president ♪ HeadOfM2 = spokesman ♪ … … Training Instances ♪ US president ♪ US senator ♪ Arkansas governor ♪ Israeli government spokesman ♪ … … Training features ♪ HeadOfM2 = president ♪ HeadOfM2 = spokesman ♪ … … Testing Instances ♪ US ambassador ♪ U.N. spokeswoman ♪ … … Testing features ♪ HM2 = ambassador ♪ HM2 = spokeswoman ♪ … … Testing Instances ♪ US ambassador ♪ U.N. spokeswoman ♪ … … Testing features ♪ HM2 = ambassador ♪ HM2 = spokeswoman ♪ … … WordClusterHM2 = C1 WC_HM2 = C1 C1C1 president ambassador spokesman spokeswoman

2. Problems Problem 1: How to choose effective clusters? ▫The Brown word hierarchy NYU Where To Cut ?

2. Problems Problem 2: Augment which lexical feature to improve generalization accuracy? ▫Named entity recognition augments every token with cluster ▫Same for relation extraction? Relation instance NYU LeftContext M1 MidContext M2 RightContext Where To Generalize ? Where To Generalize ?

3.1 Cluster Selection Main idea ▫Rank each length (from 1 to the length of the longest bit string) based on importance measures ▫Select a subset of lengths to cut the word hierarchy  Typically select 3 or 4 prefix lengths to avoid commitment to a single cluster NYU 3. Solutions and Experiments

3.1 Cluster Selection Importance measure 1: Information Gain (IG) NYU 3. Solutions and Experiments A cluster feature with the length i to rank relation class Value of the cluster feature prior entropy of classes posterior entropy, given values V of the feature f

3.1 Cluster Selection Importance measure 2: Prefix Coverage (PC) NYU 3. Solutions and Experiments i := length := lexical feature := non-null cluster feature for the lexical feature Count (*) := number of occurrences i := length := lexical feature := non-null cluster feature for the lexical feature Count (*) := number of occurrences

3.1 Cluster Selection Other measures to compare with ▫Use All Prefixes (UA): consider every length, hoping that the underlying learning algorithm can assign proper weights ▫Exhaustive Search (ES): try every possible subset of lengths and pick the one that works the best NYU 3. Solutions and Experiments

3.1 Cluster Selection Experiment ▫Setup  348 ACE 2004 bnews and nwire documents  70 as testing, the rest 278 are split into training and development sets in a ratio of 7:3  The development set is used to learn the best lengths  Choose only 3 or 4 lengths (match prior work)  For simplicity, only augment the head of each mention with clusters  Induced 1,000 word clusters on the TDT 5 corpora using the Brown Algo. ▫Baseline  Feature based MaxEnt classification model  A large feature set:  full set from Zhou et al. (2005);  cherry-picked effective features from Zhao and Grishman (2005), Jiang and Zhai (2007) and others NYU 3. Solutions and Experiments

3.1 Cluster Selection Experiment ▫Effectiveness of Cluster Selection Methods NYU 3. Solutions and Experiments SystemF △ Training Time (in minutes) Baseline UA PC PC IG IG ES ES

3.2 Effectiveness of cluster features Explore cluster features in a systematic way ▫Rank each lexical feature according to its importance  Importance is based on linguistic intuition and performance contribution from previous research ▫Test the effectiveness of a lexical feature with augmentation of word clusters  individually and incrementally NYU 3. Solutions and Experiments

3.2 Effectiveness of cluster features Importance of lexical features ▫Simplify an instance into a 3-tuple NYU 3. Solutions and Experiments Lexical Feature Cluster Feature Importance HMHM1_WC, HM2_WC, HM12_WC 1 BagWMBagWM_W C 2 HCHC_WC3 BagWCBagWC_W C 4 M1 M2 Context Other | Head

3.2 Effectiveness of cluster features Experiment ▫Setup  5-fold cross-validation  PC4 was used to select effective clusters ▫Performance NYU 3. Solutions and Experiments

3.2 Effectiveness of cluster features The Impact of Training Size ( augment mention heads only ) NYU 3. Solutions and Experiments Sometimes word cluster features allow reduction in annotation

3.2 Effectiveness of cluster features Performance of each individual relation class NYU 3. Solutions and Experiments The highlighted 5 types share the same entity type GPE; PER-SOC holds only between PERSON and PERSON; We may say word cluster can also help to distinguish between ambiguous relation types. No improvement for the PHYS relation? It is just too hard! The highlighted 5 types share the same entity type GPE; PER-SOC holds only between PERSON and PERSON; We may say word cluster can also help to distinguish between ambiguous relation types. No improvement for the PHYS relation? It is just too hard!

4. Conclusion Main contributions ▫Proposed a principled way in choosing clusters at an appropriate level of granularity ▫Systematically explored the effectiveness of word cluster features for relation extraction Future work ▫Extend to  phrase clustering (Lin and Wu, 2009)  pattern clustering (Sun and Grishman, 2010) NYU

Thanks! NYU