Cross-Domain Bootstrapping for Named Entity Recognition Ang Sun Ralph Grishman New York University July 28, 2011 Beijing, EOS, SIGIR 2011 NYU.

Slides:



Advertisements
Similar presentations
Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Rapid Training of Information Extraction with Local and Global Data Views Dissertation Defense Ang Sun Computer Science Department New York University.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Semi-supervised Relation Extraction with Large-scale Word Clustering Ang Sun Ralph Grishman Satoshi Sekine New York University June 20, 2011 NYU.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.
Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,
LingPipe Does a variety of tasks  Tokenization  Part of Speech Tagging  Named Entity Detection  Clustering  Identifies.
Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.
Introduction to Automatic Classification Shih-Wen (George) Ke 7 th Dec 2005.
Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition Author: Andrew Borthwick John Sterling Eugene Agichtein Ralph Grishman.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Ensemble Learning (2), Tree and Forest
Instance Weighting for Domain Adaptation in NLP Jing Jiang & ChengXiang Zhai University of Illinois at Urbana-Champaign June 25, 2007.
A Phonotactic-Semantic Paradigm for Automatic Spoken Document Classification Bin MA and Haizhou LI Institute for Infocomm Research Singapore.
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
Intelius-NYU Cold Start System Ang Sun, Xin Wang, Sen Xu, Yigit Kiran, Shakthi Poornima, Andrew Borthwick (Intelius Inc.) Ralph Grishman (New York University)
1 Named Entity Recognition based on three different machine learning techniques Zornitsa Kozareva JRC Workshop September 27, 2005.
The Necessity of Combining Adaptation Methods Cognitive Computation Group, University of Illinois Experimental Results Title Ming-Wei Chang, Michael Connor.
Named Entity Recognition based on Bilingual Co-training Li Yegang School of Computer, BIT.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Semi-supervised Training of Statistical Parsers CMSC Natural Language Processing January 26, 2006.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
NYU: Description of the Proteus/PET System as Used for MUC-7 ST Roman Yangarber & Ralph Grishman Presented by Jinying Chen 10/04/2002.
Entity Set Expansion in Opinion Documents Lei Zhang Bing Liu University of Illinois at Chicago.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
■. America by 2000 ■ By the late 1990s, the U.S. was changing – The economic boom was slowing down & the “tech bubble” was about to burst – Party politics.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-
National Taiwan University, Taiwan
Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Domain Adaptation for Biomedical Information Extraction Jing Jiang BeeSpace Seminar Oct 17, 2007.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Semi-automatic Product Attribute Extraction from Store Website
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
John Lafferty Andrew McCallum Fernando Pereira
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
An i-Vector PLDA based Gender Identification Approach for Severely Distorted and Multilingual DARPA RATS Data Shivesh Ranjan, Gang Liu and John H. L. Hansen.
Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
BOOTSTRAPPING INFORMATION EXTRACTION FROM SEMI-STRUCTURED WEB PAGES Andrew Carson and Charles Schafer.
Multi-Criteria-based Active Learning for Named Entity Recognition ACL 2004.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Simone Paolo Ponzetto University of Heidelberg Massimo Poesio
Relation Extraction CSCI-GA.2591
Intent-Aware Semantic Query Annotation
Introduction Task: extracting relational facts from text
Intent-Aware Semantic Query Annotation
Hierarchical, Perceptron-like Learning for OBIE
Presentation transcript:

Cross-Domain Bootstrapping for Named Entity Recognition Ang Sun Ralph Grishman New York University July 28, 2011 Beijing, EOS, SIGIR 2011 NYU

Outline 1.Named Entity Recognition (NER) 2.Domain Adaptation Problem for NER 3.Cross-domain Bootstrapping 3.1Feature Generalization with Word Clusters 3.2Instance Selection Based on Multiple Criteria 4.Conclusion NYU

1.Named Entity Recognition (NER)  Two missions  U.S. Defense Secretary Donald H. Rumsfeld discussed the resolution … NYU Identification Classification NAME GPEORGPERSON

2.Domain Adaptation Problem for NER  NYU NER system performs well on in-domain data (F- measure 83.08)  But performs poorly on out-of-domain data (F- measure 65.09) NYU Source domain (news articles) George Bush Donald H. Rumsfeld … Department of Defense … Source domain (news articles) George Bush Donald H. Rumsfeld … Department of Defense … Target domain (reports on terrorism) Abdul Sattar al-Rishawi Fahad bin Abdul Aziz bin Abdul Rahman Al-Saud … Al-Qaeda in Iraq … Target domain (reports on terrorism) Abdul Sattar al-Rishawi Fahad bin Abdul Aziz bin Abdul Rahman Al-Saud … Al-Qaeda in Iraq …

2.Domain Adaptation Problem for NER NYU 1.No annotated data from the target domain 2.Many words are out-of-vocabulary 3.Naming conventions are different: 1.Length: short vs long source: George Bush; Donald H. Rumsfeld target: Abdul Sattar al-Rishawi; Fahad bin Abdul Aziz bin Abdul Rahman Al-Saud 2.Capitalization: weaker in target 4.Name variation occurs often in target Shaikh, Shaykh, Sheikh, Sheik, … 1.No annotated data from the target domain 2.Many words are out-of-vocabulary 3.Naming conventions are different: 1.Length: short vs long source: George Bush; Donald H. Rumsfeld target: Abdul Sattar al-Rishawi; Fahad bin Abdul Aziz bin Abdul Rahman Al-Saud 2.Capitalization: weaker in target 4.Name variation occurs often in target Shaikh, Shaykh, Sheikh, Sheik, … We want to automatically adapt the source- domain tagger to the target domain without annotating target domain data We want to automatically adapt the source- domain tagger to the target domain without annotating target domain data

3. Cross-domain Bootstrapping 1.Train a tagger from labeled source data 2.Tag all unlabeled target data with current tagger 3.Select good tagged words and add these to labeled data 4.Re-train the tagger Trained tagger Unlabeled target data Instance Selection Labeled Source data President Assad Feature Generalization Multiple Criteria NYU

3.1 Feature Generalization with Word Clusters  The source model  Sequential model, assigning name classes to a sequence of tokens  One name type is split into two classes  B_PER (beginning of PERSON)  I_PER (continuation of PERSON)  Maximum Entropy Markov Model (McCallum et al., 2000)  Customary features NYU 3. Cross-domain Bootstrapping U.S.DefenseSecretaryDonaldH.Rumsfeld B_GPEB_ORGOB_PERI_PER

3.1 Feature Generalization with Word Clusters  The source/seed model  Customary features  Extracted from context window ( t i-2, t i-1, t i, t i+1, t i+2 ) NYU 3. Cross-domain Bootstrapping U.S.DefenseSecretaryDonaldH.Rumsfeld B_GPEB_ORGOB_PERI_PER currentTokenDonald wordType_currentTokeninitial_capitalized previousToken_-1Secretary previousToken_-1_class O previousToken_-2Defense nextToken_+1H. ……

3.1 Feature Generalization with Word Clusters Build a word hierarchy from a 10M word corpus (Source + Target), using the Brown word clustering algorithm Represent each word as a bit string NYU Bit stringExamples John, James, Mike, Steven Abdul, Mustafa, Abi, Abdel Shaikh, Shaykh, Sheikh, Sheik Qaeda, Qaida, qaeda, QAEDA FBI, FDA, NYPD Taliban

3.1 Feature Generalization with Word Clusters Add an additional layer of features that include word clusters currentToken = John currentPrefix3 = 100fires also for target words  To avoid commitment to a single cluster: cut word hierarchy at different levels NYU

3.1 Feature Generalization with Word Clusters  Performance on the target domain  Test set contains 23K tokens  PERSON/ORGANIZATION/GPE 771/585/559 instances  All other tokens belong to not-a-name class  4 points improvement of F-measure NYU

3.2 Instance Selection Based on Multiple Criteria  Single-domain bootstrapping uses a confidence measure as the single selection criterion  In a cross-domain setting, the most confidently labeled instances  are highly correlated with the source domain  contain little information about the target domain.  We propose multiple criteria  Criterion 1: Novelty– prefer target-specific instances  Promote Abdul instead of John NYU

3.2 Instance Selection Based on Multiple Criteria  Criterion 2: Confidence - prefer confidently labeled instances  Local confidence: based on local features NYU

3.2 Instance Selection Based on Multiple Criteria  Criterion 2: Confidence  Global confidence: based on corpus statistics NYU 1PrimeMinisterAbdulKarimKabaritiPER 2warlordGeneralAbdulRashidDostumPER 3PresidentA.P.J.AbdulKalamwillPER 4PresidentA.P.J.AbdulKalamhasPER 5AbdullahbinAbdulAziz,PER 6atKingAbdulAzizUniversityORG 7NawabMohammedAbdulAli,PER 8DrAliAbdulAzizAlPER 9NayefbinAbdulAzizsaidPER 10leaderGeneralAbdulRashidDostumPER

3.2 Instance Selection Based on Multiple Criteria  Criterion 2: Confidence  Global confidence  Combined confidence: product of local and global confidence NYU

3.2 Instance Selection Based on Multiple Criteria  Criterion 3: Density - prefer representative instances which can be seen as centroid instances NYU

3.2 Instance Selection Based on Multiple Criteria  Criterion 4: Diversity - prefer a set of diverse instances instead of similar instances  “, said * in his”  Highly confident instance  High density, representative instance  BUT, continuing to promote such instance would not gain additional benefit NYU

3.2 Instance Selection Based on Multiple Criteria  Putting all criteria together 1.Novelty: filter out source-dependent instances 2.Confidence: rank instances based on confidence and the top ranked instances will be used to generate a candidate set 3.Density: rank instances in the candidate set in descending order of density 4.Diversity: 1.accepts the first instance (with the highest density) in the candidate set 2.and selects other candidates based on the diff measure. NYU

3.2 Instance Selection Based on Multiple Criteria  Results NYU

4. Conclusion  Proposed a general cross-domain bootstrapping algorithm for adapting a model trained only on a source domain to a target domain  Improved the source model’s F score by around 7 points  This is achieved 1.without using any annotated data from the target domain 2.without explicitly encoding any target-domain-specific knowledge into our system  The improvement is largely due to 1.the feature generalization of the source model with word clusters 2.the multi-criteria-based instance selection method NYU