Relational Entity Linking with Cross Document Coreference Xiao Cheng, Bingling Chen, Rajhans Samdani, Kai-Wei Chang, Zhiye Fei and Dan Roth University.

Slides:



Advertisements
Similar presentations
Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
Advertisements

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji, and.
October 2014 Paul Kantor’s Fusion Fest Workshop Making Sense of Unstructured Data Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.
Textual Relations Task Definition Annotate input text with disambiguated Wikipedia titles: Motivation Current state-of-the-art Wikifiers, using purely.
Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology.
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
A Linear Programming Formulation for Global Inference in Natural Language Tasks Dan RothWen-tau Yih Department of Computer Science University of Illinois.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Global and Local Wikification (GLOW) in TAC KBP Entity Linking Shared Task 2011 Lev Ratinov, Dan Roth This research is supported by the Defense Advanced.
Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao.
Learning to Extract Form Labels Nguyen et al.. The Challenge We want to retrieve and integrate online databases We want to retrieve and integrate online.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
Multi-view Exploratory Learning for AKBC Problems Bhavana Dalvi and William W. Cohen School Of Computer Science, Carnegie Mellon University Motivation.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
Relational Inference for Wikification
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
A Two Tier Framework for Context-Aware Service Organization & Discovery Wei Zhang 1, Jian Su 2, Bin Chen 2,WentingWang 2, Zhiqiang Toh 2, Yanchuan Sim.
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu.
Coreference Resolution with Knowledge Haoruo Peng March 20,
Learning to Link with Wikipedia David Milne and Ian H. Witten Department of Computer Science, University of Waikato CIKM 2008 (Best Paper Award) Presented.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
Using linked data to interpret tables Varish Mulwad September 14,
Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research.
Bo Lin Kevin Dela Rosa Rushin Shah.  As part of our research, we are working on a cross- document co-reference resolution system  Co-reference Resolution:
Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
2016/3/11 Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge Xia Hu, Nan Sun, Chao Zhang, Tat-Seng Chu.
Cold-Start KBP Something from Nothing Sean Monahan, Dean Carpenter Language Computer.
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Solving Hard Coreference Problems Haoruo Peng, Daniel Khashabi and Dan Roth Problem Description  Problems with Existing Coref Systems Rely heavily on.
Cross-Lingual Named Entity Recognition via Wikification
Automatically Labeled Data Generation for Large Scale Event Extraction
Concept Grounding to Multiple Knowledge Bases via Indirect Supervision
Part 2 Applications of ILP Formulations in Natural Language Processing
CRF &SVM in Medication Extraction
By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS
Improving a Pipeline Architecture for Shallow Discourse Parsing
X Ambiguity & Variability The Challenge The Wikifier Solution
Lecture 24: NER & Entity Linking
Applying Key Phrase Extraction to aid Invalidity Search
Relational Inference for Wikification
Presented by: Prof. Ali Jaoua
A Machine Learning Approach to Coreference Resolution of Noun Phrases
Deep Robust Unsupervised Multi-Modal Network
A Machine Learning Approach to Coreference Resolution of Noun Phrases
Entity Linking Survey
Topic: Semantic Text Mining
Presentation transcript:

Relational Entity Linking with Cross Document Coreference Xiao Cheng, Bingling Chen, Rajhans Samdani, Kai-Wei Chang, Zhiye Fei and Dan Roth University of Illinois at Urbana-Champaign (UI_CCG) 1

Talk Outline Introduction  Architecture Entity Linking Approach  Preprocessing  Wikification Formulation Relational Analysis  Cross Document Coreference  Reconciliation Evaluation 2

Entity Linking Specification Query Output 3 bolt-eng-DF Lightning Bolts query_idlink_id EL13_ENG_0015NIL0006 EL13_ENG_0016E … EL13_ENG_0821NIL0006

Entity Linking using Wikification and Cross-Doc Coref 4 query_idlink_id EL13_ENG_0015NIL0006 EL13_ENG_0016E … EL13_ENG_0821NIL0006 … EL13_ENG_0937NIL0288 … EL13_ENG_1914NIL0288 Cross Document Coreference

Wikification 5 Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.

Ambiguity Concepts outside of KB (NIL)  Blumenthal ? Variability Scale  Millions of labels Wikification Challenges 6 Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State. Connecticut CT The Nutmeg State Times The New York Times The Times

Key Innovation Improved Wikification for Structured EL  Relational Inference for Linking (Cheng and Roth, EMNLP’13)  No retraining Non-trivial cross-document clustering  Best Latent Left-Linking approach (Samdani et al. ’12) 7

Talk Outline Introduction  Architecture Entity Linking Approach  Preprocessing  Wikification Formulation Relational Analysis  Cross Document Coreference Evaluation 8

Entity Linking Architecture 9 Linking Wikification Cross-Doc Coreference Supervise Linking Problem Linking Problem TAC Query Preprocessing Query Normalization Document Transformation Purposeful Coreference Reconcile Linking Clusters Reconcile Linking Clusters

Talk Outline Introduction  Architecture Entity Linking Approach  Preprocessing  Wikification Formulation Relational Analysis  Cross Document Coreference Evaluation 10

Preprocessing Query normalization  Handling spelling mistakes and slangs – one of the reasons we did not achieve expected performance  In document coreference – some coreferent mentions are easier to link than the query mention 11 Obomber, Obamadinejad, Osama Obama, Nobama, Obambi, Obamination, ObaMao, Owe Bama, 0bama, O-balm-a, O-bomb-a

Preprocessing Document transformation  Document can be as long as 100k characters for a single query  Need to truncate documents but minimize the loss of critical contexts 12 Original Opening Query Context Coreferent Context

Talk Outline Introduction  Architecture Entity Linking Approach  Preprocessing  Wikification Formulation Relational Analysis  Cross Document Coreference  Reconciliation Evaluation 13

State-of-the-art Wikification systems (Ratinov et al. 2011) can achieve the above with local and global statistical features  Reaches bottleneck around 70%~ 85% F1 on non-wiki datasets  What is missing? Wikification Bottleneck 14 Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.

, the of deposed, … Motivating Example 15 Mubarak wife Egyptian PresidentHosni Mubarak What are we missing with Bag of Words (BOW) models?  Who is Mubarak? Constraining interaction between concepts  (Mubarak, wife, Hosni Mubarak) Mubarak, the wife of deposed Egyptian President Hosni Mubarak, …

Relational Inference for Wikification 16 Our contribution  Identify key textual relations for Wikification  A global inference framework to incorporate relational knowledge Significant improvement over state-of-the-art Wikification systems Mubarak, the wife of deposed Egyptian President Hosni Mubarak, … (Mubarak, wife, Hosni Mubarak)

Mention SegmentationCandidate GenerationCandidate RankingNIL Linking 17 Traditional Wikification Pipeline Mention Segmentation Candidate Generation Candidate Ranking Determine NILs

Traditional Wikification 1 - Mention Segmentation 18...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Sub noun phrase chunks NER Capitalized phrases

Traditional Wikification 1 - Mention Segmentation 19...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Obtains nested mentions

Traditional Wikification 2 - Candidate Generation 20...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Approach  Collect known mappings from Wikipedia page titles, hyperlinks…  Limit to top-K candidates based on frequency of links (Ratinov et al. 2011)

Traditional Wikification 3 - Candidate Ranking 21...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Local and global statistical features

Traditional Wikification 4 – Determine NILs 22...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Is the top candidate really what the text referred to?  Binary classifier This answer is wrong  We did not generate the correct candidate based on top-K prior

Talk Outline Introduction  Architecture Entity Linking Approach  Preprocessing  Wikification Formulation Relational Analysis  Cross Document Coreference  Reconciliation Evaluation 23

Formulation (0) Intuition  Promote pairs of candidate concepts coherent with textual relations 24 Mubarak, the wife of deposed Egyptian President Hosni Mubarak, … (Mubarak, wife, Hosni Mubarak)

Formulate as an Integer Linear Program (ILP): If no relation exists, collapse to the unstructured decision Formulation (1) 25

Formulation (2) 26...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… r (1,2) 34  e k i : whether a concept is chosen  s k i : score of a concept  r (k,l) ij : whether a relation is present  w (k,l) ij : score of a relation r (4,3) 34

Talk Outline Introduction  Architecture Entity Linking Approach  Preprocessing  Wikification Formulation Relational Analysis  Cross Document Coreference  Reconciliation Evaluation 27

Overall Approach 28 Relational Wikification Candidate Generation Candidate Ranking Determine NILs Relation Analysis Relation Identification Relation Retrieval Relational Inference

Relation Identification ACE style in-document coreference (Chang et al. ‘13)  Extract named entity-only coreference relations with high precision Syntactico-Semantic relations (Chan & Roth ‘10)  Easy to extract with high precision  Aim for high recall, as false-positives will be filtered  Sparse, but covers ~80% relation instances in ACE TypeExample PremodifierIranian Ministry of Defense PossessiveNYC’s stock exchange FormulaicChicago, Illinois PrepositionPresident of the US

Relation Identification 30...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Argument 1Relation TypeArgument 2 Yugoslav PresidentappositionSlobodan Milošević coreferenceMilošević possessiveSocialist Party

Overall Approach 31 Relational Wikification Candidate Generation Candidate Ranking Determine NILs Relation Analysis Relation Identification Relation Retrieval Relational Inference

Relation Retrieval What concepts can “Socialist Party” refer to? More robust candidate generation  Identified relations are verified against a knowledge base (DBPedia) 32...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

Query Pruning  Only 2 queries per pair necessary due to strong baseline. Relation Retrieval 33 q 1 =(Socialist Party of France,?, *Milošević*) q 2 =(Slobodan Milošević,?,*Socialist Party*)...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

Relation Retrieval 34

Relation Retrieval 35...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

Overall Approach 36 Relational Wikification Candidate Generation Candidate Ranking Determine NILs Relation Analysis Relation Identification Relation Retrieval Relational Inference

Relational Inference - coreference 37...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

Determine unknown concepts (NILs) How to capture the fact:  “Dorothy Byrne” does not refer to any concept in Wikipedia Identify coreferent nominal mention relations  Generate better features for NIL classifier 38 Dorothy Byrne, a state coordinator for the Florida Green Party,… nominal mention

Determine unknown concepts (NILs) Create NIL candidate for structured inference  e.g. corrects other coreferent “Dorothy” later in the document 39 Dorothy Byrne, a state coordinator for the Florida Green Party,… nominal mention

Talk Outline Introduction  Architecture Entity Linking Approach  Preprocessing  Wikification Formulation Relational Analysis  Cross Document Coreference  Reconciliation Evaluation 40

Cross Document Coreference NILs can be viewed as KB entries with partial information  A uniform model for entity representation  Shared features with Entity Linking system  Can be supervised using existing EL systems Cross document coreference cluster example: 41 Naomi Campbell to give evidence at Charles Taylor trial: spokeswoman. Supermodel Campbell says 'nothing to gain' from Taylor trial testimony.

Cross Document Coreference Approach Run document-level coreference Aggregate all features in a document-level coreferent cluster Use both mention-level features and document-level features  String similarity features (NESim, Do et al. ‘09)  Context TF-IDF similarity features  Document-level cluster features Training: using both TAC data and Wikifier generated data 42

43

44

Talk Outline Introduction  Architecture Entity Linking Approach  Preprocessing  Wikification Formulation Relational Analysis  Cross Document Coreference  Reconciliation Evaluation 45

Query mapping reconciliation 46 Seattle (0.7) Seattle Seahawks (0.8) Seattle(0.2) [Seattle] has won… [Seattle] Seahawks ended the game… … cheered for [Seattle]…

Talk Outline Introduction  Architecture Entity Linking Approach  Preprocessing  Wikification Formulation Relational Analysis  Cross Document Coreference  Reconciliation Evaluation 47

Evaluation – TAC KBP 2011 Entity Linking Run Relational Inference (RI) Wikifier “as-is”:  No retraining using TAC data 48 *Median of top 14 systems

Evaluation – TAC 2012 Entity Linking Error Analysis 49

Official 2013 Performance 50

Official 2013 Performance Break-down: Link Type 51

Official 2013 Performance Break-down: Doc domain 52

Official 2013 Performance Break-down: NER type 53

Conclusion Importance of linguistic and world knowledge Identification of relational information benefits Wikification and Entity Linking Future work  Robust preprocessing on noisy input/adapt to EL task requirement  “Self-supervision” on NIL clustering  Unified NIL and KB entity representation  Joint entity typing, coreference and disambiguation  Incorporate more relations 54 Demo: Download: Thank you!

BACK UP SLIDES Back up slides 55

Applications Knowledge Acquisition via Grounding Coreference Resolution  Learning-based multi-sieve co-reference resolution with knowledge (Ratinov et al. 2012) Information Extraction  Unsupervised relation discovery with sense disambiguation (Yao et al. 2012)  Automatic Event Extraction with Structured Preference Modeling (Lu and Roth, 2012 ) Text Classification  Gabrilovich and Markovitch, 2007; Chang et al.,

Wikification Performance Result 57