Relational Entity Linking with Cross Document Coreference Xiao Cheng, Bingling Chen, Rajhans Samdani, Kai-Wei Chang, Zhiye Fei and Dan Roth University of Illinois at Urbana-Champaign (UI_CCG) 1
Talk Outline Introduction Architecture Entity Linking Approach Preprocessing Wikification Formulation Relational Analysis Cross Document Coreference Reconciliation Evaluation 2
Entity Linking Specification Query Output 3 bolt-eng-DF Lightning Bolts query_idlink_id EL13_ENG_0015NIL0006 EL13_ENG_0016E … EL13_ENG_0821NIL0006
Entity Linking using Wikification and Cross-Doc Coref 4 query_idlink_id EL13_ENG_0015NIL0006 EL13_ENG_0016E … EL13_ENG_0821NIL0006 … EL13_ENG_0937NIL0288 … EL13_ENG_1914NIL0288 Cross Document Coreference
Wikification 5 Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.
Ambiguity Concepts outside of KB (NIL) Blumenthal ? Variability Scale Millions of labels Wikification Challenges 6 Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State. Connecticut CT The Nutmeg State Times The New York Times The Times
Key Innovation Improved Wikification for Structured EL Relational Inference for Linking (Cheng and Roth, EMNLP’13) No retraining Non-trivial cross-document clustering Best Latent Left-Linking approach (Samdani et al. ’12) 7
Talk Outline Introduction Architecture Entity Linking Approach Preprocessing Wikification Formulation Relational Analysis Cross Document Coreference Evaluation 8
Entity Linking Architecture 9 Linking Wikification Cross-Doc Coreference Supervise Linking Problem Linking Problem TAC Query Preprocessing Query Normalization Document Transformation Purposeful Coreference Reconcile Linking Clusters Reconcile Linking Clusters
Talk Outline Introduction Architecture Entity Linking Approach Preprocessing Wikification Formulation Relational Analysis Cross Document Coreference Evaluation 10
Preprocessing Query normalization Handling spelling mistakes and slangs – one of the reasons we did not achieve expected performance In document coreference – some coreferent mentions are easier to link than the query mention 11 Obomber, Obamadinejad, Osama Obama, Nobama, Obambi, Obamination, ObaMao, Owe Bama, 0bama, O-balm-a, O-bomb-a
Preprocessing Document transformation Document can be as long as 100k characters for a single query Need to truncate documents but minimize the loss of critical contexts 12 Original Opening Query Context Coreferent Context
Talk Outline Introduction Architecture Entity Linking Approach Preprocessing Wikification Formulation Relational Analysis Cross Document Coreference Reconciliation Evaluation 13
State-of-the-art Wikification systems (Ratinov et al. 2011) can achieve the above with local and global statistical features Reaches bottleneck around 70%~ 85% F1 on non-wiki datasets What is missing? Wikification Bottleneck 14 Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.
, the of deposed, … Motivating Example 15 Mubarak wife Egyptian PresidentHosni Mubarak What are we missing with Bag of Words (BOW) models? Who is Mubarak? Constraining interaction between concepts (Mubarak, wife, Hosni Mubarak) Mubarak, the wife of deposed Egyptian President Hosni Mubarak, …
Relational Inference for Wikification 16 Our contribution Identify key textual relations for Wikification A global inference framework to incorporate relational knowledge Significant improvement over state-of-the-art Wikification systems Mubarak, the wife of deposed Egyptian President Hosni Mubarak, … (Mubarak, wife, Hosni Mubarak)
Mention SegmentationCandidate GenerationCandidate RankingNIL Linking 17 Traditional Wikification Pipeline Mention Segmentation Candidate Generation Candidate Ranking Determine NILs
Traditional Wikification 1 - Mention Segmentation 18...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Sub noun phrase chunks NER Capitalized phrases
Traditional Wikification 1 - Mention Segmentation 19...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Obtains nested mentions
Traditional Wikification 2 - Candidate Generation 20...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Approach Collect known mappings from Wikipedia page titles, hyperlinks… Limit to top-K candidates based on frequency of links (Ratinov et al. 2011)
Traditional Wikification 3 - Candidate Ranking 21...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Local and global statistical features
Traditional Wikification 4 – Determine NILs 22...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Is the top candidate really what the text referred to? Binary classifier This answer is wrong We did not generate the correct candidate based on top-K prior
Talk Outline Introduction Architecture Entity Linking Approach Preprocessing Wikification Formulation Relational Analysis Cross Document Coreference Reconciliation Evaluation 23
Formulation (0) Intuition Promote pairs of candidate concepts coherent with textual relations 24 Mubarak, the wife of deposed Egyptian President Hosni Mubarak, … (Mubarak, wife, Hosni Mubarak)
Formulate as an Integer Linear Program (ILP): If no relation exists, collapse to the unstructured decision Formulation (1) 25
Formulation (2) 26...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… r (1,2) 34 e k i : whether a concept is chosen s k i : score of a concept r (k,l) ij : whether a relation is present w (k,l) ij : score of a relation r (4,3) 34
Talk Outline Introduction Architecture Entity Linking Approach Preprocessing Wikification Formulation Relational Analysis Cross Document Coreference Reconciliation Evaluation 27
Overall Approach 28 Relational Wikification Candidate Generation Candidate Ranking Determine NILs Relation Analysis Relation Identification Relation Retrieval Relational Inference
Relation Identification ACE style in-document coreference (Chang et al. ‘13) Extract named entity-only coreference relations with high precision Syntactico-Semantic relations (Chan & Roth ‘10) Easy to extract with high precision Aim for high recall, as false-positives will be filtered Sparse, but covers ~80% relation instances in ACE TypeExample PremodifierIranian Ministry of Defense PossessiveNYC’s stock exchange FormulaicChicago, Illinois PrepositionPresident of the US
Relation Identification 30...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Argument 1Relation TypeArgument 2 Yugoslav PresidentappositionSlobodan Milošević coreferenceMilošević possessiveSocialist Party
Overall Approach 31 Relational Wikification Candidate Generation Candidate Ranking Determine NILs Relation Analysis Relation Identification Relation Retrieval Relational Inference
Relation Retrieval What concepts can “Socialist Party” refer to? More robust candidate generation Identified relations are verified against a knowledge base (DBPedia) 32...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…
Query Pruning Only 2 queries per pair necessary due to strong baseline. Relation Retrieval 33 q 1 =(Socialist Party of France,?, *Milošević*) q 2 =(Slobodan Milošević,?,*Socialist Party*)...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…
Relation Retrieval 34
Relation Retrieval 35...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…
Overall Approach 36 Relational Wikification Candidate Generation Candidate Ranking Determine NILs Relation Analysis Relation Identification Relation Retrieval Relational Inference
Relational Inference - coreference 37...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…
Determine unknown concepts (NILs) How to capture the fact: “Dorothy Byrne” does not refer to any concept in Wikipedia Identify coreferent nominal mention relations Generate better features for NIL classifier 38 Dorothy Byrne, a state coordinator for the Florida Green Party,… nominal mention
Determine unknown concepts (NILs) Create NIL candidate for structured inference e.g. corrects other coreferent “Dorothy” later in the document 39 Dorothy Byrne, a state coordinator for the Florida Green Party,… nominal mention
Talk Outline Introduction Architecture Entity Linking Approach Preprocessing Wikification Formulation Relational Analysis Cross Document Coreference Reconciliation Evaluation 40
Cross Document Coreference NILs can be viewed as KB entries with partial information A uniform model for entity representation Shared features with Entity Linking system Can be supervised using existing EL systems Cross document coreference cluster example: 41 Naomi Campbell to give evidence at Charles Taylor trial: spokeswoman. Supermodel Campbell says 'nothing to gain' from Taylor trial testimony.
Cross Document Coreference Approach Run document-level coreference Aggregate all features in a document-level coreferent cluster Use both mention-level features and document-level features String similarity features (NESim, Do et al. ‘09) Context TF-IDF similarity features Document-level cluster features Training: using both TAC data and Wikifier generated data 42
43
44
Talk Outline Introduction Architecture Entity Linking Approach Preprocessing Wikification Formulation Relational Analysis Cross Document Coreference Reconciliation Evaluation 45
Query mapping reconciliation 46 Seattle (0.7) Seattle Seahawks (0.8) Seattle(0.2) [Seattle] has won… [Seattle] Seahawks ended the game… … cheered for [Seattle]…
Talk Outline Introduction Architecture Entity Linking Approach Preprocessing Wikification Formulation Relational Analysis Cross Document Coreference Reconciliation Evaluation 47
Evaluation – TAC KBP 2011 Entity Linking Run Relational Inference (RI) Wikifier “as-is”: No retraining using TAC data 48 *Median of top 14 systems
Evaluation – TAC 2012 Entity Linking Error Analysis 49
Official 2013 Performance 50
Official 2013 Performance Break-down: Link Type 51
Official 2013 Performance Break-down: Doc domain 52
Official 2013 Performance Break-down: NER type 53
Conclusion Importance of linguistic and world knowledge Identification of relational information benefits Wikification and Entity Linking Future work Robust preprocessing on noisy input/adapt to EL task requirement “Self-supervision” on NIL clustering Unified NIL and KB entity representation Joint entity typing, coreference and disambiguation Incorporate more relations 54 Demo: Download: Thank you!
BACK UP SLIDES Back up slides 55
Applications Knowledge Acquisition via Grounding Coreference Resolution Learning-based multi-sieve co-reference resolution with knowledge (Ratinov et al. 2012) Information Extraction Unsupervised relation discovery with sense disambiguation (Yao et al. 2012) Automatic Event Extraction with Structured Preference Modeling (Lu and Roth, 2012 ) Text Classification Gabrilovich and Markovitch, 2007; Chang et al.,
Wikification Performance Result 57