Download presentation
Presentation is loading. Please wait.
Published bySharon Darby Modified over 9 years ago
1
Relational Entity Linking with Cross Document Coreference Xiao Cheng, Bingling Chen, Rajhans Samdani, Kai-Wei Chang, Zhiye Fei and Dan Roth University of Illinois at Urbana-Champaign (UI_CCG) 1
2
Talk Outline Introduction Architecture Entity Linking Approach Preprocessing Wikification Formulation Relational Analysis Cross Document Coreference Reconciliation Evaluation 2
3
Entity Linking Specification Query Output 3 bolt-eng-DF-170-181137-9030298 Lightning Bolts 15959 15973 query_idlink_id EL13_ENG_0015NIL0006 EL13_ENG_0016E0273299 … EL13_ENG_0821NIL0006
4
Entity Linking using Wikification and Cross-Doc Coref 4 query_idlink_id EL13_ENG_0015NIL0006 EL13_ENG_0016E0273299 … EL13_ENG_0821NIL0006 … EL13_ENG_0937NIL0288 … EL13_ENG_1914NIL0288 Cross Document Coreference
5
Wikification 5 Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.
6
Ambiguity Concepts outside of KB (NIL) Blumenthal ? Variability Scale Millions of labels Wikification Challenges 6 Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State. Connecticut CT The Nutmeg State Times The New York Times The Times
7
Key Innovation Improved Wikification for Structured EL Relational Inference for Linking (Cheng and Roth, EMNLP’13) No retraining Non-trivial cross-document clustering Best Latent Left-Linking approach (Samdani et al. ’12) 7
8
Talk Outline Introduction Architecture Entity Linking Approach Preprocessing Wikification Formulation Relational Analysis Cross Document Coreference Evaluation 8
9
Entity Linking Architecture 9 Linking Wikification Cross-Doc Coreference Supervise Linking Problem Linking Problem TAC Query Preprocessing Query Normalization Document Transformation Purposeful Coreference Reconcile Linking Clusters Reconcile Linking Clusters
10
Talk Outline Introduction Architecture Entity Linking Approach Preprocessing Wikification Formulation Relational Analysis Cross Document Coreference Evaluation 10
11
Preprocessing Query normalization Handling spelling mistakes and slangs – one of the reasons we did not achieve expected performance In document coreference – some coreferent mentions are easier to link than the query mention 11 Obomber, Obamadinejad, Osama Obama, Nobama, Obambi, Obamination, ObaMao, Owe Bama, 0bama, O-balm-a, O-bomb-a
12
Preprocessing Document transformation Document can be as long as 100k characters for a single query Need to truncate documents but minimize the loss of critical contexts 12 Original Opening Query Context Coreferent Context
13
Talk Outline Introduction Architecture Entity Linking Approach Preprocessing Wikification Formulation Relational Analysis Cross Document Coreference Reconciliation Evaluation 13
14
State-of-the-art Wikification systems (Ratinov et al. 2011) can achieve the above with local and global statistical features Reaches bottleneck around 70%~ 85% F1 on non-wiki datasets What is missing? Wikification Bottleneck 14 Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.
15
, the of deposed, … Motivating Example 15 Mubarak wife Egyptian PresidentHosni Mubarak What are we missing with Bag of Words (BOW) models? Who is Mubarak? Constraining interaction between concepts (Mubarak, wife, Hosni Mubarak) Mubarak, the wife of deposed Egyptian President Hosni Mubarak, …
16
Relational Inference for Wikification 16 Our contribution Identify key textual relations for Wikification A global inference framework to incorporate relational knowledge Significant improvement over state-of-the-art Wikification systems Mubarak, the wife of deposed Egyptian President Hosni Mubarak, … (Mubarak, wife, Hosni Mubarak)
17
Mention SegmentationCandidate GenerationCandidate RankingNIL Linking 17 Traditional Wikification Pipeline Mention Segmentation Candidate Generation Candidate Ranking Determine NILs
18
Traditional Wikification 1 - Mention Segmentation 18...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Sub noun phrase chunks NER Capitalized phrases
19
Traditional Wikification 1 - Mention Segmentation 19...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Obtains nested mentions
20
Traditional Wikification 2 - Candidate Generation 20...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Approach Collect known mappings from Wikipedia page titles, hyperlinks… Limit to top-K candidates based on frequency of links (Ratinov et al. 2011)
21
Traditional Wikification 3 - Candidate Ranking 21...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Local and global statistical features
22
Traditional Wikification 4 – Determine NILs 22...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Is the top candidate really what the text referred to? Binary classifier This answer is wrong We did not generate the correct candidate based on top-K prior
23
Talk Outline Introduction Architecture Entity Linking Approach Preprocessing Wikification Formulation Relational Analysis Cross Document Coreference Reconciliation Evaluation 23
24
Formulation (0) Intuition Promote pairs of candidate concepts coherent with textual relations 24 Mubarak, the wife of deposed Egyptian President Hosni Mubarak, … (Mubarak, wife, Hosni Mubarak)
25
Formulate as an Integer Linear Program (ILP): If no relation exists, collapse to the unstructured decision Formulation (1) 25
26
Formulation (2) 26...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… r (1,2) 34 e k i : whether a concept is chosen s k i : score of a concept r (k,l) ij : whether a relation is present w (k,l) ij : score of a relation r (4,3) 34
27
Talk Outline Introduction Architecture Entity Linking Approach Preprocessing Wikification Formulation Relational Analysis Cross Document Coreference Reconciliation Evaluation 27
28
Overall Approach 28 Relational Wikification Candidate Generation Candidate Ranking Determine NILs Relation Analysis Relation Identification Relation Retrieval Relational Inference
29
Relation Identification ACE style in-document coreference (Chang et al. ‘13) Extract named entity-only coreference relations with high precision Syntactico-Semantic relations (Chan & Roth ‘10) Easy to extract with high precision Aim for high recall, as false-positives will be filtered Sparse, but covers ~80% relation instances in ACE2004 29 TypeExample PremodifierIranian Ministry of Defense PossessiveNYC’s stock exchange FormulaicChicago, Illinois PrepositionPresident of the US
30
Relation Identification 30...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Argument 1Relation TypeArgument 2 Yugoslav PresidentappositionSlobodan Milošević coreferenceMilošević possessiveSocialist Party
31
Overall Approach 31 Relational Wikification Candidate Generation Candidate Ranking Determine NILs Relation Analysis Relation Identification Relation Retrieval Relational Inference
32
Relation Retrieval What concepts can “Socialist Party” refer to? More robust candidate generation Identified relations are verified against a knowledge base (DBPedia) 32...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…
33
Query Pruning Only 2 queries per pair necessary due to strong baseline. Relation Retrieval 33 q 1 =(Socialist Party of France,?, *Milošević*) q 2 =(Slobodan Milošević,?,*Socialist Party*)...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…
34
Relation Retrieval 34
35
Relation Retrieval 35...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…
36
Overall Approach 36 Relational Wikification Candidate Generation Candidate Ranking Determine NILs Relation Analysis Relation Identification Relation Retrieval Relational Inference
37
Relational Inference - coreference 37...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…
38
Determine unknown concepts (NILs) How to capture the fact: “Dorothy Byrne” does not refer to any concept in Wikipedia Identify coreferent nominal mention relations Generate better features for NIL classifier 38 Dorothy Byrne, a state coordinator for the Florida Green Party,… nominal mention
39
Determine unknown concepts (NILs) Create NIL candidate for structured inference e.g. corrects other coreferent “Dorothy” later in the document 39 Dorothy Byrne, a state coordinator for the Florida Green Party,… nominal mention
40
Talk Outline Introduction Architecture Entity Linking Approach Preprocessing Wikification Formulation Relational Analysis Cross Document Coreference Reconciliation Evaluation 40
41
Cross Document Coreference NILs can be viewed as KB entries with partial information A uniform model for entity representation Shared features with Entity Linking system Can be supervised using existing EL systems Cross document coreference cluster example: 41 Naomi Campbell to give evidence at Charles Taylor trial: spokeswoman. Supermodel Campbell says 'nothing to gain' from Taylor trial testimony.
42
Cross Document Coreference Approach Run document-level coreference Aggregate all features in a document-level coreferent cluster Use both mention-level features and document-level features String similarity features (NESim, Do et al. ‘09) Context TF-IDF similarity features Document-level cluster features Training: using both TAC data and Wikifier generated data 42
43
43
44
44
45
Talk Outline Introduction Architecture Entity Linking Approach Preprocessing Wikification Formulation Relational Analysis Cross Document Coreference Reconciliation Evaluation 45
46
Query mapping reconciliation 46 Seattle (0.7) Seattle Seahawks (0.8) Seattle(0.2) [Seattle] has won… [Seattle] Seahawks ended the game… … cheered for [Seattle]…
47
Talk Outline Introduction Architecture Entity Linking Approach Preprocessing Wikification Formulation Relational Analysis Cross Document Coreference Reconciliation Evaluation 47
48
Evaluation – TAC KBP 2011 Entity Linking Run Relational Inference (RI) Wikifier “as-is”: No retraining using TAC data 48 *Median of top 14 systems
49
Evaluation – TAC 2012 Entity Linking Error Analysis 49
50
Official 2013 Performance 50
51
Official 2013 Performance Break-down: Link Type 51
52
Official 2013 Performance Break-down: Doc domain 52
53
Official 2013 Performance Break-down: NER type 53
54
Conclusion Importance of linguistic and world knowledge Identification of relational information benefits Wikification and Entity Linking Future work Robust preprocessing on noisy input/adapt to EL task requirement “Self-supervision” on NIL clustering Unified NIL and KB entity representation Joint entity typing, coreference and disambiguation Incorporate more relations 54 Demo: http://cogcomp.cs.illinois.edu/demo/wikifyhttp://cogcomp.cs.illinois.edu/demo/wikify Download: http://cogcomp.cs.illinois.edu/page/download_view/Wikifierhttp://cogcomp.cs.illinois.edu/page/download_view/Wikifier Thank you!
55
BACK UP SLIDES Back up slides 55
56
Applications Knowledge Acquisition via Grounding Coreference Resolution Learning-based multi-sieve co-reference resolution with knowledge (Ratinov et al. 2012) Information Extraction Unsupervised relation discovery with sense disambiguation (Yao et al. 2012) Automatic Event Extraction with Structured Preference Modeling (Lu and Roth, 2012 ) Text Classification Gabrilovich and Markovitch, 2007; Chang et al., 2008 56
57
Wikification Performance Result 57
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.