Presentation is loading. Please wait.

Presentation is loading. Please wait.

Relational Entity Linking with Cross Document Coreference Xiao Cheng, Bingling Chen, Rajhans Samdani, Kai-Wei Chang, Zhiye Fei and Dan Roth University.

Similar presentations


Presentation on theme: "Relational Entity Linking with Cross Document Coreference Xiao Cheng, Bingling Chen, Rajhans Samdani, Kai-Wei Chang, Zhiye Fei and Dan Roth University."— Presentation transcript:

1 Relational Entity Linking with Cross Document Coreference Xiao Cheng, Bingling Chen, Rajhans Samdani, Kai-Wei Chang, Zhiye Fei and Dan Roth University of Illinois at Urbana-Champaign (UI_CCG) 1

2 Talk Outline Introduction  Architecture Entity Linking Approach  Preprocessing  Wikification Formulation Relational Analysis  Cross Document Coreference  Reconciliation Evaluation 2

3 Entity Linking Specification Query Output 3 bolt-eng-DF-170-181137-9030298 Lightning Bolts 15959 15973 query_idlink_id EL13_ENG_0015NIL0006 EL13_ENG_0016E0273299 … EL13_ENG_0821NIL0006

4 Entity Linking using Wikification and Cross-Doc Coref 4 query_idlink_id EL13_ENG_0015NIL0006 EL13_ENG_0016E0273299 … EL13_ENG_0821NIL0006 … EL13_ENG_0937NIL0288 … EL13_ENG_1914NIL0288 Cross Document Coreference

5 Wikification 5 Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.

6 Ambiguity Concepts outside of KB (NIL)  Blumenthal ? Variability Scale  Millions of labels Wikification Challenges 6 Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State. Connecticut CT The Nutmeg State Times The New York Times The Times

7 Key Innovation Improved Wikification for Structured EL  Relational Inference for Linking (Cheng and Roth, EMNLP’13)  No retraining Non-trivial cross-document clustering  Best Latent Left-Linking approach (Samdani et al. ’12) 7

8 Talk Outline Introduction  Architecture Entity Linking Approach  Preprocessing  Wikification Formulation Relational Analysis  Cross Document Coreference Evaluation 8

9 Entity Linking Architecture 9 Linking Wikification Cross-Doc Coreference Supervise Linking Problem Linking Problem TAC Query Preprocessing Query Normalization Document Transformation Purposeful Coreference Reconcile Linking Clusters Reconcile Linking Clusters

10 Talk Outline Introduction  Architecture Entity Linking Approach  Preprocessing  Wikification Formulation Relational Analysis  Cross Document Coreference Evaluation 10

11 Preprocessing Query normalization  Handling spelling mistakes and slangs – one of the reasons we did not achieve expected performance  In document coreference – some coreferent mentions are easier to link than the query mention 11 Obomber, Obamadinejad, Osama Obama, Nobama, Obambi, Obamination, ObaMao, Owe Bama, 0bama, O-balm-a, O-bomb-a

12 Preprocessing Document transformation  Document can be as long as 100k characters for a single query  Need to truncate documents but minimize the loss of critical contexts 12 Original Opening Query Context Coreferent Context

13 Talk Outline Introduction  Architecture Entity Linking Approach  Preprocessing  Wikification Formulation Relational Analysis  Cross Document Coreference  Reconciliation Evaluation 13

14 State-of-the-art Wikification systems (Ratinov et al. 2011) can achieve the above with local and global statistical features  Reaches bottleneck around 70%~ 85% F1 on non-wiki datasets  What is missing? Wikification Bottleneck 14 Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.

15 , the of deposed, … Motivating Example 15 Mubarak wife Egyptian PresidentHosni Mubarak What are we missing with Bag of Words (BOW) models?  Who is Mubarak? Constraining interaction between concepts  (Mubarak, wife, Hosni Mubarak) Mubarak, the wife of deposed Egyptian President Hosni Mubarak, …

16 Relational Inference for Wikification 16 Our contribution  Identify key textual relations for Wikification  A global inference framework to incorporate relational knowledge Significant improvement over state-of-the-art Wikification systems Mubarak, the wife of deposed Egyptian President Hosni Mubarak, … (Mubarak, wife, Hosni Mubarak)

17 Mention SegmentationCandidate GenerationCandidate RankingNIL Linking 17 Traditional Wikification Pipeline Mention Segmentation Candidate Generation Candidate Ranking Determine NILs

18 Traditional Wikification 1 - Mention Segmentation 18...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Sub noun phrase chunks NER Capitalized phrases

19 Traditional Wikification 1 - Mention Segmentation 19...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Obtains nested mentions

20 Traditional Wikification 2 - Candidate Generation 20...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Approach  Collect known mappings from Wikipedia page titles, hyperlinks…  Limit to top-K candidates based on frequency of links (Ratinov et al. 2011)

21 Traditional Wikification 3 - Candidate Ranking 21...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Local and global statistical features

22 Traditional Wikification 4 – Determine NILs 22...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Is the top candidate really what the text referred to?  Binary classifier This answer is wrong  We did not generate the correct candidate based on top-K prior

23 Talk Outline Introduction  Architecture Entity Linking Approach  Preprocessing  Wikification Formulation Relational Analysis  Cross Document Coreference  Reconciliation Evaluation 23

24 Formulation (0) Intuition  Promote pairs of candidate concepts coherent with textual relations 24 Mubarak, the wife of deposed Egyptian President Hosni Mubarak, … (Mubarak, wife, Hosni Mubarak)

25 Formulate as an Integer Linear Program (ILP): If no relation exists, collapse to the unstructured decision Formulation (1) 25

26 Formulation (2) 26...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… r (1,2) 34  e k i : whether a concept is chosen  s k i : score of a concept  r (k,l) ij : whether a relation is present  w (k,l) ij : score of a relation r (4,3) 34

27 Talk Outline Introduction  Architecture Entity Linking Approach  Preprocessing  Wikification Formulation Relational Analysis  Cross Document Coreference  Reconciliation Evaluation 27

28 Overall Approach 28 Relational Wikification Candidate Generation Candidate Ranking Determine NILs Relation Analysis Relation Identification Relation Retrieval Relational Inference

29 Relation Identification ACE style in-document coreference (Chang et al. ‘13)  Extract named entity-only coreference relations with high precision Syntactico-Semantic relations (Chan & Roth ‘10)  Easy to extract with high precision  Aim for high recall, as false-positives will be filtered  Sparse, but covers ~80% relation instances in ACE2004 29 TypeExample PremodifierIranian Ministry of Defense PossessiveNYC’s stock exchange FormulaicChicago, Illinois PrepositionPresident of the US

30 Relation Identification 30...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Argument 1Relation TypeArgument 2 Yugoslav PresidentappositionSlobodan Milošević coreferenceMilošević possessiveSocialist Party

31 Overall Approach 31 Relational Wikification Candidate Generation Candidate Ranking Determine NILs Relation Analysis Relation Identification Relation Retrieval Relational Inference

32 Relation Retrieval What concepts can “Socialist Party” refer to? More robust candidate generation  Identified relations are verified against a knowledge base (DBPedia) 32...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

33 Query Pruning  Only 2 queries per pair necessary due to strong baseline. Relation Retrieval 33 q 1 =(Socialist Party of France,?, *Milošević*) q 2 =(Slobodan Milošević,?,*Socialist Party*)...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

34 Relation Retrieval 34

35 Relation Retrieval 35...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

36 Overall Approach 36 Relational Wikification Candidate Generation Candidate Ranking Determine NILs Relation Analysis Relation Identification Relation Retrieval Relational Inference

37 Relational Inference - coreference 37...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

38 Determine unknown concepts (NILs) How to capture the fact:  “Dorothy Byrne” does not refer to any concept in Wikipedia Identify coreferent nominal mention relations  Generate better features for NIL classifier 38 Dorothy Byrne, a state coordinator for the Florida Green Party,… nominal mention

39 Determine unknown concepts (NILs) Create NIL candidate for structured inference  e.g. corrects other coreferent “Dorothy” later in the document 39 Dorothy Byrne, a state coordinator for the Florida Green Party,… nominal mention

40 Talk Outline Introduction  Architecture Entity Linking Approach  Preprocessing  Wikification Formulation Relational Analysis  Cross Document Coreference  Reconciliation Evaluation 40

41 Cross Document Coreference NILs can be viewed as KB entries with partial information  A uniform model for entity representation  Shared features with Entity Linking system  Can be supervised using existing EL systems Cross document coreference cluster example: 41 Naomi Campbell to give evidence at Charles Taylor trial: spokeswoman. Supermodel Campbell says 'nothing to gain' from Taylor trial testimony.

42 Cross Document Coreference Approach Run document-level coreference Aggregate all features in a document-level coreferent cluster Use both mention-level features and document-level features  String similarity features (NESim, Do et al. ‘09)  Context TF-IDF similarity features  Document-level cluster features Training: using both TAC data and Wikifier generated data 42

43 43

44 44

45 Talk Outline Introduction  Architecture Entity Linking Approach  Preprocessing  Wikification Formulation Relational Analysis  Cross Document Coreference  Reconciliation Evaluation 45

46 Query mapping reconciliation 46 Seattle (0.7) Seattle Seahawks (0.8) Seattle(0.2) [Seattle] has won… [Seattle] Seahawks ended the game… … cheered for [Seattle]…

47 Talk Outline Introduction  Architecture Entity Linking Approach  Preprocessing  Wikification Formulation Relational Analysis  Cross Document Coreference  Reconciliation Evaluation 47

48 Evaluation – TAC KBP 2011 Entity Linking Run Relational Inference (RI) Wikifier “as-is”:  No retraining using TAC data 48 *Median of top 14 systems

49 Evaluation – TAC 2012 Entity Linking Error Analysis 49

50 Official 2013 Performance 50

51 Official 2013 Performance Break-down: Link Type 51

52 Official 2013 Performance Break-down: Doc domain 52

53 Official 2013 Performance Break-down: NER type 53

54 Conclusion Importance of linguistic and world knowledge Identification of relational information benefits Wikification and Entity Linking Future work  Robust preprocessing on noisy input/adapt to EL task requirement  “Self-supervision” on NIL clustering  Unified NIL and KB entity representation  Joint entity typing, coreference and disambiguation  Incorporate more relations 54 Demo: http://cogcomp.cs.illinois.edu/demo/wikifyhttp://cogcomp.cs.illinois.edu/demo/wikify Download: http://cogcomp.cs.illinois.edu/page/download_view/Wikifierhttp://cogcomp.cs.illinois.edu/page/download_view/Wikifier Thank you!

55 BACK UP SLIDES Back up slides 55

56 Applications Knowledge Acquisition via Grounding Coreference Resolution  Learning-based multi-sieve co-reference resolution with knowledge (Ratinov et al. 2012) Information Extraction  Unsupervised relation discovery with sense disambiguation (Yao et al. 2012)  Automatic Event Extraction with Structured Preference Modeling (Lu and Roth, 2012 ) Text Classification  Gabrilovich and Markovitch, 2007; Chang et al., 2008 56

57 Wikification Performance Result 57


Download ppt "Relational Entity Linking with Cross Document Coreference Xiao Cheng, Bingling Chen, Rajhans Samdani, Kai-Wei Chang, Zhiye Fei and Dan Roth University."

Similar presentations


Ads by Google