Presentation is loading. Please wait.

Presentation is loading. Please wait.

Relational Inference for Wikification

Similar presentations


Presentation on theme: "Relational Inference for Wikification"— Presentation transcript:

1 Relational Inference for Wikification
Xiao Cheng and Dan Roth University of Illinois at Urbana-Champaign

2 Wikification Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State. Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State. Let’s start with what is wikification Read wiki titles slower

3 Applications Knowledge Acquisition via Grounding
Coreference Resolution Learning-based multi-sieve co-reference resolution with knowledge (Ratinov et al. 2012) Information Extraction Unsupervised relation discovery with sense disambiguation (Yao et al. 2012) Automatic Event Extraction with Structured Preference Modeling (Lu and Roth, 2012 ) Text Classification Gabrilovich and Markovitch, 2007; Chang et al., 2008 Entity Linking Generally allows us to inject knowledge into text models, with access to rich structured concept data Entity Linking is very similar and we will actually evaluate on an EL task, except for the restriction on linking entities only and to cluster NIL entities

4 Challenges Ambiguity Concepts outside of Wikipedia (NIL) Variability
Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State. Ambiguity Concepts outside of Wikipedia (NIL) Blumenthal ? Variability Scale Millions of labels Times The New York Times The Times Connecticut CT The Nutmeg State This is done with mostly with stats before What if there is no correct answer for Blumenthal in Wikipedia? How do we link those mentions to NILs even if we know Blumenthal could refer to something we know

5 Challenges Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State. State-of-the-art systems (Ratinov et al. 2011) can achieve the above with local and global statistical features Reaches bottleneck around 70%~ 85% F1 on non-wiki datasets What is missing? Quite impressive, bottleneck: adding more sophisticated contextual features does not help much We do not want to level off here

6 Motivating Example What are we missing with Bag of Words (BOW) models?
Mubarak, the wife of deposed Egyptian President Hosni Mubarak, … , the of deposed , … Mubarak wife Egyptian President Hosni Mubarak What are we missing with Bag of Words (BOW) models? Who is Mubarak? Constraining interaction between concepts (Mubarak, wife, Hosni Mubarak) Main contribution here We will show that by identifying

7 Relational Inference for Wikification
Mubarak, the wife of deposed Egyptian President Hosni Mubarak, … (Mubarak, wife, Hosni Mubarak) Our contribution Identify key textual relations for Wikification A global inference framework to incorporate relational knowledge Significant improvement over state-of-the-art systems Main contribution here We will show that by identifying

8 Talk Outline Why Wikification? Introduction Approach Evaluation
Motivation Approach Wikification Pipeline Formulation Relational Analysis Evaluation Result Entity Linking We will first go over the motivating example for our work How we approach Wikification by giving its mathematical formulation, as well as how to instantiate our model with relational analysis over the text Finally we will present our result and show its robustness in an Entity Linking task

9 Wikification Mention Segmentation Candidate Generation
Candidate Ranking Determine NILs Mention Segmentation Candidate Generation Candidate Ranking NIL Linking To incorporate the improvements, we first have to take a look at the system layout We will give a quick overview of the terminology of wikification pipeline

10 Wikification Pipeline 1 - Mention Segmentation
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… sub-NP (Noun Phrase) chunks NER Regular expressions Generate robust nested mentions

11 Wikification Pipeline 1 - Mention Segmentation
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

12 Wikification Pipeline 2 - Candidate Generation
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… k ek3 1 Slobodan_Milošević 2 Milošević_(surname) 3 Boki_Milošević 4 Alexander_Milošević k ek4 1 Socialist_Party_(France) 2 Socialist_Party_(Portugal) 3 Socialist_Party_of_America 4 Socialist_Party_(Argentina)

13 Wikification Pipeline 3 - Candidate Ranking
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… k ek3 sk3 1 Slobodan_Milošević 0.7 2 Milošević_(surname) 0.1 3 Boki_Milošević 4 Alexander_Milošević 0.05 k ek4 sk4 1 Socialist_Party_(France) 0.23 2 Socialist_Party_(Portugal) 0.16 3 Socialist_Party_of_America 0.07 4 Socialist_Party_(Argentina) 0.06 Assign score to candidate entities to express the model’s preference based on contextual coherence Local and global statistical features

14 Wikification Pipeline 4 – Determine NILs
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… k ek3 sk3 1 Slobodan_Milošević 0.7 k ek4 sk4 1 Socialist_Party_(France) 0.23 We classify the whether the top candidate belongs to Wikipedia Usually a binary classifier optimizes for evaluation metric Is the top candidate really what the text referred to?

15 Talk Outline Why Wikification? Introduction Approach Evaluation
Motivation Approach Wikification Pipeline Formulation Relational Analysis Evaluation Result Entity Linking Next we will introduce

16 Formulation (0) Intuition
Mubarak, the wife of deposed Egyptian President Hosni Mubarak, … (Mubarak, wife, Hosni Mubarak) Intuition Promote pairs of concepts coherent with textual relations

17 Formulation (1) Formulate as an Integer Linear Program (ILP):
If no relation exists, collapse to the non-structured decision weight to output 𝑒 𝑖 𝑘 Whether to output 𝑘th candidate of the 𝑖th mention weight of a relation 𝑟 𝑖𝑗 (𝑘,𝑙) Whether a relation exists between 𝑒 𝑖 𝑘 and 𝑒 𝑗 𝑙 Formalize intuition, weights induced from data, small constraints Objective function collapse to the original output when there is no relation, so our model is negatives-proof True-negatives and false-negatives will not negatively impact performance of baseline

18 Formulation (2) ...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… k ek3 sk3 1 Slobodan_Milošević 0.7 2 Milošević_(surname) 0.1 3 Boki_Milošević 4 Alexander_Milošević 0.05 k ek4 sk4 1 Socialist_Party_(France) 0.23 2 Socialist_Party_(Portugal) 0.16 3 Socialist_Party_of_America 0.07 4 Socialist_Party_(Argentina) 0.06 r(1,2)34 r(4,3)34 Here we give a concrete example of how the variables in the objective function is instantiated eki: whether a concept is chosen ski : score of a concept r(k,l)ij: whether a relation is present w(k,l)ij : score of a relation

19 Relation Identification
Overall Approach Wikification Candidate Generation Candidate Ranking Determine NILs Relation Analysis Relation Identification Relation Retrieval Relational Inference Next we will describe our approach in detail by walk through an example Our contribution – relation analysis helps all stages of wikification

20 Relation Identification
ACE style in-document coreference Extract named entity-only coreference relations with high precision Syntactico-Semantic relations (Chan & Roth ‘10) Easy to extract with high precision Aim for high recall, as false-positives will be verified and discarded Sparse, but covers ~80% relation instances in ACE2004 Type Example Premodifier Iranian Ministry of Defense Possessive NYC’s stock exchange Formulaic Chicago, Illinois Preposition President of the US Here we identify the textual relations

21 Relation Identification
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Argument 1 Relation Type Argument 2 Yugoslav President apposition Slobodan Milošević coreference Milošević possessive Socialist Party There are quite some relations in this short sentence Overall it is sparse but important in text understanding, as error propagates in structured inference We do not actually care about relation types too much, as the relation will be constrained by our knowledge

22 Relation Identification
Overall Approach Wikification Candidate Generation Candidate Ranking Determine NILs Relation Analysis Relation Identification Relation Retrieval Relational Inference Next we will describe our approach in detail by walk through an example Our contribution – relation analysis helps all stages of wikification

23 Relation Retrieval ...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Current approach Collect known mappings from Wikipedia page titles, hyperlinks… Limit to top-K candidates based on frequency of links (Ratinov et al. 2011) What concepts can “Socialist Party” refer to? Top-k gives a very strong baseline, single most important feature in wikification Include both explicit relations and Wikipedia hyperlinks as weak implicit relations in the form of (concept1,relationType,concept2)

24 Uninformative Mentions
Using all possible candidates will comprise our system’s tractability

25 Relation Retrieval ...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… What concepts can “Socialist Party” refer to? More robust candidate generation Identified relations are verified against a knowledge base (DBPedia) Retrieve relation arguments matching “(Milošević ,?,Socialist Party)” as our new candidates Instead of the mention to concept mapping space, we go to the concept to concept relational space

26 Relation Retrieval ...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… k ek3 sk3 1 Slobodan_Milošević 0.7 k ek4 sk4 1 Socialist_Party_(France) 0.23 Query Pruning Only 2 queries per pair necessary due to strong baseline. Relation space too large, needs pruning Query result noisy Constrain one of the arguments to be known candidates Chances of both top answers are wrong is actually very small q1=(Socialist Party of France,?, *Milošević*) q2=(Slobodan Milošević,?,*Socialist Party*)

27 Relation Retrieval Argument 1 Relation Type Argument 2 Milošević
possessive Socialist Party

28 Relation Retrieval ...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… k ek3 sk3 1 Slobodan_Milošević 0.7 2 Milošević_(surname) 0.1 3 Boki_Milošević 4 Alexander_Milošević 0.05 k ek4 sk4 1 Socialist_Party_(France) 0.23 2 Socialist_Party_(Portugal) 0.16 3 Socialist_Party_of_America 0.07 4 Socialist_Party_(Argentina) 0.06 21 Socialist_Party_of_Serbia 0.0 Top-k gives a very strong baseline, single most important feature in Wikification 𝑟 34 (1,21) =1

29 Relation Identification
Overall Approach Wikification Candidate Generation Candidate Ranking Determine NILs Relation Analysis Relation Identification Relation Retrieval Relational Inference Next we will describe our approach in detail by walk through an example Our contribution – relation analysis helps all stages of wikification

30 𝑤 34 (1,21) =? Relational Inference
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… k ek3 sk3 1 Slobodan_Milošević 0.7 2 Milošević_(surname) 0.1 3 Boki_Milošević 4 Alexander_Milošević 0.05 k ek4 sk4 1 Socialist_Party_(France) 0.23 2 Socialist_Party_(Portugal) 0.16 3 Socialist_Party_of_America 0.07 4 Socialist_Party_(Argentina) 0.06 21 Socialist_Party_of_Serbia 0.0 Top-k gives a very strong baseline, single most important feature in wikification 𝑟 34 (1,21) =1 𝑤 34 (1,21) =?

31 Retrieved relation tuple
Relation scoring Query scoring as a tie-breaker between multiple relations Explicit relations are stronger than a hyperlink relations Normalize score for each pair of mention to [0,1] Relation query Retrieved relation tuple 𝑤= 1 𝑍 𝑓 𝑞,𝜎

32 𝑤 34 (1,21) =1 Relational Inference
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… k ek3 sk3 1 Slobodan_Milošević 0.7 2 Milošević_(surname) 0.1 3 Boki_Milošević 4 Alexander_Milošević 0.05 k ek4 sk4 1 Socialist_Party_(France) 0.23 2 Socialist_Party_(Portugal) 0.16 3 Socialist_Party_of_America 0.07 4 Socialist_Party_(Argentina) 0.06 21 Socialist_Party_of_Serbia 0.0 Top-k gives a very strong baseline, single most important feature in wikification 𝑟 34 (1,21) =1 𝑤 34 (1,21) =1

33 Relational Inference - coreference
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… k ek3 sk3 1 Slobodan_Milošević 0.7 2 Milošević_(surname) 0.1 3 Boki_Milošević 4 Alexander_Milošević 0.05 𝑟 23 (1,1) =1 k ek2 sk2 1 Slobodan_Milošević 1.0 The semantics of coreference relation demands we assign the same output for all the mentions in the coreferent cluster

34 Relation Identification
Overall Approach Wikification Candidate Generation Candidate Ranking Determine NILs Relation Analysis Relation Identification Relation Retrieval Relational Inference Next we will describe how we handle the NIL candidates

35 Determine unknown concepts (NILs)
Dorothy Byrne, a state coordinator for the Florida Green Party,… nominal mention k ek1 sk1 1 Dorothy_Byrne_(British_Journalist) 0.6 2 Dorothy_Byrne_(mezzo-soprano) 0.4 k ek2 sk2 1 Green_Party_of_Florida 1.0 Top-k gives a very strong baseline, single most important feature in wikification NIL entities tends to be introduced with nominal mentions How to capture the fact: “Dorothy Byrne” does not refer to any concept in Wikipedia Identify coreferent nominal mention relations Generate better features for NIL classifier

36 Determine unknown concepts (NILs)
Dorothy Byrne, a state coordinator for the Florida Green Party,… nominal mention k ek1 sk1 NIL 1.0 1 Dorothy_Byrne_(British_Journalist) 0.6 2 Dorothy_Byrne_(mezzo-soprano) 0.4 k ek2 sk2 1 Green_Party_of_Florida 1.0 To incorporate NIL classification into our framework for inference, we make up a NIL candidate Create NIL candidate for propagation

37 Talk Outline Why Wikification? Introduction Approach Evaluation
Motivation Approach Wikification Pipeline Formulation Relational Analysis Evaluation Result Entity Linking Next we will introduce

38 Wikification Performance Result
And we show that by incorporating these relational inference into the disambiguation framework, our system achieves significant improvements over the previous methods

39 Evaluation – TAC KBP Entity Linking
Task Definition Links a named entity in a document to either a TAC Knowledge Base (TAC KB) node or NIL Cluster NIL entities Relevant Tasks Wikification Cross-document coreference Explain NIL clustering

40 Evaluation – TAC KBP Entity Linking
Run Relational Inference (RI) Wikifier “as-is”: No retraining using TAC data Without retraining on TAC data, we improve our baseline system to a degree that is statistically indistinguishable from the top systems *Median of top 14 systems

41 Thank you! Conclusion Importance of linguistic and world knowledge
Identification of relational information benefits Wikification Introduced an inference framework to leverage better language understandings Future work Accumulate what we know about NIL concepts Joint entity typing, coreference and disambiguation Incorporate more relations Show milosevic Thank you! *Demo will be updated in a week at: Download at:

42 Back up slides BACK UP slides

43 Massive Textual Information
How can we know more from large volumes of possibly unfamiliar raw texts?

44 Massive Textual Information
We naturally “look up” concepts and accumulate knowledge.

45 Challenges Tuesday marks the 142nd anniversary of an event that forever altered the course of Chicago's development as a then-young American city ? It’s a version of Chicago – the standard classic Macintosh menu font… By the time the New Orleans Saints kicked off at Soldier Field on Sunday afternoon, their woeful history in the Windy City was fully understood.

46 ? Challenges Ambiguity Variety Concepts outside of Wikipedia (NIL)
Tuesday marks the 142nd anniversary of an event that forever altered the course of Chicago's development as a then-young American city Ambiguity Variety Concepts outside of Wikipedia (NIL) Chicago Chicago font ? Chicago The Windy City It’s a version of Chicago – the standard classic Macintosh menu font… Similar to WSD, but we have to deal with NILs By the time the New Orleans Saints kicked off at Soldier Field on Sunday afternoon, their woeful history in the Windy City was fully understood.

47 Organizing knowledge Wikipedia as a source of “common sense” knowledge
Naturally bridges rich structured knowledge and text data Comprehensive for most purposes ~4.3 million English articles as of today Cross-lingual Estimated size of a printed version of Wikipedia as of August 2010. *Picture courtesy of Wikipedia

48 Motivating Example Relatively sparse, but act as hard constraints
Intuitively, we need to “de-coreference” this pair of mention Opens a new dimension in text understanding helps all stages of Wikification Mubarak Hosni Mubarak Egyptian President wife Mubarak, the wife of deposed Egyptian President Hosni Mubarak, …

49 Wikification Approach
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Mention Segmentation Use Shallow Parsing, NER and regular expression to generate likely mentions of concepts. Match nested mentions using dictionary. Discard unknown mentions. A traditional steps

50 Ranking Features What are we losing?
Local features from Bag of Words (BOW) representation, such as various TFIDF windows. Global features from Bag of Concepts (BOC) representation. What are we losing? Mubarak Hosni Mubarak Egyptian President wife What are we losing?

51 Relation Extraction In document coreference
Extract entity-only coreference relations with high precision Syntactico-Semantic relations (Chan & Roth ‘10) Easy to extract with high precision Deep parsing too slow Premodifier Iranian Ministry of Defense Possessive NYC’s stock exchange Formulaic Chicago, Illinois Preposition President of the US What are the useful relations

52 Relational Inference Coreference ...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party Possessive Calls for the need for looking up knowledge

53 Relational Inference ...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party How to look up knowledge and use these relations?

54 Wikification Approach
Recall concepts Rank concepts Determine unknown concepts (NILs) Move on first

55 Determine NILs Dorothy Bryne, a state coordinator for the Florida Green Party Leverage nominal mention relations to construct limited context

56 Relational Inference Example
...like Chicago O'Hare and [La Guardia] in [New York],... Wikipedia Page Ski LaGuardia_Airport 0.36 La_Guardia 0.22 Fiorello_La_Guardia 0.18 La_Guardia,_Spain 0.11 Wikipedia Page Ski New_York 0.53 New_York_City 0.44 New_York_(magazine) 0.01 New_York_Knicks

57 Relational Inference Scenario 1
...like Chicago O'Hare and [La Guardia] in [New York],... Wikipedia Page Score LaGuardia_Airport 0.36 La_Guardia 0.22 Fiorello_La_Guardia 0.18 La_Guardia,_Spain 0.11 Wikipedia Page Score New_York 0.53 New_York_City 0.44 New_York_(magazine) 0.01 New_York_Knicks Γ = = 0.71

58 Relational Inference Scenario 2
...like Chicago O'Hare and [La Guardia] in [New York],... Wikipedia Page Score LaGuardia_Airport 0.36 La_Guardia 0.22 Fiorello_La_Guardia 0.18 La_Guardia,_Spain 0.11 Wikipedia Page Score New_York 0.53 New_York_City 0.44 New_York_(magazine) 0.01 New_York_Knicks Γ = = 0.8

59 Relational Inference Conclusion
...like Chicago O'Hare and [La Guardia] in [New York],... Pr(Γ= ) > Pr(Γ= )

60 Constraint scaling 𝑞: query 𝜎: a retrieved relation triple Relation scaling factor Query lexical similarity Normalization factor Lexical similarity as a tie-breaker between multiple relations Explicit relations are stronger than a hyperlink relations Normalize score for each pair of mention to [0,1]

61 Wikification Pipeline 1 - Mention Segmentation
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Sub noun phrase chunks and NER segments (Ratinov et al. 2011) Regular expression capturing longer composite mentions President of the United States of America Generate robust nested mentions


Download ppt "Relational Inference for Wikification"

Similar presentations


Ads by Google