Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nam Khanh Tran L3S Research Center, Leibniz Universität Hannover

Similar presentations


Presentation on theme: "Nam Khanh Tran L3S Research Center, Leibniz Universität Hannover"— Presentation transcript:

1 Nam Khanh Tran L3S Research Center, Leibniz Universität Hannover
Contextualization Nam Khanh Tran L3S Research Center, Leibniz Universität Hannover

2 Knowledge base Entity Linking Entity Exploration Contextualization
Related entities Knowledge base

3 Knowledge Bases: a Pragmatic Definition
A knowledge base (KB) is a comprehensive semantically organized machine-readable collection of universally relevant or domain-specific entities, classes, and facts (attributes, relations) plus spatial and temporal dimensions plus commonsense properties and rules plus contexts of entities and facts (textual & visual witnesses, descriptors, statistics) plus …..

4 Entity Linking When Page played Kashmir at Knebworth, his Les Paul was uniquely tuned.

5 possible combinations
Entity Linking +18 +1 When Page played Kashmir at Knebworth, his Les Paul was uniquely tuned. 127080 possible combinations +4 +351

6 Common Features for Disambiguation
India Pakistan Pashmina Prior Context Coherence 91% 0.0 When Page played Kashmir at Knebworth, his Les Paul was uniquely tuned. Led Zeppelin Jimmy Page Knebworth Festival 5% 2.4 How often did “Kashmir” link to this entity in Wikipedia? How good do entity keyphrases and context tokens overlap? Are the disambiguated entities related?

7 Mention-Entity Popularity Weights
Need dictionary with entities‘ names: full names: Arnold Alois Schwarzenegger, Los Angeles, Microsoft Corp. short names: Arnold, Arnie, Mr. Schwarzenegger, New York, Microsoft, … nicknames & aliases: Terminator, City of Angels, Evil Empire, … acronyms: LA, UCLA, MS, MSFT role names: the Austrian action hero, Californian governor, CEO of MS, … Collect hyperlink anchor-text / link-target pairs from Wikipedia redirects Wikipedia links between articles and Interwiki links Web links pointing to Wikipedia articles query-and-click logs Build statistics to estimate P[entity | name]

8 Entity-Entity Coherence
Precompute overlap of incoming links for entities e1 and e2 Alternatively compute overlap of anchor texts for e1 and e2 or overlap of keyphrases, or similarity of bag-of-words, or … Optionally combine with type distance of e1 and e2 (e.g., Jaccard index for type instances) For special types of e1 and e2 (locations, people, etc.) use spatial or temporal distance

9 Entity-linking Online Tools
J. Hoffart et al.: EMNLP 2011, VLDB 2011 P. Ferragina, U. Scaella: CIKM 2010 R. Isele, C. Bizer: VLDB 2012 Reuters Open Calais: Alchemy API: S. Kulkarni, A. Singh, G. Ramakrishnan, S. Chakrabarti: KDD 2009 D. Milne, I. Witten: CIKM 2008 L. Ratinov, D. Roth, D. Downey, M. Anderson: ACL 2011 D. Ceccarelli, C. Lucchese,S. Orlando, R. Perego, S. Trani. CIKM 2013 A. Moro, A. Raganato, R. Navigli. TACL 2014

10 Entity Exploration

11 Entity Exploration [1] Lee, Joonseok and Fuxman, Ariel and Zhao, Bo and Lv, Yuanhua. Leveraging Knowledge Bases for Contextual Entity Exploration. KDD ’15

12 Entity Exploration

13 Context-Selection Betweeness
Captures to what extent a given candidate node serves as a bridge between the user selection node and the context nodes

14 Entity Exploration

15 Personalized Random Walk
The random walk is simulating the behavior of a user reading articles The random walk scores of a node are probability scores and thus sum up to 1 Personalized random walk retrieves semantically relevant pages from the query and context terms by assigning higher probability (score) to closely and densely connected nodes from the user selection and context nodes

16 Score Aggregation

17 Time-aware Contextualization
Prior to 1964, many of the cigarette companies advertised their brand by falsely claiming that their product did not have serious health risks. A couple of examples would be "Play safe with Philip Morris" and "More doctors smoke Camels". Such claims were made both to increase the sales of their product and to combat the increasing public knowledge of smoking's negative health effects. Advertisement poster from the 1950s [2] Tran, Nam Khanh and Ceroni, Andrea and Kanhabua, Nattiya and Niederee, Claudia. Back to the Past: Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. WSDM ‘15

18 Time-aware Contextualization
Time-aware contextualization aims to associate an information item d with time-aware, concise and coherent context information c for easing its understanding Several sub-goals of the information search process have to combined with each other c has to be relevant for d c has to complement the information already available in d c has to consider the time of creation of d the context information should be concise to avoid overloading the user

19 Overview of our approach
Contextualization units Extraction Contextualization units Index Article Hook Identification Query Formulation Context Context Context Ranking Context Retrieval

20 Query formulation The goal is to generate a set of queries for a given document to retrieve candidates as input for the re-ranking step We explore two families of query formulation methods Document-based methods : title, lead, title+lead Hook-based methods: each_hook, all_hooks, and query performance prediction with the following features Linguistics features Document frequency Scope Temporal document frequency Temporal scope Temporal similarity

21 Context ranking Context retrieval Learning to rank context
The ranking algorithm needs to balance two goals, i.e., high topical and temporal relevance as well as complementarity for providing additional information Use supervised machine learning that takes as input a set of labeled examples and various complementarity features Topic diversity Text difference Entity difference Anchor text difference Distributional similarity Cosine distance Relevance Temporal similarity

22 Knowledge base Entity Linking Entity Exploration Contextualization
Related entities Knowledge base

23


Download ppt "Nam Khanh Tran L3S Research Center, Leibniz Universität Hannover"

Similar presentations


Ads by Google