Download presentation
Presentation is loading. Please wait.
Published byReynold Snow Modified over 8 years ago
1
DivQ: Diversification for Keyword Search over Structured Databases Elena Demidova, Peter Fankhauser, Xuan Zhou and Wolfgang Nejfl L3S Research Center, Hannover, Germany Fraunhofer IPSE, Darmstadt Germany CSIRO ICT Centre, Australia SIGIR 2010 2010. 12. 17. Jaehui Park
2
Copyright 2010 by CEBT INTRODUCTION Keyword search over structured data No single interpretation of a keyword query can satisfy all users Multiple interpretation may yield overlapping results. Diversification Minimizing the risk of user's dissatisfaction by balancing relevance and novelty of search results An example Query: "London" – location: the capital of UK – name: a book written by Jack London The occurrences can be viewed as a keyword interpretation with different semantics offering complementary results. 2
3
Copyright 2010 by CEBT INTRODUCTION Motivation Taking advantage of the structure of the databases – Query interpretation in terms of the underlying database – To deliver more diverse and orthogonal representations of query results ex) attribute Contributions DivQ – A probabilistic query disambiguation model – A diversification scheme for generating top-k query interpretations Evaluation metrics for structured data – α-nDCG-W – WS-recall 3
4
Copyright 2010 by CEBT The Diversification Scheme Query interpretations a keyword query -> a set of structured queries Ranking the query interpretations Providing a quick overview over the available classes of results Faceted search: navigate and choose 4 Q: CONSIDERATION CHRISTOPHER GUEST RelevanceTop-3 interpretations rankingRelevanceTop-3 interpretations diversification 0.9A director CHRISTOPHER GUEST of a movie CONSIDERATION 0.9A director CHRISTOPHER GUEST of a movie CONSIDERATION 0.5A director CHRISTOPHER GUEST 0.4An actor CHRISTOPHER GUEST 0.8An actor CHRISTOPHER GUEST in a movie CONSIDERATION 0.2A plot containing CHRISTOPHER GUEST of a movie increasing novelty
5
Copyright 2010 by CEBT The Diversification Scheme Bringing Keywords into Structure Keyword Interpretations A i :k i – Mapping each keyword k i to an element A i of an algebraic expression – (Predefined) query template T joining the keyword interpretations a structural patterns that is frequently used to query the databases – An example Keyword query (K): CONSIDERATION CHRISTOPHER GUEST director:CHRISTOPHER director:GUEST movie:CONSIDERATION T: A director X of a movie Y 5
6
Copyright 2010 by CEBT The Diversification Scheme Estimating Query Relevance Relevance of a query interpretation Q to informational needs K – P(Q|K) = P(I,T|K) T: query template, I: a set of keyword interpretations – Assumptions Each keyword has one particular interpretation. The probability of a keyword interpretation is independent from the part of the query interpretation the keyword is not interpreted to. – Attribute specific term frequency (ex. the avg number of co-occurrences) ex) rank higher: a first name and a last name of a person to attribute "name" 6 the probability that, given that A j is a part of a query interpretation, keyword interpretation A j are also a part of the query interpretation. smoothing factor
7
Copyright 2010 by CEBT The Diversification Scheme Estimating Query Similarity The Jaccard coefficient between the sets of keyword interpretations I contained by Q 1 and Q 2 Combining Relevance and Similarity 1. Select the most relevance interpretation as the first interpretation presented to the user 2. Each of the following interpretations is selected based on both its relevance and novelty 7 selected query interpretation set
8
Copyright 2010 by CEBT The Diversification Scheme The Diversification algorithm materializing top-k relevance query interpretations the worst case O(l*r) – l: the number of query interpretations in L – r: the number of query interpretations in the result list R 8
9
Copyright 2010 by CEBT EVALUATION METRICS α-nDCG-W CG n (Cumulative Gain) – ex) 3+2+3+0+1+2 = 11 DCG i (Discounted Cumulative Gain) – ex) DCG 1 = 3, DCG 2 = 3 + 2/log 2 2 = 5, DCG 3 = 3 + (2/log 2 2 + 3/log 2 3) = 6.887 nDCG i = DCG i / ideal DCG i α-nDCG – Views a document as the set of information nuggets n Counting how many documents containing n were seen before and discount the gain of this document accordingly – if α = 0, it is a standard nDCG – with increasing α, novelty is rewarded with more credit 9 D1D2D3D4D5D6 323012
10
Copyright 2010 by CEBT EVALUATION METRICS α-nDCG-W In databases – an information nugget n corresponds to a primary key pk i The gain The overlap – For each primary key pk i in the result of Q k Count how many query interpretations with pk i were seen before, and aggregate the counts 10 overlap factor
11
Copyright 2010 by CEBT EVALUATION METRICS Weighted S-Recall S-recall – Instance recall at rank k when search results are related to several subtopics The number of unique subtopics covered by the first k results, divided by the total number of subtopics – a primary key corresponds to a subtopic in S-recall 11
12
Copyright 2010 by CEBT EXPERIMENTS IMDB 10,000,000 records Lyrics 400,000 records Query logs MSN, AOL 200 most frequent queries (single query) 100 queries (complex queries) 12
13
Copyright 2010 by CEBT EXPERIMENTS User Study 16 participants were asked to indicate on a two-point Likert scale to assess the relevance – top-25 interpretations 13
14
Copyright 2010 by CEBT EXPERIMENTS α-nDCG-W α = 0, 0.5, and 0.99 14
15
Copyright 2010 by CEBT EXPERIMENTS WS-recall Balancing Relevance and Novelty 15
16
Copyright 2010 by CEBT CONCLUSION We present an approach to search results diversification over structured data. a probabilistic query disambiguation model query similarity measure a greedy algorithm An adaptation of the established evaluation metrics are proposed. – α-nDCG-W and WS-recall Evaluation results demonstrate the quality of the proposed model and show that using our algorithms the novelty of keyword search results over structured data can be substantially improved. 16
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.