Download presentation
Presentation is loading. Please wait.
1
EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar
2
Example
3
Yahoo! Autos
4
Maybe a better retrieval
5
Introduction The article talks about the problem of efficiently computing diverse query results in online shopping applications.
6
The Goal The goal of diverse query answering is to return a representative set of top-k answers from all the tuples that satisfy the user selection condition
7
Users issues query for a product Only most relevant answers are shown. Many Duplications The Problem
8
Existing Solutions Definition of diversity Impossibility results of diversity. Query processing technique. Agenda
9
Existing Solutions Existing solutions are inefficient or do not work in all situations. Example: Obtain all the query results and then pick a diverse subset from these results doesn’t scale for large data sets.
10
Existing Solutions Web search engines: first retrieve c × k and then pick a diverse subset from these. It is more efficient than the previous method. many duplicates product sale. (inefficient and doesn’t guarantee diversity)
11
Existing Solutions issuing multiple queries to obtain diverse results:
12
Pro’s\Con’s The good: Diversity The Bad: Hurts performance Empty results *There are no Honda Accord convertibles
13
Existing Solutions Definition of diversity Impossibility results of diversity. Query processing technique. Agenda
14
A diversity ordering of a relation R with attributes A, denoted by, is a total ordering of the attributes in A. Example: Make ≺ Model ≺ Color ≺ Year ≺ Description ≺ Id Diversity Ordering
15
The DB example DescriptionYearColorModelMakeId Low miles2007GreenCivicHonda1 Low miles2007BlueCivicHonda2 Low miles2007RedCivicHonda3 Low miles2007BlackCivicHonda4 Low miles2006BlackCivicHonda5 Best Price2007BlueAccordHonda6 Good miles2006RedAccordHonda7 Rare2007GreenOdysseyHonda8 Good miles2006GreenOdysseyHonda9 Fun Car2007RedCRVHonda10 Good miles2006OrangeCRVHonda11 Low miles2007TanPriusToyota12 Low miles2007BlackCorollaToyota13 Low miles2007BlueTercelToyota14 Low miles2007BlueCamryToyota15
16
Similarity – SIM(X,Y) Low miles2007GreenCivicHonda1 Low miles2007BlueCivicHonda2 Low miles2007TanPriusToyota12 Low miles2007GreenCivicHonda1 Find a result set that minimizes
17
Example - Similarity DescriptionYearColorModelMakeId Low miles2007GreenCivicHonda1 Best Price2007BlueAccordHonda6 Rare2007GreenOdysseyHonda8 DescriptionYearColorModelMakeId Low miles2007GreenCivicHonda1 Low miles2007BlueCivicHonda2 Low miles2007TanPriusToyota12
18
Prefix DescriptionYearColorModelMakeId Low miles2007GreenCivicHonda1 DescriptionYearColorModelMakeId Low miles2007BlueCivicHonda2 DescriptionYearColorModelMakeId Rare2007GreenOdysseyHonda8 Good miles2006GreenOdysseyHonda9
19
Few more definitions RES(R,Q) of size k Given relation R and query Q, let maxval =
20
Existing Solutions Definition of diversity Impossibility results of diversity. Query processing technique. Agenda
21
Impossibility Results Intuition: IR score of an item depends only on the item and possibly statistics from the entire corpus, but diversity depends on the other items in the query result set.
22
Inverted Lists Honda cars Honda Car d1d4d8d10d17d4d10d11d17d20 Merged Inverted List: d4d10d17
23
Impossibility Results Item in an inverted list has a score, which can either be a global score (e.g., PageRank) or a value/keyword -dependent score (e.g., TF-IDF). The items in each list are usually ordered by their score – so that we could handle top-k queries. If we assume that we have a scoring function f() that is monotonic- which as a normal assumption for traditional IR system, then the article proofs either it’s not diverse or to inefficient\infeasible.
24
Existing Solutions Definition of diversity Impossibility results of diversity. Query processing technique. Agenda
25
The DB example DescriptionYearColorModelMakeId Low miles2007GreenCivicHonda1 Low miles2007BlueCivicHonda2 Low miles2007RedCivicHonda3 Low miles2007BlackCivicHonda4 Low miles2006BlackCivicHonda5 Best Price2007BlueAccordHonda6 Good miles2006RedAccordHonda7 Rare2007GreenOdysseyHonda8 Good miles2006GreenOdysseyHonda9 Fun Car2007RedCRVHonda10 Good miles2006OrangeCRVHonda11 Low miles2007TanPriusToyota12 Low miles2007BlackCorollaToyota13 Low miles2007BlueTercelToyota14 Low miles2007BlueCamryToyota15
26
The car indexing example
27
One-pass Algorithm Lets say Q looks for descriptions with ‘Low’, with k=3 Honda.Civic.Green.2007.’Low miles’ Pick first K Initialization go to next option and check if better, if so – prune While we can improve Diversity
28
One-pass Algorithm We start from two Civics, then we know that we need only one more so we pick the next Civic
29
One-pass Algorithm Then we look for another in next level (Accord)- no such, because it doesn’t have ‘Low’ in it (also no other in that level).
30
One-pass Algorithm Then we look for another in next level (make)- and prune, This is maximum diverse – we stop here.
31
One-pass Algorithm If we had a Ford, we would continue Ford Focus 0 Black 0 07 0 Low miles 0
32
Scored One-pass Algorithm Give each car a score, then the query would take this score as parameter- minScore- smallest score in the result set, Choose next next ID by : The smallest ID such that score(id)>=root.minScore. And the algorithm proceeds as before.
33
Probing Algorithm Main idea: to go over all the cars as they were on an axis K=1 K=2 K=3
34
Advantage of bidirectional exploring “Honda” only has one child, we found it quickly not exploring every option (only civic). Each time we add a node to the diverse solution we do not have to prune it- unlike the OnePass algorithm.
35
WAND algorithm WAND is an efficient method of obtaining top-K lists of scored results, without explicitly merging the full inverted lists. AND(X1,X2,...Xk) ≡ WAND(X1,1,X2,1,...Xk,1,k), OR(X1,X2,...Xk) ≡ WAND(X1,1,X2,1,...Xk,1,1). To obtain k best results the operator uses the upper bounds of maximum contribution, and temp threshold. WAND(X1,UB1,X2,UB2,...,Xk,UBk, θ )
36
Scored Probing Algorithm We use the WAND algorithm- to obtain the top-k list. Next step is marking all possible nodes to add- as MIDDLE. we also maintain a heap – for a node with minimum child. Each step we move nodes from tentative to useful.
37
Experiments MultQ – rewriting the query as multiple queries and merging their results. Naïve – all the results of a query Basic - just first k answers – without diversity. OnePass, Probe – our algorithms U = unscored S = scored
38
Experiments
40
Conclusions Formalized diversity in structured search and proposed inverted-list algorithms. The experiments showed that the algorithms are scalable and efficient. In particular, diversity can be implemented with little additional overhead when compared to traditional approaches
41
Extension of the algorithm Assign higher weights to Hondas and Toyotas when compared to Teslas, so that the diverse results have more Hondas and Toyotas.
42
Questions? Thank You !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.