Download presentation
Published byCrystal Tankard Modified over 9 years ago
1
Best-Effort Top-k Query Processing Under Budgetary Constraints
Michal Shmueli-Scheuer (IBM Haifa Research Lab and UCI) Yosi Mass, Haggai Roitman Chen Li Ralf Schenkel, Gerhard Weikum
2
Motivating Example Mediation Systems Achieve high query throughput.
Top-k Top-k queries results Engine Mobile Applications Highly impatient users, need fast results. Online Analytics (e.g. logs) Achieve high query throughput. Michal Shmueli-Scheuer
3
Traditional top-k query
0.9 b 0.6 c 0.5 … .. d 0.4 R2 d 0.87 a 0.85 f 0.5 … .. c 0.2 Rm c 0.9 b 0.6 g 0.5 … .. a 0.4 Pre-computed lists over multiple attributes. Combine scores by some monotonic aggregation function. Two accesses modes: sorted access (Cs) random access (Cr) Objective: Compute k objects with highest scores. sorted n m Michal Shmueli-Scheuer
4
NRA algorithm (Fagin et al.)
0.9 b 0.6 c 0.5 … .. d 0.4 R2 d 0.87 a 0.85 f 0.5 …. .. c 0.2 Top-2 Best score Worst score highi a [0.9,1.77] d [0.87,1.77] f = SUM mink candidates Add summation mink > best-score of candidates Michal Shmueli-Scheuer
5
NRA algorithm (Fagin et al.)
0.9 b 0.6 c 0.5 … .. d 0.4 R2 d 0.87 a 0.85 f 0.25 …. .. c 0.2 Top-2 Best score Worst score a [1.75,1.75] d [0.87,1.47] highi mink candidates b [0.6,1.45] mink > best-score of candidates Michal Shmueli-Scheuer
6
NRA algorithm (Fagin et al.)
0.9 b 0.6 c 0.5 … .. d 0.4 R2 d 0.87 a 0.85 f 0.25 …. .. c 0.2 Top-2 Best score Worst score a [1.75,1.75] d [0.87,1.37] highi mink candidates b [0.6,0.85] c [0.5,0.75] f [0.25,0.75] mink > best-score of candidates Michal Shmueli-Scheuer
7
Top-k with Budget Constraints
Access Costs Sorted access cost- Cs Random access cost- Cr R1 s 0.95 u 0.93 t 0.92 d 0.9 x 0.5 y 0.4 z 0.2 … R2 a 1.0 b 0.9 c 0.85 d 0.8 e 0.7 t 0.6 f 0.4 .. d 1.7 t 1.52 NRA: 12Cs = 12 precision =0.5 Given budget B, maximize result quality Cs=1, Cr =3 f = SUM TA: 7Cs +7Cr = 28 precision =0 -change green - First NRA (then TA) Budget =10 ? Michal Shmueli-Scheuer
8
Contributions Sorted Accesses Sorted and Random Accesses Experiments
Efficient Plan Solution with Adaptive a Sorted and Random Accesses Experiments -title” out contributions Michal Shmueli-Scheuer
9
Results Under Limited Budget
Results for limited budget K results for unlimited budget =remove lemma Michal Shmueli-Scheuer
10
Efficient Plan- Sorted Accesses
Assume that we know the k results for unlimited budget (REXACT). L1 L2 o1, SL1 o1, SL2 o5, SL1 o2, SL2 o5, SL2 o4, SL2 o8, SL1 o6, SL1 o3, SL2 Plan – {L1,4} {L2,2} o5 o1 Top-2 P1 P2 Q1 Q2 Interesting positions- where the k objects appear in the lists. Sorted accesRemove offline - plan instead of trace P and Q - add animation what is a plan (allocation of resource) Michal Shmueli-Scheuer
11
Efficient Plan- Sorted Accesses
Goal: find plan t, such that : Plans for B=5 P1 P2 Q1 Q2 L1 L2 o1, SL1 o1, SL2 o5, SL1 o2, SL2 o5, SL2 o4, SL2 o8, SL1 o6, SL1 o3, SL2 =remove lemma Plan: {L1,2} {L2,3} Denoted as ROPT Michal Shmueli-Scheuer
12
Sorted Accesses Observations: Prefer high scores L1 L2 L3 O1, SL1
- Remove the sentences add another object Prefer high scores Michal Shmueli-Scheuer
13
Prefer large score reductions
Observations – contd. title=“war” description=“weapon” observation Prefer large score reductions Michal Shmueli-Scheuer
14
Score Utilities Score gain: Score reduction: o2, 1 o4, 0.9 y =3
Remove formula -split it into 2 slides Michal Shmueli-Scheuer
15
Optimization Problem Bi-objective optimization problem:
util(Li,x) = a* gain +(1-a)* reduction Different color Remove icde add name Put num of slides out of Remove formula -split it into 2 slides Heuristics: Fair Heuristic Rank Heuristic Where m is the number of lists Michal Shmueli-Scheuer
16
Adaptive gain reduction )) (1-( time Michal Shmueli-Scheuer
17
Adaptive d(o4) = 0.8-0.6=0.2 top-k o1 [ws,bs] L1 L2 L3 O1, SL1
o3 [0.8,bs] d(o4) = =0.2 candidates hight1 o4 [0.6,bs] hight2 o6 [ws,bs] Theobald et al. VLDB04 Michal Shmueli-Scheuer
18
Adaptive TREC query, k=100 Michal Shmueli-Scheuer
19
Efficient Plan- Random Accesses
Observations: random accesses occur always after sorted accesses have been finished. schedule 1: {SA……RA……SA….} schedule 2: {SA……SA……RA….} Add access precision(schedule1) = precision(schedule2) Michal Shmueli-Scheuer
20
Observations- contd. Random accesses are only useful to objects in REXACT. top-k L2 o1 [ws,bs] o2 [ws,bs] o3 [ws,bs] o1 [ws,bs] o2, SL2 Precision reduced o5 [ws,bs] o5, Not in REXACT o2 [ws,bs] o5, SL2 candidates o4 [ws,bs] Precision remains the same o5 [ws,bs] o1, SL2 Michal Shmueli-Scheuer
21
Random Accesses When to switch from SA to RA? Gathering with Sorted
Probing with Random )( Not enough good candidates, RA is wasted Stress that RA is much more expensive then SA. Why we do last (1-( Not enough RAs to prune the candidates time Michal Shmueli-Scheuer
22
Random Accesses Switch from Sorted to Random: R= (1- )*S
S – total cost of sorted accesses. R – total cost for random accesses. S+R > B Which items to access ? Do one 1 RA on each candidate. maximize expected score. Michal Shmueli-Scheuer
23
Experimental Data Zipf, #lists =[2,6], #objects =[10000,1000000]
TREC Terabyte 25M webpages 50 queries with average length of 3 words. IMDB 375,000 movies 20 queries , each with 4 attributes: {Title, Genre, Actors, Description} Synthetic data Zipf, #lists =[2,6], #objects =[10000, ] Aggregate Function : Sum Aggregate function: Sum Michal Shmueli-Scheuer
24
Evaluation Methods percentage of optimal precision SME Ropt Rexact
Ralg Ropt SME Michal Shmueli-Scheuer
25
Results- Sorted Accesses
TREC, k=100 Less budget, more improvement Michal Shmueli-Scheuer
26
Varied k IMDB, B=400 Lower K, more improvement. Michal Shmueli-Scheuer
27
Number of Lists More lists, more improvement. Zipf, K=100, B=4000
Michal Shmueli-Scheuer
28
Results- Random Accesses
TREC, k=100,Cr=10 TREC, K=100, Cr=100
29
Related Works Minimize budget for optimal results: the algorithm computes the exact results with minimum cost. (Bast et al. VLDB06, Bruno et al. ICDE02, Chang et al. SIGMOD02) Dual problem. Anytime top-k : The algorithm collects statistics during processing, which can be used to provide probabilistic guarantees at any time during processing. (Aray et al. VLDB07) Do not do any optimizations. Approximate top-k: approximate results with probabilistic guarantees. (Theobald et al. VLDB04, Fagin et al. 2001) -move it to later Michal Shmueli-Scheuer
30
Conclusions First attempt to deal with budget constraints.
For SA only, average precision around 70%. Tradeoff between RAs and SAs, for relatively low cost of RA, RA schedules are improved. Michal Shmueli-Scheuer
31
Thank You !
33
Top-k query Given a set of n objects and m scoring lists sorted in decreasing order, find the top-k objects according to a scoring function f top-k: a set T of k objects such that f(rj1,…,rjm) ≤ f(ri1,…,rim) for every object Xi in T and every object Xj not in T Assumption: The scoring function f is monotone f(r1,…,rm) ≤ f(r1’,…,rm’) if ri ≤ ri’ for all I Two accesses modes: sorted access – Cs random access - Cr Objective: Compute top-k with the minimum cost
34
Sorted Accesses Observations:
object with high scores has higher potential to be part of the top-k. object with “mediocre” scores does not help. L1 L2 L3 O1, SL1 O1, SL2 O1, SL3 - Remove the sentences add another object Prefer high scores
35
Example Wireless zone Q useless
36
Applications Mobile Applications Mediation Systems
Highly impatient users, need fast results. Mediation Systems Achieve high query throughput. Online analytics (e.g. logs) Michal Shmueli-Scheuer
37
Motivating Example Query throughput Given #queries per time unit
Mediator Servers User query Engine Query throughput Allocate time for each query Given #queries per time unit
38
Terminology Sorted Access Random Access highi Top-k queue
Candidates queue mink worstScore(d) bestScore(d)
39
Efficient Offline Solution- Sorted
Goal: find trace t, such that : L1 L2 P1 P2 L1 L2 o1, SL1 o1, SL2 o5, SL1 o2, SL2 o5, SL2 o4, SL2 o8, SL1 o6, SL1 o3, SL2 B=5 t1 5 t2 1 4 t3 2 3 t4 t5 t6 =remove lemma Denoted as ROPT
40
Efficient Offline Solution- Sorted
Goal: find trace t, such that : B =5 L1 L2 P1 P2 L1 L2 o1, SL1 o1, SL2 o5, SL1 o2, SL2 o5, SL2 o4, SL2 o8, SL1 o6, SL1 o3, SL2 t1 5 t2 1 4 t3 2 3 t4 t5 t6 Feasible for K up to 100, and m up to 10.
41
Efficient Offline Solution- Sorted
Proof: (in negation) Assume that t does not exists, and chose trace s that within the budget and has optimal precision. Assume s` with traces s`i that are largest position of Pi less or equal to si. By construction the score of any object in S is the same to S`
42
Fair Heuristic Assume budget =b Runs in batches
Explain the “absolute value”. Explain here the batches
43
Efficient Offline Solution- Random
Budget for RAs =(B-|t|*Cs) Top-k d Rexact o9, S o5, S o7, S o8, S …. best(o)-mink (best(o) = wosrt(o)+RA) o1, S o2, S o3, S o4, S o10, S o14, S ….
44
Motivation Many applications work in budgeted constraint environments. Still, they wish to perform top-k queries. Servers Budget-aware Query processing Mediator Engine User query
45
Future work Different access costs for different lists
Time-aware top-k Top-k with budget constraints for P2P
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.