Best-Effort Top-k Query Processing Under Budgetary Constraints

Slides:

Advertisements

Similar presentations

February 20, Spatio-Temporal Bandwidth Reuse: A Centralized Scheduling Mechanism for Wireless Mesh Networks Mahbub Alam Prof. Choong Seon Hong.

Advertisements

QoS-based Management of Multiple Shared Resources in Dynamic Real-Time Systems Klaus Ecker, Frank Drews School of EECS, Ohio University, Athens, OH {ecker,

Group Recommendation: Semantics and Efficiency

Web Information Retrieval

Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.

 Introduction  Views  Related Work  Preliminaries  Problems Discussed  Algorithm LPTA  View Selection Problem  Experimental Results.

Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.

Fast Algorithms For Hierarchical Range Histogram Constructions

TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.

Online (Budgeted) Social Choice Joel Oren, University of Toronto Joint work with Brendan Lucier, Microsoft Research.

Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Top-k Query Evaluation with Probabilistic Guarantees By Martin Theobald, Gerald Weikum, Ralf Schenkel.

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.

Max-Planck-Institut University of Patras NetCInS Lab Informatik KLEE: A Framework for Distributed Top-k Query Algorithms KLEE: A Framework for Distributed.

Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.

Near-optimal Nonmyopic Value of Information in Graphical Models Andreas Krause, Carlos Guestrin Computer Science Department Carnegie Mellon University.

Rank Aggregation. Rank Aggregation: Settings Multiple items – Web-pages, cars, apartments,…. Multiple scores for each item – By different reviewers, users,

1 Query Optimization In Compressed Database Systems Zhiyuan Chen and Johannes Gehrke Cornell University Flip Korn AT&T Labs.

Aggregation Algorithms and Instance Optimality

1 Scheduling on Heterogeneous Machines: Minimize Total Energy + Flowtime Ravishankar Krishnaswamy Carnegie Mellon University Joint work with Anupam Gupta.

Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.

Evaluating Top-k Queries over Web-Accessible Databases Nicolas Bruno Luis Gravano Amélie Marian Columbia University.

Top- K Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel Presenter: Avinandan Sengupta.

Inefficiency of equilibria, and potential games Computational game theory Spring 2008 Michal Feldman.

COVERTNESS CENTRALITY IN NETWORKS Michael Ovelgönne UMIACS University of Maryland 1 Chanhyun Kang, Anshul Sawant Computer Science Dept.

Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.

Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

VLDB ´04 Top-k Query Evaluation with Probabilistic Guarantees Martin Theobald Gerhard Weikum Ralf Schenkel Max-Planck Institute for Computer Science SaarbrückenGermany.

MPI Informatik 1/17 Oberseminar AG5 Result merging in a Peer-to-Peer Web Search Engine Supervisors: Speaker : Sergey Chernov Prof. Gerhard Weikum Christian.

Network Aware Resource Allocation in Distributed Clouds.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.

Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.

Scheduling policies for real- time embedded systems.

EE 685 presentation Utility-Optimal Random-Access Control By Jang-Won Lee, Mung Chiang and A. Robert Calderbank.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

Efficient Processing of Top-k Spatial Preference Queries

Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.

All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)

IO-Top-k: Index-access Optimized Top-k Query Processing Debapriyo Majumdar Max-Planck-Institut für Informatik Saarbrücken, Germany Joint work with Holger.

Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.

NRA Top k query processing using Non Random Access Only sequential access Only sequential accessAlgorithm 1) 1) scan index lists in parallel; 2) 2) consider.

Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.

Frequency Capping in Online Advertising Moran Feldman Technion Joint work with: Niv Buchbinder,The Open University of Israel Arpita Ghosh,Yahoo! Research.

Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.

Introduction to Information Retrieval Introduction to Information Retrieval CS276 Information Retrieval and Web Search Chris Manning and Pandu Nayak Efficient.

Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.

03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.

Zeta: Scheduling Interactive Services with Partial Execution Yuxiong He, Sameh Elnikety, James Larus, Chenyu Yan Microsoft Research and Microsoft Bing.

Efficient and Self-tuning Incremental Query Expansions for Top-k Query Processing Martin Theobald Ralf Schenkel Gerhard Weikum Max-Planck Institute for.

TEMPLE UNIVERSITY Deadline-Sensitive Mobile Data Offloading via Opportunistic Communications Guoju Gaoa, Mingjun Xiao∗a, Jie Wub, Kai Hana, Liusheng Huanga.

Trading Timeliness and Accuracy in Geo-Distributed Streaming Analytics

Max-Planck Institute for Informatics

Seung-won Hwang, Kevin Chen-Chuan Chang

Server Allocation for Multiplayer Cloud Gaming

Preference Query Evaluation Over Expensive Attributes

Spatio-temporal Pattern Queries

StreamApprox Approximate Stream Analytics in Apache Spark

StreamApprox Approximate Computing for Stream Analytics

Rank Aggregation.

Laks V.S. Lakshmanan Depf. of CS UBC

Multi-hop Coflow Routing and Scheduling in Data Centers

Pramod Bhatotia, Ruichuan Chen, Myungjin Lee

8. Efficient Scoring Most slides were adapted from Stanford CS 276 course and University of Munich IR course.

Minimizing the Aggregate Movements for Interval Coverage

Relaxing Join and Selection Queries

Efficient Processing of Top-k Spatial Preference Queries

Presentation transcript:

Best-Effort Top-k Query Processing Under Budgetary Constraints Michal Shmueli-Scheuer (IBM Haifa Research Lab and UCI) Yosi Mass, Haggai Roitman Chen Li Ralf Schenkel, Gerhard Weikum

Motivating Example Mediation Systems Achieve high query throughput. Top-k Top-k queries results Engine Mobile Applications Highly impatient users, need fast results. Online Analytics (e.g. logs) Achieve high query throughput. Michal Shmueli-Scheuer

Traditional top-k query 0.9 b 0.6 c 0.5 … .. d 0.4 R2 d 0.87 a 0.85 f 0.5 … .. c 0.2 Rm c 0.9 b 0.6 g 0.5 … .. a 0.4 Pre-computed lists over multiple attributes. Combine scores by some monotonic aggregation function. Two accesses modes: sorted access (Cs) random access (Cr) Objective: Compute k objects with highest scores. sorted n m Michal Shmueli-Scheuer

NRA algorithm (Fagin et al.) 0.9 b 0.6 c 0.5 … .. d 0.4 R2 d 0.87 a 0.85 f 0.5 …. .. c 0.2 Top-2 Best score Worst score highi a [0.9,1.77] d [0.87,1.77] f = SUM mink candidates Add summation mink > best-score of candidates Michal Shmueli-Scheuer

NRA algorithm (Fagin et al.) 0.9 b 0.6 c 0.5 … .. d 0.4 R2 d 0.87 a 0.85 f 0.25 …. .. c 0.2 Top-2 Best score Worst score a [1.75,1.75] d [0.87,1.47] highi mink candidates b [0.6,1.45] mink > best-score of candidates Michal Shmueli-Scheuer

NRA algorithm (Fagin et al.) 0.9 b 0.6 c 0.5 … .. d 0.4 R2 d 0.87 a 0.85 f 0.25 …. .. c 0.2 Top-2 Best score Worst score a [1.75,1.75] d [0.87,1.37] highi mink candidates b [0.6,0.85] c [0.5,0.75] f [0.25,0.75] mink > best-score of candidates Michal Shmueli-Scheuer

Top-k with Budget Constraints Access Costs Sorted access cost- Cs Random access cost- Cr R1 s 0.95 u 0.93 t 0.92 d 0.9 x 0.5 y 0.4 z 0.2 … R2 a 1.0 b 0.9 c 0.85 d 0.8 e 0.7 t 0.6 f 0.4 .. d 1.7 t 1.52 NRA: 12Cs = 12 precision =0.5 Given budget B, maximize result quality Cs=1, Cr =3 f = SUM TA: 7Cs +7Cr = 28 precision =0 -change green - First NRA (then TA) Budget =10 ? Michal Shmueli-Scheuer

Contributions Sorted Accesses Sorted and Random Accesses Experiments Efficient Plan Solution with Adaptive a Sorted and Random Accesses Experiments -title” out contributions Michal Shmueli-Scheuer

Results Under Limited Budget Results for limited budget K results for unlimited budget =remove lemma Michal Shmueli-Scheuer

Efficient Plan- Sorted Accesses Assume that we know the k results for unlimited budget (REXACT). L1 L2 o1, SL1 o1, SL2 o5, SL1 o2, SL2 o5, SL2 o4, SL2 o8, SL1 o6, SL1 o3, SL2 Plan – {L1,4} {L2,2} o5 o1 Top-2 P1 P2 Q1 Q2 Interesting positions- where the k objects appear in the lists. Sorted accesRemove offline - plan instead of trace P and Q - add animation what is a plan (allocation of resource) Michal Shmueli-Scheuer

Efficient Plan- Sorted Accesses Goal: find plan t, such that : Plans for B=5 P1 P2 Q1 Q2 L1 L2 o1, SL1 o1, SL2 o5, SL1 o2, SL2 o5, SL2 o4, SL2 o8, SL1 o6, SL1 o3, SL2 =remove lemma Plan: {L1,2} {L2,3} Denoted as ROPT Michal Shmueli-Scheuer

Sorted Accesses Observations: Prefer high scores L1 L2 L3 O1, SL1 - Remove the sentences add another object Prefer high scores Michal Shmueli-Scheuer

Prefer large score reductions Observations – contd. title=“war” description=“weapon” observation Prefer large score reductions Michal Shmueli-Scheuer

Score Utilities Score gain: Score reduction: o2, 1 o4, 0.9 y =3 Remove formula -split it into 2 slides Michal Shmueli-Scheuer

Optimization Problem Bi-objective optimization problem: util(Li,x) = a* gain +(1-a)* reduction Different color Remove icde add name Put num of slides out of Remove formula -split it into 2 slides Heuristics: Fair Heuristic Rank Heuristic Where m is the number of lists Michal Shmueli-Scheuer

Adaptive  gain reduction )) (1-( time Michal Shmueli-Scheuer

Adaptive  d(o4) = 0.8-0.6=0.2 top-k o1 [ws,bs] L1 L2 L3 O1, SL1 o3 [0.8,bs] d(o4) = 0.8-0.6=0.2 candidates hight1 o4 [0.6,bs] hight2 o6 [ws,bs] Theobald et al. VLDB04 Michal Shmueli-Scheuer

Adaptive  TREC query, k=100 Michal Shmueli-Scheuer

Efficient Plan- Random Accesses Observations: random accesses occur always after sorted accesses have been finished. schedule 1: {SA……RA……SA….} schedule 2: {SA……SA……RA….} Add access precision(schedule1) = precision(schedule2) Michal Shmueli-Scheuer

Observations- contd. Random accesses are only useful to objects in REXACT. top-k L2 o1 [ws,bs] o2 [ws,bs] o3 [ws,bs] o1 [ws,bs] o2, SL2 Precision reduced o5 [ws,bs] o5, Not in REXACT o2 [ws,bs] o5, SL2 candidates o4 [ws,bs] Precision remains the same o5 [ws,bs] o1, SL2 Michal Shmueli-Scheuer

Random Accesses When to switch from SA to RA? Gathering with Sorted Probing with Random )( Not enough good candidates, RA is wasted Stress that RA is much more expensive then SA. Why we do last (1-( Not enough RAs to prune the candidates time Michal Shmueli-Scheuer

Random Accesses Switch from Sorted to Random: R= (1- )*S S – total cost of sorted accesses. R – total cost for random accesses. S+R > B Which items to access ? Do one 1 RA on each candidate. maximize expected score. Michal Shmueli-Scheuer

Experimental Data Zipf, #lists =[2,6], #objects =[10000,1000000] TREC Terabyte 25M webpages 50 queries with average length of 3 words. IMDB 375,000 movies 20 queries , each with 4 attributes: {Title, Genre, Actors, Description} Synthetic data Zipf, #lists =[2,6], #objects =[10000,1000000] Aggregate Function : Sum Aggregate function: Sum Michal Shmueli-Scheuer

Evaluation Methods percentage of optimal precision SME Ropt Rexact Ralg Ropt SME Michal Shmueli-Scheuer

Results- Sorted Accesses TREC, k=100 Less budget, more improvement Michal Shmueli-Scheuer

Varied k IMDB, B=400 Lower K, more improvement. Michal Shmueli-Scheuer

Number of Lists More lists, more improvement. Zipf, K=100, B=4000 Michal Shmueli-Scheuer

Results- Random Accesses TREC, k=100,Cr=10 TREC, K=100, Cr=100

Related Works Minimize budget for optimal results: the algorithm computes the exact results with minimum cost. (Bast et al. VLDB06, Bruno et al. ICDE02, Chang et al. SIGMOD02) Dual problem. Anytime top-k : The algorithm collects statistics during processing, which can be used to provide probabilistic guarantees at any time during processing. (Aray et al. VLDB07) Do not do any optimizations. Approximate top-k: approximate results with probabilistic guarantees. (Theobald et al. VLDB04, Fagin et al. 2001) -move it to later Michal Shmueli-Scheuer

Conclusions First attempt to deal with budget constraints. For SA only, average precision around 70%. Tradeoff between RAs and SAs, for relatively low cost of RA, RA schedules are improved. Michal Shmueli-Scheuer

Thank You !

Top-k query Given a set of n objects and m scoring lists sorted in decreasing order, find the top-k objects according to a scoring function f top-k: a set T of k objects such that f(rj1,…,rjm) ≤ f(ri1,…,rim) for every object Xi in T and every object Xj not in T Assumption: The scoring function f is monotone f(r1,…,rm) ≤ f(r1’,…,rm’) if ri ≤ ri’ for all I Two accesses modes: sorted access – Cs random access - Cr Objective: Compute top-k with the minimum cost

Sorted Accesses Observations: object with high scores has higher potential to be part of the top-k. object with “mediocre” scores does not help. L1 L2 L3 O1, SL1 O1, SL2 O1, SL3 - Remove the sentences add another object Prefer high scores

Example Wireless zone Q useless

Applications Mobile Applications Mediation Systems Highly impatient users, need fast results. Mediation Systems Achieve high query throughput. Online analytics (e.g. logs) Michal Shmueli-Scheuer

Motivating Example Query throughput Given #queries per time unit Mediator Servers User query Engine Query throughput Allocate time for each query Given #queries per time unit

Terminology Sorted Access Random Access highi Top-k queue Candidates queue mink worstScore(d) bestScore(d)

Efficient Offline Solution- Sorted Goal: find trace t, such that : L1 L2 P1 P2 L1 L2 o1, SL1 o1, SL2 o5, SL1 o2, SL2 o5, SL2 o4, SL2 o8, SL1 o6, SL1 o3, SL2 B=5 t1 5 t2 1 4 t3 2 3 t4 t5 t6 =remove lemma Denoted as ROPT

Efficient Offline Solution- Sorted Goal: find trace t, such that : B =5 L1 L2 P1 P2 L1 L2 o1, SL1 o1, SL2 o5, SL1 o2, SL2 o5, SL2 o4, SL2 o8, SL1 o6, SL1 o3, SL2 t1 5 t2 1 4 t3 2 3 t4 t5 t6 Feasible for K up to 100, and m up to 10.

Efficient Offline Solution- Sorted Proof: (in negation) Assume that t does not exists, and chose trace s that within the budget and has optimal precision. Assume s` with traces s`i that are largest position of Pi less or equal to si. By construction the score of any object in S is the same to S`

Fair Heuristic Assume budget =b Runs in batches Explain the “absolute value”. Explain here the batches

Efficient Offline Solution- Random Budget for RAs =(B-|t|*Cs) Top-k d Rexact o9, S o5, S o7, S o8, S …. best(o)-mink (best(o) = wosrt(o)+RA) o1, S o2, S o3, S o4, S o10, S o14, S ….

Motivation Many applications work in budgeted constraint environments. Still, they wish to perform top-k queries. Servers Budget-aware Query processing Mediator Engine User query

Future work Different access costs for different lists Time-aware top-k Top-k with budget constraints for P2P