Preference Query Evaluation Over Expensive Attributes

Slides:



Advertisements
Similar presentations
Finding Skyline Nodes in Large Networks. Evaluation Metrics:  Distance from the query node. (John)  Coverage of the Query Topics. (Big Data, Cloud Computing,
Advertisements

Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.
Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.
Best-Effort Top-k Query Processing Under Budgetary Constraints
Mendel Rosenblum and John K. Ousterhout Presented by Travis Bale 1.
PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries Vagelis Hristidis University of California, San Diego Nick Koudas AT&T.
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
SIGMOD 2006University of Alberta1 Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Presented by Fan Deng Joint work with.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
Using Trees to Depict a Forest Bin Liu, H. V. Jagadish EECS, University of Michigan, Ann Arbor Presented by Sergey Shepshelvich 1.
Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.
Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
FlexPref: A Framework for Extensible Preference Evaluation in Database Systems Justin J. Levandoski Mohamed F. Mokbel Mohamed E. Khalefa.
Evaluating Top-k Queries over Web-Accessible Databases Nicolas Bruno Luis Gravano Amélie Marian Columbia University.
Locking Key Ranges with Unbundled Transaction Services 1 David Lomet Microsoft Research Mohamed Mokbel University of Minnesota.
Da Yan and Wilfred Ng The Hong Kong University of Science and Technology.
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.
Energy-Efficient Monitoring of Extreme Values in Sensor Networks Loo, Kin Kong 10 May, 2007.
PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski.
Efficient Processing of Top-k Spatial Preference Queries
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES
Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.
1 Finding Competitive Price Yu Peng (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and Technology)
Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.
Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Dense-Region Based Compact Data Cube
Tian Xia and Donghui Zhang Northeastern University
Supporting Ranking and Clustering as Generalized Order-By and Group-By
15.1 – Introduction to physical-Query-plan operators
Real-Time Soft Shadows with Adaptive Light Source Sampling
Xiaokui Xiao and Yufei Tao Chinese University of Hong Kong
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Progressive Computation of The Min-Dist Optimal-Location Query
Discovering the Skyline of Web Databases
Ishan Sharma Abhishek Mittal Vivek Raj
Chapter 12: Query Processing
Spatio-temporal Pattern Queries
Join Processing in Database Systems with Large Main Memories (part 2)
Spatial Online Sampling and Aggregation
Author: Ahmed Eldawy, Mohamed F. Mokbel, Christopher Jonathan
Pyramid Sketch: a Sketch Framework
On Spatial Joins in MapReduce
Data Integration with Dependent Sources
Communication and Memory Efficient Parallel Decision Tree Construction
Introduction to Spatial Databases
Efficient Evaluation of k-NN Queries Using Spatial Mashups
Probabilistic Data Management
Renouncing Hotel’s Data Through Queries Using Hadoop
A Restaurant Recommendation System Based on Range and Skyline Queries
Diversified Top-k Subgraph Querying in a Large Graph
QuaSAQ: Enabling End-to-End QoS for Distributed Multimedia Databases
Chapter 12 Query Processing (1)
Efficient Processing of Top-k Spatial Preference Queries
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Relax and Adapt: Computing Top-k Matches to XPath Queries
Presentation transcript:

Preference Query Evaluation Over Expensive Attributes Justin Levandoski Mohamed F. Mokbel Mohamed E. Khalefa

Talk Outline Expensive Attributes The problem: preference queries over expensive attributes Previous solutions Our solution Performance Analysis Conclusion

Rich data stored at and retrieved from third party Expensive Attributes Rich data stored at and retrieved from third party Third-party = “cloud” or “web” Restaurant Reviews Driving Directions

Expensive Attributes You cannot “download” this data From Yelp API terms: “You may not cache, store, analyze or otherwise use Yelp content except for real-time use.” http://www.yelp.com/developers/documentation/faq

Expensive Attributes Business Listings Driving Restaurant Directions Reviews Driving Directions Online News Weather

Preference Queries Over Expensive Attributes How expensive is “expensive”? Experiment implemented in PostgreSQL prototype Query Processor Single drive time attribute From 3rd party web service 502 msec “Hot” 8K page From buffer 4.7 μsec “Cold” 8K page from disk 27 msec Order of magnitude difference Disk 8K Page

Talk Outline Expensive Attributes The problem: preference queries over expensive attributes Previous solutions Our solution Performance Analysis Conclusion

Preference Queries Over Expensive Attributes Preference queries with a mix of “local” and “expensive” attributes Goal: retrieve the least amount of expensive attributes We consider skyline, top-k, and multi-objective preference queries SELECT * FROM Restaurants R PREFERRING MIN R.Price, MIN R.Distance, MAX R.Rating Restaurants ID Price Distance Rating R1 1 R2 5 R3 2 R4 3 R5 7 Mention cost model changes and needing to optimize for least number of expensive attribute requests. Stored locally in DBMS

Talk Outline Expensive Attributes The problem: preference queries over expensive attributes Previous solutions Our solution Performance Analysis Conclusion

Distributed Skyline Queries Previous Solutions Top-K Ranking Bruno et al. ICDE 2002 Chang et al. SIGMOD 2002 Distributed Skyline Queries Balke et al. EDBT 2004 Bartolini et al. CIKM 2006

Preference Queries Over Expensive Attributes Every previous solution assumes sorted access for expensive attributes All third party data sources we have surveyed are stateless: they do not allow sorted access Query Processor Restaurants ID Price Rating R1 1 R2 5 R3 2 R4 3 R5 7 3 get next sorted 5 return next sorted 4

Talk Outline Expensive Attributes The problem: preference queries over expensive attributes Previous solutions Our solution Performance Analysis Conclusion

Highlights of Our Solution Does not assume sorted access to expensive attributes Assumes two fundamental access operations for expensive attributes Random: given object ID, return attribute value for that object Range: return objects with attribute values in given range [b,e] Framework that works with any existing preference algorithm (top-k, skyline, multi-objective) We have not invented a “new” preference algorithm Our framework is complementary to existing algorithms Implemented and tested in PostgreSQL

Outline of Solution Expensive Attribute Requests Dataset D D D - L Random Access Random Access for objects in S Range Access Dataset D D Phase I: Initial Answers Phase II: Pruning D - L Phase III: Cleaning Pruned Objects L Guaranteed Preference Answers Final Preference Answer

Skyline: find all “non-dominated” objects Skyline Query Example Skyline: find all “non-dominated” objects SELECT * FROM Restaurants R PREFERRING MIN R.Price, MIN R.Distance Dominance Region Restaurants ID Price Distance R1 1 10 R2 5 9 R3 2 R4 3 7 R5 R6 R7 8 4 R8 11 R8 R1 R2 R4 Distance R3 R7 R5 R6 Price

Running Example Data Local DBMS Restaurants SELECT * FROM Restaurants R PREFERRING MIN R.Price, MAX R.Rating, MIN R.WaitTime, MIN R.DriveTime Local DBMS Restaurants ID Price Rating Wait Time Drive Time a 84 78 39 b 91 29 19 ... c 27 28 55 d 36 12 51 e 63 f 1 13 24 … g 15 95 40 h 99 14 30 i 35 47 J 49 33 97

Outline of Solution Expensive Attribute Requests Dataset D D D - L Random Access Random Access for objects in S Range Access Dataset D D Phase I: Initial Answers Phase II: Pruning D - L Phase III: Cleaning Pruned Objects L Guaranteed Preference Answers Final Preference Answer

Running example: Phase I Create set A: known preference answers found by using any skyline algorithm over local attributes Track dominating object in A for each object not in A Dominating object depends on the skyline algorithm used Local DBMS A ID Price Rating Wait Time b 91 29 19 f 1 13 24 g 15 95 40 Other Objects a 84 78 39 c 27 28 55 d 36 12 51 e 63 h 99 14 30 i 35 47 29 j 49 33 97 Dominating Objects Phase I: Initial Answers Phase II: Pruning Phase III: Cleaning

Running example: Phase I Local DBMS A ID Price Rating Wait Time Drive Time b 91 29 19 f 1 13 24 g 15 95 40 RandomRequest 76 80 10 Other Objects a 84 78 39 c 27 28 55 d 36 12 51 e 63 h 99 14 30 i 35 47 29 j 49 33 97 Boundary Value Make random request for expensive attributes of A only Set boundary value for each object not in A based on dominating object’s expensive attribute 80 Dominating Objects 10 80 80 . . . 76 80 10 Phase I: Initial Answers Phase II: Pruning Phase III: Cleaning

Running example: Phase II Local DBMS Derive range boundary values U and L Make range request Details for finding range boundaries in paper Example with U=10 L=0 A ID Price Rating Wait Time Drive Time b 91 29 19 f 1 13 24 g 15 95 40 76 80 10 Other Objects a 84 78 39 c 27 28 55 d 36 12 51 e 63 h 99 14 30 i 35 47 29 j 49 33 97 BV 80 10 9 Range Request [0,10] 80 80 76 8 80 10 Phase I: Initial Answers Phase II: Pruning Phase III: Cleaning

Running example: Phase II Local DBMS Create two sets P: objects returned from range request and A that have expensive attribute less than or equal to U Q: All other objects not in A or P A ID Price Rating Wait Time Drive Time b 91 29 19 f 1 13 24 g 15 95 40 76 80 10 Other Objects a 84 78 39 c 27 28 55 d 36 12 51 e 63 h 99 14 30 i 35 47 29 j 49 33 97 BV 80 10 9 80 80 76 8 80 10 Phase I: Initial Answers Phase II: Pruning Phase III: Cleaning

Running example: Phase II Local DBMS Create two sets P: objects returned from range request and A that have expensive attribute less than or equal to U Q: All other objects not in A or P Clean P: run skyline over P, seeding inititial skyline with objects in P taken from A (ensures P contains final answers) Prune Q: for each object q in Q Case 1: bounding value (BV) is less than or equal to U Case 2: compare q to each object in P and discard if found to be dominated A ID Price Rating Wait Time Drive Time b 91 29 19 f 1 13 24 76 80 P i 35 47 29 d 36 12 51 g 15 95 40 8 9 10 Q a 84 78 39 c 27 28 55 e 63 51 h 99 14 30 j 49 33 97 BV 80 10 80 76 10 Phase I: Initial Answers Phase II: Pruning Phase III: Cleaning

Running example: Phase II Local DBMS A ID Price Rating Wait Time Drive Time b 91 29 19 f 1 13 24 76 80 P i 35 47 29 d 36 12 51 g 15 95 40 8 Cleaned objects in P represent uneccesary expensive attribute retrieval 9 10 Q a 84 78 39 c 27 28 55 e 63 51 h 99 14 30 j 49 33 97 Pruned objects in Q discarded without retrieving expensive attribute Phase I: Initial Answers Phase II: Pruning Phase III: Cleaning

Running example: Phase III Local DBMS Make random request for expensive attributes of all objects left in Q Find final preference answer by running skyline over Q, seeding initial answer with objects from A and P A ID Price Rating Wait Time Drive Time b 91 29 19 f 1 13 24 76 80 P i 35 47 29 g 15 95 40 8 10 Q e 63 51 39 RandomRequest 60 Phase I: Initial Answers Phase II: Pruning Phase III: Cleaning

Running example: Phase III Local DBMS Final preference answer is concatenation of A, P, and Q A ID Price Rating Wait Time Drive Time b 91 29 19 f 1 13 24 76 80 Final Answer ID Price Rating Wait Time Drive Time b 91 29 19 f 1 13 24 i 35 47 g 15 95 40 e 63 51 39 P i 35 47 29 g 15 95 40 8 76 10 80 9 10 Q e 63 51 39 60 60 Phase I: Initial Answers Phase II: Pruning Phase III: Cleaning

Solutions for Other Scenarios Paper contains details for Skyline over multiple expensive attributes Multi-objective queries for a single expensive attribute Multi-objective queries for multiple expensive attributes Framework can adapted to top-k queries

Multi-Objective Queries Multi-objective: mixture of skyline and ranking SELECT * FROM Restaurants R PREFERRING MIN (R.Price + R.Distance), MAX R.Rating Restaurants ID Price Distance Rating R1 1 10 4.5 R2 5 9 3 R3 2 3.5 R4 7 R5 3.4 R6 4.8 R7 8 4 R8 11 2.5 Restaurants ID (Price + Distance) Rating R1 11 4.5 R2 14 3 R3 7 3.5 R4 10 5 R5 9 3.4 R6 4.8 R7 12 2 R8 15 2.5

Talk Outline Expensive Attributes The problem: preference queries over expensive attributes Previous solutions Our solution Performance Analysis Conclusion

Framework implemented inside Postgres query processing engine Performance Analysis Framework implemented inside Postgres query processing engine Preference algorithms Skyline [ICDE 2001] Multi-Objective [VLDB 2004] Data Synthetically generated [ICDE 2001] Real Minneapolis point-of-interest data using Bing Maps API for driving time

Performance Analysis: Skyline Single Expensive Attr.

Multi-Objective with Mult Expensive Attr.

Talk Outline Expensive Attributes The problem: preference queries over expensive attributes Previous solutions Our solution Performance Analysis Conclusion

Conclusion Covered problem of preference query processing over expensive attributes Surveyed existing techniques to addressing this problem and their drawbacks Proposed new expensive attribute query processing framework Works with any existing preference algorithm Three-phase framework Assumes only random and range access to expensive attributes Covered performance analysis of framework implemented inside PostgreSQL

Thank You

Choosing a value for U Five options for choosing U Details in paper Local DBMS Five options for choosing U MAX: maximum expensive attribute value from A MIN: minimum expensive attribute value from A MOSTDOM: expensive attribute from A of object found to dominate the most other objects BOUNDMAX: Maximum boundary value BOUNDMIN: Minimum boundary value Details in paper A ID Price Rating Wait Time Drive Time b 91 29 19 f 1 13 24 g 15 95 40 76 80 10 Other Objects a 84 78 39 c 27 28 55 d 36 12 51 e 63 h 99 14 30 i 35 47 29 j 49 33 97 BV 80 Dominating Objects 10 10 80 76 80 10 Phase I: Initial Answers Phase II: Pruning Phase III: Cleaning

Performance Analysis: Skyline Mult Expensive Attr.

Performance Analysis: U Derivation Methods Synthetic Data with 50K objects (default workload) 6-attribute skyline with single expensive attribute