PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski.

Slides:



Advertisements
Similar presentations
Security for Distributed E-Service Composition Stefan SeltzsamStephan BörzsönyiAlfons Kemper Universität Passau.
Advertisements

Revisiting Co-Processing for Hash Joins on the Coupled CPU- GPU Architecture School of Computer Engineering Nanyang Technological University 27 th Aug.
Modeling and Querying Possible Repairs in Duplicate Detection George Beskales Mohamed A. Soliman Ihab F. Ilyas Shai Ben-David.
VLDB 2011 Pohang University of Science and Technology (POSTECH) Republic of Korea Jongwuk Lee, Seung-won Hwang VLDB 2011.
A Paper on RANDOM SAMPLING OVER JOINS by SURAJIT CHAUDHARI RAJEEV MOTWANI VIVEK NARASAYYA PRESENTED BY, JEEVAN KUMAR GOGINENI SARANYA GOTTIPATI.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.
Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.
1 RankSQL: Query Algebra and Optimization for Relational Top-k Queries Chengkai Li (UIUC) joint work with Kevin Chen-Chuan Chang (UIUC) Ihab F. Ilyas (U.
The Skyline Operator (Stephan Borzsonyi, Donald Kossmann, Konrad Stocker) Presenter: Shehnaaz Yusuf March 2005.
1 Chapter 5 : Query Processing and Optimization Group 4: Nipun Garg, Surabhi Mithal
PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries Vagelis Hristidis University of California, San Diego Nick Koudas AT&T.
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
ISAC 教育學術資安資訊分享與分析中心研發專案 The Skyline Operator Stephan B¨orzs¨onyi, Donald Kossmann, Konrad Stocker EDBT
University of Minnesota CG_Hadoop: Computational Geometry in MapReduce Ahmed Eldawy* Yuan Li* Mohamed F. Mokbel*$ Ravi Janardan* * Department of Computer.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
FlexPref: A Framework for Extensible Preference Evaluation in Database Systems Justin J. Levandoski Mohamed F. Mokbel Mohamed E. Khalefa.
1 Continuous k-dominant Skyline Query Processing Presented by Prasad Sriram Nilu Thakur.
Depth Estimation for Ranking Query Optimization Karl Schnaitter, UC Santa Cruz Joshua Spiegel, BEA Systems, Inc. Neoklis Polyzotis, UC Santa Cruz.
Reaching the Top-k of the Skyline: A efficient Indexed Algorithm for Top-k Skyline Queries Marlene Goncalves and María-Esther Vidal Universidad Simón Bolívar,
Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science.
Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey,
1 Progressive Computation of Constrained Subspace Skyline Queries Evangelos Dellis 1 Akrivi Vlachou 1 Ilya Vladimirskiy 1 Bernhard Seeger 1 Yannis Theodoridis.
DATA-DRIVEN UNDERSTANDING AND REFINEMENT OF SCHEMA MAPPINGS Data Integration and Service Computing ITCS 6010.
University of Minnesota 1 / 50 June 2011 Keynote MobiDE 2011 Personalization, Socialization, and Recommendations in Location-based Services 2.0 Mohamed.
Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.
Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),
Efficient Progressive Processing of Skyline Queries in Peer-to-Peer Systems INFOSCALE’06.
© ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.
Set Containment Joins: The Good, The Bad and The Ugly Karthikeyan Ramasamy Jointly With Jignesh Patel, Jeffrey F. Naughton and Raghav Kaushik.
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.
K-Hit Query: Top-k Query Processing with Probabilistic Utility Function SIGMOD2015 Peng Peng, Raymond C.-W. Wong CSE, HKUST 1.
Efficient Processing of Top-k Spatial Preference Queries
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.
A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
The σ-neighborhood skyline queries Chen, Yi-Chung; LEE, Chiang. The σ-neighborhood skyline queries. Information Sciences, 2015, 322: 張天彥 2015/12/05.
Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.
Efficient Computation of Combinatorial Skyline Queries Author: Yu-Chi Chung, I-Fang Su, and Chiang Lee Source: Information Systems, 38(2013), pp
Supporting Ranking and Clustering as Generalized Order-By and Group-By Chengkai Li (UIUC) joint work with Min Wang Lipyeow Lim Haixun Wang (IBM) Kevin.
On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications 陳良弼 Arbee L.P. Chen National Chengchi University 9/21/2012 at NCHU.
1 Finding Competitive Price Yu Peng (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and Technology)
Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.
Answering Top-k Queries with Multi-Dimensional Selections: The Ranking Cube Approach Dong Xin, Jiawei Han, Hong Cheng, Xiaolei Li Department of Computer.
Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.
Answering Why-not Questions on Top-K Queries Andy He and Eric Lo The Hong Kong Polytechnic University.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.
HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.
1 VLDB, Background What is important for the user.
1 Chengkai Li Kevin-Chen-Chuan Chang Ihab Ilyas Sumin Song Presented by: Mariam John CSE /20/2006 RankSQL: Query Algebra and Optimization for Relational.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Online Skyline Queries. Agenda Motivation: Top N vs. Skyline „Classic“ Algorithms –Block-Nested Loop Algorithm –Divide & Conquer Algorithm Online Algorithm.
Tian Xia and Donghui Zhang Northeastern University
Supporting Ranking and Clustering as Generalized Order-By and Group-By
CS 540 Database Management Systems
CS 440 Database Management Systems
RankSQL: Query Algebra and Optimization for Relational Top-k Queries
Ishan Sharma Abhishek Mittal Vivek Raj
Preference Query Evaluation Over Expensive Attributes
Spatio-temporal Pattern Queries
Probabilistic Data Management
SQL: Structured Query Language
Efficient Processing of Top-k Spatial Preference Queries
Relax and Adapt: Computing Top-k Matches to XPath Queries
Presentation transcript:

PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski

2 Outline Preference Queries Implementing a Preference Join The PrefJoin Operator Performance Analysis Conclusion

3 Preference Queries SELECT * FROM Hotels H Restaurants R WHEREH.city = R.city PREFERRING MIN H.Price, MAX H.Rating, MIN BeachDistance(H.Location, Beach) MIN R.Price MAX R.Rating MIN R.WaitTime Top-K [VLDB99] Skyline [ICDE01] K-Dominance [SIGMOD06] K-Frequency [EDBT06] Multi-Objective [VLDB04] UsingSkyline/K-Dominance/K-Frequency/...

4 Outline Preference Methods Implementing a Preference Join The PrefJoin Operator Performance Analysis Conclusion

5 The “On-Top” Implementation SELECT * FROM Hotels H Restaurants R WHERE H.city = R.city PREFERRING MIN H.Price, MAX H.Rating, MIN BeachDistance(H.Location, Beach) MIN R.Price, MAX R.Rating, MIN R.WaitTime SELECT * FROM Hotels H Restaurants R WHERE H.city = R.city PREFERRING MIN H.Price, MAX H.Rating, MIN BeachDistance(H.Location, Beach) MIN R.Price, MAX R.Rating, MIN R.WaitTime Join Top-K Skyline Mult-Objective K-Frequency K-Dominance Easy to implementInefficient

6 The “Custom” Implementation Skyline Join Skyline Join K-Dom Join K-Dom Join K-Freq Join K-Freq Join Top-K Join Top-K Join Mult-Obj Join Mult-Obj Join … Good performance Infeasible Multi-relational skyline [ICDE07] Equijoin skyline [ICDE10] Progressive multi-criteria [ICDE10] TA & NRA [PODS01] Klee [VLDB05] Rank-Join [VLDB03]

7 Outline Preference Methods Implementing a Preference Join The PrefJoin Operator –Architecture –Functionality Performance Analysis Conclusion

8 The PrefJoin Architecture K-Dominance PrefJoin SkylineK-Frequency Good performance Extensible architecture / Sustainable

9 The PrefJoin Architecture: Comparisons Join Top-K Skyline Multi-Objective K-Frequency K-Dominance The On-Top Approach Work: Easy to Implement Performance: Poor Skyline Join Skyline Join K-Dom Join K-Dom Join Top-K Join Top-K Join Mult- Obj Join Mult- Obj Join K-Freq Join K-Freq Join The CustomApproach Work: Difficult/Unsustainable Performance: Good … K-Frequency Skyline Top-K Multi-Obj K-Dom PrefJoin Work: Easy to Implement/ Sustainable Performance: Good

10 Outline Preference Methods Implementing a Preference Join The PrefJoin Operator –Architecture –Functionality Performance Analysis Conclusion

11 PrefJoin Functionality ….. Phase 1 Local Pruning Phase 2 Data Preparation P local LocalPref P pairwise DB(t) Phase 3 Joining Candidate Preference Set P refine Final Preference Set Phase 4 Refinement “Plugin” Functions

12 PrefJoin Functionality: Plugin Functions Semantics of three plugin functions determine preference join type P local P pairwise P refine Skyline Null = Skyline Join P local P pairwise P refine Skyline K-Dominance = Join P local P pairwise P refine Multi- Objective = Join Multi- Objective

13 PrefJoin Functionality ….. Phase 1 Local Pruning Phase 2 Data Preparation P local LocalPref P pairwise DB(t) Phase 3 Joining Candidate Preference Set P refine Final Preference Set Phase 4 Refinement

14 Phase 1: Local Pruning Filter tuples from each input relation guaranteed not to be preference answers Filtered tuples are never considered again SELECT * FROM Hotels H, Restaurants R WHERE H.city = R.city PREFERRING MIN H.Price, MAX H.Rating, MIN BeachDistance(H.Location, Beach) MIN R.Price, MAX R.Rating, MIN R.WaitTime USING SKYLINE SELECT * FROM Hotels H, Restaurants R WHERE H.city = R.city PREFERRING MIN H.Price, MAX H.Rating, MIN BeachDistance(H.Location, Beach) MIN R.Price, MAX R.Rating, MIN R.WaitTime USING SKYLINE h (city) P local LocalPref (Hotels) LocalPref (Restaurants)

15 PrefJoin Functionality ….. Phase 1 Local Pruning Phase 2 Data Preparation P local LocalPref P pairwise DB(t) Phase 3 Joining Candidate Preference Set P refine Final Preference Set Phase 4 Refinement

16 Phase 2: Data Preparation Associate dominance metadata with tuples Helps to reduce output of join phase Hotel LocalPref Set Hotel Buckets null S S H Restaurants LocalPref Set P pairwise

17 PrefJoin Functionality ….. Phase 1 Local Pruning Phase 2 Data Preparation P local LocalPref P pairwise DB(t) Phase 3 Joining Candidate Preference Set P refine Final Preference Set Phase 4 Refinement

18 Phase 3: Joining Join input to produce candidate preference set –Use metadata from previous phase as extra join predicate –Greatly reduces false positive preference answers DB set intersection is not null DB set intersection is null Candidate Preference Set

19 PrefJoin Functionality ….. Phase 1 Local Pruning Phase 2 Data Preparation P local LocalPref P pairwise DB(t) Phase 3 Joining Candidate Preference Set P refine Final Preference Set Phase 4 Refinement

20 Phase 4: Refinement Apply final preference evaluation to join Guarantees correct final preference answer Optional phase –Skyline does not require refinement phase –K-dominance does require refinement phase P refine Final Preference Answer

21 Outline Preference Methods Implementing a Preference Join The PrefJoin Operator Performance Analysis Conclusion

22 Performance Analysis PrefJoin implemented in PostgreSQL Comparison of performance against –FlexPref [ICDE10]: generic, extensible join –SkylineJoin [ICDE07]: skyline-specific join

23 Scalability Experiment Performance for increasing input sizes –Skyline –K-Dominance –Multi-objective

24 Varying Number of Preference Attributes Increasing number of preference attributes for Skyline preference method Increased number of attributes increases preference answer cardinality

25 Outline Preference Methods Implementing a Preference Join The PrefJoin Operator Performance Analysis Conclusion

26 Conclusion and Summary Many (possibly infinite) preference methods Three approaches to supporting preference join queries –“On-top” approach: easy but inefficient –“Custom implementation” approach: efficient yet infeasible –PrefJoin’s “extensible” approach: efficient and feasible PrefJoin architecture –Four-phase approach –Uses three “plug-in” preference functions to determine preference join semantics Performance analysis –Experiments with PostgreSQL implementation –Superior performance compared to existing custom and generic preference join algorithms

27 Thank You Questions

28 Preference Method Examples Price Distance R1 R3 R4 R2 R7 SELECT * FROM Restaurants R PREFERRING MIN R.Price, MIN R.Distance SELECT * FROM Restaurants R PREFERRING MIN R.Price, MIN R.Distance R8 Skyline answer: {R1, R3, R5} R6 R5 PriceDistance R1 R3 R4 R2 R7 R8 Top-K Domination answer: {R3, R4, R2} R6 R5 The Skyline Method The Top-K Domination Method