1 RankSQL: Query Algebra and Optimization for Relational Top-k Queries Chengkai Li (UIUC) joint work with Kevin Chen-Chuan Chang (UIUC) Ihab F. Ilyas (U.

Slides:



Advertisements
Similar presentations
Topic 3 Top-K and Skyline Algorithms. 2 What is top-k processing? Find k items that best answer a users query –As a set, as a sorted list, or as a sorted.
Advertisements

Web Information Retrieval
Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,
13/04/20151 SPARK: Top- k Keyword Query in Relational Database Wei Wang University of New South Wales Australia.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.
 Introduction  Views  Related Work  Preliminaries  Problems Discussed  Algorithm LPTA  View Selection Problem  Experimental Results.
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
RankSQL: Supporting Ranking Queries in RDBMS Chengkai Li (UIUC) Mohamed A. Soliman (Univ. of Waterloo) Kevin Chen-Chuan Chang (UIUC) Ihab F. Ilyas (Univ.
Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,
The Volcano/Cascades Query Optimization Framework
CS 540 Database Management Systems
PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries Vagelis Hristidis University of California, San Diego Nick Koudas AT&T.
Top-k Set Similarity Joins Chuan Xiao, Wei Wang, Xuemin Lin and Haichuan Shang University of New South Wales and NICTA.
University of Washington Database Group Tiresias The Database Oracle for How-To Queries Alexandra Meliou § ✜ Dan Suciu ✜ § University of Massachusetts.
Supporting Ad-Hoc Ranking Aggregates Chengkai Li (UIUC) joint work with Kevin Chang (UIUC) Ihab Ilyas (Waterloo)
1 EntityRank: Searching Entities Directly and Holistically Tao Cheng Joint work with : Xifeng Yan, Kevin Chang VLDB 2007, Vienna, Austria.
SUPPORTING TOP-K QUERIES IN RELATIONAL DATABASES. PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, MARCH 2004 Sowmya Muniraju.
Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.
6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades.
FlexPref: A Framework for Extensible Preference Evaluation in Database Systems Justin J. Levandoski Mohamed F. Mokbel Mohamed E. Khalefa.
Depth Estimation for Ranking Query Optimization Karl Schnaitter, UC Santa Cruz Joshua Spiegel, BEA Systems, Inc. Neoklis Polyzotis, UC Santa Cruz.
Evaluating Top-k Queries over Web-Accessible Databases Nicolas Bruno Luis Gravano Amélie Marian Columbia University.
Top-k Query Processing and Optimization 198:541 (slides courtesy of Ihab F. Ilyas and Walid G. Aref)
Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign.
Keyword Search in Relational Databases Jaehui Park Intelligent Database Systems Lab. Seoul National University
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
Querying Structured Text in an XML Database By Xuemei Luo.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
EntityRank :Searching Entities Directly and Holistically Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang Computer Science Department, University of Illinois.
C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.
Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Richa Varshney.
“Artificial Intelligence” in my research Seung-won Hwang Department of CSE POSTECH.
Distributed Spatio-Temporal Similarity Search Demetrios Zeinalipour-Yazti University of Cyprus Song Lin
PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski.
Efficient Processing of Top-k Spatial Preference Queries
1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
IO-Top-k: Index-access Optimized Top-k Query Processing Debapriyo Majumdar Max-Planck-Institut für Informatik Saarbrücken, Germany Joint work with Holger.
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina.
Supporting Ranking and Clustering as Generalized Order-By and Group-By Chengkai Li (UIUC) joint work with Min Wang Lipyeow Lim Haixun Wang (IBM) Kevin.
On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications 陳良弼 Arbee L.P. Chen National Chengchi University 9/21/2012 at NCHU.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Answering Top-k Queries with Multi-Dimensional Selections: The Ranking Cube Approach Dong Xin, Jiawei Han, Hong Cheng, Xiaolei Li Department of Computer.
Database Searching and Information Retrieval Presented by: Tushar Kumar.J Ritesh Bagga.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Query by Image and Video Content: The QBIC System M. Flickner et al. IEEE Computer Special Issue on Content-Based Retrieval Vol. 28, No. 9, September 1995.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign RankFP : A Framework for Rank Formulation and Processing Hwanjo Yu, Seung-won.
1 Chengkai Li Kevin-Chen-Chuan Chang Ihab Ilyas Sumin Song Presented by: Mariam John CSE /20/2006 RankSQL: Query Algebra and Optimization for Relational.
Structured-Value Ranking in Update- Intensive Relational Databases Jayavel Shanmugasundaram Cornell University (Joint work with: Lin Guo, Kevin Beyer,
Supporting Ranking and Clustering as Generalized Order-By and Group-By
RankSQL: Query Algebra and Optimization for Relational Top-k Queries
Supporting Ranking in Queries Score-based Paradigm
Seung-won Hwang, Kevin Chen-Chuan Chang
Supporting Ad-Hoc Ranking Aggregates
RankSQL: Query Algebra and Optimization for Relational Top-k Queries
Prepared by : Ankit Patel (226)
Preference Query Evaluation Over Expensive Attributes
Efficient Processing of Top-k Spatial Preference Queries
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Presentation transcript:

1 RankSQL: Query Algebra and Optimization for Relational Top-k Queries Chengkai Li (UIUC) joint work with Kevin Chen-Chuan Chang (UIUC) Ihab F. Ilyas (U. of Waterloo) Sumin Song (UIUC)

2 Ranking (Top-k) Queries Ranking is an important functionality in many real-world database applications: E-Commerce, Web Sources Find the best hotel deals by price, distance, etc. Multimedia Databases Find the most similar images by color, shape, texture, etc. Text Retrieval, Search Engine Find the most relevant records/documents/pages. OLAP, Decision Support Find the top profitable customers to send ads.

3 Example: Trip Planning Select * From Hotel h, Museum m Where h.star=3 AND h.area=m.area Order By cheap(h.price) + close(h.addr, “BWI airport”) + related(m.collection,“dinosaur”) Limit 5 membership dimension: Boolean predicates, Boolean function order dimension: ranking predicates, monotonic scoring function hotelmuseumcheapcloserelatedscore h1m h2m h1m Suggest a hotel to stay and a museum to visit: B R

4 Processing Ranking Queries in Traditional RDBMS Select * From Hotel h, Museum m Where B Order By R Limit 5 B h.area=m.area σ h.star=3 scan(h) scan(m) 5 results sort cheap+close+related R

5 Problems of Traditional Approach Naïve Materialize-then-Sort scheme Overkill: total order of all results; only 5 top results are requested. Very inefficient: –Scan large base tables; –Join large intermediate results; –Evaluate every ranking on every tuple; –Full sorting. R 5 results B

6 Therefore the problem is: Unlike Boolean constructs, ranking is second class. –Ranking is processed as a Monolithic component ( R ), always after the Boolean component ( B ).

7 How did we make Boolean “first class”? Select * From Hotel h, Museum m Where h.star=3 AND h.area=m.area Hotel X Museum σ h.star=3 AND h.area=m.area (1) materialize-then-filter

8 First Class: Splitting and Interleaving Select * From Hotel h, Museum m Where h.star=3 AND h.area=m.area (2) B is split into joins and selections, which interleave with each other. h.area=m.area σ h.star=3 scan(h) scan(m) Hotel X Museum σ h.star=3 AND h.area=m.area (1) materialize-then-filter

9 Ranking Query Plan materialize-then-sort: naïve, overkill split and interleave: reduction of intermediate results, thus processing cost 5 results rank-join h.area=m.area σ h.star=3 rank-scan( h,cheap ) scan(m) μ related μ close B h.area=m.area σ h.star=3 scan(h) scan(m) 5 results sort cheap+close+related R

10 Possibly orders of magnitude improvement Implementation in PostgreSQL plan1: traditional materialize-then-sort plan plan2-4: new ranking query plans Observations: an extended plan space with plans of various costs.

11 RankSQL Goals: –Support ranking as a first-class query type in RDBMS; splitting ranking. –Integrate ranking with traditional Boolean query constructs. interleaving ranking with other operations. Foundation: Rank-Relational Algebra –data model: rank-relation –operators: new and augmented –algebraic laws Query engine: –executor: physical operator implementation –optimizer: plan enumeration, cost estimation

12 Two Logical Properties of Rank-Relation Membership( B ) : h.star=3 Order( R ) : cheap(h.price) Membership( B ) : h.area=m.area, h.star=3 Order( R ) : close(h.addr, “BWI airport”) related(m.collection,“dinosaur”) cheap(h.price) rank-relation A B Membership of the tuples: evaluated Boolean predicates Order among the tuples: evaluated ranking predciates

13 Ranking Principle: what should be the order? Upper-bound determines the order: Without further processing h1, we cannot output any result; hotelcheap h10.9 h20.6 …… tuple h 1 tuple h 2 upper bound museumcloserelated 2.9*11 2.6*11 ………… A B F=cheap + close + related

14 Ranking Principle: upper-bound determines the order Processing in the “promising” order, avoiding unnecessary processing. tuple h 1 tuple h 2 A B Upper-bound determines the order: Without further processing h1, we cannot output any result; F=cheap + close + related hotelcheap h10.9 h20.6 …… upper bound museumcloserelated 2.9*11 2.6*11 …………

15 Rank-Relation Rank-relation R: relation F: monotonic scoring function over predicates (p 1,..., p n ) P  {p 1,..., p n }: evaluated predicates Logical Properties: –Membership: R (as usual) –Order: <  t1, t2  : t1 < t2 iff F P [t1] < F P [t2]. (by upper-bound)

16 Operators To achieve splitting and interleaving: New operator: –μ: evaluate ranking predicates piece by piece. implementation: MPro (Chang et al. SIGMOD02). Extended operators: –rank-selection –rank-join implementation: HRJN (Ilyas et al. VLDB03). –rank-scan –rank-union, rank-intersection.

17 Example hotel …p1p2p3score h h h h …………… upper-bound hotel upper-boundhotel upper-bound hotel Select * From Hotel H Order By p1+p2+p3 Limit 1 Scan p1 (H) μ p2 μ p3 R p1 R p1+p2 R p1+p2+p3

18 Example hotel upper-bound h2 hotelupper-bound hotel upper-bound hotel …p1p2p3score h1 h20.9 h3 h4 … Select * From Hotel H Order By p1+p2+p3 Limit 1 Scan p1 (H) μ p2 μ p3 R p1 R p1+p2 R p1+p2+p3

19 Example hotel upper-bound h22.9 hotelupper-bound hotel upper-bound hotel …p1p2p3score h h h h … Select * From Hotel H Order By p1+p2+p3 Limit 1 Scan p1 (H) μ p2 μ p3 R p1 R p1+p2 R p1+p2+p3

20 Example hotel upper-bound h22.9 hotelupper-bound hotel upper-bound hotel …p1p2p3score h h h h … Select * From Hotel H Order By p1+p2+p3 Limit 1 h2 2.9 Scan p1 (H) μ p2 μ p3 R p1 R p1+p2 R p1+p2+p3

21 Example hotel upper-bound h22.9 hotelupper-bound hotel upper-bound h22.75 Select * From Hotel H Order By p1+p2+p3 Limit 1 hotel …p1p2p3score h h h h … Scan p1 (H) μ p2 μ p3 R p1 R p1+p2 R p1+p2+p3

22 Example hotel upper-bound h22.9 h12.7 hotelupper-bound hotel upper-bound h22.75 Select * From Hotel H Order By p1+p2+p3 Limit 1 hotel …p1p2p3score h h h h … Scan p1 (H) μ p2 μ p3 R p1 R p1+p2 R p1+p2+p3

23 Example hotel upper-bound h22.9 h12.7 hotelupper-bound hotel upper-bound h22.75 h12.5 Select * From Hotel H Order By p1+p2+p3 Limit 1 hotel …p1p2p3score h h h h … Scan p1 (H) μ p2 μ p3 R p1 R p1+p2 R p1+p2+p3

24 Example hotel upper-bound h22.9 h12.7 hotelupper-bound h22.55 hotel upper-bound h22.75 h12.5 Select * From Hotel H Order By p1+p2+p3 Limit 1 hotel …p1p2p3score h h h h … Scan p1 (H) μ p2 μ p3 R p1 R p1+p2 R p1+p2+p3

25 Example hotel upper-bound h22.9 h12.7 h32.5 hotelupper-bound h22.55 hotel upper-bound h22.75 h12.5 Select * From Hotel H Order By p1+p2+p3 Limit 1 hotel …p1p2p3score h h h h … Scan p1 (H) μ p2 μ p3 R p1 R p1+p2 R p1+p2+p3

26 Example hotel upper-bound h22.9 h12.7 h32.5 hotelupper-bound h22.55 hotel upper-bound h22.75 h12.5 h31.95 Select * From Hotel H Order By p1+p2+p3 Limit 1 hotel …p1p2p3score h h h h … Scan p1 (H) μ p2 μ p3 R p1 R p1+p2 R p1+p2+p3

27 Example hotel upper-bound h22.9 h12.7 h32.5 hotelupper-bound h22.55 h12.4 hotel upper-bound h22.75 h12.5 h31.95 Select * From Hotel H Order By p1+p2+p3 Limit 1 hotel …p1p2p3score h h h h … Scan p1 (H) μ p2 μ p3 R p1 R p1+p2 R p1+p2+p3

28 Example hotel upper-bound h22.9 h12.7 h32.5 hotelupper-bound h22.55 h12.4 hotel upper-bound h22.75 h12.5 h31.95 Select * From Hotel H Order By p1+p2+p3 Limit 1 hotel …p1p2p3score h h h h … Scan p1 (H) μ p2 μ p3 R p1 R p1+p2 R p1+p2+p3

29 In contrast: materialize-then-sort Scan(H) Sort p1+p2+p3 Select * From Hotel H Order By p1+p2+p3 Limit 1 hotel …p1p2p3score h h h h …………….

30 Impact of Rank-Relational Algebra Ranking Query parser, rewriter Logical Query Plan Physical Query Plan executor Results operators and algebraic laws implementation of operators cost model Plan enumerator optimizer

31 Optimization Two-dimensional enumeration: ranking (ranking predicate scheduling) and filtering (join order selection) Sampling-based cardinality estimation

32 Two-Dimensional Enumeration (1 table, 0 predicate) seqScan(H), idxScan(H), seqScan(M), … (1 table, 1 predicate) rankScan cheap (H),  cheap (seqScan(H)), … (1 table, 2 predicates)  close (rankScan cheap (H)), … (2 table, 0 predicate) NestLoop(seqScan(H), seqScan(M)), … (2 table, 1 predicate) NRJN(rankScan cheap (H), seqScan(M)),… and so on…

33 Related Work Middleware Fagin et al. (PODS 96,01),Nepal et al. (ICDE 99),Günter et al. (VLDB 00),Bruno et al. (ICDE 02),Chang et al. (SIGMOD 02) RDBMS, outside the core Chaudhuri et al. (VLDB 99),Chang et al. (SIGMOD 00), Hristidis et al. (SIGMOD 01), Tsaparas et al. (ICDE 03), Yi et al. (ICDE 03) RDBMS, in the query engine –Physical operators and physical properties Carey et al. (SIGMOD 97), Ilyas et al. (VLDB 02, 03, SIGMOD 04), Natsev et al. (VLDB 01) –Algebra framework Chaudhuri et al. (CIDR 05)

34 Conclusion: RankSQL System Goal: Support ranking as a first-class query type; Integrate ranking with Boolean query constructs. Our approach: –Algebra: rank-relation, new and augmented rank-aware operators, algebraic laws –Optimizer: two-dimensional enumeration, sampling-based cost estimation Implementation: in PostgreSQL Welcome to our demo in VLDB05!