AQuery A Database System for Order

Slides:



Advertisements
Similar presentations
Advanced SQL (part 1) CS263 Lecture 7.
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
1 Advanced SQL Queries. 2 Example Tables Used Reserves sidbidday /10/04 11/12/04 Sailors sidsnameratingage Dustin Lubber Rusty.
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
BUILDING A DATABASE SYSTEM FOR ORDER New England Database Seminars April 2002 Alberto Lerner – ENST Paris Dennis Shasha – NYU
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Programming, Triggers Chapter 5 Modified by Donghui Zhang.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Composite Subset Measures Lei Chen, Paul Barford, Bee-Chung Chen, Vinod Yegneswaran University of Wisconsin - Madison Raghu Ramakrishnan Yahoo! Research.
Algebraic and Logical Query Languages Spring 2011 Instructor: Hassan Khosravi.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Programming, Triggers Chapter 5.
CREATE VIEW SYNTAX CREATE VIEW name [(view_col [, view_col …])] AS [WITH CHECK OPTION];
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
FALL 2004CENG 351 File Structures and Data Management1 SQL: Structured Query Language Chapter 5.
AQuery A Database System for Order Dennis Shasha joint work with Alberto Lerner
Unary Query Processing Operators Not in the Textbook!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Constraints, Triggers Chapter 5.
Database Systems More SQL Database Design -- More SQL1.
Introduction to Structured Query Language (SQL)
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Aquery: A DATABASE SYSTEM FOR ORDER Dennis Shasha, joint work with Alberto Lerner
1 ICS 184: Introduction to Data Management Lecture Note 10 SQL as a Query Language (Cont.)
Relational Algebra - Chapter (7th ed )
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
1 Single Table Queries. 2 Objectives  SELECT, WHERE  AND / OR / NOT conditions  Computed columns  LIKE, IN, BETWEEN operators  ORDER BY, GROUP BY,
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
From Relational Algebra to SQL CS 157B Enrique Tang.
Copyright © Curt Hill Joins Revisited What is there beyond Natural Joins?
Database Fundamental & Design by A.Surasit Samaisut Copyrights : All Rights Reserved.
1 CSCE Database Systems Anxiao (Andrew) Jiang The Database Language SQL.
1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
A Guide to SQL, Eighth Edition Chapter Four Single-Table Queries.
Query Processing – Implementing Set Operations and Joins Chap. 19.
1 SQL: The Query Language. 2 Example Instances R1 S1 S2 v We will use these instances of the Sailors and Reserves relations in our examples. v If the.
COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!
1 Chapter 3 Single Table Queries. 2 Simple Queries Query - a question represented in a way that the DBMS can understand Basic format SELECT-FROM Optional.
1 SQL Chapter 9 – 8 th edition With help from Chapter 2 – 10 th edition.
SQL LANGUAGE TUTORIAL Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha.
SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.
1 CS122A: Introduction to Data Management Lecture 9 SQL II: Nested Queries, Aggregation, Grouping Instructor: Chen Li.
IFS180 Intro. to Data Management Chapter 10 - Unions.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Arthur Whitney, Chief Technical Officer, KX Systems.
More SQL: Complex Queries,
Unary Query Processing Operators
Slides are reused by the approval of Jeffrey Ullman’s
MySQL Subquery Source: Dev.MySql.com
COP Introduction to Database Structures
Relational Algebra - Part 1
SQL: Advanced Options, Updates and Views Lecturer: Dr Pavle Mogin
SQL Structured Query Language 11/9/2018 Introduction to Databases.
Operators Expression Trees Bag Model of Data
CS 3630 Database Design and Implementation
Instructor: Mohamed Eltabakh
SQL – Entire Select.
Chapter 4 Summary Query.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
The Relational Model Textbook /7/2018.
SQL: Structured Query Language
The Relational Algebra
Contents Preface I Introduction Lesson Objectives I-2
Query Functions.
Section 4 - Sorting/Functions
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Introduction to SQL Server and the Structure Query Language
Presentation transcript:

AQuery A Database System for Order Alberto Lerner and Dennis Shasha lerner@cs.nyu.edu

Motivation The need for ordered data Queries in Finance, Biology, and Network Management depend on order. SQL 99 has extensions – the OLAP amendment – that incorporate order to the language but they are clumsy to use.

3-month moving average: the wrong way sales 100 120 140 130 3-avg 110 133 136 month 1 2 3 4 5 SELECT t1.month,t1.sales, (t1.sales+t2.sales+t3.sales)/3 FROM Sales t1, Sales t2, Sales t3 WHERE t1.month – 1 = t2.month AND t1.month – 2 = t3.month Join eliminates first two months! Do we really need a three-way join? Can the optimizer make it linear-time? Problems?

3-month moving average: the hard way sales 100 120 140 130 3-avg 110 133 136 month 1 2 3 4 5 SELECT t1.month,t1.sales, (t1.sales+ CASE WHEN t2.sales is null AND t3.sales is null THEN 2*t1.sales WHEN t2.sales is not null AND THEN t2.sales +(t1.sales+t2.sales)/2 ELSE t2.sales + t3.sales END) / 3 FROM Sales t1 LEFT OUTER JOIN Sales t2 ON t1.month – 1 = t2.month LEFT OUTER JOIN Sales t3 ON t1.month – 2 = t3.month “Write-once” query Problems? Three way join

3-month moving average: the OLAP way sales 100 120 140 130 3-avg 110 133 136 month 1 2 3 4 5 SELECT month,sales, avg(sales) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) FROM Sales OVER construct is confined to the SELECT clause Awkward syntax Problems?

Network Management Query Find duration and average length of packets of src-dst flows. A flow from src to dest ends after a 2-minute silence WITH Prec AS (src,dst,len,time,ptime) (SELECT src,dst,len,time,min(time) OVER w FROM Packets WINDOW w AS (PARTITION BY src,dst ORDER BY time ROWS BETWEEN 1 PRECEDING AND 1 PRECEEDING)), Flow AS (src,dst,len,time,flag) (SELECT src,dst,len,time, CASE WHEN time-ptime > 120 THEN 1 ELSE 0 FROM Prec), FlowID AS (src,dst,len,time,fID) (SELECT src,dst,len,time,sum(flag) OVER w FROM Flow WINDOW w AS (ORDER BY src,dst, time ROWS UNBOUDED PRECEDING)) SELECT src,dst,count(*),avg(len) FROM FlowID GROUP BY src,dst,fID Packets src s1 s2 dst s2 s1 len 250 270 330 235 280 305 time 1 20 47 141 150 155

Order in SQL:1999 Inter-tuple operations require joins or additional query constructs - or both! Ordering can only be obtained in specific clauses (e.g., SELECT) Bottom line: Queries become difficult to read Cost of execution is larger than necessary (optimization of nested queries is still an open problem)

Idea Replace ordered tables (arrables) for tables in the data model (inspiration from KSQL by KX systems) Whatever can be done on a table can be done on an arrable. Not vice-versa. Define order on a per-query basis All query clauses can count on data ordering Maintain SQL flavor (upward compatibility to SQL 92) while allowing expressions based on order with no additional language constructs Exploit optimization techniques involving order That’s AQuery!

Moving average over Arrables month 1 2 3 4 5 sales 100 120 140 130 3-avg 100 110 120 133 136 SELECT month,avgs(3,sales) FROM Sales ASSUMING ORDER month Arrable: a collection of named arrays, ordered by a column list

Moving average over Arrables month 1 2 3 4 5 sales 100 120 140 130 3-avg 100 110 120 133 136 SELECT month,avgs(3,sales) FROM Sales ASSUMING ORDER month Arrable: a collection of named arrays, ordered by a column list Each query defines data ordering

Moving average over Arrables month 1 2 3 4 5 sales 100 120 140 130 3-avg 100 110 120 133 136 SELECT month,avgs(3,sales) FROM Sales ASSUMING ORDER month Arrable: a collection of named arrays, ordered by a column list Each query defines data ordering Variables (e.g., month) are bound to an array, as opposed to a value

Moving average over Arrables month 1 2 3 4 5 sales 100 120 140 130 3-avg 100 110 120 133 136 SELECT month,avgs(3,sales) FROM Sales ASSUMING ORDER month Arrable: a collection of named arrays, ordered by a column list Each query defines data ordering Variables (e.g., month) are bound to an array, as opposed to a value Expression are mappings from arrays to array

avgs, prds, sums, mins, deltas, ratios, reverse, Built-in Functions size- preserving non size- preserving prev, next avgs, prds, sums, mins, deltas, ratios, reverse, … drop, first, last order- dependent rank, n-tile min, max, avg, count non order- dependent

Emotive Query Find the best profit one could make by buying a stock and selling it later in the same day price 15 19 16 17 15 13 5 8 7 13 11 14 10 5 2 5

Emotive Query Find the best profit one could make by buying a stock and selling it later in the same day price 15 19 16 17 15 13 5 8 7 13 11 14 10 5 2 5 mins(price)15 15 15 15 15 13 5 5 5 5 5 5 5 5 2 2 0 4 1 2 0 0 0 3 2 8 6 9 0 0 0 3

Best-profit Query Comparison [AQuery] SELECT max(price–mins(price)) FROM ticks ASSUMING timestamp WHERE ID=“S” AND tradeDate=‘1/10/03' [SQL:1999] SELECT max(rdif) FROM (SELECT ID,tradeDate, price - min(price) OVER (PARTITION BY ID, tradeDate ORDER BY timestamp ROWS UNBOUNDED PRECEDING) AS rdif FROM Ticks ) AS t1 WHERE ID=“S” AND tradeDate=‘1/10/03' Optimizer doesn’t push this selection. To get good performance, the query author has to rewrite it.

Best-profit Query Performance

Complex queries: Network Management Query Revisited Create a log of flow information. A flow from src to dest ends after a 2-minutes silence Packets pID ... src ... dest ... len ... time ... SELECT src, dst, count(*), avg(len) FROM Packets ASSUMING ORDER src, dst, time GROUP BY src, dst, sums (deltas(time) > 120)

Network Management Query in Pictures Packets src s1 s2 dst s2 s1 len 250 270 330 235 280 305 time 1 20 47 141 150 155 SELECT src, dst, count(*), avg(len) FROM Packets ASSUMING ORDER src, dst, time GROUP BY src, dst, sums (deltas(time) > 120)

Network Management Query in Pictures Packets src s1 s2 dst s2 s1 len 250 270 330 235 280 305 time 1 20 47 141 150 155 Packets src s1 s2 dst s2 s1 len 250 270 235 330 280 305 time 1 20 141 47 150 155 SELECT src, dst, count(*), avg(len) FROM Packets ASSUMING ORDER src, dst, time GROUP BY src, dst, sums (deltas(time) > 120)

Network Management Query in Pictures Packets src s1 s2 dst s2 s1 len 250 270 235 330 280 305 time 1 20 141 47 150 155 c1 F T SELECT src, dst, count(*), avg(len) FROM Packets ASSUMING ORDER src, dst, time GROUP BY src, dst, sums (deltas(time) > 120)

Network Management Query in Pictures Packets src s1 s2 dst s2 s1 len 250 270 235 330 280 305 time 1 20 141 47 150 155 c1 F T c2 1 SELECT src, dst, count(*), avg(len) FROM Packets ASSUMING ORDER src, dst, time GROUP BY src, dst, sums (deltas(time) > 120)

Network Management Query in Pictures Packets src s1 s2 dst s2 s1 len 250 270 235 330 280 305 time 1 20 141 47 150 155 c1 F T c2 1 SELECT src, dst, count(*), avg(len) FROM Packets ASSUMING ORDER src, dst, time GROUP BY src, dst, sums (deltas(time) > 120)

Network Management Query in Pictures Packets src s1 s2 dst s2 s1 len 250 270 235 330 280 305 time 1 20 141 47 150 155 src s1 s2 dst s2 s1 len 250,270 235 330,280,305 time 1,20 141 47,150,155 SELECT src, dst, count(*), avg(len) FROM Packets ASSUMING ORDER src, dst, time GROUP BY src, dst, sums (deltas(time) > 120)

Network Management Query in Pictures Packets src s1 s2 dst s2 s1 len 250 270 235 330 280 305 time 1 20 141 47 150 155 src s1 s2 dst s2 s1 len 250,270 235 330,280,305 time 1,20 141 47,150,155 src s1 s2 dst s2 s1 avg(len) 260 235 305 count(*) 2 1 3 SELECT src, dst, count(*), avg(len) FROM Packets ASSUMING ORDER src, dst, time GROUP BY src, dst, sums (deltas(time) > 120)

Network Management Query Performance

Order-aware Query Languages Relations, Sequences, and ordered-relations SQL:1999 Sequin (Seshadri et al., 96) SRQL (Ramakrishnan et al., 98) Grouping in SQL (Chatziantoniou and Ross, 96) Array query languages AQL (Libkin et al., 96) AML (Marathe and Salem, 97) RaSQL (Widmann and Baumann, 98) KSQL (KX Systems) – out direct ancestor

Order-related Optimization Techniques Starburst’s “glue” (Lohman 88) and Exodus/Volcano “Enforcers” (Graefe and McKeena, 93) DB2 Order optimization (Simmens et al., 96) Top-k query optimization (Carey and Kossman, 97; Bruno,Chaudhuri, and Gravano 02) Hash-based order-preserving join (Claussen et al., 01) Temporal query optimization addressing order and duplicates (Slivinskas et al., 01)

AQuery Optimization Optimization is cost based The main strategies are: Define the extent of the order-preserving region of the plan, considering (correctness, obviously, and) the performance of variation of operators Exploit algebraic equivalences Apply efficient implementations of patterns of operators (e.g. “edge-by”)

Interchange sorting + order preserving operators SELECT ts.ID, avgs(10, hq.ClosePrice) FROM TradedStocks AS ts NATURAL JOIN HistoricQuotes AS hq ASSUMING ORDER hq.TradeDate GROUP BY Id    avgs() avgs() avgs() sort gby  gby avgs() gby sort gby sort (1) Sort then join preserving order (2) Preserve existing order (3) Join then sort before grouping (4) Join then sort after grouping

Performance depends on size

Last price for a name query SELECT last(price) FROM ticks t,base b ASSUMING ORDER name,timestamp WHERE t.ID=b.ID AND name=“x”  last(price)  Name=“x” sort name,timesamp Ticks ID ... date ... price ... time ... ID base ID ... name ... ticks base

Last price for a name query The sort on name can be eliminated because there will be only one name Then, push sort sortA(r1 r2) A sortA(r1) lop r2 sortA(r) A r  last(price) sort name,timesamp ID ticks  Name=“x” base

Last price for a name query  price The projection is carrying an implicit selection: last(price) = price[n], where n is the last index of the price array f(r.col[i])(r) order(r) f(r.col)(pos()=i(r))  last(price)  pos()=last lop ID ticks  Name=“x” base

Last price for a name query But why join the entire relation if we are only using the last tuple? Can we somehow push the last selection down the join?  price  pos()=last lop ID ticks  Name=“x” base

Last price for a name query We can take the last position of each ID on ticks to reduce cardinality, but we need to group by ticks.ID first But trading a join for a group by is usually a good deal?! One more step: make this an “edge by”  price  safety lop ID  each  pos()=last Name=“x” Gby base ID ticks

Performance

http://www.cs.nyu.edu/~lerner (for additional references) Conclusion AQuery declaratively incorporates order in a per-query basis Any clause can rely on order; expressions can be order-dependent Optimization possibilities are vast; performance improvements of an order of magnitude Applications to Finance, Biology, Linguistics, ... http://www.cs.nyu.edu/~lerner (for additional references)