BUILDING A DATABASE SYSTEM FOR ORDER New England Database Seminars April 2002 Alberto Lerner – ENST Paris Dennis Shasha – NYU

Slides:



Advertisements
Similar presentations
Optimal Top-k Generation of Attribute Combinations based on Ranked Lists Jiaheng Lu, Renmin University of China Joint work with Pierre Senellart, Chunbin.
Advertisements

High Performance Discovery from Time Series Streams
SQL Database for a Book Store Clinton McKay. Explanation The database contains information about the books held in stock, their authors, publishers, customers,
Query Methods (SQL). What is SQL A programming language for databases. SQL (structured Query Language) It allows you add, edit, delete and run queries.
Review Session Monday, Oct 8 Shipra Agrawal. Announcements New Gradiance assignment deadline Wednesday, Oct 10 Please read FAQs for assignments.
Anna Atramentov and Vasant Honavar* Artificial Intelligence Laboratory Department of Computer Science Iowa State University Ames, IA 50011, USA
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Scalable Mining For Classification Rules in Relational Databases מוצג ע ” י : נדב גרוסאוג Min Wang Bala Iyer Jeffrey Scott Vitter.
Copyright © 2004 Pearson Education, Inc.. Chapter 15 Algorithms for Query Processing and Optimization.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 15-1 Query Processing and.
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
SQL Sub (or Nested ) Query. Examples Q: Find students whose GPA is below the average. –The criteria itself requires a SQL statement. –SELECT * FROM student.
Data Warehousing/Mining 1 Data Warehousing/Mining Comp 150 Aggregation in SQL (not in book) Instructor: Dan Hebert.
Michael Armbrust A Functional Query Optimization Framework.
Homework for November 2011 Nikolay Kostov Telerik Corporation
CS 4432lecture #11 - indexing & hashing1 CS4432: Database Systems II Lecture #11 Professor Elke A. Rundensteiner.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
1 Distributed Databases Chapter What is a Distributed Database? Database whose relations reside on different sites Database some of whose relations.
Sensor Data Management with Model-based View LSIR, EPFL.
Interpreting SQL Code. SQL (The language used to query a database) S is used to specify the you want to include. F is used to specify the the selected.
AN INTRODUCTION TO EXECUTION PLAN OF QUERIES These slides have been adapted from a presentation originally made by ORACLE. The full set of original slides.
Seminar #3 CM036: Advanced Databases1 Seminar 4: Relational Algebra and its Simulation using SQL Purpose To understand how the relational operations are.
Warehouse Activity Profiling
What is SQL and Who uses it? Presented by: John Deardurff Global McOWL Internal Sales Training October 24, 2014.
CS609 Introduction. Databases Current state? Future?
Aquery: A DATABASE SYSTEM FOR ORDER Dennis Shasha, joint work with Alberto Lerner
Creating a Table Create a table, “emp”, containing: –empno – a 4 digit employee number –ename – up to 10 character string –job – up to 9 character string.
THE INFORMATION ARCHITECTURE OF THE ORGANIZATION MIS2502 Data Analytics.
Copyright © 2004 Pearson Education, Inc.. Chapter 15 Algorithms for Query Processing and Optimization.
© Dennis Shasha, Alberto Lerner, Philippe Bonnet 2004 DBMS Performance Monitoring.
MIS2502: Data Analytics The Information Architecture of an Organization.
Programming in R SQL in R. Running SQL in R In this session I will show you how to: Run basic SQL commands within R.
SQL Select Statement IST359.
CS4432: Database Systems II Query Processing- Part 2.
1/18/00CSE 711 data mining1 What is SQL? Query language for structural databases (esp. RDB) Structured Query Language Originated from Sequel 2 by Chamberlin.
SQL-5 In-Class Exercise Answer IST 210 Organization of Data IST2101.
Agenda for today A brief discussion of EDBT/ICDT Planning for project presentations Looking back at past papers.
The Student Registry Database Ian Van Houdt and Muna alfahad.
What Do You Do With Data? Gather Store Retrieve Interpret.
CSE 303 Course Outline (Part 2) Text Book: Database System Concepts 6 th Edition by Abraham Silberschatz, Henry F. Korth and S. Sudarshan.
Sofia, Bulgaria | 9-10 October SQL Querying Tips & Techniques Richard Campbell.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Seminar #6 CG096 Advanced Database Technologies1 Advanced Databases Seminar 6: Implementing Relational Algebra Data Model using SQL.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP,MCP. SQL SERVER Database Administration.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP SQL SERVER Database Administration.
1 Chengkai Li Kevin-Chen-Chuan Chang Ihab Ilyas Sumin Song Presented by: Mariam John CSE /20/2006 RankSQL: Query Algebra and Optimization for Relational.
Closing the Query Processing Loop in Oracle 11g Allison Lee, Mohamed Zait.
CSS Microsoft Korea. Data Collector Management Data Warehouse Performance and Configuration Reports Graphical Showplan Activity Monitor SQL Profiler Dynamic.
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
Jaclyn Hansberry MIS2502: Data Analytics The Things You Can Do With Data The Information Architecture of an Organization Jaclyn.
CS3220 Web and Internet Programming More SQL
JPEG Compressed Image Retrieval via Statistical Features
Big Data Analytics in Parallel Systems
DATABASE SQL= Structure Query Language مبادئ قواعد بيانات
Query Sampling in DB2.
الفصل الثاني الصيغة العامة لجمله select*
Affiliation of presenter
Database Vs. Data Warehouse
מדינת ישראל הוועדה לאנרגיה אטומית
Optimizing Recursive Queries in SQL
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Query Sampling in DB2.
MIS2502: Data Analytics The Information Architecture of an Organization David Schuff
MIS2502: Data Analytics The Information Architecture of an Organization Acknowledgement: David Schuff.
DATABASE SQL= Structure Query Language مبادئ قواعد بيانات
Query Functions.
Query Optimization.
OLAP Functions Order-Dependent Aggregates and Windows in SQL: SQL: same as SQL:1999.
Topic - DML - Select statement
Practice Project Practice to know SQL
Presentation transcript:

BUILDING A DATABASE SYSTEM FOR ORDER New England Database Seminars April 2002 Alberto Lerner – ENST Paris Dennis Shasha – NYU

NEDS April 2002 – Lerner and Shasha Agenda Motivation SQL + Order Transformations Conclusion

NEDS April 2002 – Lerner and Shasha Motivation The need for ordered data Some queries rely on order Examples: Moving averages Top N Rank “SQL can handle it.” Can it really?

NEDS April 2002 – Lerner and Shasha Motivation Moving Averages: algorithmically linear Sales(month, total) SELECT t1.month+1 AS forecastMonth, (t1.total+ t2.total + t3.total)/3 AS 3MonthMovingAverage FROM Sales AS t1, Sales AS t2, Sales AS t3 WHERE t1.month = t2.month - 1 AND t1.month = t3.month – 2 Can optimizer make a 3-way (in general, n-way) join linear time? Ref: Data Mining and Statistical Analysis Using SQL Trueblood and Lovett Apress, 2001

NEDS April 2002 – Lerner and Shasha Motivation Top N Employee(Id, salary) SELECT DISTINCT count(*), t1.salary FROM Employee AS t1, Employee AS t2 WHERE t1.salary <= t2.salary GROUP BY t1.salary HAVING count(*) <= N How many elements of cross-product have salaries at least as large as t1.salary? Will optimizer see essential sort-count trick? Ref: SQL for Smarties Joe Celko Morgan Kauffman, 1995

NEDS April 2002 – Lerner and Shasha Motivation Problems Extending SQL with Order Queries are hard to read Cost of execution is often non-linear (would not pass basic algorithms course) Few operators preserve order, so optimization hard.

NEDS April 2002 – Lerner and Shasha Agenda Motivation SQL + Order Transformations Conclusion

NEDS April 2002 – Lerner and Shasha SQL + Order Desirable Features Express order-dependent predicates and clauses in a readable, clear way Make optimization opportunities explicit (by getting rid of complex idioms, see above) Execution in linear (or n log n) time when possible

NEDS April 2002 – Lerner and Shasha SQL + Order three steps in solution 1. Give SQL a vector-oriented semantics – Database is a set of array-tables “arrables”; variables in the queries do not refer to a single tuple at a time anymore, but to a whole column vector 2. Provide new vector-to-vector functions – Supporting order-based manipulations of column vectors 3. Streaming: new data may need special treatment.

NEDS April 2002 – Lerner and Shasha SQL + Order Moving Averages Sales(month, total) SELECT month, avgs(8, total) FROM Sales ASSUMING ORDER month Execution (Sales is an arrable): 1.FROM clause – enforces the order in ASSUMING clause 2.SELECT clause – for each month yields the moving average (window size 8) ending at that month. No 8-way join. avgs: vector-to-vector function, order-dependant and size-preserving order to be used on vector- to-vector functions

NEDS April 2002 – Lerner and Shasha SQL + Order Top N Employee(ID, salary) SELECT first(N, salary) FROM Employee ASSUMING ORDER Salary first: vector-to-vector function, order-dependant and non size-preserving Execution: 1.FROM clause – orders arrable by Salary 2.SELECT clause – applies first() to the ‘salary’ vector, yielding first N values of that vector given the order. Could get the top earning IDs by saying first(N, ID).

NEDS April 2002 – Lerner and Shasha SQL + Order Ranking SalesReport(salesPerson, territory, total) SELECT territory, salesPerson, total, rank(total) FROM SalesReport WHERE rank(total) < N rank: vector-to-vector function, non order-dependant and size- preserving Execution: 1.FROM clause – assuming is NOT needed. 2.rank is applied to the ‘total’ vector and maps each position into an integer.

NEDS April 2002 – Lerner and Shasha SQL + Order Vector-to-Vector Functions prev, next, $, [] avgs(*), prds(*), sums(*), deltas(*), ratios(*), reverse, … drop, first, last order- dependant non order- dependant size- preserving non size- preserving rank, tilemin, max, avg, count

NEDS April 2002 – Lerner and Shasha SQL + Order Complex queries: Best spread In a given day, what would be the maximum difference between a buying and selling point of each security? Ticks(ID, price, tradeDate, timestamp, …) SELECT ID, max(price – mins(price)) FROM Ticks ASSUMING ORDER timestamp WHERE tradeDate = ‘99/99/99’ GROUP BY ID Execution: 1.For each security, compute the running minimum vector for price and then subtract from the price vector itself; result is a vector of spreads. 2.Note that max – min would overstate spread. max min best spread running min

NEDS April 2002 – Lerner and Shasha SQL + Order Complex queries: Crossing averages part I When does the 21-day average cross the 5-month average? Market(ID, closePrice, tradeDate, …) TradedStocks(ID, Exchange,…) INSERT INTO temp FROM SELECT ID, tradeDate, avgs(21 days, closePrice) AS a21, avgs(5 months, closePrice) AS a5, prev(avgs(21 days, closePrice)) AS pa21, prev(avgs(5 months, closePrice)) AS pa5 FROM TradedStocks NATURAL JOIN Market ASSUMING ORDER tradeDate GROUP BY ID

NEDS April 2002 – Lerner and Shasha SQL + Order Complex queries: Crossing averages part I Execution: 1.FROM clause – order-preserving join 2.GROUP BY clause – groups are defined based on the value of the Id column 3.SELECT clause – functions are applied; non-grouped columns become vector fields so that target cardinality is met. Violates first normal form  groups in ID and non-grouped column grouped ID and non-grouped column Vector field two columns with the same cardinality

NEDS April 2002 – Lerner and Shasha SQL + Order Complex queries: Crossing averages part II Get the result from the resulting non first normal form relation temp SELECT ID, tradeDate FROM flatten(temp) WHERE a21 > a5 AND pa21 <= pa5 Execution: 1.FROM clause – flatten transforms temp into a first normal form relation (for row r, every vector field in r MUST have the same cardinality). Could have been placed at end of previous query. 2.Standard query processing after that.

NEDS April 2002 – Lerner and Shasha SQL + Order Related Work: Research SEQUIN – Seshadri et al. Sequences are first-class objects Difficult to mix tables and sequences. SRQL – Ramakrishnan et al. Elegant algebra and language No work on transformations. SQL-TS – Sadri et al. Language for finding patterns in sequence But: Not everything is a pattern!

NEDS April 2002 – Lerner and Shasha SQL + Order Related Works: Products RISQL – Red Brick Some vector-to-vector, order-dependent, size- preserving functions Low-hanging fruit approach to language design. Analysis Functions – Oracle 9i Quite complete set of vector-to-vector functions But: Can only be used in the select clause; poor optimization (our preliminary study) KSQL – Kx Systems Arrable extension to SQL but syntactically incompatible. No cost-based optimization.

NEDS April 2002 – Lerner and Shasha Agenda Motivation SQL + Order Transformations Conclusion

NEDS April 2002 – Lerner and Shasha SELECT ts.ID, ts.Exchange, avgs(10, hq.ClosePrice) FROM TradedStocks AS ts NATURAL JOIN HistoricQuotes AS hq ASSUMING ORDER hq.TradeDate GROUP BY Id Transformations Early sorting + order preserving operators (1) Sort then join preserving order (2) Preserve existing order (3) Join then sort before grouping op sort g-by avgs op avgs g-by op avgs g-by op sort (4) Join then sort after grouping avgs g-by sort

NEDS April 2002 – Lerner and Shasha Transformations Early sorting + order preserving operators

NEDS April 2002 – Lerner and Shasha Transformations UDFs evaluation order Gene(geneId, seq) SELECT t1.geneId, t2.geneId, dist(t1.seq, t2.seq) FROM Gene AS t1, Gene AS t WHERE dist(t1.seq, t2.seq) < 5 AND posA(t1.seq, t2.seq) posA asks whether sequences have Nucleo A in same position. Dist gives edit distance between two Sequences. posA dist posA (2)(1) (3) Switch dynamically between (1) and (2) depending on the execution history

NEDS April 2002 – Lerner and Shasha Transformations UDFs Evaluation Order

NEDS April 2002 – Lerner and Shasha Transformations Order preserving joins select lineitem.orderid, avgs(10, lineitem.qty), lineitem.lineid from order, lineitem assuming order lineid where order.date > 45 and order.date < 55 and lineitem.orderid = order.orderid Basic strategy 1: restrict based on date. Create hash on order. Run through lineitem, performing the join and pulling out the qty. Basic strategy 2: Arrange for lineitem.orderid to be an index into order. Then restrict order based on date giving a bit vector. The bit vector, indexed by lineitem.orderid, gives the relevant lineitem rows. The relevant order rows are then fetched using the surviving lineitem.orderid. Strategy 2 is often 3-10 times faster.

NEDS April 2002 – Lerner and Shasha Transformations Building Blocks Order optimization Simmens et al. `96 – push-down sorts over joins, and combining and avoiding sorts Order preserving operators KSQL – joins on vector Claussen et al. `00 – OP hash-based join Push-down aggregating functions Chaudhuri and Shim `94, Yan and Larson `94 – evaluate aggregation before joins UDF evaluation Hellerstein and Stonebraker ’93 – evaluate UDF according to its ((output/input) – 1)/cost per tuple Porto et al. `00 – take correlation into account

NEDS April 2002 – Lerner and Shasha Agenda Motivation SQL + Order Transformations Conclusion

NEDS April 2002 – Lerner and Shasha Conclusion Arrable-based approach to ordered databases may be scary – dependency on order, vector-to- vector functions – but it’s expressive and fast. SQL extension that includes order is possible and reasonably simple. Optimization possibilities are vast.