1 Continuous Query Languages for DSMS CS240B Notes by Carlo Zaniolo.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.
What is a Database By: Cristian Dubon.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 12, Part A.
1 Continuous Query Languages (CQL) Blocking Operators and the expressive power problem Carlo Zaniolo UCLA CSD Spring 2009.
1 Efficient Temporal Coalescing Query Support in Relational Database Systems Xin Zhou 1, Carlo Zaniolo 1, Fusheng Wang 2 1 UCLA, 2 Simens Corporate Research.
Maintaining Variance over Data Stream Windows Brian Babcock, Mayur Datar, Rajeev Motwani, Liadan O ’ Callaghan, Stanford University ACM Symp. on Principles.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations Presenter: Liyan Zhang Presentation of ICS
1 Continuous Query Languages for DSMS CS240B Notes by Carlo Zaniolo.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
ATLaS: A Complete Database Language for Streams Carlo Zaniolo, Haixun Wang Richard Luo,Jan-Nei Law et al. Documentation and software downloads:
Database Systems More SQL Database Design -- More SQL1.
Concepts of Database Management Sixth Edition
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Avoiding Idle Waiting in the execution of Continuous Queries Carlo Zaniolo CSD CS240B Notes April 2008.
SMM: A Data Stream Management System for Knowledge Discovery 1 Hetal Thakkar, Nikolay Laptev, Hamid Mousavi, Barzan Mozafari, Vincenzo Russo, Carlo Zaniolo.
NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison.
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Using Special Operators (LIKE and IN)
Concepts of Database Management Seventh Edition
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
1 Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565.
Data Streams: Lecture 101 Window Aggregates in NiagaraST Kristin Tufte, Jin Li Thanks to the NiagaraST PSU.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13.
C-Store: Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar 27, 2009.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
1 Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter Tucker This work.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Blocking, Monotonicity, and Turing Completeness in a Database Language for Sequences and Streams Yan-Nei Law, Haixun Wang, Carlo Zaniolo 12/06/2002.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
Patterns in Sequences and Data Streams
COMP3211 Advanced Databases
Continuous Query Languages for DSMS
Continuous Query Languages (CQL) Blocking Operators and the expressive power problem Carlo Zaniolo UCLA CSD 2017.
Load Shedding CS240B notes.
Relational Algebra Chapter 4, Part A
Evaluation of Relational Operations
Selected Topics: External Sorting, Join Algorithms, …
CS240B, Winter 2017 Task 2.1:  Using a syntax based on that of notes and the two references above, express a user-defined aggregate d_count to perform.
CS240B: Assignment1 Winter 2016.
Continuous Query Languages (CQL) Blocking Operators and the expressive power problem Carlo Zaniolo UCLA CSD 2017.
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
UCLA, Fall CS240B Midterm Your Name: and your ID:
CS240B—Fall 2018 Task 4.1.  Express the Flajolet-Martin's distinct_count sketch as a user-defined aggregate mamed dcount_sketch, to be called in the same.
Evaluation of Relational Operations: Other Techniques
Continuous Query Languages for DSMS
CS240B, Spring 2014 Task 2.2:  Using a syntax based on that of notes and reference 3 above, express a user-defined aggregate d_count to perform the exact.
Idle Waiting for slides
Continuous Query Languages for DSMS
Load Shedding CS240B notes.
Approximation and Load Shedding Sampling Methods
Presentation transcript:

1 Continuous Query Languages for DSMS CS240B Notes by Carlo Zaniolo

2 DSMS Languages for Continuous Queries zRelational Algebra Operators zSQL and User-Defined aggregates yThe Blocking problem yThe expressive Power problem zXML streams and their query languages.

3 CQLs for DSMS zMost of DSMS projects use SQL for continuous queries—for good reasons, since yMany applications span data streams and DB tables yA CQL based on SQL will be easier to learn & use yMoreover: the fewer the differences the better! zBut DSMS were designed for persistent data and transient queries---not for persistent queries on transient data zAdaptation of SQL and its enabling technology presents many research challenges zLack of expressive power—even worse now since only nonblocking operators are allowed.

4 Continuous Query Graph: many components — arbitrary DAGs Source σ ∑1∑1 Sink ∑2∑2 Source Sink O2O2 O3O3 O1O1 Source1  U Source2 σ Sink Source1  U Source2 σ ∑1∑1 Sink ∑2∑2

5 Relational Algebra Operators Stored data zSelection, Projection zUnion zJoin (including X) on tables zSet Difference zAggregates: yTraditional Blocking aggregates yOLAP functions on windows or unlimited preceding Data Streams z... same zUnion by Sort-Merging on timestamps zJoin of Stream with table zWindow joins on streams ( timestamps merged into 1 column) zNo stream difference (blocking—diff of stream with table OK). zAggregates: yNo blocking aggregate yOLAP functions on windows or unlimited preceding ySlides, and tumbles.

6 Bolts and Nuts create stream bids(bid#, item, offer, Time) create stream mybids as (select bid#, offer, Time from bids where item=bolt union select bid#, offer, Time from bids where item=nut) Result same as: select bid#, offer, Time where item= bolt or item=nut

7 Joins We could create a stream called interesting bids by say joining bids with the ‘interesting_items’ table. We next find the bolt bids for which there was a nut bid offered in the last 5 minutes for the same price. create stream selfjoinbids as (select S1.bid#, S1.offer, S2.bid#, S2.Time from bids as S1, bids as S2 [window of 5 minutes] where S1.item=bolt and S2.item=nut and S1.offer=S2.offer) The window condition implies that S1.Time >= S2.Time and S2.Time >= S1.Time-5 minutes. Windows on both streams are used very often. \

8 Processing Unions Union: When tuples are present at all inputs, select one with minimal timestamp and Production: add this tuple to the output, and Consumption: remove it from the input. Source1  U Source2 σ Sink σ

9 Window Joins Window Join of Stream A and Stream B: When tuples are present at both inputs, and the timestamp of A is less or equal than that of B, then perform the following operations (symmetric operations are performed if timestamp of B is less or equal than that of A): Production: compute the join of the tuple in A with the tuples in W(B) and add the resulting tuples to output buffer (these tuple have the same timestamp a the tuple in A) Consumption: the current tuple in A is removed from the input and added to the window buffer W(A) (from which the expired tuples are also removed) SourceA  join SourceB σ Sink σ A B

10 Relational Algebra Operators Stored data zSelection, Projection zUnion zJoin (including X) on tables zSet Difference zAggregates: yTraditional Blocking aggregates yOLAP functions on windows or unlimited preceding Data Streams z... same zUnion by Sort-Merging on timestamps zJoin of Stream with table zWindow joins on streams ( timestamps merged into 1 column) zNo stream difference (blocking—diff of stream with table OK). zAggregates: yNo blocking aggregate yOLAP functions on windows or unlimited preceding ySlides, and tumbles. yIncluding UDAs

11 User-Defined Aggregates: Max Power via Min SQL Extensions zWindows (logical, physical, slides, tumbles,…): flexible synopses that solve the blocking problem for aggregates y DSMS only support these constructs on built-in aggregates yESL is the first to support the complete integration of these two zUser Defined Aggregates (UDAs) —the key to power and extensibility, and yAnd thus can support data mining, yXML, ysequences not supported by other DSMS zOne framework for aggregates and windows, whether they are built-ins or user-defined, and independent on the language used to define them.

12 Defining Traditional Aggregates zSpecification consists of 3 blocks of code--- Written in an external PL (as DBMS and other DSMS do), or zIn SQL itself (SQL becomesTuring Complete!) yINITIALIZE xExecuted upon the arrival of the first tuple yITERATE xExecuted upon the arrival of each subsequent tuples (an incremental computation suitable for streams) yTERMINATE xExecuted after the end of the relation/stream has been reached  Invocation: SELECT myavg(start_price) FROM OpenAuction

13 The UDA AVG in SQL AGGREGATE avg(Next Int) : Real {TABLE state(tsum Int, cnt Int); INITIALIZE : { INSERT INTO state VALUES (Next, 1); } ITERATE : { UPDATE state SET tsum=tsum+Next, cnt=cnt+1; } TERMINATE : { INSERT INTO RETURN SELECT tsum/cnt FROM state; }  “INSERT INTO RETURN” in TERMINATE  a blocking UDA

14 NonBlocking UDA: AVG of last 200 Values AGGREGATE myavg(Next Int) : Real {TABLE state(tsum Int, cnt Int); INITIALIZE : { INSERT INTO state VALUES (Next, 1); } ITERATE : { UPDATE state SET tsum=tsum+Next, cnt=cnt+1; INSERT INTO RETURN SELECT tsum/cnt FROM state WHERE cnt %200 =0; UPDATE state SET tsum=Next, cnt=1 WHERE cnt %200 =1 } TERMINATE : { } }  Empty TERMINATE Denotes a non-blocking UDA

15 UDAs in ESL zIn ESL user-defined Aggregates (UDAs) can be defined directly in SQL, rather than in a PL yNative extensibility in SQL via UDAs (which can also be defined in a PL for better performance) yNo impedance mismatch yAccess to DB tables from UDAs yData Independence and optimization yGood ease of use and performance yTuring completeness & nb-completeness.

16 Data Intensive Applications & UDAs zComplex Applications can expressed concisely, with good performance zATLAS: a single-user DBMS developed at UCLA. ySupport for SQL with UDAs yOn top of Berkeley-DB record manager. zData Mining Algorithms in ATLAS yDecision Tree Classifiers: 18 lines of codes yAPriori: 40 lines of codes yModest overhead: <50% w.r.t procedural UDA y Data Stream Applications in ESL yData Stream Mining, approximate aggregates, sketches, histograms, …

17 SQL:2003 OLAP Functions Aggregates on Windows CREATE STREAM LastTenAvg SELECT sellerID, AVG(price) OVER(PARTITION BY sellerID ROWS 9 PRECEDING), Current_time FROM ClosedPrice; CREATE STREAM ClosedAuction (/*auction closings */ itemID, /*id of the item in this auction.*/ buyerID /*buyer of this item.*/) Final price real /*final price of the item */, Current_time) order by … source … Auctions  For each seller, show the average selling price over the last 10 items sold (physical window)

18 Optimizing Window AVG in ESL WINDOW AGGREGATE avg(Next Real) : Real { TABLE state(tsum Int, cnt Real); TABLE inwindow(wnext Real); INITIALIZE : { INSERT INTO state VALUES (Next, 1)} ITERATE : { UPDATE state SET tsum=tsum+Next, cnt=cnt+1; INSERT INTO RETURN SELECT tsum/cnt FROM state} EXPIRE: { /*if there are expired tuples, take the oldest */ UPDATE state SET cnt= cnt-1, tsum = tsum – (select wnext FROM inwindow WHERE oldest(inwindow)) } } For each expired tuple decrease the count by one and the sum by the expired value—works for logical & physical windows

19 MAX ySystem maintains inwindow yRemove dominated (less & older) values yThe oldest is always the max. WINDOW AGGREGATE max (Next Real) : Real {TABLE inwindow(wnext real); INITIALIZE : { etc.} /*system adds new tuples to inwindow*/ ITERATE : { DELETE FROM inwindow WHERE wnext <Next; INSERT INTO RETURN SELECT wnext FROM inwindow WHERE oldest(inwindow) } EXPIRE: { } /*expired tuples removed automatically*/ }

20 For Each Aggregate two versions zThe traditional Base aggregate with terminate zThe Window aggregate with inwindow and expire. zThese definitions will take care of both logical and physical windows. zBut there are more complications: slides and tumbles.

21 Slides and Tumbles CREATE STREAM LastTenAvg SELECT sellerID, max(price) OVER(RANGE 10 MINUTE PRECEDING SLIDE 2 MINUTE), Current_time FROM ClosedPrice;  Every two minutes, show the average selling price over the last 10 minutes (logical window) Here the window is W=10 and the slide is S=2. Tumble: When S ≥ W

22 SLIDEs zThe slide constructs divides a window into panes, results only returned at the end of each pane zSlide is conducive to optimization. yCombine summaries into the desired aggregation yE.g.: MAX(1, 2, 3, 4)= MAX(MAX(1,2), MAX(3,4)) = 4 I.e., for MAX, we can perform MAX on subsets of numbers as local summaries, then combine them together to get the true MAX yProposed before: but what constructs should be used to integrate these concepts into the language? window slide/pane window Summary Tuples

23 Slides &Tumbles--Examples zTumble – where the SLIDE size is equal or larger than the window size yE.g. Once every 50 tuples, compute and return average over the last 10 tuples yEasy to optimize xSkip the first 40 tuples of every 50 tuples, and compute the blocking base version of the aggregate on the last 10 zSlide – where slide size is smaller than the window size yE.g. Once every 10 tuples, compute and return average over the last 50 tuples yNaïve implementation--not optimized xPerform incremental maintenance on every incoming tuple xIgnore RETURN statements for most incoming tuples xOnly invoke RETURN once every 10 tuples

24 Pane-based SLIDE Optimization zTwo-level cascading aggregates using two existing aggregates yPerform sub-aggregation inside each pane using the base aggregate No need for incremental maintenance here xComputed with a blocking aggregate once for each pane yCombine the summary tuples using the window aggregate that returns on every incoming tuple (non-blocking) xWith incremental maintenance here xAt any time, only the last un-finished pane needs to store data tuples xall finished panes are reduced to one reusable summary tuple window Agg1 (base) window Agg2 (window)

25 Pane-based SLIDE optimization Example: SUM with window size 50 tuples, and slide size 10 tuples First create a stream of summary tuples using base aggregate CREATE STREAM temp AS ( SELECT itemID, base_max(sale_price) as s OVER(PARTITION BY itemID ROWS 9 PRECEDING SLIDE 10) FROM Auction); Then apply the window version of the aggregate SELECT itemID, window_max(s) OVER(PARTITION BY itemID ROWS 4 PRECEDING) FROM temp; This simple approach can be used to implement very complex aggregations (e.g. ensemble classifiers) Applies uniformly to logical/physical windows defined in SQL or in an external language

26 Summary z{ Logical, Physical} x {tumble, slide, unlimited_preceding} Six different types of calls, supported by two definitions zBoth SQL or procedural languages can be used in the definition.

27 Window UDAs vs. Base UDAs zBase UDAs: ycalled as traditional SQL-2 aggregates, with yoptional GROUP BY zWindow UDAs: ycalled with SQL:2003 OVER clause ylogical or physical windows yoptional PARTITION BY and SLIDE clauses in ESL zClear semantics and optimization rules unify : yUDAs—SQL or PL-defined, algebraic or not … y window (logical & physical), slice, tumbles, etc. ySystem and user roles in optimization.

28 Window UDAs: Physical Optimization zThe Stream Mill System provides efficient support for: yManagement of new & expiring tuples in buffer yMain memory & intelligent paging into disk yEvents caused by tuple expiration  Users can access the buffer as the table called inwindow

29 Conclusion zLanguage Technology: yESL a very powerful language for data stream and DB applications ySimple semantics and unified syntax conforming to SQL:2003 standards yStrong case for the DB-oriented approach to data streams zSystem Technology: ySome performance-oriented techniques well-developed— e.g., buffer management for windows yFor others: work is still in progress—stay tuned for latest news  Stream Mill is up and running:

30 ********* The End THANK YOU ! *****

31 References [1]ATLaS user manual. [2]SQL/LPP: A Time Series Extension of SQL Based on Limited Patience Patterns, volume 1677 of Lecture Notes in Computer Science. Springer, [4]A. Arasu, S. Babu, and J. Widom. An abstract semantics and concrete language for continuous queries over streams and relations. Technical report, Stanford University, [5]B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In PODS, [9]D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. Zdonik. Monitoring streams - a new class of data management applications. In VLDB, Hong Kong, China, [10]J. Celko. SQL for Smarties, chapter Advanced SQL Programming. Morgan Kaufmann, [11]S. Chandrasekaran and M. Franklin. Streaming queries over streaming data. In VLDB, [12]J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A scalable continuous query system for internet databases. In SIGMOD, pages , May [13]C. Cranor, Y. Gao, T. Johnson, V. Shkapenyuk, and O. Spatscheck. Gigascope: A stream database for network applications. In SIGMOD Conference, pages ACM Press, [14]Lukasz Golab and M. Tamer Özsu. Issues in data stream management. ACM SIGMOD Record, 32(2):5-14, [15]J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online aggregation. In SIGMOD, [16] Yijian Bai, Hetal Thakkar, Chang Luo, Haixun Wang, Carlo Zaniolo: A Data Stream Language and System Designed for Power and Extensibility. Proc. of the ACM 15th Conference on Information and Knowledge Management (CIKM'06), 2006 [17] Yijian Bai, Hetal Thakkar, Haixun Wang and Carlo Zaniolo: Optimizing Timestamp Management in Data Stream Management Systems. ICDE 2007.

32 References (Cont.) [18] Yan-Nei Law, Haixun Wang, Carlo Zaniolo: Query Languages and Data Models for Database Sequences and Data Streams. VLDB 2004: [19] Sam Madden, Mehul A. Shah, Joseph M. Hellerstein, and Vijayshankar Raman. Continuously adaptive continuous queries over streams. In SIGMOD, pages 49-61, [20]R. Motwani, J. Widom, A. Arasu, B. Babcock, M. Datar S. Babu, G. Manku, C. Olston, J. Rosenstein, and R. Varma. Query processing, approximation, and resource management in a data stream management system. In First CIDR 2003 Conference, Asilomar, CA, [21]R. Ramakrishnan, D. Donjerkovic, A. Ranganathan, K. Beyer, and M. Krishnaprasad. SRQL: Sorted relational query language, [23]Reza Sadri, Carlo Zaniolo, and Amir M. Zarkesh andJafar Adibi. A sequential pattern query language for supporting instant data minining for e-services. In VLDB, pages , [24]Reza Sadri, Carlo Zaniolo, Amir Zarkesh, and Jafar Adibi. Optimization of sequence queries in database systems. In PODS, Santa Barbara, CA, May [25]P. Seshadri. Predator: A resource for database research. SIGMOD Record, 27(1):16-20, [26]P. Seshadri, M. Livny, and R. Ramakrishnan. SEQ: A model for sequence databases. In ICDE, pages , Taipei, Taiwan, March [27]Praveen Seshadri, Miron Livny, and Raghu Ramakrishnan. Sequence query processing. In ACM SIGMOD 1994, pages ACM Press, [28]M. Sullivan. Tribeca: A stream database manager for network traffic analysis. In VLDB, [29]D. Terry, D. Goldberg, D. Nichols, and B. Oki. Continuous queries over append-only databases. In SIGMOD, pages , [30]Peter A. Tucker, David Maier, Tim Sheard, and Leonidas Fegaras. Exploiting punctuation semantics in continuous data streams. IEEE Trans. Knowl. Data Eng, 15(3): , [31]Haixun Wang and Carlo Zaniolo. ATLaS: a native extension of SQL for data minining. In Proceedings of Third SIAM Int. Conference on Data MIning, pages , 2003.