STREAM The Stanford Data Stream Management System.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.
Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.
1 CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS. 2 Introduction - We discuss here two mathematical formalisms which can be used as the basis for stating and.
Maintaining Sliding Widow Skylines on Data Streams.
Relational Algebra Ch. 7.4 – 7.6 John Ortiz. Lecture 4Relational Algebra2 Relational Query Languages  Query languages: allow manipulation and retrieval.
Data Streams & Continuous Queries The Stanford STREAM Project stanfordstreamdatamanager.
Query Processing, Resource Management, and Approximation in a Data Stream Management System Jennifer Widom Stanford University stanfordstreamdatamanager.
Dr. Kalpakis CMSC 661, Principles of Database Systems Query Execution [15]
Query Execution Since our SQL queries are very high level the query processor does a lot of processing to supply all the details. An SQL query is translated.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
STREAM: The Stanford Data Stream Management System Rebuttal Team Mingzhu Wei Di Yang CS525s - Fall 2006.
COMP 451/651 Optimizing Performance
Windows in Niagara Jin (Jenny) Li, David Maier, Vassilis Papadimos, Peter Tucker, Kristin Tufte.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations Presenter: Liyan Zhang Presentation of ICS
Query Processing, Resource Management, and Approximation in a Data Stream Management System Selected subset of slides taken from talk by Jennifer Widom.
Query Execution 15.5 Two-pass Algorithms based on Hashing By Swathi Vegesna.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Stream Data Management System Prototypes Ying Sheng, Richard Sia June 1, 2004 Professor Carlo Zaniolo CS 240B Spring 2004.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Relational Algebra Basic Operations Algebra of Bags.
SQL. Basic Structure SQL is based on set and relational operations with certain modifications and enhancements A typical SQL query has the form: select.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
Query Processing, Resource Management, and Approximation in a Data Stream Management System.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
CSCE Database Systems Chapter 15: Query Execution 1.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
1 STREAM: The Stanford Data Stream Management System STanfordstREamdatAManager 陳盈君 吳哲維 林冠良.
1 Relational Algebra and Calculas Chapter 4, Part A.
Chapter 7: Relations Relations(7.1) Relations(7.1) n-any Relations & their Applications (7.2) n-any Relations & their Applications (7.2)
Advanced Relational Algebra & SQL (Part1 )
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
Database Management Systems, R. Ramakrishnan1 Relational Algebra Module 3, Lecture 1.
CS4432: Database Systems II Query Processing- Part 2.
 CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian.
Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.
Triggers and Streams Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 28, 2005.
1 Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter Tucker This work.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
1. Chapter 2: The relational Database Modeling Section 2.4: An algebraic Query Language Chapter 5: Algebraic and logical Query Languages Section 5.1:
©Silberschatz, Korth and Sudarshan2.1Database System Concepts - 6 th Edition Chapter 8: Relational Algebra.
Data Streams COMP3017 Advanced Databases Dr Nicholas Gibbins –
Basic Operations Algebra of Bags
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
COMP3211 Advanced Databases
Module 2: Intro to Relational Model
Relational Algebra - Part 1
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
Prepared by : Ankit Patel (226)
The Relational Algebra and Relational Calculus
Chapter 2: Intro to Relational Model
Relational Algebra Chapter 4, Part A
Chapter 15 QUERY EXECUTION.
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Database Management Systems (CS 564)
Chapter 6: Formal Relational Query Languages
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Basic Operations Algebra of Bags
Chapter 2: Intro to Relational Model
Chapter 2: Intro to Relational Model
Chapter 2: Intro to Relational Model
Example of a Relation attributes (or columns) tuples (or rows)
Chapter 2: Intro to Relational Model
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Lecture 11: B+ Trees and Query Execution
Lecture 2 Relational Database
CS 405G: Introduction to Database Systems
Presentation transcript:

STREAM The Stanford Data Stream Management System

Presentation Structure Introduction CQL: Continuous Query Language –Abstract Semantics –Data Types –Operators Query Plan & Execution

Introduction The system is designed for limited resource environments where streams may be rapid, and query loads may vary over time.

CQL: Continuous Query Language For simple continuous queries over streams, it can be sufficient to use a relational query language such as SQL. However, for more complex queries this can quickly become very unclear.

Abstract Semantics 2 Data Types: – Streams – Relations defined on a discrete, ordered time domain  3 Types of Operators

Streams A stream S is an unbounded bag (multiset) of pairs, where s is a tuple and t   is the timestamp that denotes the logical arrival time of tuple s on stream S.

Relations A relation R is a time-varying bag of tuples. The bag of tuples at time t is denoted R(t), and we call R(t) an instantaneous relation. Note that tuples in R(t) have no time-stamp.

Operator Diagram

Operator Classes A relation-to-relation operator takes one or more relations as input and produces a relation as output. A stream-to-relation operator takes a stream as input and produces a relation as output. A relation-to-stream operator takes a relation as input and produces a stream as output. Stream-to-stream operators are absent they are composed from operators of the above classes.

Query Structure A continuous query Q is a tree of operators belonging to the aforementioned classes. The inputs of Q are the streams and relations that are input to the leaf operators. The output of Q is the output of the root operator. The output is either a stream or a relation, depending on the class of the root operator.

Output Timestamp Since at time t, an operator of Q logically depends on its inputs up to t. The operator produces new outputs corresponding to t tuples of S with timestamp t if the output is a stream S, or instantaneous relation R(t) if the output is a relation R

Relation-to-Relation Operators CQL uses SQL constructs to express its relation-to-relation operators i.e. SELECT... FROM …

Class Operator Diagram Performs duplicate eliminationrelation-to-relationduplicate-eliminate Performs grouping and aggregationrelation-to-relationaggregate Antisemijoin of two input relationsrelation-to-relationantisemijoin Bag Intersectionrelation-to-relationintersect Bag Differencerelation-to-relationexcept Bag Unionrelation-to-relationunion Multiway join from [22]relation-to-relationmjoin Joins two input relationsrelation-to-relationbinary-join Duplicate-Preserving Projectionrelation-to-relationproject Filters elements based on predicate(s)relation-to-relationselect DescriptionOperator TypeName

Stream-to-Relation Operators Based on a sliding window principle. 3 Types of Windows: –Tuple-based window –Time-based window –Partitioned Widow

Tuple-based Window A tuple-based sliding window on a stream S takes an integer N > 0 as a parameter and produces a relation R. At time t, R(t)contains the N tuples of S with the largest timestamps < t. Example: R(14) [Rows 5]

Time-based window A time-based sliding window on a stream S takes a time interval w as a parameter and produces a relation R. At time t, R(t) contains all tuples of S with timestamps between t-w and t. Example: R(9) [Range 4]

Partitioned Window A partitioned sliding window on a stream S takes an integer N and a set of attributes {A1,..., Ak } of S as parameters, and is specified by following S with [Partition By A1,...,Ak Rows N]." It logically partitions S into different sub streams. HINT: Rows N will be used a tuple-based window on the substreams.

Relation-to-Stream Operators 3 Relation-to-stream operators: Istream (for Insert Stream) Dstream (for Delete Stream) Rstream (for Relation Stream)

R-to-S Operators IS: Applied to a relation R contains whenever tuple s is in R(t) − R(t − 1) –i.e., whenever s is inserted into R at time t. DS: Applied to a relation R contains whenever tuple s is in R(t − 1) − R(t) –i.e., whenever s is deleted from R at time t. RS: Applied to a relation R contains whenever tuple s is in R(t) –i.e., every current tuple in R is streamed at every time instant.

Example 1 CQL Query Select Istream(*) From S [Rows Unbounded] Where S.A > 10 – S[Rows Unbounded] (stream-to-relation) – S.A > 10 (relation-to-relation) – IStream(*) (relation-to-stream)

Example 2 CQL Query Select * From S1 [Rows 1000], S2 [Range 2 Minutes] Where S1.A = S2.A And S1.A > 10 – S1 [Rows 1000] (Stream-to-Relation) – S2 (Range 2min] (Stream-to-Relation) – S1.A = S2.A (Relation-to-relation) – S1.A > 10 (Relation-to-Relation) – * (Relation-to-Relation)

Example 3 CQL Query Select Rstream(S.A, R.B) From S [Now], R Where S.A = R.A –S[Now] (Stream-To-Relation) –R (Stream-To-Relation) assumes [Rows Unbounded] –S.A = R.A (Relation-To-Relation) –RStream(S.A, R.B) (Relation-To-Stream)

Query Plans & Execution When a continuous query is to be executed within STREAM, a query plan is compiled from it. Query plans are composed of: –Operators (to perform the actual processing) –Queues (buffer tuples as they move between operators) –Synposes (which I will not discuss)

Operators In order to allow processing, each timestamped tuple is additionally flaged for 'insertion' or 'deletion' (+ or -) Streams only include + elements, while relations may include both + and − elements Each query plan operator reads from one or more input queues, processes the input based on its semantics, and writes any output to an output queue.

Queues A queue in a query plan connects its “producing" plan operator OP to its “consuming" operator OC. At any time a queue contains a (possibly empty) collection of elements representing a portion of a stream or relation. Furthermore, the system requires all queues to enforce non-decreasing timestamps, to allow for all possible operations. (Very Important)

Queue Diagram

Query Plan (Example 2) Select * From S1 [Rows 1000], S2 [Range 2 Minutes] Where S1.A = S2.A And S1.A > 10

Query Plan (Example 1) Select Istream(*) From S [Rows Unbounded] Where S.A > 10 Q1: Stream Queue SW: all of Q1 copied Q2: Relation Sel: on S.A > 10 Q3: Relation I-S: R-to-S Q4: Stream

Query Plan (Example 3) Select Rstream(S.A, R.B) From S [Now], R Where S.A = R.A

Query Plan Scheduling When a query plan is executed, a scheduler selects operators in the plan to execute in turn. The semantics of each operator depends only on the timestamps of the elements it processes, not on system or “wall-clock" time. Thus, the order of execution has no effect on the data in the query result, although it can affect other properties such as latency and resource utilization.

Execution Example The first seq-window (now just called SW1) reads (s,r,+) SW1 stores the tuple in its own buffer If buffer is full, more than 1000 elements, it removes oldest element called s'. SW1 writes to q3 (s,r,+) and (s',r,-) SW2 works similary. Binary-Join (now called BJ3) reads (s,r,+) from q3 Stores it in buffer 1 Joins tuple with all elements of buffer 2 Outputs (st,r,+) for t in buffer 2

Execution (Part 2) BJ3 processes all its input queues in non-decreasing order. The Select Operator simply checks its input elements against its predicate and outputs those that pass.