Presentation is loading. Please wait.

Presentation is loading. Please wait.

STREAM The Stanford Data Stream Management System.

Similar presentations


Presentation on theme: "STREAM The Stanford Data Stream Management System."— Presentation transcript:

1 STREAM The Stanford Data Stream Management System

2 Presentation Structure Introduction CQL: Continuous Query Language –Abstract Semantics –Data Types –Operators Query Plan & Execution

3 Introduction The system is designed for limited resource environments where streams may be rapid, and query loads may vary over time.

4 CQL: Continuous Query Language For simple continuous queries over streams, it can be sufficient to use a relational query language such as SQL. However, for more complex queries this can quickly become very unclear.

5 Abstract Semantics 2 Data Types: – Streams – Relations defined on a discrete, ordered time domain  3 Types of Operators

6 Streams A stream S is an unbounded bag (multiset) of pairs, where s is a tuple and t   is the timestamp that denotes the logical arrival time of tuple s on stream S.

7 Relations A relation R is a time-varying bag of tuples. The bag of tuples at time t is denoted R(t), and we call R(t) an instantaneous relation. Note that tuples in R(t) have no time-stamp.

8 Operator Diagram

9 Operator Classes A relation-to-relation operator takes one or more relations as input and produces a relation as output. A stream-to-relation operator takes a stream as input and produces a relation as output. A relation-to-stream operator takes a relation as input and produces a stream as output. Stream-to-stream operators are absent they are composed from operators of the above classes.

10 Query Structure A continuous query Q is a tree of operators belonging to the aforementioned classes. The inputs of Q are the streams and relations that are input to the leaf operators. The output of Q is the output of the root operator. The output is either a stream or a relation, depending on the class of the root operator.

11 Output Timestamp Since at time t, an operator of Q logically depends on its inputs up to t. The operator produces new outputs corresponding to t tuples of S with timestamp t if the output is a stream S, or instantaneous relation R(t) if the output is a relation R

12 Relation-to-Relation Operators CQL uses SQL constructs to express its relation-to-relation operators i.e. SELECT... FROM …

13 Class Operator Diagram Performs duplicate eliminationrelation-to-relationduplicate-eliminate Performs grouping and aggregationrelation-to-relationaggregate Antisemijoin of two input relationsrelation-to-relationantisemijoin Bag Intersectionrelation-to-relationintersect Bag Differencerelation-to-relationexcept Bag Unionrelation-to-relationunion Multiway join from [22]relation-to-relationmjoin Joins two input relationsrelation-to-relationbinary-join Duplicate-Preserving Projectionrelation-to-relationproject Filters elements based on predicate(s)relation-to-relationselect DescriptionOperator TypeName

14 Stream-to-Relation Operators Based on a sliding window principle. 3 Types of Windows: –Tuple-based window –Time-based window –Partitioned Widow

15 Tuple-based Window A tuple-based sliding window on a stream S takes an integer N > 0 as a parameter and produces a relation R. At time t, R(t)contains the N tuples of S with the largest timestamps < t. Example: R(14) [Rows 5]

16 Time-based window A time-based sliding window on a stream S takes a time interval w as a parameter and produces a relation R. At time t, R(t) contains all tuples of S with timestamps between t-w and t. Example: R(9) [Range 4]

17 Partitioned Window A partitioned sliding window on a stream S takes an integer N and a set of attributes {A1,..., Ak } of S as parameters, and is specified by following S with [Partition By A1,...,Ak Rows N]." It logically partitions S into different sub streams. HINT: Rows N will be used a tuple-based window on the substreams.

18 Relation-to-Stream Operators 3 Relation-to-stream operators: Istream (for Insert Stream) Dstream (for Delete Stream) Rstream (for Relation Stream)

19 R-to-S Operators IS: Applied to a relation R contains whenever tuple s is in R(t) − R(t − 1) –i.e., whenever s is inserted into R at time t. DS: Applied to a relation R contains whenever tuple s is in R(t − 1) − R(t) –i.e., whenever s is deleted from R at time t. RS: Applied to a relation R contains whenever tuple s is in R(t) –i.e., every current tuple in R is streamed at every time instant.

20 Example 1 CQL Query Select Istream(*) From S [Rows Unbounded] Where S.A > 10 – S[Rows Unbounded] (stream-to-relation) – S.A > 10 (relation-to-relation) – IStream(*) (relation-to-stream)

21 Example 2 CQL Query Select * From S1 [Rows 1000], S2 [Range 2 Minutes] Where S1.A = S2.A And S1.A > 10 – S1 [Rows 1000] (Stream-to-Relation) – S2 (Range 2min] (Stream-to-Relation) – S1.A = S2.A (Relation-to-relation) – S1.A > 10 (Relation-to-Relation) – * (Relation-to-Relation)

22 Example 3 CQL Query Select Rstream(S.A, R.B) From S [Now], R Where S.A = R.A –S[Now] (Stream-To-Relation) –R (Stream-To-Relation) assumes [Rows Unbounded] –S.A = R.A (Relation-To-Relation) –RStream(S.A, R.B) (Relation-To-Stream)

23 Query Plans & Execution When a continuous query is to be executed within STREAM, a query plan is compiled from it. Query plans are composed of: –Operators (to perform the actual processing) –Queues (buffer tuples as they move between operators) –Synposes (which I will not discuss)

24 Operators In order to allow processing, each timestamped tuple is additionally flaged for 'insertion' or 'deletion' (+ or -) Streams only include + elements, while relations may include both + and − elements Each query plan operator reads from one or more input queues, processes the input based on its semantics, and writes any output to an output queue.

25 Queues A queue in a query plan connects its “producing" plan operator OP to its “consuming" operator OC. At any time a queue contains a (possibly empty) collection of elements representing a portion of a stream or relation. Furthermore, the system requires all queues to enforce non-decreasing timestamps, to allow for all possible operations. (Very Important)

26 Queue Diagram

27 Query Plan (Example 2) Select * From S1 [Rows 1000], S2 [Range 2 Minutes] Where S1.A = S2.A And S1.A > 10

28 Query Plan (Example 1) Select Istream(*) From S [Rows Unbounded] Where S.A > 10 Q1: Stream Queue SW: all of Q1 copied Q2: Relation Sel: on S.A > 10 Q3: Relation I-S: R-to-S Q4: Stream

29 Query Plan (Example 3) Select Rstream(S.A, R.B) From S [Now], R Where S.A = R.A

30 Query Plan Scheduling When a query plan is executed, a scheduler selects operators in the plan to execute in turn. The semantics of each operator depends only on the timestamps of the elements it processes, not on system or “wall-clock" time. Thus, the order of execution has no effect on the data in the query result, although it can affect other properties such as latency and resource utilization.

31 Execution Example The first seq-window (now just called SW1) reads (s,r,+) SW1 stores the tuple in its own buffer If buffer is full, more than 1000 elements, it removes oldest element called s'. SW1 writes to q3 (s,r,+) and (s',r,-) SW2 works similary. Binary-Join (now called BJ3) reads (s,r,+) from q3 Stores it in buffer 1 Joins tuple with all elements of buffer 2 Outputs (st,r,+) for t in buffer 2

32 Execution (Part 2) BJ3 processes all its input queues in non-decreasing order. The Select Operator simply checks its input elements against its predicate and outputs those that pass.


Download ppt "STREAM The Stanford Data Stream Management System."

Similar presentations


Ads by Google