Continuous Query Languages for DSMS

Slides:

Advertisements

Similar presentations

Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:

Advertisements

Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.

Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.

Part IV: Memory Management

CS240B Midterm Spring 2013 Your Name: and your ID: Problem Max scoreScore Problem 140% Problem 232% Problem 228% Total 100%

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.

1 Efficient Temporal Coalescing Query Support in Relational Database Systems Xin Zhou 1, Carlo Zaniolo 1, Fusheng Wang 2 1 UCLA, 2 Simens Corporate Research.

The Volcano/Cascades Query Optimization Framework

CS 484. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.

An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations Presenter: Liyan Zhang Presentation of ICS

Memory Management Chapter 4. Memory hierarchy Programmers want a lot of fast, non- volatile memory But, here is what we have:

Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.

1 Continuous Query Languages for DSMS CS240B Notes by Carlo Zaniolo.

1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.

Avoiding Idle Waiting in the execution of Continuous Queries Carlo Zaniolo CSD CS240B Notes April 2008.

1 Verifying and Mining Frequent Patterns from Large Windows ICDE2008 Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Date: 2008/9/25 Speaker: Li, HueiJyun.

Data Streams: Lecture 101 Window Aggregates in NiagaraST Kristin Tufte, Jin Li Thanks to the NiagaraST PSU.

Cpr E 308 Spring 2005 Process Scheduling Basic Question: Which process goes next? Personal Computers –Few processes, interactive, low response time Batch.

ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, Keyword Search on Relational Data Streams Alexander Markowetz Yin.

Eddies: Continuously Adaptive Query Processing Ross Rosemark.

CS4432: Database Systems II Query Processing- Part 2.

Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.

Evaluating Window Joins over Unbounded Streams Jaewoo Kang Jeffrey F. Naughton Stratis D. Viglas {jaewoo, naughton, Univ. of Wisconsin-Madison.

1 Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter Tucker This work.

Query Processing – Implementing Set Operations and Joins Chap. 19.

High Performance Embedded Computing © 2007 Elsevier Lecture 4: Models of Computation Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

Safety Guarantee of Continuous Join Queries over Punctuated Data Streams Hua-Gang Li *, Songting Chen, Junichi Tatemura Divykant Agrawal, K. Selcuk Candan.

Lecture 4 CPU scheduling. Basic Concepts Single Process  one process at a time Maximum CPU utilization obtained with multiprogramming CPU idle :waiting.

CPU scheduling.  Single Process  one process at a time  Maximum CPU utilization obtained with multiprogramming  CPU idle :waiting time is wasted 2.

1 Out of Order Processing for Stream Query Evaluation Jin Li (Portland State Universtiy) Joint work with Theodore Johnson, Vladislav Shkapenyuk, David.

CPSC-310 Database Systems

Continuous Query Languages for DSMS

Continuous Query Languages (CQL) Blocking Operators and the expressive power problem Carlo Zaniolo UCLA CSD 2017.

CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12

Modeling Stream Processing Applications for Dependability Evaluation

Chapter 2 Scheduling.

Chapter 12: Query Processing

Relational Algebra Chapter 4, Part A

Evaluation of Relational Operations

Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.

Join Processing in Database Systems with Large Main Memories (part 2)

CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12

CPU Scheduling G.Anuradha

Database Applications (15-415) DBMS Internals- Part VII Lecture 19, March 27, 2018 Mohammad Hammoud.

Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.

Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016

Evaluating Window Joins over Punctuated Streams

Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy

Problem Solving and Searching

Relational Algebra Chapter 4, Sections 4.1 – 4.2

What to do when you don’t know anything know nothing

Problem Solving and Searching

Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)

Artificial Intelligence

CS 6290 Many-core & Interconnect

Continuous Query Languages (CQL) Blocking Operators and the expressive power problem Carlo Zaniolo UCLA CSD 2017.

Constraint satisfaction problems

Query Optimization Minimizing Memory and Latency in DSMS

UCLA, Fall CS240B Midterm Your Name: and your ID:

Continuous Query Languages for DSMS

Idle Waiting for slides

CS240B Midterm: Winter 2017 Your Name: and your ID:

Congestion Control (from Chapter 05)

Database Systems (資料庫系統)

Congestion Control (from Chapter 05)

COMP755 Advanced Operating Systems

A unified extension of lstm to deep network

Constraint satisfaction problems

Presentation transcript:

Continuous Query Languages for DSMS Notes by Carlo Zaniolo UCLA CSD Spring 2009

Relational Algebra Operators DB Relations Selection, Projection Union Join (including X) on tables Set Difference Aggregates: Traditional Blocking aggregates OLAP functions on windows or unlimited preceding Data Streams (DS) ... same (no duplicate eliminated) DS Union by Sort-Merging on timestamps Join of DS with table OK Window joins on streams (timestamps merged into 1 column) DS diff not supported (blocking!) diff of stream with table OK Aggregates: No blocking aggregate OLAP functions on windows or unlimited preceding Slides, and tumbles.

Cascading of Streams CREATE STREAM OpenAuction (itemID int, sellerID char(10), price real, start time timestamp) ORDER BY start time; /*external timestamps*/ SOURCE ’port4445’; CREATE STREAM expensiveItems AS SELECT itemID, sellerID, price, start time FROM OpenAuction WHERE price > 1000 SELECT itemID, price, start time FROM expensiveItems WHERE sellerID= `ABCwarehouse` Source port4445 σ Sink OpenAuction ExpensiveItems σπ 4 4

Continuous Query Graph: many components—arbitrary DAGs Source σ ∑1 Sink ∑2 Source Sink O2 O3 O1  Source1 ⋈ Sink Source2 σ Source1  U Source2 σ ∑1 Sink ∑2

Data Stream of Bids on Bolts and Nuts create stream bids(bid#, item, offer, Time) %example of selection followed by union create stream mybids as (select bid#, offer, Time from bids where item=bolt union select bid#, offer, Time where item=nut) % Result same as: select bid#, offer, Time where item= bolt or item=nut

Window Joins We could create a stream called interesting bids by say joining bids with the ‘interesting_items’ table. For each bolt bid find all the nut bids in the last 5 minutes create stream selfjoinbids as select S1.bid#, S1.offer, S2.bid#, S1.Time from bids as S1, bids as S2 [window of 5 minutes] where S1.item=bolt and S2.item=nut and S1.offer=S2.offer S1 only sees older S2 tuples in a window of 5 minutes: S1.Time >= S2.Time and S2.Time >= S1.Time-5 minutes. These are logical windows: Physical windows defined by a tuple count. Symmetrically a window can also be defined on S1

Window Join of Stream A and Stream B SourceA  SourceB op2 Sink op1 A ⋈ B Join of A having window W(A) and Stream B having window W(B) When tuples are present at both inputs, and the timestamp of A is less than or equal to that of B, then perform the following operations (symmetric operations are performed if timestamp of B is less than or equal to that of A): Production: compute the join of the tuple in A with the tuples in W(B) and add the resulting tuples to output buffer (these tuples take their timestamps from the tuple in A) Consumption: the current tuple in A is removed from the input and added to the window buffer W(A) (from which the expired tuples are also removed)

Union—Merging tuples by timestamps The Union operator performs a merge operation on tuples sorted by their timestamps The tuple with the smallest timestamp goes through first Output tuples thus sorted by timestamps Tuples on Union are subject to idle-waiting Tuples in some input might be slow to arrive or might be slow to arrive due network traffic and operator scheduling, etc. When some input is empty, tuples on the other inputs have to wait Input tuples idle-wait for future arrivals, greatly increase query response time We use a union operator to explain the idle-waiting problem. Source1  U Source2 σ Sink

The Idle-Waiting Problem Source1  U Source2 σ Sink ∑1 ∑2 1 6 3 ? B A C Only timestamps are shown for tuples in buffers Tuple with TS=1 goes through union first, followed by that with TS=3 Source1  U Source2 σ Sink ∑1 ∑2 ? 6 A B C We use a union operator to explain the idle-waiting problem. The union produces tuples by increasing timestamp values Nothing is produced until there is a tuple in A— Idle Waiting Idle Waiting: poor response time—also extra memory used.

Idle Waiting: Simultaneous Tuples Source1  U Source2 σ Sink ∑1 ∑2 1 4 ? B A C Only timestamps of tuples are shown in buffers Tuple with TS=1 goes through union first, followed by one with TS=4 Source1  U Source2 σ Sink ∑1 ∑2 ? 4 A B C We use a union operator to explain the idle-waiting problem. No need to wait here Timestamp memory registers can solve that problem

A General Solution for Idle Waiting Source1  U Source2 σ Sink ∑1 ∑2 ? 6 A B C To avoid idle waiting, we need to get values into A fast. How ?? By going back to ∑1 that checks B for tuples to be processed and sent to A. If B is empty then we go to , which processes the tuples in C. This process is called Backtracking! Other Execution models, such as those used by other DSMS, will not do. E.g., Round Robin: a fixed execution order can take us to different components or different branches. Backtracking takes us back to the only buffers and operators that can unblock the idle waiting Yes, But: ... what if the source buffer C is empty? Then On-demand Timestamp Generation, and Punctuation tuples to deliver information: basically these are tuples with only timing information. Punctuation tuples were also used to deliver End-of-input information for blocking operators. We use a union operator to explain the idle-waiting problem.

Time-stamped Punctuation Marks Heartbeats: timestamps are generated periodically and sent out from the source. [Gigascope] Effective but far from optimal: too few when needed, too many when not needed On-demand generation, to Avoid useless operations when there is no idle-waiting Send request to right source nodes that can fix the idle-waiting Much less response time, less memory, but An execution model capable of supporting backtracking is needed for on-demand generation [Stream Mill] Regular heartbeat tuples has a number of problems: response time improvement limited by heartbeat frequency to have high improvements, heartbeat tuples themselves bring high overhead of both memory and cpu time. not on-demand, therefore incur overhead even when there is no idle-waiting occurring.

Backtracking without Tears A Simple Rule for Next Operator Selection (NOS), based on the input & output buffers: YIELD is true if the output buffer of the current operator contains some tuples MORE is true if there are still tuples in the input buffer of the current operator [Forward:] if YIELD then next := succ [Encore:] else if MORE then next := self [Backtrack:] else next := pred NOS for Depth-First Note that DFS and BFS rules here only differ on the order of the two condition checks for yield and more. Slight modifications can also be made to support other strategies, such RR. Source σ ∑1 Sink ∑2 ?

A General Model: Breadth/Depth First A Simple Rule for Next Operator Selection (NOS) based on the input & output buffers: YIELD is true if the output buffer of the current operator contains some tuples MORE is true if there are still tuples in the input buffer of the current operator NOS for Depth-First [Forward:] if YIELD then next := succ [Encore:] else if MORE then next := self [Backtrack:] else next := pred NOS for Breadth-First [Encore:] if MORE then next := self [Forward:] else if YIELD then next := succ [Backtrack:] else next := pred Note that DFS and BFS rules here only differ on the order of the two condition checks for yield and more. Slight modifications can also be made to support other strategies, such RR. Source σ ∑1 Sink ∑2 ?

Timestamp Propagation by Special Arcs Timestamps can be propagated back to the idle-waiting operators By punctuation marks By special arcs that connect the source to idle-waiting operators shown are dashed arcs in the Enhanced Query Graph (EQG) ∑1  Source1 It is important that timestamp propagation only occurs after unsuccessful backtracking, since we need to make sure that there are no tuples left in the intermediate non-IWP operators. U Sink Source2 σ ∑2 Source3

Execution Model Benefits Simple and regular: The same basic cycle is shared by all strategies, with only the NOS rules being different Amenable to an efficient Deterministic Finite Automata (DFA) based implementation: Optimization/scheduling Flexibility A run time, we can easily switch between policies Different strategies at the same time in different components Highly reconfigurable At run-time. Next we will take a look at how it is implemented and how the reconfiguration happens

Experiments – Timestamp Propagation Union query with unmatched arrival rates at input These experiments are done with internal timestamps, where the ETS (Enabling TimeStamp) value is easy to generate. The experiments use one union operator, where the arrival rates on the two inputs are very different (50 tuples/sec and 0.05 tuples/sec). In fact, not shown here is that on-demand timestamp here works very close to the optimal case of latent timestamps, where the input tuples goes through the union operator as they arrive without timestamp consideration (timestamp assigned upon arrival). Periodic timestamp propagation reduces latency in proportion to the rate of the heartbeat However memory overhead increases when heartbeat tuple rate is high On-demand timestamp propagation reduces latency to very small values with no memory overhead

DFS vs. BFS How DFS and BFS behave under different input burstiness To show that our execution model supports different strategies, we compare DFS and BFS under different input burstiness Both DFS and BFS shows increased latency with increasing burstiness, but BFS has a steeper increase, which is dictated by the nature of the strategies. How DFS and BFS behave under different input burstiness We introduce bursts of nearly simultaneous tuples Both DFS and BFS shows increased latency when burstiness increases, but BFS has a steeper increase

References A. Arasu, S. Babu, and J. Widom. An abstract semantics and concrete language for continuous queries over streams and relations. Technical report, Stanford University, 2002. C. Cranor, Y. Gao, T. Johnson, V. Shkapenyuk, and O. Spatscheck. Gigascope: A stream database for network applications. In SIGMOD Conference, pages 647-651. ACM Press, 2003. Jaewoo Kang, Jeffrey F. Naughton, and Stratis Viglas. Evaluating window joins over unbounded streams. In ICDE, pages 341--352, 2003. Utkarsh Srivastava and J.Widom. Flexible time management in data stream systems. In PODS, pages 263.274, 2004. Peter A. Tucker, David Maier, Tim Sheard, and Leonidas Fegaras. Exploiting punctuation semantics in continuous data streams. IEEE Trans. Knowl. Data Eng, 15(3):555-568, 2003. Theodore Johnson et. al. A heartbeat mechanism and its applicationin gigascope. In VLDB, pages 1079.1088, 2005. Yijian Bai, Hetal Thakkar, Haixun Wang and Carlo Zaniolo: Optimizing Timestamp Management in Data Stream Management Systems. ICDE 2007.

Relational Algebra Operators Stored data Selection, Projection Union Join (including X) on tables Set Difference Aggregates: Traditional Blocking aggregates OLAP functions on windows or unlimited preceding Data Streams ... same Union by Sort-Merging on timestamps Join of Stream with table Window joins on streams (timestamps merged into 1 column) No stream difference (blocking—diff of stream with table OK). Aggregates: No blocking aggregate OLAP functions on windows or unlimited preceding Slides, and tumbles. Including UDAs