Presentation is loading. Please wait.

Presentation is loading. Please wait.

Avoiding Idle Waiting in the execution of Continuous Queries Carlo Zaniolo CSD CS240B Notes April 2008.

Similar presentations


Presentation on theme: "Avoiding Idle Waiting in the execution of Continuous Queries Carlo Zaniolo CSD CS240B Notes April 2008."— Presentation transcript:

1 Avoiding Idle Waiting in the execution of Continuous Queries Carlo Zaniolo CSD CS240B Notes April 2008

2 Continuous Query Optimization  Global optimization objectives can change during execution from, e.g.,  Response time minimization (typical)  Memory minimization—when this becomes the critical resource  Change in optimization objective might cause re-partitioning of query graph (chain)  Addition & deletion of continuous queries also change topology of execution graph—agile restructuring using a DFA model  Local (operator-specific) optimization—union and join operators have a potential idle-waiting problem  Special execution models (backtracking) and timestamp management techniques are needed to minimize response time and memory usage

3 The Query Graph  The query graph is a DAG  Nodes represent operators  Selection, projection, union, window-join, aggregates, etc.  Thick edges represent inter-connecting physical buffers  The DAG consists of a set of strong components  Strong component are the units for scheduling  A complex DAG may need be partitioned, based on the optimization goal—global optimization  For union and joins (and also slides on logical window) local optimization is needed to avoid idle waiting. Source σ ∑1∑1 Sink ∑2∑2 Source1  U Source2 σ σ ∑1∑1 Sink ∑2∑2

4 The Idle-Waiting Problem for Union (Joins have the same problem)  The Union operator performs a sort-merge operation  Tuple with the smallest timestamp goes through first  Output tuples stay in sorted order on timestamp  Tuples on Union are subject to idle-waiting—short term blocking behavior  Due to network traffic and operator scheduling, timestamp of tuples over multiple inputs may be skewed – earlier tuples on one input may arrive after later tuples on another input.  When one input is empty, tuples on the other inputs have to wait  Input tuples idle-wait for future arrivals, greatly increase query response time Source1  U Source2 σ Sink

5 The Idle-Waiting Problem Only timestamps of tuples are shown in buffers Tuple with TS=1 goes through union first, followed by that with TS=3 Source1  U Source2 σ Sink ∑1∑1 ∑2∑2 ? ? ? ? 6 A BC The union produces tuples by increasing timestamp values Nothing is produced until there is a tuple in A— Idle Waiting Idle Waiting: poor response time—also extra memory used. C Source1  U Source2 σ Sink ∑1∑1 ∑2∑2 1 6 3 ? ? ? ? B A

6 Solving the Idle-Waiting Problem To avoid idle waiting, we need to get values into A fast. How ?? By going back to ∑ 1 that checks B for tuples to be processed and sent to A. If B is empty then we go to , which processes the tuples in C. This process is called Backtracking! Other Execution models, such as those used by other DSMS, will not do. E.g., Round Robin: a fixed execution order can take us to different components or different branches. Backtracking takes us back to the only buffers and operators that can unblock the idle waiting Yes, But:... what if the source buffer C is empty?  On-demand Timestamp Generation! Source1  U Source2 σ Sink ∑1∑1 ∑2∑2 ? ? ? ? 6 A B C

7 Time-stamped Punctuation Marks  Heart-beats: timestamps are generated and sent out from the source.  Periodically as in Gigascope  Effective but far from optimal: too few when needed, too many when not needed  On demand as in Stream Mill  Avoid useless operations when there is no idle-waiting  Send request to right source nodes that can fix the idle-waiting  Much less response time, less memory, but An execution model capable of supporting backtracking is needed for that

8 Time Stamps  External: generated by the system or sensor producing the tuple. Normally at a remote location, with delays in transmission.  Heartbeat mechanism to ensure synchronization.  Internal: Generated by the DSMS when the tuple arrives.  Missing. Actually Latent. Generated by the system when (and if) one is needed.

9 Backtracking without Tears A Simple Rule for Next Operator Selection (NOS), based on the input & output buffers:  YIELD is true if the output buffer of the current operator contains some tuples  MORE is true if there are still tuples in the input buffer of the current operator  [Forward:] if YIELD then next := succ  [Encore:] else if MORE then next := self  [Backtrack:] else next := pred Source σ ∑1∑1 Sink ∑2∑2 ?? NOS for Depth-First

10 A General Model: Breadth/Depth First A Simple Rule for Next Operator Selection (NOS) based on the input & output buffers:  YIELD is true if the output buffer of the current operator contains some tuples  MORE is true if there are still tuples in the input buffer of the current operator NOS for Depth-First  [Forward:] if YIELD then next := succ  [Encore:] else if MORE then next := self  [Backtrack:] else next := pred NOS for Breadth-First  [Encore:] if MORE then next := self  [Forward:] else if YIELD then next := succ  [Backtrack:] else next := pred Source σ ∑1∑1 Sink ∑2∑2 ??

11 Timestamp Propagation by Special Arcs Timestamps can be propagated back to the idle-waiting operators  By punctuation marks  By special arcs that connect the source to idle-waiting operators  shown are dashed arcs in the Enhanced Query Graph (EQG) Source1 ∑1∑1 Source2 Source3 Sink  U σ ∑2∑2

12 Execution Model Benefits  Simple and regular:  The same basic cycle is shared by all strategies, with only the NOS rules being different  Amenable to an efficient Deterministic Finite Automata (DFA) based implementation:  Optimization/scheduling Flexibility  A run time, we can easily switch between policies  Different strategies at the same time in different components  Highly reconfigurable  At run-time.

13 Experiments – Timestamp Propagation  Periodic timestamp propagation reduces latency in proportion to the rate of the heartbeat  However memory overhead increases when heartbeat tuple rate is high  On-demand timestamp propagation reduces latency to very small values with no memory overhead

14 DFS vs. BFS  How DFS and BFS behave under different input burstiness  We introduce bursts of nearly simultaneous tuples  Both DFS and BFS shows increased latency when burstiness increases, but BFS has a steeper increase

15 Conclusion  A tuple-level execution model for DSMS that:  Supports different execution strategies with ease  Enables optimization of response time by  efficiently backtracking, and  propagating on-demand timestamp information.  Is amenable to a simple Deterministic Finite Automata (DFA) based implementation  Supports dynamic reconfiguration: we can re-partition the query graph for optimization, add/delete operators, at run time with little overhead  Is the linchpin of the Stream Mill System  Stream Mill System: http://wis.cs.ucla.edu

16 A Flexible Query Graph Based Model for the Efficient Execution of Continuous Queries Yijian Bai, Hetal Thakkar, Haixun Wang* and Carlo Zaniolo Department of Computer Science University of California, Los Angeles * IBM T.J. Watson

17 Time for Questions Thank You!

18 The Stream Mill System  One server, multiple clients  Server (on Linux) hosts the query language, manages storage and schedules continuous queries  Clients (Java based GUI) allow the user to specify streams, queries, and interact with the server Clients


Download ppt "Avoiding Idle Waiting in the execution of Continuous Queries Carlo Zaniolo CSD CS240B Notes April 2008."

Similar presentations


Ads by Google