State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries Song Wang Elke Rundensteiner Database Systems Research Group Worcester.

Slides:



Advertisements
Similar presentations
Choosing an Order for Joins
Advertisements

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Revisiting Co-Processing for Hash Joins on the Coupled CPU- GPU Architecture School of Computer Engineering Nanyang Technological University 27 th Aug.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
Fast Firewall Implementation for Software and Hardware-based Routers Lili Qiu, Microsoft Research George Varghese, UCSD Subhash Suri, UCSB 9 th International.
Di Yang, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute VLDB 2009, Lyon, France 1 A Shared Execution Strategy for Multiple Pattern.
Estimating TCP Latency Approximately with Passive Measurements Sriharsha Gangam, Jaideep Chandrashekar, Ítalo Cunha, Jim Kurose.
On-the-Fly Sharing for Streamed Aggregation Sailesh Krishnamurthy, Chung Wu, and Michael J. Franklin Presented by: Joshua Lee and Mingrui Wei Material.
Adaptive Monitoring of Bursty Data Streams Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani.
Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.
Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz.
Cuckoo Hashing : Hardware Implementations Adam Kirsch Michael Mitzenmacher.
Cs4432optimization1 CS4432: Database Systems II Lecture #18 Query Optimizer – Wrap Up Professor Elke A. Rundensteiner.
Fjording the Stream: An Architecture for Queries over Streaming Sensor Data Samuel Madden, Michael J. Franklin University of California, Berkeley Proceedings.
©NEC Laboratories America 1 Hui Zhang Samrat Ganguly Sudeept Bhatnagar Rauf Izmailov NEC Labs America Abhishek Sharma University of Southern California.
Dynamic Plan Migration for Continuous Query over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group Worcester.
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
SIGMOD'061 Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Bin Liu, Yali Zhu and Elke A. Rundensteiner Database Systems Research.
Continuous Stream Monitoring Technology Elke A. Rundensteiner Database Systems Research Laboratory Department of Computer Science Worcester Polytechnic.
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Chain: Operator Scheduling for Memory Minimization in Data Stream Systems Authors: Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani (Dept.
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
1 DCAPE: Distributed and Self-Tuned Continuous Query Processing Tim Sutherland,Bin Liu,Mariana Jbantova, and Elke A. Rundensteiner Department of Computer.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Wide-Area Traffic Management COS 597E: Software Defined Networking.
Not All Microseconds are Equal: Fine-Grained Per-Flow Measurements with Reference Latency Interpolation Myungjin Lee †, Nick Duffield‡, Ramana Rao Kompella†
Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.
Efficient Exact Similarity Searches using Multiple Token Orderings Jongik Kim 1 and Hongrae Lee 2 1 Chonbuk National University, South Korea 2 Google Inc.
Index Tuning for Adaptive Multi-Route Data Stream Systems Karen Works, Elke A. Rundensteiner, and Emmanuel Agu Database Systems Research.
Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.
Low-Power Gated Bus Synthesis for 3D IC via Rectilinear Shortest-Path Steiner Graph Chung-Kuan Cheng, Peng Du, Andrew B. Kahng, and Shih-Hung Weng UC San.
Multiple Aggregations Over Data Streams Rui ZhangNational Univ. of Singapore Nick KoudasUniv. of Toronto Beng Chin OoiNational Univ. of Singapore Divesh.
Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.
1 Dynamically Adaptive Distributed System for Processing CompleX Continuous Queries Bin Liu, Yali Zhu, Mariana Jbantova, Brad Momberger, and Elke A. Rundensteiner.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
Copyright © Curt Hill Query Evaluation Translating a query into action.
R-SOX : R untime S emantic Query O ptimization over X ML Streams Song Wang, Hong Su, Ming Li, Mingzhu Wei, Shoushen Yang Drew Ditto, Elke A. Rundensteiner.
QoS Routing in Networks with Inaccurate Information: Theory and Algorithms Roch A. Guerin and Ariel Orda Presented by: Tiewei Wang Jun Chen July 10, 2000.
Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.
Opportunistic Traffic Scheduling Over Multiple Network Path Coskun Cetinkaya and Edward Knightly.
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
1 Elke. A. Rundensteiner Worcester Polytechnic Institute Elisa Bertino Purdue University 1 Rimma V. Nehme Microsoft.
Buffer-pool aware Query Optimization Ravishankar Ramamurthy David DeWitt University of Wisconsin, Madison.
ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, Keyword Search on Relational Data Streams Alexander Markowetz Yin.
Di Yang, Zhengyu Guo, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute EDBT 2010, Submitted 1 A Unified Framework Supporting Interactive.
D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang 1, Tie Hua Zhou 1, Kyung Ah Kim 2, Eun Jong Cha 2, and Keun.
Evaluating Window Joins over Unbounded Streams Jaewoo Kang Jeffrey F. Naughton Stratis D. Viglas {jaewoo, naughton, Univ. of Wisconsin-Madison.
Adaptivity in continuous query systems Luis A. Sotomayor & Zhiguo Xu Professor Carlo Zaniolo CS240B - Spring 2003.
1 Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter Tucker This work.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Safety Guarantee of Continuous Join Queries over Punctuated Data Streams Hua-Gang Li *, Songting Chen, Junichi Tatemura Divykant Agrawal, K. Selcuk Candan.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Power-aware NOC Reuse on the Testing of Core-based Systems* CSCE 932 Class Presentation by Xinwang Zhang April 26, 2007 * Erika Cota, et al., International.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
International Conference on Data Engineering (ICDE 2016)
RE-Tree: An Efficient Index Structure for Regular Expressions
Query in Streaming Environment
Load Shedding in Stream Databases – A Control-Based Approach
Evaluating Window Joins over Punctuated Streams
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani
(A Research Proposal for Optimizing DBMS on CMP)
Slides adapted from Donghui Zhang, UC Riverside
TelegraphCQ: Continuous Dataflow Processing for an Uncertain World
Presentation transcript:

State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries Song Wang Elke Rundensteiner Database Systems Research Group Worcester Polytechnic Institute Worcester, MA, USA. Samrat Ganguly Sudeept Bhatnagar NEC Laboratories America Inc. Princeton, NJ, USA.

32nd VLDB Conference, Seoul, Korea, Computation Sharing for Stream Processing Register Continuous Queries Streaming Data Streaming Result σ П σ σ New Challenges: In-memory processing of stateful operators Stateful operators with various window constraints Agg SPJA Query Network w1w1 w2w2 w3w3 Agg

32nd VLDB Conference, Seoul, Korea, Window Constraints for Stateful Operators  Time-based sliding window constraints Each tuple has a timestamp Only tuples within W timeframe can form an output Buffer A Buffer B A[ w ] AB B[ w ] Observations: States in the operator dominate memory usage State size is proportional to the input rate and window length Join CPU cost is proportional to the state size

32nd VLDB Conference, Seoul, Korea, A Motivation Example Q1: SELECT A.* FROM Temperature A, Humidity B WHERE A.LocationId= B.LocationId WINDOW w1 min Q2: SELECT A.* FROM Temperature A, Humidity B WHERE A.LocationId= B.LocationId AND A.Value>Threshold WINDOW w2 min A[ w 1 ] Q1Q1 AB B[ w 1 ] Q2Q2 σAσA A B A[ w 2 ] B[ w 2 ] Observations: State A[W 1 ] overlaps with state A[W 2 ] State B[W 1 ] overlaps with state B[W 2 ] Joined results of Q1 and Q2 overlap Let: w1<w2

32nd VLDB Conference, Seoul, Korea, Sharing with Selection Pull-up [CDF02, HFA+03] +  Selection pull up  Using larger window (w2) A[ w 1 ] Q1Q1 AB B[ w 1 ] Q2Q2 σAσA A B A[ w 2 ] B[ w 2 ] all Q2Q2 Q1Q1 |T a -T b | <W 1 Router B σAσA A R A[ w 2 ] B[ w 2 ] A B A[ w 2 ] B[ w 2 ] σAσA Q2Q2 [CDF02]: J. Chen, D. J. DeWitt, and J. F. Naughton. Design and evaluation of alternative selection placement strategies in optimizing continuous queries. In ICDE’02. [HFA+03]: M. A. Hammad, M. J. Franklin, W. G. Aref, and A. K. Elmagarmid. Scheduling for shared window joins over data streams. In VLDB’03.

32nd VLDB Conference, Seoul, Korea, Pros  Single Join Operator Cons  Wasted Computation without Early Filtering  Wasted State Memory without Early Filtering  Per Output-Tuple Routing Cost Sharing with Selection Pull-up [CDF02, HFA+03]

32nd VLDB Conference, Seoul, Korea,  Split stream A by A.Value  Route shared join results Stream Partition with Selection Pushdown [KFH04] + A[ w 1 ] Q1Q1 AB B[ w 1 ] Q2Q2 σ A.Value>Threshold A B A[ w 2 ] B[ w 2 ] A1 Router > all B A Threshold <= U B1 Split 1 A2 B2 2 Q2Q2 Q1Q1 |T a -T b | Union R S A[ w 1 ] B[ w 1 ] A[ w 2 ] B[ w 2 ] <W 1 [KFH04]: S. Krishnamurthy, M. J. Franklin, J. M. Hellerstein, and G. Jacobson. The case for precision sharing. In VLDB’04.

32nd VLDB Conference, Seoul, Korea, Pros  Selection pushdown: no wasted Join Computation Cons  Multiple Join Operators  Duplicated State Memory in Multiple Join Operators  Per Output-Tuple Routing Cost Stream Partition with Selection Pushdown [KFH04]

32nd VLDB Conference, Seoul, Korea, State-Slice: New Sharing Paradigm Key Ideas:  State-Slice Concept for Sliding Window Join  Pipelined Chain of Join Slices Prospective Benefit:  Fine-grained Selection Push-down  Pipelined Join Operators  Avoiding Per-tuple Routing Cost

32nd VLDB Conference, Seoul, Korea, One-way State Sliced Window Join State of Stream A: [w1, w2] Probe A Tuple B Tuple Joined-Result Purged-A-Tuple Propagated-B-Tuple  Iower bound of sliding window: [w1,w2]  B tuple only probes A tuples that are “older” at least W 1, but at most W 2, than itself

32nd VLDB Conference, Seoul, Korea, The Chain of One-way State-Sliced Joins  Split state memory into chain of joins  No overlap of state memory in chain of joins Queue(s) State of Stream A: [0, w 1 ] Probe A Tuple B Tuple J1J1 J2J2 State of Stream A: [w 1, w 2 ] Probe U Union Joined-Result =

32nd VLDB Conference, Seoul, Korea, female From One-way to Two-way Binary Join  Intuitively a combination of two one-way join  Two references for each A or B tuples Male tuples are used to probe states Female tuples are inserted and cross-purged to respective states State of Stream A: [0, w 1 ] State of Stream B: [0, w 1 ] Queue(s) A Tuple B Tuple J1J1 J2J2 U Union Joined-Result State of Stream B: [w 1, w 2 ] State of Stream A: [w 1, w 2 ] male

32nd VLDB Conference, Seoul, Korea, State-Sliced Join Chain: The Example  States of sliced joins in a chain are disjoint with each other  Minimize State Memory Usage  Selection can be pushed down into middle of join chain  Avoid Unnecessary Resource Waste  No routing step is needed  Avoid Per Output-Tuple Routing Cost Completely A1 B1 B A [0,W 1 ] 1 A2 B2 2 Q2Q2 Q1Q1 U Union σAσA s s σAσA [0,W 1 ] [W 1,W 2 ] + Q2Q2 σAσA A B A[ w 2 ] B[ w 2 ] Q1Q1 A[ w 1 ] AB B[ w 1 ] Q1Q1 A[ w 1 ] AB B[ w 1 ]

32nd VLDB Conference, Seoul, Korea, Summary: State-Sliced Join Chain Pros:  Minimized Memory Usage  Reduced Routing Cost  No Need of Operator Synchronization in the Chain Cons:  Stream traffic between pipelined joins  Purge cost

32nd VLDB Conference, Seoul, Korea, Sharing via Chains: Memory-Optimal Chain U U U ss [w 1,w 2 ] B A 1 Q1Q1 [0,w 1 ] 2 Q2Q2 s [w N-1,w N ] N … Union … QNQN s [w 2,w 3 ] 3 Q3Q3 Union … U ss [w 1,w 2 ] B A 1 Q1Q1 [0,w 1 ] 2 Q2Q2 s [w N-1,w N ] N … U Union … QNQN U s [w 2,w 3 ] 3 Q3Q3 Union … σ’1σ’1 σ1σ1 σ’2σ’2 σ’2σ’2 σ2σ2 σ3σ3 σ’3σ’3 σ’3σ’3 σNσN σNσN  No Selection:  With Selection:

32nd VLDB Conference, Seoul, Korea, Mem-Optimal Chain  CPU-Optimal Chain? ss [w 1,w 2 ] B A 1 Q1Q1 [0,w 1 ] 2 Q2Q2 U Union s [w 2,w 3 ] 3 Q3Q3 U Union s [w 3,w 4 ] 4 Q4Q4 U Union s [w 4,w 5 ] 5 Q5Q5 U Union Overheads:  Too many operators may increase system context switch cost  Too many sliced states increase purging cost

32nd VLDB Conference, Seoul, Korea, Merging Sliced Joins Tradeoff:  Gain from Merging Reduce number of Join operators Reduce extra purging cost  Loss from Merging Introduce routing cost Increase memory usage due to selection pullup Cost Model for CPU Usage s i QiQi U Union … s [w j-1,w j ] QjQj U Union …… … … [w i-1,w i ] j QiQi U Union … s [w i-1,w j ] QjQj U Union … <wi<wi |T a -T b | R Router ≥w j-1 i … … …

32nd VLDB Conference, Seoul, Korea, CPU-Opt. Chain: Search Space & Solution v0v0 v1v1 v2v2 v5v5 v3v3 w0w0 w1w1 w2w2 w3w3 w5w5 v4v4 w4w4 ss [w 2,w 3 ] B A 1 [0,w 2 ] 2 Q3Q3 U Union s [w 3,w 5 ] 3 Q4Q4 U Union Q2Q2 <w1<w1 |T a -T b | R Router Q1Q1 <w4<w4 |T a -T b | R Router Q5Q5 U Union Legend: V i : window start/end time V i toV j : one slice window Shortest path problem

32nd VLDB Conference, Seoul, Korea, Summary: Mem-Opt. vs. CPU-Opt. Join Chain Mem-Optimal: Minimized Memory Usage  Higher System Overhead  Higher Purging Cost CPU-Optimal: Minimized CPU Usage  More Memory Usage if Selection is Pulled Up to Merge Slices. Selection PullUp SharingMem-Opt. Chain CPU-Opt. Chain State Slice State Merge

32nd VLDB Conference, Seoul, Korea, Experimental WPI Stream Engine: CAPE Software Demonstration VLDB’04

32nd VLDB Conference, Seoul, Korea, Experiment Study 1: Memory Consumption

32nd VLDB Conference, Seoul, Korea, Experiment Study 2: Total Service Rate

32nd VLDB Conference, Seoul, Korea, Experiment Study 3: Mem-Opt. vs. CPU-Opt. Window Distributions Used for 12 Queries. Small-Large: 12 Queries Small-Large: 24 Queries

32nd VLDB Conference, Seoul, Korea, Conclusion Pipelined state sliced join chain Mem-Optimal chain construction CPU-Optimal chain construction Implemented in CAPE Performance evaluation

32nd VLDB Conference, Seoul, Korea, Thank You! Visit CAPE Homepage Supported by: CRI grant CNS