Continuous Stream Monitoring Technology Elke A. Rundensteiner Database Systems Research Laboratory Department of Computer Science Worcester Polytechnic.

Slides:



Advertisements
Similar presentations
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Advertisements

Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.
Di Yang, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute VLDB 2009, Lyon, France 1 A Shared Execution Strategy for Multiple Pattern.
Elke A. Rundensteiner Database Systems Research Group Office: Fuller 238 Phone: Ext. – 5815 WebPages:
1 Murali Mani Topics projects in databases and web applications and XML Database Systems Research Lab @cs.wpi.eduWebpages:
DAX: Dynamically Adaptive Distributed System for Processing CompleX Continuous Queries Bin Liu, Yali Zhu, Mariana Jbantova, Brad Momberger, and Elke A.
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
Elke A. Rundensteiner Topics projects in database and Information systems, such as, web information systems, distributed databases, Etc. Database Systems.
Dynamic Plan Migration for Continuous Query over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group Worcester.
1 Elke A. Rundensteiner Topics projects in database and Information systems, such as, web information systems, distributed databases, Etc. Database Systems.
Continuous Stream Monitoring Technology Elke A. Rundensteiner Database Systems Research Laboratory Department of Computer Science Worcester Polytechnic.
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
SIGMOD'061 Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Bin Liu, Yali Zhu and Elke A. Rundensteiner Database Systems Research.
Chapter 10: Stream-based Data Management Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors:
State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries Song Wang Elke Rundensteiner Database Systems Research Group Worcester.
Dynamic Plan Migration for Continuous Queries over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group, WPI. Massachusetts,
1 Murali Mani Topics projects in databases and web applications and XML Database Systems Research Lab @cs.wpi.eduWebpages:
Elke A. Rundensteiner Database Systems Research Group Office: Fuller 238 Phone: Ext. – 5815 WebPages:
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
1 DCAPE: Distributed and Self-Tuned Continuous Query Processing Tim Sutherland,Bin Liu,Mariana Jbantova, and Elke A. Rundensteiner Department of Computer.
Mariam Salloum (YP.com) Xin Luna Dong (Google) Divesh Srivastava (AT&T Research) Vassilis J. Tsotras (UC Riverside) 1 Online Ordering of Overlapping Data.
Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.
Continuous resource monitoring for self-predicting DBMS Dushyanth Narayanan 1 Eno Thereska 2 Anastassia Ailamaki 2 1 Microsoft Research-Cambridge, 2 Carnegie.
Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.
CPS 216: Advanced Database Systems Shivnath Babu Fall 2006.
Index Tuning for Adaptive Multi-Route Data Stream Systems Karen Works, Elke A. Rundensteiner, and Emmanuel Agu Database Systems Research.
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
CAPE: Continuous Query Engine with Heterogeneous-Grained Adaptivity Elke A. Rundensteiner, Luping Ding, Timothy Sutherland, Yali Zhu Brad Pielech, Nishant.
1 Dynamically Adaptive Distributed System for Processing CompleX Continuous Queries Bin Liu, Yali Zhu, Mariana Jbantova, Brad Momberger, and Elke A. Rundensteiner.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
DBMS Implementation Chapter 6.4 V3.0 Napier University Dr Gordon Russell.
PermJoin: An Efficient Algorithm for Producing Early Results in Multi-join Query Plans Justin J. Levandoski Mohamed E. Khalefa Mohamed F. Mokbel University.
Viktor Prasanna,Yogesh Simmhan, Alok Kumbhare, Sreedhar Natarajan 04/20/2012.
1 Elke. A. Rundensteiner Worcester Polytechnic Institute Elisa Bertino Purdue University 1 Rimma V. Nehme Microsoft.
1 Supporting Dynamic Migration in Tightly Coupled Grid Applications Liang Chen Qian Zhu Gagan Agrawal Computer Science & Engineering The Ohio State University.
ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, Keyword Search on Relational Data Streams Alexander Markowetz Yin.
Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.
Eddies: Continuously Adaptive Query Processing Ross Rosemark.
A new model and architecture for data stream management.
CS4432: Database Systems II Query Processing- Part 2.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Di Yang, Zhengyu Guo, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute EDBT 2010, Submitted 1 A Unified Framework Supporting Interactive.
M.Kersten MonetDB, Cracking and recycling Martin Kersten CWI Amsterdam.
D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang 1, Tie Hua Zhou 1, Kyung Ah Kim 2, Eun Jong Cha 2, and Keun.
Aurora: a new model and architecture for data stream management Daniel J. Abadi 1, Don Carney 2, Ugur Cetintemel 2, Mitch Cherniack 1, Christian Convey.
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Song Liu‡, Sunil Prabhakar†, and Bin Yao‡ † Department of Computer.
Safety Guarantee of Continuous Join Queries over Punctuated Data Streams Hua-Gang Li *, Songting Chen, Junichi Tatemura Divykant Agrawal, K. Selcuk Candan.
Query Optimization for Stream Databases Presented by: Guillermo Cabrera Fall 2008.
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
Efficient Opportunistic Sensing using Mobile Collaborative Platform MOSDEN.
Understanding DBMSs. Data Management Data Query Application DataBase Management System (DBMS)
Adaptive Online Scheduling in Storm Paper by Leonardo Aniello, Roberto Baldoni, and Leonardo Querzoni Presentation by Keshav Santhanam.
18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Online Parameter Optimization for Elastic Data Stream Processing Thomas Heinze, Lars Roediger, Yuanzhen Ji, Zbigniew Jerzak (SAP SE) Andreas Meister (University.
Applying Control Theory to Stream Processing Systems
Query in Streaming Environment
Supporting Fault-Tolerance in Streaming Grid Applications
Evaluating Window Joins over Punctuated Streams
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
with Raul Castro Fernandez* Matteo Migliavacca+ and Peter Pietzuch*
Resource Allocation for Distributed Streaming Applications
Adaptive Query Processing (Background)
Presentation transcript:

Continuous Stream Monitoring Technology Elke A. Rundensteiner Database Systems Research Laboratory Department of Computer Science Worcester Polytechnic Institute, USA cs.wpi.edu November 2006

2 A Database... Vast amount of electronic information in organisations, companies, scientific institutes that needs to be organized, stored securily, and accessed efficiently and easily. Three common steps :  Make schema design  Load database  Query static database Stored Database DBMS Select name from employee;

3 So what next ? Stored Database DBMS Select name from employee;

4 A Look at Modern Data : Streams !  Digital radio telescopes  Network traffic flow  Stock tickers/feeds  Sensor networks  Web usage transactions  Outpatient care  Environmental instruments DSMS Filter & Transform select fft(s) from radiosignal s where source(s)= “Antenna1”;

5 Databases : Everything is Upside Down ! data Query static data Query data streams of data Standing queries one-time queries

6 Continuous Queries on Data Streams Online Stream Monitoring Online Stream Monitoring

7 Motivating Applications Everywhere  Traffic Management : Streams of Cars and Mobile Requests  Market Analysis : Streams of Stock Exchange Data  Critical Care : Streams of Vital Sign Measurements  Physical Plant Monitoring: Streams of RFID/Environmental Readings  Emergency Response: Streams of Sensors and People tracking

8 Mobile Traffic-Related Streams - moving objects - dynamic range query - dynamic kNN query

9 Spatio-Temporal Continuous Tracking Monitor the traffic in the red areas Continuously return the area covered by the herd during the migration

10 FireEngine Project : Sensors in Rooms

11 Fire Monitoring Queries  Track smoke and heat clouds (moving clusters) in terms of their sizes and speeds?  Is there an outlier (prank), or an actual fire ?  Match sensors readings of fire with a fire stream simulation to determine similarity ?  Any sensors faulty, and thus should be ignored?

12 Dynamicity in Stream Query Processing Register Continuous Queries Scalable Stream Query Engine Scalable Stream Query Engine Streaming Data (push-based paradigm) Streaming Result Real-time and accurate responses required May have time- varying rates and high-volumes Available resources for executing each operator may vary over time. New query processing technology required. High workload of queries Memory- and CPU resource limitations (continuous evaluation)

13 Execution of Queries App QoS App QoS App QoS Queries = Graph = Query Plan Boxes = Query Operators such as Filter or Join Arcs = Streams with time-stamped tuples        Slide Tumble  

14 Execution of Queries App QoS App QoS App QoS        Slide Tumble                       App Tumble App Execution via Operator Scheduling

15 Adaptation Techniques in CAPE  On-Line Query Plan Reshaping (with Yali Zhu and G. Heineman ) Published in ACM SIGMOD’ 2004, and in Submission to TODS journal 2006

16 Query Optimization AB BC AB C AB BC A B C How optimize if query is continuously running?

17 Run-time Plan Re-Optimization  Step1 - Decide when to optimize Statistics monitoring  Step2 – Generate new query plan Query optimization  Step3 – Replace current plan by new plan Plan Migration

18 Naïve Plan Migration Strategy  Migration Steps Pause execution of old plan Drain out all tuples inside old plan Replace old plan by new plan Resume execution of new plan AB BC AB C AB BC A B C Problem: Works for stateless operators only

19 Stateful Operator in Streaming  Why stateful Need non-blocking operators Operator needs to output partial results AB AB State AState B Key Observation: The purge of tuples in states relies on processing of new tuples. Symmetric hash join For each new tuple A purge state B, join state B, insert to state A

20 Naïve Migration Strategy Revisited  Steps (1) Pause execution of old plan (2) Drain out all tuples inside old plan (3) Replace old plan by new plan (4) Resume execution of new plan AB BC AB C (2) All tuples drained (4) Processing Resumed (3) Old Replaced By new Deadlock Waiting Problem:

21 Proposed Dynamic Migration Strategies  Moving State Strategy  Parallel Track Strategy

22 Moving State Strategy  Basic idea Share common states between two boxes  Key Steps Identify common states  State matching Share common states  State moving Recompute unmatched states  State recomputing

23 Moving State Strategy  State Matching State in old box has unique ID During rewriting, new ID given to new state in new box When rewriting done, match states based on IDs.  State Moving Between matched states On same machine, creates new pointers for matched states in new box  What’s left? Unmatched states in new box CD S ABC SDSD BC S AB SCSC AB SASA SBSB SASA S BCD CD S BC SDSD BC SBSB SCSC QAQA QBQB QCQC QDQD QAQA QBQB QCQC QDQD Q ABCD Old BoxNew Box

24 Unmatched States  State Recomputing Recursively recompute unmatched S BC and S BCD by joining matched states AB SASA S BCD CD S BC SDSD BC SBSB SCSC QAQA QBQB QCQC QDQD Q ABCD

25 MS Migration Pros and Cons  Pros Fast when # of tuples in states is small  Low input rates or small window size  Cons Output silence during entire migration stage  Can we output results even during migration?  Motivation for Parallel Track Strategy

26 Parallel Track Strategy  Basic idea Execute both old and new plans in parallel Gradually “push” old tuples out of old box by purging  Key Steps Connect new box Execute both boxes in parallel Remove old box once “expired”  Contains only new tuples  No old tuples or sub-tuples

27 Parallel Track Strategy  Connect boxes  Execute in parallel Until all old tuples purged  Disconnect old box CD S ABC SDSD BC S AB SCSC AB SASA SBSB SASA S BCD CD S BC SDSD BC SBSB SCSC QAQA QBQB QCQC QDQD QAQA QBQB QCQC QDQD Q ABCD A Tuple ABC in S ABC ABC

28 PT Migrations Pros and Cons  Pros Keep on producing results even during migration  No results during MS migration  Cons Migration duration is at least 2W  MS may be faster depends on # of tuples in states

29 Summary : Stream Plan Migration  Our central theme : Optimization via Adaptation  First run-time solution for stateful operators  Two migration methods: Moving State Strategy Parallel Track Strategy  Cost Models for Comparative Analysis  System Implementation in CAPE  Experimental Evaluations

30 Overall Summary : So Much Left to Do !  Large variety of challenging stream applications  Generic core technology for stream processing engines  Startup starting to pop up : StreamBase for Stockmarket  Major DBMS players like IBM, Oracle, etc. joining in  Cool open research, great potential for real impact !

31 Questions ? The End

32 Subset of CAPE Publications [RDZ04] E. A. Rundensteiner, L. Ding, Y. Zhu, T. Sutherland and B. Pielech, “CAPE: A Constraint- Aware Adaptive Stream Processing Engine”. Invited Book Chapter. July [ZRH04] Y. Zhu, E. A. Rundensteiner and G. T. Heineman, "Dynamic Plan Migration for Continuous Queries Over Data Streams”. SIGMOD 2004, pages [DMR+04] L. Ding, N. Mehta, E. A. Rundensteiner and G. T. Heineman, "Joining Punctuated Streams“. EDBT 2004, pages [DR04] L. Ding and E. A. Rundensteiner, "Evaluating Window Joins over Punctuated Streams“. CIKM 2004, to appear. [DRH03] L. Ding, E. A. Rundensteiner and G. T. Heineman, “MJoin: A Metadata-Aware Stream Join Operator”. DEBS [RDSZBM04] E A. Rundensteiner, L Ding, T Sutherland, Y Zhu, B Pielech And N Mehta. CAPE: Continuous Query Engine with Heterogeneous-Grained Adaptivity. Demonstration Paper. VLDB 2004 [SR04] T. Sutherland and E. A. Rundensteiner, "D-CAPE: A Self-Tuning Continuous Query Plan Distribution Architecture“. Tech Report, WPI-CS-TR-04-18, [SPR04] T. Sutherland, B. Pielech, Yali Zhu, Luping Ding, and E. A. Rundensteiner, "Adaptive Multi- Objective Scheduling Selection Framework for Continuous Query Processing “. IDEAS [SLJR05] T Sutherland, B Liu, M Jbantova, and E A. Rundensteiner, D-CAPE: Distributed and Self- Tuned Continuous Query Processing, CIKM, Bremen, Germany, Nov [LR05] Bin Liu and E.A. Rundensteiner, Revisiting Pipelined Parallelism in Multi-Join Query Processing, VLDB [B05] Bin Liu, Yali Zhu and E.A. Rundensteiner, Spill Policies for Long-Running Queries, ACM SIGMOD 2006, to appear. CAPE Project: