Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown.

Slides:



Advertisements
Similar presentations
Load Management and High Availability in Borealis Magdalena Balazinska, Jeong-Hyon Hwang, and the Borealis team MIT, Brown University, and Brandeis University.
Advertisements

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
The Design of the Borealis Stream Processing Engine Daniel J. Abadi1, Yanif Ahmad2, Magdalena Balazinska1, Ug ̆ur C ̧ etintemel2, Mitch Cherniack3, Jeong-Hyon.
The Design of the Borealis Stream Processing Engine Brandeis University, Brown University, MIT Magdalena BalazinskaNesime Tatbul MIT Brown.
Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.
The Design of the Borealis Stream Processing Engine CIDR 2005 Brandeis University, Brown University, MIT Kang, Seungwoo Ref.
Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.
A Stratified Approach for Supporting High Throughput Event Processing Applications July 2009 Geetika T. LakshmananYuri G. RabinovichOpher Etzion IBM T.
1 Murali Mani Topics projects in databases and web applications and XML Database Systems Research Lab @cs.wpi.eduWebpages:
DAX: Dynamically Adaptive Distributed System for Processing CompleX Continuous Queries Bin Liu, Yali Zhu, Mariana Jbantova, Brad Momberger, and Elke A.
Dynamic Plan Migration for Continuous Query over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group Worcester.
Aurora Proponent Team Wei, Mingrui Liu, Mo Rebuttal Team Joshua M Lee Raghavan, Venkatesh.
1 Continuously Adaptive Continuous Queries (CACQ) over Streams Samuel Madden, Mehul Shah, Joseph Hellerstein, and Vijayshankar Raman Presented by: Bhuvan.
Quality-Of-Service (QoS) Panel Mitch Cherniack Brandeis David Maier OGI Rajeev Motwani Stanford Johannes GehrkeCornell Hari BalakrishnanMIT SWiM, Stanford.
Stream Processing Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 30, 2005.
Continuous Stream Monitoring Technology Elke A. Rundensteiner Database Systems Research Laboratory Department of Computer Science Worcester Polytechnic.
Scalable Distributed Stream System Mitch Cherniack, Hari Balakrishnan, Magdalena Balazinska, Don Carney, Uğur Çetintemel, Ying Xing, and Stan Zdonik Proceedings.
MPDS 2003 San Diego 1 Reducing Execution Overhead in a Data Stream Manager Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack.
Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
Dynamic Plan Migration for Continuous Queries over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group, WPI. Massachusetts,
1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.
SWIM 1/9/20031 QoS in Data Stream Systems Rajeev Motwani Stanford University.
1 04/18/2005 Flux Flux: An Adaptive Partitioning Operator for Continuous Query Systems M.A. Shah, J.M. Hellerstein, S. Chandrasekaran, M.J. Franklin UC.
Avoiding Idle Waiting in the execution of Continuous Queries Carlo Zaniolo CSD CS240B Notes April 2008.
New Challenges in Cloud Datacenter Monitoring and Management
1 CSE544 Database Architecture Tuesday, February 1 st, 2011 Slides courtesy of Magda Balazinska.
Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.
Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at
MONITORING STREAMS: A NEW CLASS OF DATA MANAGEMENT APPLICATIONS DON CARNEY, U Ğ UR ÇETINTEMEL, MITCH CHERNIACK, CHRISTIAN CONVEY, SANGDON LEE, GREG SEIDMAN,
The Design of the Borealis Stream Processing Engine CIDR 2005 Brandeis University, Brown University, MIT Kang, Seungwoo Ref.
Chapter 10: Stream-based Data Management Title: Retrospective on Aurora Authors: Hari Balakrishnan, et. al.
Master’s Thesis (30 credits) By: Morten Lindeberg Supervisors: Vera Goebel and Jarle Søberg Design, Implementation, and Evaluation of Network Monitoring.
A new model and architecture for data stream management.
Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.
Aurora – system architecture Pawel Jurczyk. Currently used DB systems Classical DBMS: –Passive repository storing data (HADP – human-active, DBMS- passive.
Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
PermJoin: An Efficient Algorithm for Producing Early Results in Multi-join Query Plans Justin J. Levandoski Mohamed E. Khalefa Mohamed F. Mokbel University.
Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.
A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside
Accommodating Bursts in Distributed Stream Processing Systems Yannis Drougas, ESRI Vana Kalogeraki, AUEB
Automatic Statistical Evaluation of Resources for Condor Daniel Nurmi, John Brevik, Rich Wolski University of California, Santa Barbara.
Jon Turner Resilient Cell Resequencing in Terabit Routers.
Eddies: Continuously Adaptive Query Processing Ross Rosemark.
Aurora Group 19 : Chu Xuân Tình Trần Nhật Tuấn Huỳnh Thái Tâm Lec: Associate Professor Dr.techn. Dang Tran Khanh A new model and architecture for data.
A Data Stream Publish/Subscribe Architecture with Self-adapting Queries Alasdair J G Gray and Werner Nutt School of Mathematical and Computer Sciences,
A new model and architecture for data stream management.
CS4432: Database Systems II Query Processing- Part 2.
Aurora: a new model and architecture for data stream management Daniel J. Abadi 1, Don Carney 2, Ugur Cetintemel 2, Mitch Cherniack 1, Christian Convey.
W. Hong & S. Madden – Implementation and Research Issues in Query Processing for Wireless Sensor Networks, ICDE 2004.
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Song Liu‡, Sunil Prabhakar†, and Bin Yao‡ † Department of Computer.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Introduction to Load Balancing:
Load Shedding CS240B notes.
An overview of Data Streaming
Data Stream Management System (DSMS)
湖南大学-信息科学与工程学院-计算机与科学系
Presenter Kyungho Jeon 11/17/2018.
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Load Shedding CS240B notes.
Adaptive Query Processing (Background)
An Analysis of Stream Processing Languages
Presentation transcript:

Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis University Christian Convey Brown University Sangdon LeeBrown University Greg Seidman Brown University Michael Stonebraker MIT Nesime Tatbul Brown University Stan ZdonikBrown University

Background Authors: MIT/Brown/Brandeis First Aurora (idea), Then Borealis (distributed,etc) Then startup (StreamBase) –Practical system –Designed for Scalablility: 10 6 stream inputs, queries –QoS-Driven Resource Management –Stream Storage Management –Reliability/ Fault Tolerance –Distribution and Adaptivity

Example Stream Applications Market Analysis –Streams of Stock Exchange Data Critical Care –Streams of Vital Sign Measurements Physical Plant Monitoring –Streams of Environmental Readings Biological Population Tracking –Streams of Positions from Individuals of a Species

Not Your Average DBMS 1.External, Autonomous Data Sources 2.Querying Time-Series 3.Triggers-in-the-large 4.Real-time response requirements 5.Noisy Data, Approximate Query Results

Outline 2. Aurora Overview/ Query Model 3.Runtime Operation 4.Adaptivity 

Aurora from 100,000 Feet Query App QoS Query App QoS Query App QoS Each Provides: A over input data streams A Quality-Of-Service Specification ( ) (specifies utility of partial or late results) Application Query QoS

Aurora from 100 Feet App QoS App QoS App QoS Queries = Workflow (Boxes and Arcs) Workflow Diagram = “Aurora Network” Boxes = Query Operators Arcs = Streams        Slide Tumble   Streams (Arcs) stream: tuple sequence from common source (e.g., sensor) tuples timestamped on arrival (Internal use: QoS) Query Operators (Boxes) Simple: FILTER, MAP, RESTREAM Binary: UNION, JOIN, RESAMPLE Windowed:TUMBLE, SLIDE, XSECTION, WSORT

Aurora in Action App QoS App QoS App QoS        Slide Tumble                       App Tumble App “Box-at-a-time” Scheduling Arcs  Tuple Queues Outputs Monitored for QoS

… Continuous and Historical Queries ad-hoc query O4O4 O5O5 QoS App … O1O1 O3O3 O2O2 continuous query QoS App …… Queues O7O7 O8O8 O9O9 view 3 Days QoS …… Connection Point 1 Hour

Quality-of-Service (QoS) Output Value Specifies “Utility” Of Imperfect Query Results Delay-Based (specify utility of late results) Delivery-Based, Value-Based (specify utility of partial results) QoS Influences… Scheduling, Storage Management, Load Shedding % Tuples Delivered B Delay A C

Talk Outline 1.Introduction 2. Aurora Overview 3. Runtime Operation 4. Adaptivity 5. Related Work and Conclusions 

Runtime Operation Basic Architecture Scheduler QOS Monitor Box Processors Buffer Storage Manager Persistent Store … q1q1 … q2q2 … qiqi … q1q1 … qnqn … q2q2      Catalog Router inputs outputs

Runtime Operation Scheduling: Maximize Overall QoS Choice 1: A: Cost: 1 sec (…, age: 1 sec) B: Cost: 2 sec (…, age: 3 sec) Delay = 2 sec Utility = 0.5 Delay = 5 sec Utility = 0.8 Schedule Box A now rather than later Ideal: Maximize Overall Utility (feedback driven) Choice 2:

Runtime Operation Scheduling: Minimizing Per Tuple Processing Overhead Train Scheduling: A B …xyz A (x)A (y)A (z)B (A (x))B (A (y))B (A (z)) Default Operation: = Context Switch AB …xyz B (A (x))B (A (y))B (A (z)) Box Trains: A B …xyz A (z, y, x) B (A (z), A (y), A (x)) Tuple Trains:

1.Run-time Queue Management Prefetch Queues Prior to Being Scheduled Drop Tuples from Queues to Improve QoS 2. Connection Point Management Support Efficient (Pull-Based) Access to Historical Data E.g., indexing, sorting, clustering, … Runtime Operation Storage Management

Talk Outline 1.Introduction 2. Aurora Overview 3. Runtime Operation 4.Query Optimization and Adaptivity 5. Conclusions 

Stream Query Optimization Differences with Traditional Query Optimization?

Stream Query Optimization New classes of operators (windows) may mean new rewrites New execution modes (continuous/pipelining) More dynamic fluctuations in statistics  compile time optimization not possible Global optimization not practical; as huge query networks  Adaptive optimization. Other cost models taking memory into account, not throughput but output rate, etc. Query optimization and load shedding

Query Optimization Compile-time, Global Optimization Infeasible Too Many Boxes Too Much Volatility in Network, Data Dynamic, Local Optimization Threshold re when to optimize

Motivation of ‘Query Migration’ Continuous query over streams –Statistics unknown before start –Statistics changing during execution Stream rates, arrival pattern, distribution, etc Need for dynamic adaptation –Plan re-optimization Change the shape of query plan tree

Run-time Plan Re-Optimization Step 1 - Decide when to optimize –Statistics Monitoring Step 2 – Generate new query plan –Query Optimization Step 3 – Replace current plan by new plan –Plan Migration

Adaptivity in Query Optimization Dynamic Optimization : Migration 3. Drain Subnetwork 4. Optimize Subnetwork 5. Turn on Taps 1. Identify Subnetwork 2. Buffer Inputs

Naïve Plan Migration Strategy Migration Steps –Pause execution of old plan –Drain out all tuples inside old plan –Replace old plan by new plan –Resume execution of new plan AB BC AB C AB BC A B C Problem: Works for stateless operators only

Stateful Operator in CQ Why stateful –Need non-blocking operators in CQ –Operator needs to output partial results –State data structure keep received tuples AB AB b1 b2 b3 b4 b5 ax State AState B ax b2 axb3 Key Observation: The purge of tuples in states relies on processing of new tuples. Example: Symmetric NL join w/ window constraints

Naïve Migration Strategy Revisited Steps (1) Pause execution of old plan (2) Drain out all tuples inside old plan (3) Replace old plan by new plan (4) Resume execution of new plan AB BC AB C (2) All tuples drained (4) Processing Resumed (3) Old Replaced By new Deadlock Waiting Problem:

Adaptivity Query Optimization State Movement Protocol Parallel Track Protocol

Moving State Strategy Basic idea –Share common states between two migration boxes Key steps –State Matching Match states based on IDs. –State Moving Create new pointers for matched states in new box –What’s left? Unmatched states in new box CD S ABC SDSD BC S AB SCSC AB SASA SBSB SASA S BCD CD S BC SDSD BC SBSB SCSC QAQA QBQB QCQC QDQD QAQA QBQB QCQC QDQD Q ABCD Old BoxNew Box

Parallel Track Strategy Basic idea –Execute both plans in parallel and gradually “push” old tuples out of old box by purging Key steps –Connect boxes –Execute in parallel Until old box “expired” (no old tuple or sub-tuple) –Disconnect old box –Start execute new box only CD S ABC SDSD BC S AB SCSC AB SASA SBSB SASA S BCD CD S BC SDSD BC SBSB SCSC QAQA QBQB QCQC QDQD QAQA QBQB QCQC QDQD Q ABCD

1. Two Load Shedding Techniques: Random Tuple Drops Add DROP box to network (DROP a special case of FILTER) Position to affect queries w/ tolerant delivery-based QoS reqts Semantic Load Shedding FILTER values with low utility (acc to value-based QoS) 2. Triggered by QoS Monitor e.g., after Latency Analysis reveals certain applications are continuously receiving poor QoS Adaptivity Load Shedding

Adaptivity Detecting Overload Throughput Analysis Cost = c Selectivity = s Input rate = r Output rate = min (1/c, r) * s 1/c > r  Problem C,S I O P I O P I O P I O P I O P I O P I O P I O P I O P Monitor each application’s Delay-based QoS Problem: Too many apps in “bad zone” Latency Analysis

Implementation GUI

Implementation Runtime

Conclusions Aurora Stream Query Processing System 1.Designed for Scalability 2.QoS-Driven Resource Management 3.Continuous and Historical Queries 4.Stream Storage Management 5.Implemented Prototype See Stream Web site at Brown Univ.