Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.

Slides:



Advertisements
Similar presentations
Load Management and High Availability in Borealis Magdalena Balazinska, Jeong-Hyon Hwang, and the Borealis team MIT, Brown University, and Brandeis University.
Advertisements

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
1 11. Streaming Data Management Chapter 18 Current Issues: Streaming Data and Cloud Computing The 3rd edition of the textbook.
The Design of the Borealis Stream Processing Engine Daniel J. Abadi1, Yanif Ahmad2, Magdalena Balazinska1, Ug ̆ur C ̧ etintemel2, Mitch Cherniack3, Jeong-Hyon.
Adaptive Monitoring of Bursty Data Streams Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani.
Data Stream Computation Lecture Notes in COMP 9314 modified from those by Nikos Koudas (Toronto U), Divesh Srivastava (AT & T), and S. Muthukrishnan (Rutgers)
1 Load Shedding CS240B notes. 22 Load Shedding in a DSMS zDSMS: online response on boundless and bursty data streams—How? zBy using approximations and.
The Design of the Borealis Stream Processing Engine Brandeis University, Brown University, MIT Magdalena BalazinskaNesime Tatbul MIT Brown.
Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.
The Design of the Borealis Stream Processing Engine CIDR 2005 Brandeis University, Brown University, MIT Kang, Seungwoo Ref.
Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.
1 Continuous Queries over Data Streams Vitaly Kroivets, Lyan Marina Presentation for The Seminar on Database and Internet The Hebrew University of Jerusalem,
An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations Presenter: Liyan Zhang Presentation of ICS
1 Murali Mani Topics projects in databases and web applications and XML Database Systems Research Lab @cs.wpi.eduWebpages:
Aurora Proponent Team Wei, Mingrui Liu, Mo Rebuttal Team Joshua M Lee Raghavan, Venkatesh.
1 Elke A. Rundensteiner Topics projects in database and Information systems, such as, web information systems, distributed databases, Etc. Database Systems.
Quality-Of-Service (QoS) Panel Mitch Cherniack Brandeis David Maier OGI Rajeev Motwani Stanford Johannes GehrkeCornell Hari BalakrishnanMIT SWiM, Stanford.
Scalable Distributed Stream System Mitch Cherniack, Hari Balakrishnan, Magdalena Balazinska, Don Carney, Uğur Çetintemel, Ying Xing, and Stan Zdonik Proceedings.
MPDS 2003 San Diego 1 Reducing Execution Overhead in a Data Stream Manager Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack.
1 Murali Mani Topics projects in databases and web applications and XML Database Systems Research Lab @cs.wpi.eduWebpages:
1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
Streaming Data, Continuous Queries, and Adaptive Dataflow Michael Franklin UC Berkeley NRC June 2002.
Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.
SWIM 1/9/20031 QoS in Data Stream Systems Rajeev Motwani Stanford University.
Panel on Stream Query Languages The Aurora View Stan Zdonik Brown University.
Avoiding Idle Waiting in the execution of Continuous Queries Carlo Zaniolo CSD CS240B Notes April 2008.
New Challenges in Cloud Datacenter Monitoring and Management
Computer Science Cataclysm: Policing Extreme Overloads in Internet Applications Bhuvan Urgaonkar and Prashant Shenoy University of Massachusetts.
[ §6 : 1 ] 6. Basic Methods II Overview 6.1 Models 6.2 Taxonomy 6.3 Finite State Model 6.4 State Transition Model 6.5 Dataflow Model 6.6 User Manual.
Management by Network Search
An adaptive framework of multiple schemes for event and query distribution in wireless sensor networks Vincent Tam, Keng-Teck Ma, and King-Shan Lui IEEE.
Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.
Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at
MONITORING STREAMS: A NEW CLASS OF DATA MANAGEMENT APPLICATIONS DON CARNEY, U Ğ UR ÇETINTEMEL, MITCH CHERNIACK, CHRISTIAN CONVEY, SANGDON LEE, GREG SEIDMAN,
The Design of the Borealis Stream Processing Engine CIDR 2005 Brandeis University, Brown University, MIT Kang, Seungwoo Ref.
Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Master’s Thesis (30 credits) By: Morten Lindeberg Supervisors: Vera Goebel and Jarle Søberg Design, Implementation, and Evaluation of Network Monitoring.
A new model and architecture for data stream management.
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
한국기술교육대학교 컴퓨터공학부 민준기.  Stream data ◦ A growing number of applications generate streams of data  Performance measurements in network monitoring and traffic.
Aurora – system architecture Pawel Jurczyk. Currently used DB systems Classical DBMS: –Passive repository storing data (HADP – human-active, DBMS- passive.
Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al.
Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
Jon Turner Resilient Cell Resequencing in Terabit Routers.
Aurora Group 19 : Chu Xuân Tình Trần Nhật Tuấn Huỳnh Thái Tâm Lec: Associate Professor Dr.techn. Dang Tran Khanh A new model and architecture for data.
A Data Stream Publish/Subscribe Architecture with Self-adapting Queries Alasdair J G Gray and Werner Nutt School of Mathematical and Computer Sciences,
A new model and architecture for data stream management.
Aurora: a new model and architecture for data stream management Daniel J. Abadi 1, Don Carney 2, Ugur Cetintemel 2, Mitch Cherniack 1, Christian Convey.
Interactive Data Exploration Using Semantic Windows Alexander Kalinin Ugur Cetintemel, Stan Zdonik.
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
Efficient Evaluation of Queries in a Mediator for WebSources Louiqa Raschid University of Maryland Joint work with Zadorozhny, Vidal, Urhan, Bright.
Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown.
Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Song Liu‡, Sunil Prabhakar†, and Bin Yao‡ † Department of Computer.
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
Stream Data Operator Ordering  Query Optimization Query Index.
Data Streams COMP3017 Advanced Databases Dr Nicholas Gibbins –
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Applying Control Theory to Stream Processing Systems
Load Shedding CS240B notes.
An overview of Data Streaming
Data Stream Management System (DSMS)
Presenter Kyungho Jeon 11/17/2018.
Multimedia Data Stream Management System
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Advanced Database Management System
Load Shedding CS240B notes.
Adaptive Query Processing (Background)
Presentation transcript:

Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis University Christian Convey Brown University Sangdon LeeBrown University Greg Seidman Brown University Michael Stonebraker MIT Nesime Tatbul Brown University Stan ZdonikBrown University

Example Stream Applications Critical Care –Streams of Vital Sign Measurements Physical Plant Monitoring –Streams of Environmental Readings Market Analysis –Streams of Stock Exchange Data Biological Population Tracking –Streams of Positions from Individuals of a Species

Not Your Average DBMS 1.External, Autonomous Data Sources 2.Querying Time-Series 3.Triggers-in-the-large 4.Real-time response requirements 5.Approximate Query Results

Aurora At-A-Glance Stream Query Processing System 3 Schools, 5 Faculty, 11 Grad Students, Several Ugrads Features 1.Designed for Scalablility: 10 6 stream inputs, queries 2.QoS-Driven Resource Management 3.Continuous and Historical Queries 4.Stream Storage Management 5.Implemented Prototype: Demo Submission, Fall ‘02 This paper: System Overview: Architecture and High-Level Strategies

Talk Outline 1.Introduction 2. Aurora Overview 3.Runtime Operation 4.Adaptivity 5. Related Work and Conclusions 

Aurora from 100,000 Feet Query App QoS Query App QoS Query App QoS Each Provides: A over input data streams A Quality-Of-Service Specification ( ) (specifies utility of partial or late results) Application Query QoS

Aurora from 100 Feet App QoS App QoS App QoS Queries = Workflow (Boxes and Arcs) Workflow Diagram = “Aurora Network” Boxes = Query Operators Arcs = Streams        Slide Tumble   Streams (Arcs) stream: tuple sequence from common source (e.g., sensor) tuples timestamped on arrival (Internal use: QoS) Query Operators (Boxes) Simple: FILTER, MAP, RESTREAM Binary: UNION, JOIN, RESAMPLE Windowed:TUMBLE, SLIDE, XSECTION, WSORT

Aurora in Action App QoS App QoS App QoS        Slide Tumble                       App Tumble App “Box-at-a-time” Scheduling Arcs  Tuple Queues Outputs Monitored for QoS

… Continuous and Historical Queries ad-hoc query O4O4 O5O5 QoS App … O1O1 O3O3 O2O2 continuous query QoS App …… Queues O7O7 O8O8 O9O9 view 3 Days QoS …… Connection Point 1 Hour

Quality-of-Service (QoS) Output Value Specifies “Utility” Of Imperfect Query Results Delay-Based (specify utility of late results) Delivery-Based, Value-Based (specify utility of partial results) QoS Influences… Scheduling, Storage Management, Load Shedding % Tuples Delivered B Delay A C

Talk Outline 1.Introduction 2. Aurora Overview 3. Runtime Operation 4. Adaptivity 5. Related Work and Conclusions 

Runtime Operation Basic Architecture Scheduler QOS Monitor Box Processors Buffer Storage Manager Persistent Store … q1q1 … q2q2 … qiqi … q1q1 … qnqn … q2q2      Catalog Router inputs outputs

Runtime Operation Scheduling: Maximize Overall QoS Choice 1: A: Cost: 1 sec (…, age: 1 sec) B: Cost: 2 sec (…, age: 3 sec) Delay = 2 sec Utility = 0.5 Delay = 5 sec Utility = 0.8 Schedule Box A now rather than later Ideal: Maximize Overall Utility Presently exploring scalable heuristics (e.g., feedback-based) Choice 2:

Runtime Operation Scheduling: Minimizing Per Tuple Processing Overhead Train Scheduling: A B …xyz A (x)A (y)A (z)B (A (x))B (A (y))B (A (z)) Default Operation: = Context Switch AB …xyz B (A (x))B (A (y))B (A (z)) Box Trains: A B …xyz A (z, y, x) B (A (z), A (y), A (x)) Tuple Trains:

1.Run-time Queue Management Prefetch Queues Prior to Being Scheduled Drop Tuples from Queues to Improve QoS 2. Connection Point Management Support Efficient (Pull-Based) Access to Historical Data E.g., indexing, sorting, clustering, … Runtime Operation Storage Management

Talk Outline 1.Introduction 2. Aurora Overview 3. Runtime Operation 4. Adaptivity 5. Related Work and Conclusions 

Adaptivity Query Optimization Compile-time, Global Optimization Infeasible –Too Many Boxes –Too Much Volatility in Network, Data Dynamic, Local Optimization 3. Drain Subnetwork 4. Optimize Subnetwork 5. Turn on Taps 1. Identify Subnetwork 2. Buffer Inputs

1. Two Load Shedding Techniques: Random Tuple Drops Add DROP box to network (DROP a special case of FILTER) Position to affect queries w/ tolerant delivery-based QoS reqts Semantic Load Shedding FILTER values with low utility (acc to value-based QoS) 2. Triggered by QoS Monitor e.g., after Latency Analysis reveals certain applications are continuously receiving poor QoS Adaptivity Load Shedding

Adaptivity Detecting Overload Throughput Analysis Cost = c Selectivity = s Input rate = r Output rate = min (1/c, r) * s 1/c > r  Problem C,S I O P I O P I O P I O P I O P I O P I O P I O P I O P Monitor each application’s Delay-based QoS Problem: Too many apps in “bad zone” Latency Analysis

Talk Outline 1.Introduction 2. Aurora Overview 3. Runtime Operation 4. Adaptivity 5. Related Work and Conclusions 

Related Work Stream Processing Systems: Niagara [CDTY00], STREAM [BW01], Tribeca [SH98] Telegraph [MF02, MSHR02] Adaptive Query Processing Eddies [AH00], Tukwila [IFFLW99], Query Scrambling [AFTU96] Multiple Query Optimization [SG90], [RC88] Approximate Query Answering Online Aggregation [HHW97], AQUA [AGP99] Active Databases [PD99], [SPAM91], [HC+99] Continuous Queries Tapestry [TGNO92], OpenCQ [LPT99], Chronicle [JMS95]

Conclusions Aurora Stream Query Processing System 1.Designed for Scalability 2.QoS-Driven Resource Management 3.Continuous and Historical Queries 4.Stream Storage Management 5.Implemented Prototype Web site:

Implementation GUI

Implementation Runtime