Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis University Christian Convey Brown University Sangdon LeeBrown University Greg Seidman Brown University Michael Stonebraker MIT Nesime Tatbul Brown University Stan ZdonikBrown University
Example Stream Applications Critical Care –Streams of Vital Sign Measurements Physical Plant Monitoring –Streams of Environmental Readings Market Analysis –Streams of Stock Exchange Data Biological Population Tracking –Streams of Positions from Individuals of a Species
Not Your Average DBMS 1.External, Autonomous Data Sources 2.Querying Time-Series 3.Triggers-in-the-large 4.Real-time response requirements 5.Approximate Query Results
Aurora At-A-Glance Stream Query Processing System 3 Schools, 5 Faculty, 11 Grad Students, Several Ugrads Features 1.Designed for Scalablility: 10 6 stream inputs, queries 2.QoS-Driven Resource Management 3.Continuous and Historical Queries 4.Stream Storage Management 5.Implemented Prototype: Demo Submission, Fall ‘02 This paper: System Overview: Architecture and High-Level Strategies
Talk Outline 1.Introduction 2. Aurora Overview 3.Runtime Operation 4.Adaptivity 5. Related Work and Conclusions
Aurora from 100,000 Feet Query App QoS Query App QoS Query App QoS Each Provides: A over input data streams A Quality-Of-Service Specification ( ) (specifies utility of partial or late results) Application Query QoS
Aurora from 100 Feet App QoS App QoS App QoS Queries = Workflow (Boxes and Arcs) Workflow Diagram = “Aurora Network” Boxes = Query Operators Arcs = Streams Slide Tumble Streams (Arcs) stream: tuple sequence from common source (e.g., sensor) tuples timestamped on arrival (Internal use: QoS) Query Operators (Boxes) Simple: FILTER, MAP, RESTREAM Binary: UNION, JOIN, RESAMPLE Windowed:TUMBLE, SLIDE, XSECTION, WSORT
Aurora in Action App QoS App QoS App QoS Slide Tumble App Tumble App “Box-at-a-time” Scheduling Arcs Tuple Queues Outputs Monitored for QoS
… Continuous and Historical Queries ad-hoc query O4O4 O5O5 QoS App … O1O1 O3O3 O2O2 continuous query QoS App …… Queues O7O7 O8O8 O9O9 view 3 Days QoS …… Connection Point 1 Hour
Quality-of-Service (QoS) Output Value Specifies “Utility” Of Imperfect Query Results Delay-Based (specify utility of late results) Delivery-Based, Value-Based (specify utility of partial results) QoS Influences… Scheduling, Storage Management, Load Shedding % Tuples Delivered B Delay A C
Talk Outline 1.Introduction 2. Aurora Overview 3. Runtime Operation 4. Adaptivity 5. Related Work and Conclusions
Runtime Operation Basic Architecture Scheduler QOS Monitor Box Processors Buffer Storage Manager Persistent Store … q1q1 … q2q2 … qiqi … q1q1 … qnqn … q2q2 Catalog Router inputs outputs
Runtime Operation Scheduling: Maximize Overall QoS Choice 1: A: Cost: 1 sec (…, age: 1 sec) B: Cost: 2 sec (…, age: 3 sec) Delay = 2 sec Utility = 0.5 Delay = 5 sec Utility = 0.8 Schedule Box A now rather than later Ideal: Maximize Overall Utility Presently exploring scalable heuristics (e.g., feedback-based) Choice 2:
Runtime Operation Scheduling: Minimizing Per Tuple Processing Overhead Train Scheduling: A B …xyz A (x)A (y)A (z)B (A (x))B (A (y))B (A (z)) Default Operation: = Context Switch AB …xyz B (A (x))B (A (y))B (A (z)) Box Trains: A B …xyz A (z, y, x) B (A (z), A (y), A (x)) Tuple Trains:
1.Run-time Queue Management Prefetch Queues Prior to Being Scheduled Drop Tuples from Queues to Improve QoS 2. Connection Point Management Support Efficient (Pull-Based) Access to Historical Data E.g., indexing, sorting, clustering, … Runtime Operation Storage Management
Talk Outline 1.Introduction 2. Aurora Overview 3. Runtime Operation 4. Adaptivity 5. Related Work and Conclusions
Adaptivity Query Optimization Compile-time, Global Optimization Infeasible –Too Many Boxes –Too Much Volatility in Network, Data Dynamic, Local Optimization 3. Drain Subnetwork 4. Optimize Subnetwork 5. Turn on Taps 1. Identify Subnetwork 2. Buffer Inputs
1. Two Load Shedding Techniques: Random Tuple Drops Add DROP box to network (DROP a special case of FILTER) Position to affect queries w/ tolerant delivery-based QoS reqts Semantic Load Shedding FILTER values with low utility (acc to value-based QoS) 2. Triggered by QoS Monitor e.g., after Latency Analysis reveals certain applications are continuously receiving poor QoS Adaptivity Load Shedding
Adaptivity Detecting Overload Throughput Analysis Cost = c Selectivity = s Input rate = r Output rate = min (1/c, r) * s 1/c > r Problem C,S I O P I O P I O P I O P I O P I O P I O P I O P I O P Monitor each application’s Delay-based QoS Problem: Too many apps in “bad zone” Latency Analysis
Talk Outline 1.Introduction 2. Aurora Overview 3. Runtime Operation 4. Adaptivity 5. Related Work and Conclusions
Related Work Stream Processing Systems: Niagara [CDTY00], STREAM [BW01], Tribeca [SH98] Telegraph [MF02, MSHR02] Adaptive Query Processing Eddies [AH00], Tukwila [IFFLW99], Query Scrambling [AFTU96] Multiple Query Optimization [SG90], [RC88] Approximate Query Answering Online Aggregation [HHW97], AQUA [AGP99] Active Databases [PD99], [SPAM91], [HC+99] Continuous Queries Tapestry [TGNO92], OpenCQ [LPT99], Chronicle [JMS95]
Conclusions Aurora Stream Query Processing System 1.Designed for Scalability 2.QoS-Driven Resource Management 3.Continuous and Historical Queries 4.Stream Storage Management 5.Implemented Prototype Web site:
Implementation GUI
Implementation Runtime