Presentation is loading. Please wait.

Presentation is loading. Please wait.

Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown.

Similar presentations


Presentation on theme: "Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown."— Presentation transcript:

1 Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis University Christian Convey Brown University Sangdon LeeBrown University Greg Seidman Brown University Michael Stonebraker MIT Nesime Tatbul Brown University Stan ZdonikBrown University

2 Background Authors: MIT/Brown/Brandeis First Aurora (idea), Then Borealis (distributed,etc) Then startup (StreamBase) –Practical system –Designed for Scalablility: 10 6 stream inputs, queries –QoS-Driven Resource Management –Stream Storage Management –Reliability/ Fault Tolerance –Distribution and Adaptivity

3 Example Stream Applications Market Analysis –Streams of Stock Exchange Data Critical Care –Streams of Vital Sign Measurements Physical Plant Monitoring –Streams of Environmental Readings Biological Population Tracking –Streams of Positions from Individuals of a Species

4 Not Your Average DBMS 1.External, Autonomous Data Sources 2.Querying Time-Series 3.Triggers-in-the-large 4.Real-time response requirements 5.Noisy Data, Approximate Query Results

5 Outline 2. Aurora Overview/ Query Model 3.Runtime Operation 4.Adaptivity 

6 Aurora from 100,000 Feet Query App QoS............ Query App QoS...... Query App QoS........................ Each Provides: A over input data streams A Quality-Of-Service Specification ( ) (specifies utility of partial or late results) Application Query QoS

7 Aurora from 100 Feet App QoS............ App QoS...... App QoS............ Queries = Workflow (Boxes and Arcs) Workflow Diagram = “Aurora Network” Boxes = Query Operators Arcs = Streams        Slide Tumble   Streams (Arcs) stream: tuple sequence from common source (e.g., sensor) tuples timestamped on arrival (Internal use: QoS) Query Operators (Boxes) Simple: FILTER, MAP, RESTREAM Binary: UNION, JOIN, RESAMPLE Windowed:TUMBLE, SLIDE, XSECTION, WSORT

8 Aurora in Action App QoS............ App QoS...... App QoS............        Slide Tumble                       App Tumble App “Box-at-a-time” Scheduling Arcs  Tuple Queues Outputs Monitored for QoS

9 … Continuous and Historical Queries ad-hoc query O4O4 O5O5 QoS App … O1O1 O3O3 O2O2 continuous query QoS App …… Queues O7O7 O8O8 O9O9 view 3 Days QoS …… Connection Point 1 Hour

10 Quality-of-Service (QoS) Output Value Specifies “Utility” Of Imperfect Query Results Delay-Based (specify utility of late results) Delivery-Based, Value-Based (specify utility of partial results) QoS Influences… Scheduling, Storage Management, Load Shedding % Tuples Delivered B Delay A C

11 Talk Outline 1.Introduction 2. Aurora Overview 3. Runtime Operation 4. Adaptivity 5. Related Work and Conclusions 

12 Runtime Operation Basic Architecture Scheduler QOS Monitor Box Processors...... Buffer Storage Manager Persistent Store … q1q1 … q2q2 … qiqi … q1q1 … qnqn...... … q2q2  ......  ......  Catalog Router inputs outputs

13 Runtime Operation Scheduling: Maximize Overall QoS Choice 1: A: Cost: 1 sec (…, age: 1 sec) B: Cost: 2 sec (…, age: 3 sec) Delay = 2 sec Utility = 0.5 Delay = 5 sec Utility = 0.8 Schedule Box A now rather than later Ideal: Maximize Overall Utility (feedback driven) Choice 2:

14 Runtime Operation Scheduling: Minimizing Per Tuple Processing Overhead Train Scheduling: A B …xyz A (x)A (y)A (z)B (A (x))B (A (y))B (A (z)) Default Operation: = Context Switch AB …xyz B (A (x))B (A (y))B (A (z)) Box Trains: A B …xyz A (z, y, x) B (A (z), A (y), A (x)) Tuple Trains:

15 1.Run-time Queue Management Prefetch Queues Prior to Being Scheduled Drop Tuples from Queues to Improve QoS 2. Connection Point Management Support Efficient (Pull-Based) Access to Historical Data E.g., indexing, sorting, clustering, … Runtime Operation Storage Management

16 Talk Outline 1.Introduction 2. Aurora Overview 3. Runtime Operation 4.Query Optimization and Adaptivity 5. Conclusions 

17 Stream Query Optimization Differences with Traditional Query Optimization?

18 Stream Query Optimization New classes of operators (windows) may mean new rewrites New execution modes (continuous/pipelining) More dynamic fluctuations in statistics  compile time optimization not possible Global optimization not practical; as huge query networks  Adaptive optimization. Other cost models taking memory into account, not throughput but output rate, etc. Query optimization and load shedding

19 Query Optimization Compile-time, Global Optimization Infeasible Too Many Boxes Too Much Volatility in Network, Data Dynamic, Local Optimization Threshold re when to optimize

20 Motivation of ‘Query Migration’ Continuous query over streams –Statistics unknown before start –Statistics changing during execution Stream rates, arrival pattern, distribution, etc Need for dynamic adaptation –Plan re-optimization Change the shape of query plan tree

21 Run-time Plan Re-Optimization Step 1 - Decide when to optimize –Statistics Monitoring Step 2 – Generate new query plan –Query Optimization Step 3 – Replace current plan by new plan –Plan Migration

22 Adaptivity in Query Optimization Dynamic Optimization : Migration 3. Drain Subnetwork 4. Optimize Subnetwork 5. Turn on Taps 1. Identify Subnetwork 2. Buffer Inputs

23 Naïve Plan Migration Strategy Migration Steps –Pause execution of old plan –Drain out all tuples inside old plan –Replace old plan by new plan –Resume execution of new plan AB BC AB C AB BC A B C Problem: Works for stateless operators only

24 Stateful Operator in CQ Why stateful –Need non-blocking operators in CQ –Operator needs to output partial results –State data structure keep received tuples AB AB b1 b2 b3 b4 b5 ax State AState B ax b2 axb3 Key Observation: The purge of tuples in states relies on processing of new tuples. Example: Symmetric NL join w/ window constraints

25 Naïve Migration Strategy Revisited Steps (1) Pause execution of old plan (2) Drain out all tuples inside old plan (3) Replace old plan by new plan (4) Resume execution of new plan AB BC AB C (2) All tuples drained (4) Processing Resumed (3) Old Replaced By new Deadlock Waiting Problem:

26 Adaptivity Query Optimization State Movement Protocol Parallel Track Protocol

27 Moving State Strategy Basic idea –Share common states between two migration boxes Key steps –State Matching Match states based on IDs. –State Moving Create new pointers for matched states in new box –What’s left? Unmatched states in new box CD S ABC SDSD BC S AB SCSC AB SASA SBSB SASA S BCD CD S BC SDSD BC SBSB SCSC QAQA QBQB QCQC QDQD QAQA QBQB QCQC QDQD Q ABCD Old BoxNew Box

28 Parallel Track Strategy Basic idea –Execute both plans in parallel and gradually “push” old tuples out of old box by purging Key steps –Connect boxes –Execute in parallel Until old box “expired” (no old tuple or sub-tuple) –Disconnect old box –Start execute new box only CD S ABC SDSD BC S AB SCSC AB SASA SBSB SASA S BCD CD S BC SDSD BC SBSB SCSC QAQA QBQB QCQC QDQD QAQA QBQB QCQC QDQD Q ABCD

29 1. Two Load Shedding Techniques: Random Tuple Drops Add DROP box to network (DROP a special case of FILTER) Position to affect queries w/ tolerant delivery-based QoS reqts Semantic Load Shedding FILTER values with low utility (acc to value-based QoS) 2. Triggered by QoS Monitor e.g., after Latency Analysis reveals certain applications are continuously receiving poor QoS Adaptivity Load Shedding

30 Adaptivity Detecting Overload Throughput Analysis Cost = c Selectivity = s Input rate = r Output rate = min (1/c, r) * s 1/c > r  Problem C,S I O P I O P I O P I O P I O P I O P I O P I O P I O P Monitor each application’s Delay-based QoS Problem: Too many apps in “bad zone” Latency Analysis

31 Implementation GUI

32 Implementation Runtime 0 1 2 3 4 5 6

33 Conclusions Aurora Stream Query Processing System 1.Designed for Scalability 2.QoS-Driven Resource Management 3.Continuous and Historical Queries 4.Stream Storage Management 5.Implemented Prototype See Stream Web site at Brown Univ.


Download ppt "Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown."

Similar presentations


Ads by Google