Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.

Slides:



Advertisements
Similar presentations
CPU Scheduling.
Advertisements

Scheduling Criteria CPU utilization – keep the CPU as busy as possible (from 0% to 100%) Throughput – # of processes that complete their execution per.
Topic : Process Management Lecture By: Rupinder Kaur Lecturer IT, SRS Govt. Polytechnic College for Girls,Ludhiana.
Chap 5 Process Scheduling. Basic Concepts Maximum CPU utilization obtained with multiprogramming CPU–I/O Burst Cycle – Process execution consists of a.
Chapter 5 CPU Scheduling. CPU Scheduling Topics: Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling.
Operating Systems CPU Scheduling. Agenda for Today What is Scheduler and its types Short-term scheduler Dispatcher Reasons for invoking scheduler Optimization.
CPU Scheduling CS 3100 CPU Scheduling1. Objectives To introduce CPU scheduling, which is the basis for multiprogrammed operating systems To describe various.
Chapter 2: Processes Topics –Processes –Threads –Process Scheduling –Inter Process Communication (IPC) Reference: Operating Systems Design and Implementation.
CS 311 – Lecture 23 Outline Kernel – Process subsystem Process scheduling Scheduling algorithms User mode and kernel mode Lecture 231CS Operating.
Scheduling in Batch Systems
Chapter 6: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 2, 2005 Chapter 6: CPU Scheduling Basic.
1 Thursday, June 15, 2006 Confucius says: He who play in root, eventually kill tree.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
MPDS 2003 San Diego 1 Reducing Execution Overhead in a Data Stream Manager Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack.
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
Wk 2 – Scheduling 1 CS502 Spring 2006 Scheduling The art and science of allocating the CPU and other resources to processes.
CS212: OPERATING SYSTEM Lecture 3: Process Scheduling 1.
Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 5: CPU Scheduling.
Aurora – system architecture Pawel Jurczyk. Currently used DB systems Classical DBMS: –Passive repository storing data (HADP – human-active, DBMS- passive.
Runtime Optimization of Continuous Queries Balakumar K. Kendai and Sharma Chakravarthy Information Technology Laboratory Department of Computer Science.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
Chapter 5: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 5: CPU Scheduling Basic Concepts Scheduling Criteria.
Chapter 5: Process Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Basic Concepts Maximum CPU utilization can be obtained.
1 11/29/2015 Chapter 6: CPU Scheduling l Basic Concepts l Scheduling Criteria l Scheduling Algorithms l Multiple-Processor Scheduling l Real-Time Scheduling.
Chapter 5: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 2, 2005 Chapter 5: CPU Scheduling Basic.
Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.
1 CS.217 Operating System By Ajarn..Sutapart Sappajak,METC,MSIT Chapter 5 CPU Scheduling Slide 1 Chapter 5 CPU Scheduling.
Introduction to Operating System Created by : Zahid Javed CPU Scheduling Fifth Lecture.
Lecture 4 CPU scheduling. Basic Concepts Single Process  one process at a time Maximum CPU utilization obtained with multiprogramming CPU idle :waiting.
CPU scheduling.  Single Process  one process at a time  Maximum CPU utilization obtained with multiprogramming  CPU idle :waiting time is wasted 2.
Basic Concepts Maximum CPU utilization obtained with multiprogramming
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
OPERATING SYSTEMS CS 3502 Fall 2017
lecture 5: CPU Scheduling
CPU SCHEDULING.
Dan C. Marinescu Office: HEC 439 B. Office hours: M, Wd 3 – 4:30 PM.
EEE Embedded Systems Design Process in Operating Systems 서강대학교 전자공학과
Process Scheduling B.Ramamurthy 9/16/2018.
Scheduling (Priority Based)
Chapter 5: CPU Scheduling
CPU Scheduling.
Chapter 6: CPU Scheduling
Chapter 6: CPU Scheduling
Process management Information maintained by OS for process management
CPU Scheduling Basic Concepts Scheduling Criteria
CPU Scheduling G.Anuradha
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Chapter 5: CPU Scheduling
Operating System Concepts
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
3: CPU Scheduling Basic Concepts Scheduling Criteria
Chapter5: CPU Scheduling
Operating systems Process scheduling.
Chapter 5: CPU Scheduling
Chapter 6: CPU Scheduling
CPU SCHEDULING.
Outline Scheduling algorithms Multi-processor scheduling
Chapter 5: CPU Scheduling
Process Scheduling B.Ramamurthy 2/23/2019.
Process Scheduling B.Ramamurthy 4/11/2019.
Process Scheduling B.Ramamurthy 4/7/2019.
Operating System , Fall 2000 EA101 W 9:00-10:00 F 9:00-11:00
Shortest-Job-First (SJR) Scheduling
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Chapter 6: CPU Scheduling
CPU Scheduling: Basic Concepts
Module 5: CPU Scheduling
Presentation transcript:

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom

Agenda Overview of Stream Processing Aurora Project Goals Aurora Processing Example Aurora Architecture Multi-Thread Vs. Single-Thread processing Important Definitions Superbox Scheduling and Processing Tuple Batching Experimental Evaluation Quality of Service (QoS) Scheduling QoS Scheduling Scalability Related Work

Overview of Stream Processing Stream Processing is the processing of potentially unbounded, continuous streams of data Data streams are created via micro-sensors, GPS devices, monitoring devices Examples include: soldier location tracking, traffic sensors, stock market exchanges, heart monitors Data may be received evenly or in bursts

Aurora Project Goals To build a data stream manager that addresses the performance and processing requirements of stream-based applications To support multiple concurrent continuous queries on one or more application data streams To use Quality-of-Service(QoS) based criteria to make resource allocation decisions

Aurora Processing Example Input Data Streams Output to Applications Operator Boxes Continuous & ad hoc queries Historical Storage

Aurora Architecture Persistent Store Buffer Manager Scheduler Catalogs Load Shredder QoS Monitor Router Box Processors B1 B2 B3 B4 InputsOutputs

Multi-Thread Vs. Single Thread Processing Multi-Thread Processing –Each query is processed in its own thread –The operating system manages resource allocation –Advantages Processing can take advantage of efficient operating system algorithms Easier to program –Disadvantages Software has limited control of resource management Additional overhead do to cache misses, lock contention and switching

Multi-Thread Vs. Single Thread Processing Single-Thread Processing –All operations are processed within a single thread –All resource allocation decisions are made by the scheduler –Advantages Allows processing to be scheduled based on latency and other Quality of Service factors based on query needs Avoids the limitations of multi-thread processing –Disadvantages More complex to program Aurora has chosen to implement a single-threaded scheduling model

Important Definitions Quality of Service (QoS) – Specific requirements that represent the needs of a specific query. In Aurora, the primary QoS factor is latency Query Tree – The set of operators (boxes) and data streams that represent a query. Superbox – A sequence of operators that are scheduled and executed as an atomic group. Aurora treats each query as separate superbox. Two-Level Scheduling – Scheduling is done at two levels. First, at the superbox level (deciding which superbox to process) and second, what order to execute the operators within the superbox once a superbox is selected.

Important Definitions (Cont.) Scheduling Plan – The combination of dynamically based superbox scheduling and algorithm based operator execution order within the superbox is called a scheduling plan. Application-at-a-time (AAAT) is a term used in Aurora that statically defines each query (application) as a superbox Box-at-a-time (BAAT) refers to scheduling at the box level rather then the superbox level Static and dynamic scheduling approaches – Static approaches to scheduling are defined prior to runtime. Dynamic scheduling approaches use runtime information and statistics to adjust and prioritize scheduling order during execution Traversing a superbox – This refers to how the operators within a superbox should be scheduled and executed

Non-Superbox Processing

Superbox Processing B4 B3 B2 B1 A1A2A3A5 C1C2C3C4C5 B5 B6

Superbox Traversal Min-Cost (MC) – Attempts to optimize per-output-tuple processing costs by minimizing the number of operator calls per output tuple Min-Latency (ML) – Attempts to produce initial tuples as soon as possible Min-Memory (MM) – Attempts to minimize memory usage Superbox traversal refers to how the operators within a superbox should be executed

Superbox Traversal Processing Min-Cost(MC) –B4 > B5 > B3 > B2 > B6 > B1 Min-Latency(ML) –B1 > B2 > B1 > B6 > B1 > B4 > B2 > B1 > B3 > B2 > B1 > B5 > B3 > B2 > B1 Min-Memory(MM) –B3 > B6 > B2 > B5 > B3 > B2 > B1 > B4 > B2 > B1 B4 B3 B2 B1B5 B6

Tuple Batching (Train Processing) A Tuple Train is the process of executing tuples as a batch within a single operator call. The goal of Tuple Train processing is to reduce overall processing cost per tuple processed Advantages of Tuple Train processing are: –Decreased number of total operator executions –Cuts down on low level overhead such as context switching, scheduling, memory management and execution queue maintenance –Some windowing and merge-join operators work efficiently when batching tuples

Experimental Evaluation Definitions Stream-based applications do not currently have a standardized benchmark Aurora modeled queries as a rooted tree structure from a stream input box to an application output box Trees are categorized based on depth and fan-out –Depth is the number of box levels from input to output –Fan-out is the average number of children of each box

Experimental Evaluation Results At low volumes “Round Robin Box-At-A-Time (RR-BAAT)” scheduling was almost as efficient as “Minimum Cost Application-At- A-Time (MC-AAAT)” at low volumes but much less efficient and higher levels –At low volumes, the efficiencies of MC-AAAT were reduced by more complex scheduling overhead –As volumes increased, the efficiencies of MC-AAAT became more apparent as scheduling overhead became a lower percentage to total processing Experimentation was also done to compare ML, MC and MM scheduling techniques –As expected, each technique minimized their specified attribute (latency, cost and memory respectively) –However, at very low processing levels the simplest algorithms tended to do the best (but who cares :)

Quality of Service (QoS) Scheduling Definitions –Utility – is how useful the tuple will be when it exits the query –Urgency – is represented by the angle of the downward slope of the utility QoS parameter. In other words, how fast the utility deteriorates Approach –Keep track of the latency of tuples that reside in the queues and pick tuples for processing based on whose execution will provide the highest aggregate QoS delivered to the applications.

Latency-Utility Relationship Critical Points Latency Quality of Service 0,0 1 The older the data gets, The less it is worth, The lower the quality of service Aurora combines the QoS charts of each query being executed with the average latency of the tuples in each box to decide which superbox to execute next. The idea is to, on average, maintain the highest quality of service.

QoS Scheduling Scalability Problem –A per-tuple approach to QoS based scheduling will not scale because of the amount of processing needed to maintain it Solution –Latency is not calculated at the tuple level, rather, it is calculated as the average latency of tuples in the box input queue –Priority is given based on the combination of utility and urgency –Once a box’s priority (priority tuple or “p-tuple”) is calculated, the boxes are placed in logical buckets bases on their priority value –Scheduling is then done based on the priority of the bucket –All boxes in a given bucket are considered equal

Related Work Eddies – has a tuple-at-a-time scheduler providing adaptablility, but does not scale well Urhan – works on rate-based pipeline scheduling of data between operators NiagaraCQ – query optimization for streaming data from wide-area information sources STREAM – provides comprehensive data stream management using chain scheduling algorithms Note, that none of the above projects have a notion of QoS