Runtime Optimization of Continuous Queries Balakumar K. Kendai and Sharma Chakravarthy Information Technology Laboratory Department of Computer Science.

Slides:



Advertisements
Similar presentations
CPU Scheduling.
Advertisements

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
Chapter 5 CPU Scheduling. CPU Scheduling Topics: Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling.
CPU Scheduling CS 3100 CPU Scheduling1. Objectives To introduce CPU scheduling, which is the basis for multiprogrammed operating systems To describe various.
Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 5: CPU Scheduling.
Adaptive Sampling for Sensor Networks Ankur Jain ٭ and Edward Y. Chang University of California, Santa Barbara DMSN 2004.
Scheduling in Batch Systems
Chapter 6: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 2, 2005 Chapter 6: CPU Scheduling Basic.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
Cs238 CPU Scheduling Dr. Alan R. Davis. CPU Scheduling The objective of multiprogramming is to have some process running at all times, to maximize CPU.
5: CPU-Scheduling1 Jerry Breecher OPERATING SYSTEMS SCHEDULING.
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.
Simulation of Memory Management Using Paging Mechanism in Operating Systems Tarek M. Sobh and Yanchun Liu Presented by: Bei Wang University of Bridgeport.
Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Lecture 5 Operating Systems.
Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Lecture 2 Process Concepts, Performance Measures and Evaluation Techniques.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
Implementation of Finite Field Inversion
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 5: CPU Scheduling.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
Chapter 5: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 5: CPU Scheduling Basic Concepts Scheduling Criteria.
Uniprocessor Scheduling
Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.
A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside
Chapter 5: Process Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Basic Concepts Maximum CPU utilization can be obtained.
1 11/29/2015 Chapter 6: CPU Scheduling l Basic Concepts l Scheduling Criteria l Scheduling Algorithms l Multiple-Processor Scheduling l Real-Time Scheduling.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.
Parametric Optimization Of Some Critical Operating System Functions An Alternative Approach To The Study Of Operating Systems Design.
6.1 CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation.
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
Sunpyo Hong, Hyesoon Kim
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
Introduction to Operating System Created by : Zahid Javed CPU Scheduling Fifth Lecture.
Full Design. DESIGN CONCEPTS The main idea behind this design was to create an architecture capable of performing run-time load balancing in order to.
Lecture 4 CPU scheduling. Basic Concepts Single Process  one process at a time Maximum CPU utilization obtained with multiprogramming CPU idle :waiting.
CPU scheduling.  Single Process  one process at a time  Maximum CPU utilization obtained with multiprogramming  CPU idle :waiting time is wasted 2.
Basic Concepts Maximum CPU utilization obtained with multiprogramming
Lecturer 5: Process Scheduling Process Scheduling  Criteria & Objectives Types of Scheduling  Long term  Medium term  Short term CPU Scheduling Algorithms.
Stela: Enabling Stream Processing Systems to Scale-in and Scale-out On- demand Le Xu ∗, Boyang Peng†, Indranil Gupta ∗ ∗ Department of Computer Science,
Chapter 5a: CPU Scheduling
Chapter 5: CPU Scheduling
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Chapter 6: CPU Scheduling
Chapter 5: CPU Scheduling
Load Shedding in Stream Databases – A Control-Based Approach
CPU Scheduling Basic Concepts Scheduling Criteria
CPU Scheduling G.Anuradha
Module 5: CPU Scheduling
3: CPU Scheduling Basic Concepts Scheduling Criteria
Chapter5: CPU Scheduling
Operating systems Process scheduling.
Chapter 5: CPU Scheduling
Chapter 6: CPU Scheduling
CPU SCHEDULING.
Chapter 5: CPU Scheduling
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Chapter 6: CPU Scheduling
Relax and Adapt: Computing Top-k Matches to XPath Queries
Module 5: CPU Scheduling
Presentation transcript:

Runtime Optimization of Continuous Queries Balakumar K. Kendai and Sharma Chakravarthy Information Technology Laboratory Department of Computer Science and Engineering THE UNIVERSITY OF TEXAS AT ARLINGTON

Introduction  Data Stream Management System (DSMS)  Processes queries over continuous streams of data  Supports monitoring applications  Needs to provide real-time or near real-time response  Continuous queries (CQ)  Evaluated continuously producing an output stream  Quality of Service (QoS) is important  Examples of streaming applications  GPS, Network/Traffic monitoring, Stock feeds  Sensor, RFID

Current MavStream Architecture Instantiator Master Scheduler Schedulers User Input O/P Streams S1S2 Input Processor Data Streams Feeder Monitored QoS Ready Queues P1 S1S2 J1 P2 S3 J2 Run-Time Optimizer Decision Maker System Monitor MavStream Server Control Flow Data Flow

Modules  Input processor  Processes queries to create query plan object  Instantiator  Creates actual instances of operators  CQ execution  Each operator is modeled as a separate thread  Select, Project  Window based - Join, Aggregates  Scheduling Strategies [Kendai:BNCOD’08]  Round robin (RR) / Weighted round robin Each operator allocated equal amount of time (based on priority)  Path capacity scheduling (PCS) Lowest tuple latency.  Segment scheduling Lowest memory utilization.  Simplified segment scheduling. (SS) Intermediate performance  Each scheduling strategy executes as a separate thread    Query Tree

Motivation  Why Runtime Optimization?  Multiple QoS requirements Tuple Latency : –Delay in results –Difference between the entry and exit time of a tuple Memory Utilization –Size of tuples in the input queues of operators Throughput : –Tuples produced each second

Motivation  Why Runtime Optimization?  Various scheduling strategies Scheduling strategies optimize different QoS measures Chain: –Low memory usage and high latency Path Capacity Strategy –Low latency and high memory usage  Long running queries and bursty input  Which scheduling strategy to use for a query? Based on QoS measures of interest Adapt scheduling strategy of queries to meet requirements of QoS

Alternatives For Optimization  Need to choose the best scheduling strategy  Input based  When input rate changes drastically  Monitor selectivity  QoS requirements are not considered  QoS based  Difference between expected and monitored values  Finer control

Runtime Optimizer Architecture  Feedback control system  Optimizes queries individually  System Monitor  Monitors QoS parameters  Provides monitored values to decision maker  Decision Maker  Selects scheduling strategy  Activate and Deactivates Load shedders Master Scheduler Schedulers O/P Streams Monitored QoS Ready Queues P1 S1S2 J1 Run-Time Optimizer Decision Maker System Monitor Feedback

Runtime Optimizer Architecture  Inputs  Prioritized QoS measures of interest  Expected values for QoS measures  Monitored values for QoS measures Provided by System Monitor  Logic / Decision Making  Decision table driven approach to choose a scheduling strategy  Output /Actions  Choose a strategy to improve QoS  Update monitoring interval

Piecewise Approximation of QoS  Fixed value  Inflexible  Expected values may not be same for all time periods  Multiple values  Modeled as piecewise linear functions  Ordered by time X values – time intervals Y values – QoS –Tuple latency –Memory Utilization –Throughput time Latency time Latency

QoS From Piecewise Functions  Flexibility to specify single or multiple intervals  QoS Specification  List of x, y values  A pair of (x, y) values form an interval  Expected values  Calculated using slope and boundary values within an interval  Between intervals extrapolate using values on either side  Time point outside all intervals : Endpoint values are extrapolated  Provides a value for comparison throughout the lifetime time Latency/ Memory/ Throughput

Priority Classes And Usage  QoS Measures: Tuple latency, Memory Utilization, Throughput  Algorithm independent of number of priority classes  Initial weights can be assigned Priority Classes Action Must Satisfy 1.Select best scheduling strategy 2.Load shedding 3.Initial weight - 1 Best Effort 1.Select best scheduling strategy 2.Initial weight 0.5 Don’t Care 1.Select scheduling strategy 2.Initial weight 0.01

Need For Decision Table  System supports multiple scheduling strategies  Characteristics of strategies  Example Tuple Latency belongs to Must Satisfy Class –PCS best strategy, Segment worst Memory Utilization belongs to Must Satisfy Class –Segment best, PCS worst Tuple Latency and Memory Utilization belong to Must Satisfy Class –Simplified Segment (SS) better for both Possible combinations increase with more strategies and QoS measures  Hardwired Logic  Inflexible  Decision logic mixed with code  Decision Table  Easy to extend or modify  What if scenarios can be easily explored

Ranking Of Scheduling Strategies  Each strategy is relatively ranked for each QoS measure  Example Four strategies –4 indicates best strategy –1 indicates worst strategy  Addition of strategies revises decision table  Runtime optimizer logic does not change Round Robin (RR) Path Capacity (PCS) SegmentSimplified Segment (SS) Tuple Latency2413 Memory Utilization 2143 Throughput2413 Round Robin (RR) Path Capacity (PCS) SegmentSimplified Segment (SS) Chain Tuple Latency35241 Memory Utilization Throughput35241

Example : Decision Making QoS Measure ClassInitial Weight Tuple Latency Must Satisfy 1 RRPCSSegmentSS Tuple Latency 2413 Memory Utilization 2143 Throughput 2413 Input QoS Measure ClassInitial Weight Tuple Latency Must Satisfy 1 Memory Utilization Must Satisfy 1 Tuple Latency Violated RR : 1*(2/4) = 0.5 PCS : 1*(4/4) = 1 Segment : 1*(1/4) = 0.25 SS : 1*(3/4) = 0.75 Tuple Latency & Memory Utilization Violated RR : 1*(2/4) + 1*(2/4) = 1 PCS : 1*(4/4) + 1*(1/4) = 1.25 Segment : 1*(1/4) + 1*(4/4) = 1.25 SS : 1*(3/4) + 1*(3/4) = 1.5

QoS Measures Belong To Different Priority Classes  Multilevel approach for selecting strategies  QoS measures are considered level wise  Best Effort QoS measures are considered only when Must Satisfy measures are met  Don’t Care class QoS measures are considered only when Must Satisfy and Best Effort measures are met

Reduction of Weights  Domination of Must Satisfy measures while considering Best Effort measures  Keep track of margin by which measure is satisfied  Use the margin to reduce weights  Higher the margin by which measure is satisfied, weights can be lowered more without affecting the measure  Margin = (Expected – Observed) / Expected  Lowest weight allowed > Initial weight of next lower priority class

Example : Decision Making RRPCSSegmentSS Tuple Latency 2413 Memory Utilization 2143 Throughput 2413 Input QoS Measure ClassInitial Weight Tuple Latency Must Satisfy 1 Memory Utilization Best Effort 0.5 ThroughputDon’t Care 0.01 RR : 0.75*(2/4) + 0.5*(2/4) = PCS : 0.75*(4/4) + 0.5*(1/4) = Segment : 0.75*(1/4) + 0.5*(4/4) = SS : 0.75*(3/4) + 0.5*(3/4) = Tuple latency satisfied & Memory Utilization Violated QoS MeasureExpected Value Observed value Reduction Percentage Tuple Latency21(2-1)/2 = 0.5 Reduced weight = Initial Weight – (Reduction Percentage * Weight Range) 1 - (0.5 * (1-0.5) ) = 0.75

Design Issues  Delays due to strategy switching  Synchronization  Moving schedulable objects between ready queues  Overhead  Number of switches  Monitoring Frequency

Look Ahead Time  Query has to be executed in a new strategy for some period of time for changes to appear  Expected QoS may not be the same  Look ahead for expected QoS values  Compare monitored QoS values with expected QoS values at time + ∆t  ∆t > time to switch + time to schedule all operators once with new strategy  Can reduce unwanted switches time Y value (x 1, y 1) (x 2, y 2)

Overhead  Decision making overhead  Time taken to evaluate all measures  Choosing a strategy Fixed number of strategies Fixed number of measures  Can be considered constant  Actions to change scheduling strategy  Overhead proportional to  Number of switches Worst case – number of times monitored  Overhead can be reduced by  Reducing number of switches

Monitoring Frequency  Monitoring frequency determines maximum number of switches  Low monitoring frequency Runtime optimizer may not react in time  Determining Monitoring Interval  Number of times to be monitored in each interval Monitor at least at begin, center and end points of an interval Minimum time between monitoring cycles – All operators in a query must be scheduled for some period of time using a scheduling strategy for effects to become visible – t 1 = m * time required to schedule all operators, m >= 1 Maximum time between monitoring cycles –t 2 = n * time required to schedule all operators –Query should monitored at least every t 2 seconds –m, n can be configured, m<< n –Defaults »m = 2, n = 10

Runtime Optimizer Flowchart Current Strategy If strategy different from current True Compare QOS Violating Satisfied check lower priority measures Get strategy using violated and satisfied measures Switch to new strategy

Implementation Details  Input Processor  Processes the list of expected QoS values creates data structure QoS Parameters  Decision Maker  Holds Decision Table, algorithm to choose strategy  Hash table contains QoS Data for each query –QoS Parameters –Methods provided »Expected QoS values, given a time »Next time to monitor for a query  System Monitor  Maintains hash table containing information required to monitor a query  Wakes up and monitors output in a cycle  Determines amount of time monitor should wait between cycles  Invokes decision maker

Implementation Issues  Individual monitoring thread for queries  High overhead  Runtime optimizer changing scheduling strategies  Synchronization issues Potentially can block Other queries to monitor can be delayed  Switching strategy for a query delegated to Master Scheduler

Experimental Setup  AMD Opteron 2 GHz, Dual Core - Quad processor  Red Hat Enterprise Linux AS 4  JDK 1.5, 16 GB RAM, 8GB Max heap size  Monitoring interval fixed for comparison purposes  Query  Two joins, 8 operators  Three input streams, 2 million tuples each      

Multiple QoS Measures – Different Priority Mean rates for Poisson distribution  800, 550, 900 tuples/sec  Mean doubled at different points  Tuple based window tuples/window  Choice of strategy by runtime optimizer  Tuple latency Must Satisfy (1,0.5)-(10,0.5) Seconds  Memory Utilization Don’t Care (1,10K)-(500,10K), (550,1K) – (3000,1K) Kilo Bytes  Initial Strategy was provided  Segment

Multiple QoS Measures – Different Priority

Multiple QoS Measures – Same Priority  Mean rates for Poisson distribution  2000,1800,2200 tuples/sec  Tuple latency  Must Satisfy  (1,0.5)-(10,0.5)  Seconds  Memory Utilization  Must Satisfy  (1,10K)-(500,10K), (550,1K) – (3000,1K)  Kilo Bytes  Initial Strategy was provided  Segment

Multiple QoS Measures – Same Priority

Related Work  Aurora: A New Model and Architecture for Data Stream Management. D. Abadi, et al. In VLDB Journal (12)2: , August 2003  Two level scheduling  Load shedding QoS driven  Borealis, Streambase  STREAM: The Stanford Stream Data Manager. The STREAM Group. IEEE Data Engineering Bulletin, March 2003  CQL  Chain scheduling  Load shedding for aggregation queries

Related Work  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. Sirish Chandrasekaran, et al CIDR Ingress and caching Query processing Adaptive routing  NiagaraCQ: A Scalable Continuous Query System for Internet Databases. Jianjun Chen, et al SIGMOD 2000, p  Cougar: Towards Sensor Database Systems. Philippe Bonnet, et al International Conference on Mobile Data Management. January 2001.

Related Work  A framework for supporting quality of service requirements in a data stream management system, Qingchun Jiang. PhD Thesis, UTA 2005  Continuous Query Modeling System capacity planning Choose QoS delivery mechanisms QoS verification  Scheduling Strategies Path Capacity Strategy (minimize tuple latency) Segment Strategies (Greedy, MOS, Simplified) Threshold Strategy – hybrid of PC and MOS  Load Shedding System Load Estimation Optimal location of shedders Shedding-load distribution among shedders

To Conclude  Presented the issues involved in the design, implementation, and evaluation of a run-time optimizer  Introduced a decision table that stores information about performance of various scheduling strategies  Runtime optimizer uses this decision table to select the appropriate strategy  Extensive experimental validation indicates the correctness of the RO under disparate input characteristics

Thank you…