1 Load Shedding CS240B notes. 22 Load Shedding in a DSMS zDSMS: online response on boundless and bursty data streams—How? zBy using approximations and.

Slides:



Advertisements
Similar presentations
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
Advertisements

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
1 Continuous Query Languages (CQL) Blocking Operators and the expressive power problem Carlo Zaniolo UCLA CSD Spring 2009.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song.
The Design of the Borealis Stream Processing Engine Daniel J. Abadi1, Yanif Ahmad2, Magdalena Balazinska1, Ug ̆ur C ̧ etintemel2, Mitch Cherniack3, Jeong-Hyon.
Maintaining Variance over Data Stream Windows Brian Babcock, Mayur Datar, Rajeev Motwani, Liadan O ’ Callaghan, Stanford University ACM Symp. on Principles.
Mining Data Streams.
Adaptive Monitoring of Bursty Data Streams Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani.
Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.
A Robust, Optimization-Based Approach for Approximate Answering of Aggregate Queries By : Surajid Chaudhuri Gautam Das Vivek Narasayya Presented by :Sayed.
Probabilistic Aggregation in Distributed Networks Ling Huang, Ben Zhao, Anthony Joseph and John Kubiatowicz {hling, ravenben, adj,
Aurora Proponent Team Wei, Mingrui Liu, Mo Rebuttal Team Joshua M Lee Raghavan, Venkatesh.
Quality-Of-Service (QoS) Panel Mitch Cherniack Brandeis David Maier OGI Rajeev Motwani Stanford Johannes GehrkeCornell Hari BalakrishnanMIT SWiM, Stanford.
Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004.
Time-Decaying Sketches for Sensor Data Aggregation Graham Cormode AT&T Labs, Research Srikanta Tirthapura Dept. of Electrical and Computer Engineering.
Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.
Chain: Operator Scheduling for Memory Minimization in Data Stream Systems Authors: Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani (Dept.
Improving the Accuracy of Continuous Aggregates & Mining Queries Under Load Shedding Yan-Nei Law* and Carlo Zaniolo Computer Science Dept. UCLA * Bioinformatics.
1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.
1 Approximation and Load Shedding for QoS in DSMS* CS240B Notes By Carlo Zaniolo CSD--UCLA ________________________________________ * Notes based on a.
1 Data Stream Management Systems Checkpoint CS240B Notes by Carlo Zaniolo UCLA CSD.
SWIM 1/9/20031 QoS in Data Stream Systems Rajeev Motwani Stanford University.
Energy-efficient Self-adapting Online Linear Forecasting for Wireless Sensor Network Applications Jai-Jin Lim and Kang G. Shin Real-Time Computing Laboratory,
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets Rainer Gemulla (University of Technology Dresden) Wolfgang Lehner (University.
Scalable Approximate Query Processing through Scalable Error Estimation Kai Zeng UCLA Advisor: Carlo Zaniolo 1.
1 Data Mining over the Deep Web Tantan Liu, Gagan Agrawal Ohio State University April 12, 2011.
Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.
Mirek Riedewald Department of Computer Science Cornell University Efficient Processing of Massive Data Streams for Mining and Monitoring.
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP-BY QUERIES Swarup Acharya Phillip Gibbons Viswanath Poosala ( Information Sciences Research Center,
Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
A new model and architecture for data stream management.
1 [3] Jorge Martinez-Bauset, David Garcia-Roger, M a Jose Domenech- Benlloch and Vicent Pla, “ Maximizing the capacity of mobile cellular networks with.
Topic (vi): New and Emerging Methods Topic organizer: Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Oslo, Norway, September 2012.
1 Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565.
Tracking Irregularly Moving Objects based on Alert-enabling Sensor Model in Sensor Networks 1 Chao-Chun Chen & 2 Yu-Chi Chung Dept. of Information Management.
Adaptive Query Processing in Data Stream Systems Paper written by Shivnath Babu Kamesh Munagala, Rajeev Motwani, Jennifer Widom stanfordstreamdatamanager.
Load Shedding Techniques for Data Stream Systems Brian Babcock Mayur Datar Rajeev Motwani Stanford University.
Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.
1 Supporting Dynamic Migration in Tightly Coupled Grid Applications Liang Chen Qian Zhu Gagan Agrawal Computer Science & Engineering The Ohio State University.
Accommodating Bursts in Distributed Stream Processing Systems Yannis Drougas, ESRI Vana Kalogeraki, AUEB
Control-Based Load Shedding in Data Stream Management Yicheng Tu †, Song Liu ‡, Sunil Prabhakar †, Bin Yao ‡ † Indiana Center of Database Systems, Department.
A new model and architecture for data stream management.
Robust Estimation With Sampling and Approximate Pre-Aggregation Author: Christopher Jermaine Presented by: Bill Eberle.
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP BY QUERIES Swaroop Acharya,Philip B Gibbons, VishwanathPoosala By Agasthya Padisala Anusha Reddy.
Aurora: a new model and architecture for data stream management Daniel J. Abadi 1, Don Carney 2, Ugur Cetintemel 2, Mitch Cherniack 1, Christian Convey.
Evaluating Window Joins over Unbounded Streams Jaewoo Kang Jeffrey F. Naughton Stratis D. Viglas {jaewoo, naughton, Univ. of Wisconsin-Madison.
Adaptivity in continuous query systems Luis A. Sotomayor & Zhiguo Xu Professor Carlo Zaniolo CS240B - Spring 2003.
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Song Liu‡, Sunil Prabhakar†, and Bin Yao‡ † Department of Computer.
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
Stream Data Operator Ordering  Query Optimization Query Index.
BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data Authored by Sameer Agarwal, et. al. Presented by Atul Sandur.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
Mining Data Streams (Part 1)
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Load Shedding CS240B notes.
A paper on Join Synopses for Approximate Query Answering
An overview of Data Streaming
Load Shedding Techniques for Data Stream Systems
Load Shedding in Stream Databases – A Control-Based Approach
Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani
Load Shedding CS240B notes.
Presentation transcript:

1 Load Shedding CS240B notes

22 Load Shedding in a DSMS zDSMS: online response on boundless and bursty data streams—How? zBy using approximations and synopses and even zShedding load when arrival rates become impossible zApproximations and Synopses are often used with normal load zShedding is used for bursty streams and overload situations.

3 QoS and Load Schedding zWhen input stream rate exceeds system capacity a stream manager can shed load (tuples) z Load shedding affects queries and their answers: drop the tasks and the tuples that will cause least loss z Introducing load shedding in a data stream manager is a challenging problem z Random load shedding or semantic load shedding

4 Problems to Address zWhen to shed load yOverload should be detected quickly zWhere to shed load yAvoid wasted work yUpstream Drop Vs. Downstream Drop zHow much to shed yThe magnitude of the drop zWhich tuples to shed

5 Loss-tolerance QoS function Loss function is not linear:

6 Value-based QoS zValue-based QoS yShow which values of the output tuple space are most important. yIn a medical application that monitors patient heartbeats yExtreme values are certainly more interesting than normal ones yCorresponding higher utility

7 Load Shedding in Aurora z QoS for each application as a function relating output to its utility – Delay based, drop based, value based zTechniques for introducing load shedding operators in a plan such that QoS isdisrupted the least – Determining when, where and how much load to shed

8 Which Query to drop First? zModels and algorithms proposed include y Greedy algorithms or y Fractional Knapsack Problem y Other OR techniques y Must deal with nonlinearities

9 Load Shedding in STREAM zFormulate load shedding as an optimization problem for multiple sliding window aggregate queries – Minimize inaccuracy in answers subject to output rate matching or exceeding arrival rate zConsider placement of load shedding operators in query plan – Each operator sheds load uniformly with probability pi

10 Window-Oriented Load Shedding Input stream divided into windows of size w zUse fewer Slides per windows to compute aggregates—tumbles is the extreme case. z Window-based Sampling y Reservoir sampling for incoming tuples y Expiring tuples pose a more difficult problem.

11 Load Shedding by Sampling for Continuous Aggregate Queries on Data Streams: zOnly random samples are available for computing aggregate queries because of yLimitations of remote sensors, or transmission lines yLoad Shedding policies implemented when overloads occur y When overloads occur (e.g., due to a burst of arrivals} we can 1.drop queries all together, or 2.sample the input---much preferable zKey objective: Achieve answer accuracy with sparse samples for complex aggregates on windows zCan we improve answer accuracy with minimal overhead?

Load Shedding zTo cope with bursty arrivals of high-volume data zDSMS has to shed load while minimizing the degradation of the Quality of Service (QoS) zThe goal then becomes determining: ywhen, where and how much load to shed zAn intelligent scheme, can improve the quality of our mining results under bursty arrivals 12

13 A first Architecture zBasic Idea: [BDM04] yOptimize sampling rates of load shedders for accurate answers. yFind an error bound for each aggregate query. yDetermine sampling rates that minimize query inaccuracy within the limits imposed by resource constraints. yThis approach works for SUM and COUNT yGeneralization to other functions? …... S 1 S n Query Network ∑ ∑ ∑ Aggregate Query Operator Load Shedder Data Stream S i ∑

Query Network: arbitrary placement of aggregates and shedder after any aggregate S1S1 L1L4 L2L5 Q1Q4 Q5Q3Q2 Sn Data Stream Load Shedder Aggregate Operator 14

Generalized Load Shedding in Stream Mill 1.A general framework that achieves optimal load shedding policies, while accommodating: yDifferent requirements for different users, different query sensitivities, and different penalties. 2.Applicability to a wide spectrum of aggregate functions: yWe have formally characterized using a new notion, called reciprocal-error queries. 3.Proposing an extensible architecture that allows UDAs to benefit from the system provided load shedding functions. 4.Significant improvements (in absolute error, false positives, and false negatives) compared to the common uniform approach. 5.We propose an efficient (linear-time) algorithm to handle severe overloads without losing optimality. 15

16 Goals to Achieve zLight-weight overhead handling yReact to overload immediately zMinimizing QoS degradation zDelivering subset results yOnly omitting tuples from the correct answer yNever produce incorrect answers

17 Prediction & Improvements zA larger class of queries was considered in [LZ08] ySUM, COUNT, AVG, Quantiles. zTemporal Correlation between answers can be used to improve answer yExample: sensor data yCurrent answer can be adjusted by the past answers so that: xLow sampling rate  current answer less accurate  more dependent on history. xHigh sampling rate  current answer more accurate  less dependent on history. zA Bayesian quality enhancement module which can achieve this objective automatically and reduce the uncertainty of the approximate answers.

18 Improved Model Using History zThe observed answer à is computed from random samples of the complete stream with sampling rate P. zA bayesian method to obtain the improved answer by combining ythe observed answer ythe error model yhistory of the answer Aggregate Quality Enhancement Module Improved answer …... ∑∑∑ S 1 S n Query Network History P à Query Operator Load Shedder Data Stream S i ∑ …...

19 Summary z An error model yWorks for ordered statistics and data mining functions as well as with traditional aggregates, y computationally very efficient y Bayesian quality enhancement method for approximate aggregates in the presence of sampling.  No correction when concept changes are suspected — a two-sample test used to detect suspected changes.

20 References—Sampling and load shedding [Tabul03] Nesime Tatbul, Ugur Cetintemel, Stanley B. Zdonik, Mitch Cherniack, Michael Stonebraker: Load Shedding in a Data Stream Manager.VLDB2003, pp [BDM04] Brian Babcock, Mayur Datar, Rajeev Motwani: Load Shedding for Aggregation Queries over Data Streams. ICDE 2004: [Tabul07] Nesime Tatbul, Ugur Cetintemel, Stanley B. Zdonik: Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing. VLDB 2007: [LZ08] Yan-Nei Law and Carlo Zaniolo: Improving the Accuracy of Continuous Aggregates and Mining Queries on Data Streams under Load Shedding. International Journal of Business Intelligence and Data Mining, [ICDE 2010] Barzan Mozafari and Carlo Zaniolo, Optimal Load Shedding with Aggregates and Mining Queries. In Proceedings of the 26th International Conference on Data Engineering (ICDE 2010), Long Beach, California, USA, March 1-6, 2010.