Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Load Shedding CS240B notes. 22 Load Shedding in a DSMS zDSMS: online response on boundless and bursty data streams—How? zBy using approximations and.

Similar presentations


Presentation on theme: "1 Load Shedding CS240B notes. 22 Load Shedding in a DSMS zDSMS: online response on boundless and bursty data streams—How? zBy using approximations and."— Presentation transcript:

1 1 Load Shedding CS240B notes

2 22 Load Shedding in a DSMS zDSMS: online response on boundless and bursty data streams—How? zBy using approximations and synopses and even zShedding load when arrival rates become impossible zApproximations and Synopses are often used with normal load zShedding is used for bursty streams and overload situations.

3 3 QoS and Load Schedding zWhen input stream rate exceeds system capacity a stream manager can shed load (tuples) z Load shedding affects queries and their answers: drop the tasks and the tuples that will cause least loss z Introducing load shedding in a data stream manager is a challenging problem z Random load shedding or semantic load shedding

4 4 Problems to Address zWhen to shed load yOverload should be detected quickly zWhere to shed load yAvoid wasted work yUpstream Drop Vs. Downstream Drop zHow much to shed yThe magnitude of the drop zWhich tuples to shed

5 5 Loss-tolerance QoS function Loss function is not linear:

6 6 Value-based QoS zValue-based QoS yShow which values of the output tuple space are most important. yIn a medical application that monitors patient heartbeats yExtreme values are certainly more interesting than normal ones yCorresponding higher utility

7 7 Load Shedding in Aurora z QoS for each application as a function relating output to its utility – Delay based, drop based, value based zTechniques for introducing load shedding operators in a plan such that QoS isdisrupted the least – Determining when, where and how much load to shed

8 8 Which Query to drop First? zModels and algorithms proposed include y Greedy algorithms or y Fractional Knapsack Problem y Other OR techniques y Must deal with nonlinearities

9 9 Load Shedding in STREAM zFormulate load shedding as an optimization problem for multiple sliding window aggregate queries – Minimize inaccuracy in answers subject to output rate matching or exceeding arrival rate zConsider placement of load shedding operators in query plan – Each operator sheds load uniformly with probability pi

10 10 Window-Oriented Load Shedding Input stream divided into windows of size w zUse fewer Slides per windows to compute aggregates—tumbles is the extreme case. z Window-based Sampling y Reservoir sampling for incoming tuples y Expiring tuples pose a more difficult problem.

11 11 Load Shedding by Sampling for Continuous Aggregate Queries on Data Streams: zOnly random samples are available for computing aggregate queries because of yLimitations of remote sensors, or transmission lines yLoad Shedding policies implemented when overloads occur y When overloads occur (e.g., due to a burst of arrivals} we can 1.drop queries all together, or 2.sample the input---much preferable zKey objective: Achieve answer accuracy with sparse samples for complex aggregates on windows zCan we improve answer accuracy with minimal overhead?

12 Load Shedding zTo cope with bursty arrivals of high-volume data zDSMS has to shed load while minimizing the degradation of the Quality of Service (QoS) zThe goal then becomes determining: ywhen, where and how much load to shed zAn intelligent scheme, can improve the quality of our mining results under bursty arrivals 12

13 13 A first Architecture zBasic Idea: [BDM04] yOptimize sampling rates of load shedders for accurate answers. yFind an error bound for each aggregate query. yDetermine sampling rates that minimize query inaccuracy within the limits imposed by resource constraints. yThis approach works for SUM and COUNT yGeneralization to other functions? …... S 1 S n Query Network ∑ ∑ ∑ Aggregate Query Operator Load Shedder Data Stream S i ∑

14 Query Network: arbitrary placement of aggregates and shedder after any aggregate S1S1 L1L4 L2L5 Q1Q4 Q5Q3Q2 Sn Data Stream Load Shedder Aggregate Operator 14

15 Generalized Load Shedding in Stream Mill 1.A general framework that achieves optimal load shedding policies, while accommodating: yDifferent requirements for different users, different query sensitivities, and different penalties. 2.Applicability to a wide spectrum of aggregate functions: yWe have formally characterized using a new notion, called reciprocal-error queries. 3.Proposing an extensible architecture that allows UDAs to benefit from the system provided load shedding functions. 4.Significant improvements (in absolute error, false positives, and false negatives) compared to the common uniform approach. 5.We propose an efficient (linear-time) algorithm to handle severe overloads without losing optimality. 15

16 16 Goals to Achieve zLight-weight overhead handling yReact to overload immediately zMinimizing QoS degradation zDelivering subset results yOnly omitting tuples from the correct answer yNever produce incorrect answers

17 17 Prediction & Improvements zA larger class of queries was considered in [LZ08] ySUM, COUNT, AVG, Quantiles. zTemporal Correlation between answers can be used to improve answer yExample: sensor data yCurrent answer can be adjusted by the past answers so that: xLow sampling rate  current answer less accurate  more dependent on history. xHigh sampling rate  current answer more accurate  less dependent on history. zA Bayesian quality enhancement module which can achieve this objective automatically and reduce the uncertainty of the approximate answers.

18 18 Improved Model Using History zThe observed answer à is computed from random samples of the complete stream with sampling rate P. zA bayesian method to obtain the improved answer by combining ythe observed answer ythe error model yhistory of the answer Aggregate Quality Enhancement Module Improved answer …... ∑∑∑ S 1 S n Query Network History P à Query Operator Load Shedder Data Stream S i ∑ …...

19 19 Summary z An error model yWorks for ordered statistics and data mining functions as well as with traditional aggregates, y computationally very efficient y Bayesian quality enhancement method for approximate aggregates in the presence of sampling.  No correction when concept changes are suspected — a two-sample test used to detect suspected changes.

20 20 References—Sampling and load shedding [Tabul03] Nesime Tatbul, Ugur Cetintemel, Stanley B. Zdonik, Mitch Cherniack, Michael Stonebraker: Load Shedding in a Data Stream Manager.VLDB2003, pp.309--320. [BDM04] Brian Babcock, Mayur Datar, Rajeev Motwani: Load Shedding for Aggregation Queries over Data Streams. ICDE 2004: 350-361. [Tabul07] Nesime Tatbul, Ugur Cetintemel, Stanley B. Zdonik: Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing. VLDB 2007: 159-170. [LZ08] Yan-Nei Law and Carlo Zaniolo: Improving the Accuracy of Continuous Aggregates and Mining Queries on Data Streams under Load Shedding. International Journal of Business Intelligence and Data Mining, 2008. [ICDE 2010] Barzan Mozafari and Carlo Zaniolo, Optimal Load Shedding with Aggregates and Mining Queries. In Proceedings of the 26th International Conference on Data Engineering (ICDE 2010), Long Beach, California, USA, March 1-6, 2010.


Download ppt "1 Load Shedding CS240B notes. 22 Load Shedding in a DSMS zDSMS: online response on boundless and bursty data streams—How? zBy using approximations and."

Similar presentations


Ads by Google