Presentation is loading. Please wait.

Presentation is loading. Please wait.

Load Shedding Techniques for Data Stream Systems Brian Babcock Mayur Datar Rajeev Motwani Stanford University.

Similar presentations


Presentation on theme: "Load Shedding Techniques for Data Stream Systems Brian Babcock Mayur Datar Rajeev Motwani Stanford University."— Presentation transcript:

1 Load Shedding Techniques for Data Stream Systems Brian Babcock Mayur Datar Rajeev Motwani Stanford University

2 Differences from Previous Talk Our focus: Aggregation queries No quality of service specifications – Instead, focus on accuracy of query answers Compensate for dropped data by scaling answers Random drops only (no semantic drops)

3 Problem Setting   ΣΣ   Σ S1S1 S2S2 R Sliding Window Aggregate Queries (SUM and COUNT) Operator Sharing Filters, UDFs, and Joins w/ Relations Q1Q1 Q2Q2 Q3Q3

4 Inputs to the Problem   ΣΣ   Σ S1S1 S2S2 R Std Dev σ Mean μ Q1Q1 Q2Q2 Q3Q3 Stream Rate r Processing Time t Selectivity s

5 Load Shedding via Random Drops Stream Rate r 11 22 Σ3Σ3 S (t 1, s 1 ) (t 2, s 2 ) Load = rt 1 + rs 1 t 2 + rs 1 s 2 t 3 (t 3, s 3 ) Sampling Rate p (time, selectivity) Load = rt 1 + p(rs 1 t 2 + rs 1 s 2 t 3 ) Scale answer by 1/p Need Load ≤ 1

6 Problem Statement Relative error is metric of choice: |Estimate - Actual| Actual Goal: Minimize the maximum relative error across queries, subject to Load ≤ 1 – Want low error with high probability

7 Relating Load Shedding and Error Relative error for query i Sampling rate for query i Query-dependent constant Equation derived from Hoeffding bounds Constant C i depends on: – Variance of aggregated attribute – Sliding window size

8 Calculate Ratio of Sampling Rates Minimize maximum relative error → Equal relative error across queries Express all sampling rates in terms of common variable λ

9 Placing Load Shedders   Σ Σ Target.8λ Target.6λ Sampling Rate.8λ Sampling Rate.75 =.6λ /.8λ

10 Conclusion Load shedding helps cope with bursts Minimizing relative error is natural objective for aggregate queries Algorithm for load shedding: – Relate target sampling rates for all queries – Place random drop operators based on target sampling rates – Adjust sampling rates to achieve desired load

11 Thanks for listening! Questions?

12 Choosing Target Sampling Rates Sampling rate for query Variance of aggregated attribute Sliding window size Relative Error

13 Measuring Inaccuracy 11 22 Σ3Σ3 Sampling Rate p 1 Sampling Rate p 2 Scale answer by 1/(p 1 p 2 ) Tuple w/ value x: x / (p 1 p 2 ) 0 with pr. p 1 p 2 with pr. 1-p 1 p 2 Key point: Product of sampling rates determines quality of approximate answer


Download ppt "Load Shedding Techniques for Data Stream Systems Brian Babcock Mayur Datar Rajeev Motwani Stanford University."

Similar presentations


Ads by Google