Download presentation
Presentation is loading. Please wait.
Published byHannah Bates Modified over 9 years ago
1
Load Shedding Techniques for Data Stream Systems Brian Babcock Mayur Datar Rajeev Motwani Stanford University
2
Differences from Previous Talk Our focus: Aggregation queries No quality of service specifications – Instead, focus on accuracy of query answers Compensate for dropped data by scaling answers Random drops only (no semantic drops)
3
Problem Setting ΣΣ Σ S1S1 S2S2 R Sliding Window Aggregate Queries (SUM and COUNT) Operator Sharing Filters, UDFs, and Joins w/ Relations Q1Q1 Q2Q2 Q3Q3
4
Inputs to the Problem ΣΣ Σ S1S1 S2S2 R Std Dev σ Mean μ Q1Q1 Q2Q2 Q3Q3 Stream Rate r Processing Time t Selectivity s
5
Load Shedding via Random Drops Stream Rate r 11 22 Σ3Σ3 S (t 1, s 1 ) (t 2, s 2 ) Load = rt 1 + rs 1 t 2 + rs 1 s 2 t 3 (t 3, s 3 ) Sampling Rate p (time, selectivity) Load = rt 1 + p(rs 1 t 2 + rs 1 s 2 t 3 ) Scale answer by 1/p Need Load ≤ 1
6
Problem Statement Relative error is metric of choice: |Estimate - Actual| Actual Goal: Minimize the maximum relative error across queries, subject to Load ≤ 1 – Want low error with high probability
7
Relating Load Shedding and Error Relative error for query i Sampling rate for query i Query-dependent constant Equation derived from Hoeffding bounds Constant C i depends on: – Variance of aggregated attribute – Sliding window size
8
Calculate Ratio of Sampling Rates Minimize maximum relative error → Equal relative error across queries Express all sampling rates in terms of common variable λ
9
Placing Load Shedders Σ Σ Target.8λ Target.6λ Sampling Rate.8λ Sampling Rate.75 =.6λ /.8λ
10
Conclusion Load shedding helps cope with bursts Minimizing relative error is natural objective for aggregate queries Algorithm for load shedding: – Relate target sampling rates for all queries – Place random drop operators based on target sampling rates – Adjust sampling rates to achieve desired load
11
Thanks for listening! Questions?
12
Choosing Target Sampling Rates Sampling rate for query Variance of aggregated attribute Sliding window size Relative Error
13
Measuring Inaccuracy 11 22 Σ3Σ3 Sampling Rate p 1 Sampling Rate p 2 Scale answer by 1/(p 1 p 2 ) Tuple w/ value x: x / (p 1 p 2 ) 0 with pr. p 1 p 2 with pr. 1-p 1 p 2 Key point: Product of sampling rates determines quality of approximate answer
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.