Download presentation
Presentation is loading. Please wait.
Published byθάλασσα Δαμασκηνός Modified over 6 years ago
1
Load Shedding Techniques for Data Stream Systems
Brian Babcock Mayur Datar Rajeev Motwani Stanford University
2
Differences from Previous Talk
Our focus: Aggregation queries No quality of service specifications Instead, focus on accuracy of query answers Compensate for dropped data by scaling answers Random drops only (no semantic drops)
3
Problem Setting Σ Σ Σ Q1 Q2 Q3 R S1 S2
Sliding Window Aggregate Queries (SUM and COUNT) Σ Σ Σ Filters, UDFs, and Joins w/ Relations Operator Sharing R S1 S2
4
Inputs to the Problem Σ Σ Σ Q1 Q2 Q3 R S1 S2 Std Dev σ Mean μ
Processing Time t Selectivity s R S1 S2 Stream Rate r
5
Load Shedding via Random Drops
(time, selectivity) 1 2 Σ3 S Scale answer by 1/p (t3, s3) Load = rt1 + rs1t2 + rs1s2t3 (t2, s2) Load = rt1 + p(rs1t2 + rs1s2t3) Sampling Rate p (t1, s1) Need Load ≤ 1 Stream Rate r
6
Problem Statement Relative error is metric of choice: |Estimate - Actual| Actual Goal: Minimize the maximum relative error across queries, subject to Load ≤ 1 Want low error with high probability
7
Relating Load Shedding and Error
Query-dependent constant Relative error for query i Sampling rate for query i Equation derived from Hoeffding bounds Constant Ci depends on: Variance of aggregated attribute Sliding window size
8
Calculate Ratio of Sampling Rates
Minimize maximum relative error → Equal relative error across queries Express all sampling rates in terms of common variable λ
9
Placing Load Shedders Σ Σ Target .8λ Target .6λ
Sampling Rate .75 = .6λ /.8λ Sampling Rate .8λ
10
Conclusion Load shedding helps cope with bursts
Minimizing relative error is natural objective for aggregate queries Algorithm for load shedding: Relate target sampling rates for all queries Place random drop operators based on target sampling rates Adjust sampling rates to achieve desired load
11
Thanks for listening! Questions?
12
Choosing Target Sampling Rates
Relative Error Sampling rate for query Variance of aggregated attribute Sliding window size
13
Measuring Inaccuracy Σ3 2 1 Tuple w/ value x: x / (p1p2)
Scale answer by 1/(p1p2) Tuple w/ value x: x / (p1p2) with pr. p1p2 with pr. 1-p1p2 Σ3 Sampling Rate p2 2 Key point: Product of sampling rates determines quality of approximate answer Sampling Rate p1 1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.