Presentation is loading. Please wait.

Presentation is loading. Please wait.

Load Shedding Techniques for Data Stream Systems

Similar presentations


Presentation on theme: "Load Shedding Techniques for Data Stream Systems"— Presentation transcript:

1 Load Shedding Techniques for Data Stream Systems
Brian Babcock Mayur Datar Rajeev Motwani Stanford University

2 Differences from Previous Talk
Our focus: Aggregation queries No quality of service specifications Instead, focus on accuracy of query answers Compensate for dropped data by scaling answers Random drops only (no semantic drops)

3 Problem Setting Σ Σ Σ     Q1 Q2 Q3 R S1 S2
Sliding Window Aggregate Queries (SUM and COUNT) Σ Σ Σ Filters, UDFs, and Joins w/ Relations Operator Sharing R S1 S2

4 Inputs to the Problem Σ Σ Σ     Q1 Q2 Q3 R S1 S2 Std Dev σ Mean μ
Processing Time t Selectivity s R S1 S2 Stream Rate r

5 Load Shedding via Random Drops
(time, selectivity) 1 2 Σ3 S Scale answer by 1/p (t3, s3) Load = rt1 + rs1t2 + rs1s2t3 (t2, s2) Load = rt1 + p(rs1t2 + rs1s2t3) Sampling Rate p (t1, s1) Need Load ≤ 1 Stream Rate r

6 Problem Statement Relative error is metric of choice: |Estimate - Actual| Actual Goal: Minimize the maximum relative error across queries, subject to Load ≤ 1 Want low error with high probability

7 Relating Load Shedding and Error
Query-dependent constant Relative error for query i Sampling rate for query i Equation derived from Hoeffding bounds Constant Ci depends on: Variance of aggregated attribute Sliding window size

8 Calculate Ratio of Sampling Rates
Minimize maximum relative error → Equal relative error across queries Express all sampling rates in terms of common variable λ

9 Placing Load Shedders Σ Σ    Target .8λ Target .6λ
Sampling Rate .75 = .6λ /.8λ Sampling Rate .8λ

10 Conclusion Load shedding helps cope with bursts
Minimizing relative error is natural objective for aggregate queries Algorithm for load shedding: Relate target sampling rates for all queries Place random drop operators based on target sampling rates Adjust sampling rates to achieve desired load

11 Thanks for listening! Questions?

12 Choosing Target Sampling Rates
Relative Error Sampling rate for query Variance of aggregated attribute Sliding window size

13 Measuring Inaccuracy Σ3 2 1 Tuple w/ value x: x / (p1p2)
Scale answer by 1/(p1p2) Tuple w/ value x: x / (p1p2) with pr. p1p2 with pr. 1-p1p2 Σ3 Sampling Rate p2 2 Key point: Product of sampling rates determines quality of approximate answer Sampling Rate p1 1


Download ppt "Load Shedding Techniques for Data Stream Systems"

Similar presentations


Ads by Google