Download presentation
Presentation is loading. Please wait.
Published byとしなり おおばま Modified over 6 years ago
1
ICICLES: Self-tuning Samples for Approximate Query Answering
Haidong Wang 02/13/2007
2
Outline Introduction and background
Intuition and basic idea of Icicles Icicle maintenance Estimators for Aggregate Queries Performance Evaluation Conclusion and my comment
3
Introduction Analysis of data in data warehouses is useful in decision support OLAP—provide interactive response time s to aggregate queries Approximate query answering (AQUA) system are being developed Tolerate approximate answers to achieve response time
4
Introduction Various approaches to answering approximate queries
Sampling-based Histogram-based Clustering Probabilistic Wavelet-based
5
Uniform Random Sample Sales S_sales 50% Sample scale factor
Branch State Sales 1 CA 80K 2 TX 42K 3 40K 4 5 75K 6 48K 7 55K 8 38K 9 10 41K 50% Sample Branch State Sales 2 TX 42K 4 CA 6 48K 8 38K 10 41K scale factor SELECT SUM(sales) x 2 AS cnt FROM s_sales WHERE state = ‘TX’
6
Why Icicles? In practice, queries may follow a predictable pattern
Static sampling strategy treats all tuples uniformly, thus wasting memory on less required tuples. Sample relation space better utilized if more samples from actual result set are present Example: Manager of Walmart in Asia location=”Asia” and year>=2000
7
Intuition of Icicles Tuples being selected to the sample is proportional to the frequency with which it’s required to answer queries location=”Asia” and year>=2000 We need a dynamic algorithm that tunes the sample with respect to the most recent knowledge of the workload
8
What is Icicles Icicle for Relation R is:
A uniform random sample of a multiset of tuples L, which is the union of R and all sets of tuples that were required to answer queries in the workload
9
What is Icicles Uniform random sample R Icicle R(Q1) R(Q2) R(Q3)
10
Icicle Maintenance Reservoir sampling algorithm
Each time we only need to access the new block of data Uniform random sample R Icicle R(Q1) R(Q2) R(Q3)
11
Icicle Maintenance
12
Icicle Maintenance Example
SELECT average(*) FROM widget-tuners WHERE date.month = ‘April’
13
Icicle Maintenance Maintaining the frequency relation
Keep frequency relation in main memory Delay updating the frequency into the disk until the magnitude of the change crosses a threshold
14
Estimators for Aggregate Queries
Traditional estimators can’t be used due to selection bias and duplicates in icicle Example: count Maintain a set of frequencies one per tuple in the relation.
15
Estimators for Aggregate Queries
Avg: the average of distinct tuples in sample satisfying query Doesn’t require frequency attribute Count: the sum of expected contributions of all tuples in icicle that satisfy the query Sum: Avg * Count
16
Performance Evaluation
Plots definition: Static sample: Uniform random sample on the relation Icicle: Icicle evolves with the workload Icicle-complete The tuned icicle again on the same workload
17
Performance Evaluation
18
Performance Evaluation
19
Conclusion icicle are better than static sampling when workload focuses on relatively small subsets in relation When the workload is randomized, icicle is at least the same as static sampling Icicles adapt quickly to changing workload
20
My Comment Icicle is a trade-off between accuracy and cost
Icicle works well under certain restrictions How to define a workload Has to be exact query (not approximate) The typical scenario of a analyst using Icicle: First do a bunch of approximate query Then do one or more exact query
21
My conclusion Icicle is useful when the following is true:
workload focuses on relatively small subsets in relation Calls for high accuracy of approximate answer Has to have exact query (the more the better)
22
Thanks Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.