ICICLES: Self-tuning Samples for Approximate Query Answering

ICICLES: Self-tuning Samples for Approximate Query Answering
Haidong Wang 02/13/2007

Outline Introduction and background
Intuition and basic idea of Icicles Icicle maintenance Estimators for Aggregate Queries Performance Evaluation Conclusion and my comment

Introduction Analysis of data in data warehouses is useful in decision support OLAP—provide interactive response time s to aggregate queries Approximate query answering (AQUA) system are being developed Tolerate approximate answers to achieve response time

Introduction Various approaches to answering approximate queries
Sampling-based Histogram-based Clustering Probabilistic Wavelet-based

Uniform Random Sample Sales S_sales 50% Sample scale factor
Branch State Sales 1 CA 80K 2 TX 42K 3 40K 4 5 75K 6 48K 7 55K 8 38K 9 10 41K 50% Sample Branch State Sales 2 TX 42K 4 CA 6 48K 8 38K 10 41K scale factor SELECT SUM(sales) x 2 AS cnt FROM s_sales WHERE state = ‘TX’

Why Icicles? In practice, queries may follow a predictable pattern
Static sampling strategy treats all tuples uniformly, thus wasting memory on less required tuples. Sample relation space better utilized if more samples from actual result set are present Example: Manager of Walmart in Asia location=”Asia” and year>=2000

Intuition of Icicles Tuples being selected to the sample is proportional to the frequency with which it’s required to answer queries location=”Asia” and year>=2000 We need a dynamic algorithm that tunes the sample with respect to the most recent knowledge of the workload

What is Icicles Icicle for Relation R is:
A uniform random sample of a multiset of tuples L, which is the union of R and all sets of tuples that were required to answer queries in the workload

What is Icicles Uniform random sample R Icicle R(Q1) R(Q2) R(Q3)

Icicle Maintenance Reservoir sampling algorithm
Each time we only need to access the new block of data Uniform random sample R Icicle R(Q1) R(Q2) R(Q3)

Icicle Maintenance

Icicle Maintenance Example
SELECT average(*) FROM widget-tuners WHERE date.month = ‘April’

Icicle Maintenance Maintaining the frequency relation
Keep frequency relation in main memory Delay updating the frequency into the disk until the magnitude of the change crosses a threshold

Estimators for Aggregate Queries
Traditional estimators can’t be used due to selection bias and duplicates in icicle Example: count Maintain a set of frequencies one per tuple in the relation.

Estimators for Aggregate Queries
Avg: the average of distinct tuples in sample satisfying query Doesn’t require frequency attribute Count: the sum of expected contributions of all tuples in icicle that satisfy the query Sum: Avg * Count

Performance Evaluation
Plots definition: Static sample: Uniform random sample on the relation Icicle: Icicle evolves with the workload Icicle-complete The tuned icicle again on the same workload

Performance Evaluation

Conclusion icicle are better than static sampling when workload focuses on relatively small subsets in relation When the workload is randomized, icicle is at least the same as static sampling Icicles adapt quickly to changing workload

My Comment Icicle is a trade-off between accuracy and cost
Icicle works well under certain restrictions How to define a workload Has to be exact query (not approximate) The typical scenario of a analyst using Icicle: First do a bunch of approximate query Then do one or more exact query

My conclusion Icicle is useful when the following is true:
workload focuses on relatively small subsets in relation Calls for high accuracy of approximate answer Has to have exact query (the more the better)

Thanks Questions?

ICICLES: Self-tuning Samples for Approximate Query Answering

Similar presentations

Presentation on theme: "ICICLES: Self-tuning Samples for Approximate Query Answering"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ICICLES: Self-tuning Samples for Approximate Query Answering

Similar presentations

Presentation on theme: "ICICLES: Self-tuning Samples for Approximate Query Answering"— Presentation transcript:

Similar presentations

About project

Feedback