Download presentation
Presentation is loading. Please wait.
Published byFaith Winham Modified over 10 years ago
1
Raghavendra Madala
2
Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning Samples for Approximate Query
3
Analysis of data in data warehouses useful in decision support OLAP-provide interactive response times to aggregate queries AQUA- Approximate query answering systems provide very fast alternatives to OLAP systems 3 ICICLES: Self-tuning Samples for Approximate Query
4
Sampling-based Histogram-based Probabilistic-based Wavelet-based Clustering-based ICICLES: Self-tuning Samples for Approximate Query4
5
Is a Uniform Random Sampling All tuples are assumed to be equally important OLAP queries follow a predictable repetitive pattern Sampling wastes precious main-memory Join of random samples of base relations may not be a random sample of the join of the base relations. This is basis for Join Synopsis by Gibbons 5 ICICLES: Self-tuning Samples for Approximate Query
6
To capture the data locality of aggregate queries on foreign key joins Is expected to consist of more tuples in regions that are accessed more frequently Sample relation space better utilized if more samples from actual result set are present Dynamic algorithm that changes the sample to suit the queries being executed in the workload 6 ICICLES: Self-tuning Samples for Approximate Query
7
Is a uniform random sample of a multiset of tuples L (an extension of R), which is the union of a relation R and all sets of tuples that were required to answer queries in the workload 7 ICICLES: Self-tuning Samples for Approximate Query
8
The intuition is to incrementally maintain a sample, called icicles. We maintain an icicle such that the probability of a tuple being selected is proportional to frequency with which it is required to answer queries(exactly). 8 ICICLES: Self-tuning Samples for Approximate Query
9
Efficient incremental maintenance is possible for the the following reasons Uniform Random Sample of L(extension of relation R) ensures that tuples selection in the icicle is proportional to its frequency Incremental maintenance of icicle requires only the segment of R that satisfies the new query each time Reservoir Sampling Algorithm is used to stream each tuple being appended to L. 9 ICICLES: Self-tuning Samples for Approximate Query
10
10
11
ICICLES: Self-tuning Samples for Approximate Query11
12
Icicle is a non-uniform sample of original data Frequency must be maintained over all tuples Different Estimation mechanisms for Average, Count and Sum 12 ICICLES: Self-tuning Samples for Approximate Query
13
Average is the average of distinct tuples in sample satisfying query Count is the sum of expected contributions of all tuples in icicle that satisfy the query Sum is the product of average and count 13 ICICLES: Self-tuning Samples for Approximate Query
14
Add Frequency Attribute to the Relation R Frequency of each tuples is set to 1 Frequency incremented each time when a tuple is used to answer a query Frequencies of relevant tuples updated only when icicle updated with new query 14 ICICLES: Self-tuning Samples for Approximate Query
15
When queries in workload exhibit data locality, then icicles consists of more tuples from frequently accessed subsets of the relation Accuracy improves with increase in number of tuples used to compute it 15 ICICLES: Self-tuning Samples for Approximate Query
16
Plots definition: Static sample: Uniform random sample on the relation Icicle: Icicle evolves with the workload Icicle-complete The tuned icicle again on the same workload 16 ICICLES: Self-tuning Samples for Approximate Query
17
SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice) FROM LI, C, O, S, N, R WHERE C_Custkey=O_Custkey AND O_Orderkey=LI_Orderkey AND LI_Suppkey=S_Suppkey AND C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998 SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice) FROM LICOS-icicle, N, R WHERE C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998 Q workload : Template for generating workloads Template for obtaining approximate answers 17 ICICLES: Self-tuning Samples for Approximate Query
18
18 ICICLES: Self-tuning Samples for Approximate Query
19
19 ICICLES: Self-tuning Samples for Approximate Query
20
Icicles are class of samples that are sensitive to workload characteristics Adapt quickly to changing workload Icicles are useful when the workload focuses on relatively small subsets in relation Icicle is a trade-off between accuracy and cost 20 ICICLES: Self-tuning Samples for Approximate Query
21
V. Ganti, M. Lee, and R. Ramakrishnan. ICICLES: Self-tuning Samples for Approximate Query Answering. VLDB Conference 2000. 21 ICICLES: Self-tuning Samples for Approximate Query
22
22 ICICLES: Self-tuning Samples for Approximate Query
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.