Download presentation
Presentation is loading. Please wait.
Published byGordon Jones Modified over 6 years ago
1
Anthony Okorodudu CSE 6392 2006-2-7
ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan Anthony Okorodudu CSE 6392
2
ICICLES: Self-tuning Samples for Approximate Query Answering
Outline Introduction Uniform random sampling Icicles Icicle maintenance Maintaining frequency relation Estimators for aggregate queries Quality Guarantee Performance evaluation 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
3
ICICLES: Self-tuning Samples for Approximate Query Answering
Introduction Analysis of data in data warehouses useful in decision support Users of decision support systems want interactive systems Most decision support systems can tolerate approximate results Approximate query answering systems (AQUA) 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
4
ICICLES: Self-tuning Samples for Approximate Query Answering
Approximate Querying Various approaches to answering approximate queries Sampling-based Histogram-based Clustering Probabilistic Wavelet-based 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
5
Uniform Random Sampling
Sales S_sales Branch State Sales 1 CA 80K 2 TX 42K 3 40K 4 5 75K 6 48K 7 55K 8 38K 9 10 41K 50% Sample Branch State Sales 2 TX 42K 4 CA 6 48K 8 38K 10 41K scale factor SELECT SUM(sales) x 2 AS cnt FROM s_sales WHERE state = ‘TX’ 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
6
ICICLES: Self-tuning Samples for Approximate Query Answering
Biased Sampling Sales S_sales Branch State Sales 1 CA 80K 2 TX 42K 3 40K 4 5 75K 6 48K 7 55K 8 38K 9 10 41K Branch State Sales 2 TX 42K 4 CA 5 75K 7 55K 8 38K Sample relation for aggregation query workload regarding Texas branches 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
7
ICICLES: Self-tuning Samples for Approximate Query Answering
Class of samples to capture data locality of aggregate queries of foreign key joins Join synopsis is the join of a uniform random sample of the fact table with a set of dimension tables Sample relation space better utilized if more samples from actual result set are present 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
8
Icicle-Based Estimators
Icicle is a non-uniform sample of the original relation Traditional scaling up not appropriate for icicles Frequency must be maintained for each tuple 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
9
ICICLES: Self-tuning Samples for Approximate Query Answering
Reasoning Accuracy of approximate answer proportional to number of tuples used for computation If a lot of queries in workload use the frequent set of tuples, then average quality of answer improves Drastic changes to workload or queries that don’t conform to workload have less accuracy than static sample 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
10
ICICLES: Self-tuning Samples for Approximate Query Answering
Icicle Maintenance Probability of tuples presence is proportional to its importance in answering queries in workload Tuple is selected for icicle base on its frequency A workload where all tuples are retrieved equally frequently is a uniform workload 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
11
ICICLES: Self-tuning Samples for Approximate Query Answering
Icicle Maintenance 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
12
Icicle Maintenance Example
SELECT average(*) FROM widget-tuners WHERE date.month = ‘April’ 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
13
Estimators for Aggregate Queries
Traditional estimators can’t be used due to selection bias and duplicates in icicle Average is the average of distinct tuples in sample satisfying query Doesn’t require frequency attribute Count is the sum of expected contributions of all tuples in icicle that satisfy the query Sum is the product of the average and count 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
14
ICICLES: Self-tuning Samples for Approximate Query Answering
Quality Guarantees If queries in workload exhibit data locality, then the icicles contain tuples from frequently accessed subset of relation A new query to workload is more accurate than uniform random sample if query accesses frequent tuples in icicle 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
15
Performance Evaluation
SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice) FROM LI, C, O, S, N, R WHERE C_Custkey=O_Custkey AND O_Orderkey=LI_Orderkey AND LI_Suppkey=S_Suppkey AND C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= Qworkload: Template for generating workloads SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice) FROM LICOS-icicle, N, R WHERE C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= Template for obtaining approximate answers 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
16
Performance Evaluation
2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
17
Performance Evaluation: Mixed Workload
2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
18
ICICLES: Self-tuning Samples for Approximate Query Answering
Conclusion Icicles are a new class of samples that are sensitive to workload characteristics Icicles adapt quickly to changing workload Experiments show that icicle are good when workload focuses on relatively small subsets in relation 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
19
ICICLES: Self-tuning Samples for Approximate Query Answering
References V. Ganti, M. Lee, and R. Ramakrishnan. ICICLES: Self-tuning Samples for Approximate Query Answering. VLDB Conference 2000. 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
20
ICICLES: Self-tuning Samples for Approximate Query Answering
Thanks 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.