Anthony Okorodudu CSE 6392 2006-2-7 ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan Anthony Okorodudu CSE 6392 2006-2-7
ICICLES: Self-tuning Samples for Approximate Query Answering Outline Introduction Uniform random sampling Icicles Icicle maintenance Maintaining frequency relation Estimators for aggregate queries Quality Guarantee Performance evaluation 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
ICICLES: Self-tuning Samples for Approximate Query Answering Introduction Analysis of data in data warehouses useful in decision support Users of decision support systems want interactive systems Most decision support systems can tolerate approximate results Approximate query answering systems (AQUA) 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
ICICLES: Self-tuning Samples for Approximate Query Answering Approximate Querying Various approaches to answering approximate queries Sampling-based Histogram-based Clustering Probabilistic Wavelet-based 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
Uniform Random Sampling Sales S_sales Branch State Sales 1 CA 80K 2 TX 42K 3 40K 4 5 75K 6 48K 7 55K 8 38K 9 10 41K 50% Sample Branch State Sales 2 TX 42K 4 CA 6 48K 8 38K 10 41K scale factor SELECT SUM(sales) x 2 AS cnt FROM s_sales WHERE state = ‘TX’ 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
ICICLES: Self-tuning Samples for Approximate Query Answering Biased Sampling Sales S_sales Branch State Sales 1 CA 80K 2 TX 42K 3 40K 4 5 75K 6 48K 7 55K 8 38K 9 10 41K Branch State Sales 2 TX 42K 4 CA 5 75K 7 55K 8 38K Sample relation for aggregation query workload regarding Texas branches 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
ICICLES: Self-tuning Samples for Approximate Query Answering Class of samples to capture data locality of aggregate queries of foreign key joins Join synopsis is the join of a uniform random sample of the fact table with a set of dimension tables Sample relation space better utilized if more samples from actual result set are present 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
Icicle-Based Estimators Icicle is a non-uniform sample of the original relation Traditional scaling up not appropriate for icicles Frequency must be maintained for each tuple 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
ICICLES: Self-tuning Samples for Approximate Query Answering Reasoning Accuracy of approximate answer proportional to number of tuples used for computation If a lot of queries in workload use the frequent set of tuples, then average quality of answer improves Drastic changes to workload or queries that don’t conform to workload have less accuracy than static sample 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
ICICLES: Self-tuning Samples for Approximate Query Answering Icicle Maintenance Probability of tuples presence is proportional to its importance in answering queries in workload Tuple is selected for icicle base on its frequency A workload where all tuples are retrieved equally frequently is a uniform workload 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
ICICLES: Self-tuning Samples for Approximate Query Answering Icicle Maintenance 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
Icicle Maintenance Example SELECT average(*) FROM widget-tuners WHERE date.month = ‘April’ 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
Estimators for Aggregate Queries Traditional estimators can’t be used due to selection bias and duplicates in icicle Average is the average of distinct tuples in sample satisfying query Doesn’t require frequency attribute Count is the sum of expected contributions of all tuples in icicle that satisfy the query Sum is the product of the average and count 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
ICICLES: Self-tuning Samples for Approximate Query Answering Quality Guarantees If queries in workload exhibit data locality, then the icicles contain tuples from frequently accessed subset of relation A new query to workload is more accurate than uniform random sample if query accesses frequent tuples in icicle 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
Performance Evaluation SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice) FROM LI, C, O, S, N, R WHERE C_Custkey=O_Custkey AND O_Orderkey=LI_Orderkey AND LI_Suppkey=S_Suppkey AND C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998 Qworkload: Template for generating workloads SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice) FROM LICOS-icicle, N, R WHERE C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998 Template for obtaining approximate answers 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
Performance Evaluation 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
Performance Evaluation: Mixed Workload 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
ICICLES: Self-tuning Samples for Approximate Query Answering Conclusion Icicles are a new class of samples that are sensitive to workload characteristics Icicles adapt quickly to changing workload Experiments show that icicle are good when workload focuses on relatively small subsets in relation 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
ICICLES: Self-tuning Samples for Approximate Query Answering References V. Ganti, M. Lee, and R. Ramakrishnan. ICICLES: Self-tuning Samples for Approximate Query Answering. VLDB Conference 2000. 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering
ICICLES: Self-tuning Samples for Approximate Query Answering Thanks 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering