Anthony Okorodudu CSE 6392 2006-2-7 ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan.

Slides:



Advertisements
Similar presentations
Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.
Advertisements

Dynamic Sample Selection for Approximate Query Processing Brian Babcock Stanford University Surajit Chaudhuri Microsoft Research Gautam Das Microsoft Research.
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST and AITrc) Presented at 26 th VLDB Conference,
1 DynaMat A Dynamic View Management System for Data Warehouses Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan.
CS4432: Database Systems II
Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Representing and Querying Correlated Tuples in Probabilistic Databases
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Fast Algorithms For Hierarchical Range Histogram Constructions
Brian Babcock Surajit Chaudhuri Gautam Das at the 2003 ACM SIGMOD International Conference By Shashank Kamble Gnanoba.
Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Icicles  Icicle Maintenance  Icicle-Based Estimators  Quality & Performance  Conclusion.
February 14, 2006CS DB Exploration 1 Congressional Samples for Approximate Answering of Group-By Queries Swarup Acharya Phillip B. Gibbons Viswanath.
Optimal Workload-Based Weighted Wavelet Synopsis
A Robust, Optimization-Based Approach for Approximate Answering of Aggregate Queries By : Surajid Chaudhuri Gautam Das Vivek Narasayya Presented by :Sayed.
New Sampling-Based Summary Statistics for Improving Approximate Query Answers P. B. Gibbons and Y. Matias (ACM SIGMOD 1998) Rongfang Li Feb 2007.
Graph-Based Synopses for Relational Selectivity Estimation Joshua Spiegel and Neoklis Polyzotis University of California, Santa Cruz.
ACM GIS An Interactive Framework for Raster Data Spatial Joins Wan Bae (Computer Science, University of Denver) Petr Vojtěchovský (Mathematics,
Evaluating Hypotheses
Time-Decaying Sketches for Sensor Data Aggregation Graham Cormode AT&T Labs, Research Srikanta Tirthapura Dept. of Electrical and Computer Engineering.
Liang Jin and Chen Li VLDB’2005 Supported by NSF CAREER Award IIS Selectivity Estimation for Fuzzy String Predicates in Large Data Sets.
1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray.
Top- K Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel Presenter: Avinandan Sengupta.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25, Part B.
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP-BY QUERIES Swarup Acharya Phillip Gibbons Viswanath Poosala ( Information Sciences Research Center,
Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Ronda Hilton.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Join Synopses for Approximate Query Answering Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy.
OLAP : Blitzkreig Introduction 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema :
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
End-biased Samples for Join Cardinality Estimation Cristian Estan, Jeffrey F. Naughton Computer Sciences Department University of Wisconsin-Madison.
February 14, 2006CS DB Exploration 1 Congressional Samples for Approximate Answering of Group-By Queries Swarup Acharya Phillip B. Gibbons Viswanath.
A Robust, Optimization-Based Approach for Approximate Answering of Aggregate Queries Surajit Chaudhuri Gautam Das Vivek Narasayya Presented by Sushanth.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Histograms for Selectivity Estimation
Join Synopses for Approximate Query Answering Swarup Acharya, Philip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy By Vladimir Gamaley.
Load Shedding Techniques for Data Stream Systems Brian Babcock Mayur Datar Rajeev Motwani Stanford University.
New Sampling-Based Summary Statistics for Improving Approximate Query Answers Yinghui Wang
BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data ACM EuroSys 2013 (Best Paper Award)
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
Presented By Anirban Maiti Chandrashekar Vijayarenu
Robust Estimation With Sampling and Approximate Pre-Aggregation Author: Christopher Jermaine Presented by: Bill Eberle.
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP BY QUERIES Swaroop Acharya,Philip B Gibbons, VishwanathPoosala By Agasthya Padisala Anusha Reddy.
University of Texas at Arlington Presented By Srikanth Vadada Fall CSE rd Sep 2010 Dynamic Sample Selection for Approximate Query Processing.
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
Sampling Sampling Distributions. Sample is subset of population used to infer something about the population. Probability – know the likelihood of selection.
Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies 병렬 분산 컴퓨팅 연구실 석사 1 학기 김남희.
ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan Shruti P. Gopinath CSE 6339.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 1-1 Statistics for Managers Using Microsoft ® Excel 4 th Edition Chapter.
Dense-Region Based Compact Data Cube
A paper on Join Synopses for Approximate Query Answering
Bolin Ding Silu Huang* Surajit Chaudhuri Kaushik Chakrabarti Chi Wang
Query Sampling in DB2.
Overcoming Limitations of Sampling for Aggregation Queries
ICICLES: Self-tuning Samples for Approximate Query Answering
Chapter 15 QUERY EXECUTION.
Spatial Online Sampling and Aggregation
Load Shedding Techniques for Data Stream Systems
Introduction In probability, events are either dependent or independent. Two events are independent if the occurrence or non-occurrence of one event has.
Query Sampling in DB2.
CHAPTER 7 Sampling Distributions
Data Warehousing and Decision Support
Data Transformations targeted at minimizing experimental variance
CHAPTER 7 Sampling Distributions
Lecture 1: Descriptive Statistics and Exploratory
CHAPTER 7 Sampling Distributions
Probabilistic Ranking of Database Query Results
Presented by: Mariam John CSE /14/2006
Anthony Okorodudu CSE Answering Imprecise Queries over Autonomous Web Databases By Ullas Nambiar and Subbarao Kambhampati Anthony Okorodudu.
Presentation transcript:

Anthony Okorodudu CSE 6392 2006-2-7 ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan Anthony Okorodudu CSE 6392 2006-2-7

ICICLES: Self-tuning Samples for Approximate Query Answering Outline Introduction Uniform random sampling Icicles Icicle maintenance Maintaining frequency relation Estimators for aggregate queries Quality Guarantee Performance evaluation 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering

ICICLES: Self-tuning Samples for Approximate Query Answering Introduction Analysis of data in data warehouses useful in decision support Users of decision support systems want interactive systems Most decision support systems can tolerate approximate results Approximate query answering systems (AQUA) 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering

ICICLES: Self-tuning Samples for Approximate Query Answering Approximate Querying Various approaches to answering approximate queries Sampling-based Histogram-based Clustering Probabilistic Wavelet-based 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering

Uniform Random Sampling Sales S_sales Branch State Sales 1 CA 80K 2 TX 42K 3 40K 4 5 75K 6 48K 7 55K 8 38K 9 10 41K 50% Sample Branch State Sales 2 TX 42K 4 CA 6 48K 8 38K 10 41K scale factor SELECT SUM(sales) x 2 AS cnt FROM s_sales WHERE state = ‘TX’ 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering

ICICLES: Self-tuning Samples for Approximate Query Answering Biased Sampling Sales S_sales Branch State Sales 1 CA 80K 2 TX 42K 3 40K 4 5 75K 6 48K 7 55K 8 38K 9 10 41K Branch State Sales 2 TX 42K 4 CA 5 75K 7 55K 8 38K Sample relation for aggregation query workload regarding Texas branches 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering

ICICLES: Self-tuning Samples for Approximate Query Answering Class of samples to capture data locality of aggregate queries of foreign key joins Join synopsis is the join of a uniform random sample of the fact table with a set of dimension tables Sample relation space better utilized if more samples from actual result set are present 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering

Icicle-Based Estimators Icicle is a non-uniform sample of the original relation Traditional scaling up not appropriate for icicles Frequency must be maintained for each tuple 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering

ICICLES: Self-tuning Samples for Approximate Query Answering Reasoning Accuracy of approximate answer proportional to number of tuples used for computation If a lot of queries in workload use the frequent set of tuples, then average quality of answer improves Drastic changes to workload or queries that don’t conform to workload have less accuracy than static sample 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering

ICICLES: Self-tuning Samples for Approximate Query Answering Icicle Maintenance Probability of tuples presence is proportional to its importance in answering queries in workload Tuple is selected for icicle base on its frequency A workload where all tuples are retrieved equally frequently is a uniform workload 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering

ICICLES: Self-tuning Samples for Approximate Query Answering Icicle Maintenance 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering

Icicle Maintenance Example SELECT average(*) FROM widget-tuners WHERE date.month = ‘April’ 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering

Estimators for Aggregate Queries Traditional estimators can’t be used due to selection bias and duplicates in icicle Average is the average of distinct tuples in sample satisfying query Doesn’t require frequency attribute Count is the sum of expected contributions of all tuples in icicle that satisfy the query Sum is the product of the average and count 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering

ICICLES: Self-tuning Samples for Approximate Query Answering Quality Guarantees If queries in workload exhibit data locality, then the icicles contain tuples from frequently accessed subset of relation A new query to workload is more accurate than uniform random sample if query accesses frequent tuples in icicle 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering

Performance Evaluation SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice) FROM LI, C, O, S, N, R WHERE C_Custkey=O_Custkey AND O_Orderkey=LI_Orderkey AND LI_Suppkey=S_Suppkey AND C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998 Qworkload: Template for generating workloads SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice) FROM LICOS-icicle, N, R WHERE C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998 Template for obtaining approximate answers 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering

Performance Evaluation 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering

Performance Evaluation: Mixed Workload 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering

ICICLES: Self-tuning Samples for Approximate Query Answering Conclusion Icicles are a new class of samples that are sensitive to workload characteristics Icicles adapt quickly to changing workload Experiments show that icicle are good when workload focuses on relatively small subsets in relation 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering

ICICLES: Self-tuning Samples for Approximate Query Answering References V. Ganti, M. Lee, and R. Ramakrishnan. ICICLES: Self-tuning Samples for Approximate Query Answering. VLDB Conference 2000. 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering

ICICLES: Self-tuning Samples for Approximate Query Answering Thanks 2006/2/7 ICICLES: Self-tuning Samples for Approximate Query Answering