Presented by: Mariam John CSE /14/2006

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Microsoft ® Access ® 2010 Training Create queries for a new database.
Dynamic Sample Selection for Approximate Query Processing Brian Babcock Stanford University Surajit Chaudhuri Microsoft Research Gautam Das Microsoft Research.
Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST and AITrc) Presented at 26 th VLDB Conference,
Using the Optimizer to Generate an Effective Regression Suite: A First Step Murali M. Krishna Presented by Harumi Kuno HP.
Brian Babcock Surajit Chaudhuri Gautam Das at the 2003 ACM SIGMOD International Conference By Shashank Kamble Gnanoba.
February 14, 2006CS DB Exploration 1 Congressional Samples for Approximate Answering of Group-By Queries Swarup Acharya Phillip B. Gibbons Viswanath.
An Efficient Cost-Driven Selection Tool for Microsoft SQL Server Surajit ChaudhuriVivek Narasayya Indian Institute of Technology Bombay CS632 Course seminar.
A Robust, Optimization-Based Approach for Approximate Answering of Aggregate Queries By : Surajid Chaudhuri Gautam Das Vivek Narasayya Presented by :Sayed.
Choosing an Order for Joins Chapter 16.6 by: Chiu Luk ID: 210.
Choosing an Order for Joins (16.6) Neha Saxena (214) Instructor: T.Y.Lin.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP-BY QUERIES Swarup Acharya Phillip Gibbons Viswanath Poosala ( Information Sciences Research Center,
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
A Robust, Optimization-Based Approach for Approximate Answering of Aggregate Queries Surajit Chaudhuri Gautam Das Vivek Narasayya Presented by Sushanth.
Do What Needs to Be Done Today. The secret of happy successful living is to do what needs to be done now, and not worry about the past or the future.
BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data ACM EuroSys 2013 (Best Paper Award)
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Hank Childs, University of Oregon Isosurfacing (Part 3)
Presented By Anirban Maiti Chandrashekar Vijayarenu
Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.
Multiplication Facts Table of Contents 0’s 1’s 2’s 3’s 4’s 5’s 6’s 7’s 8’s 9’s 10’s.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP BY QUERIES Swaroop Acharya,Philip B Gibbons, VishwanathPoosala By Agasthya Padisala Anusha Reddy.
1 Chapter 9 Tuning Table Access. 2 Overview Improve performance of access to single table Explain access methods – Full Table Scan – Index – Partition-level.
20 Copyright © 2008, Oracle. All rights reserved. Cache Management.
1 A Robust, Optimization-Based Approach for Approximate Answering of Aggregate Queries Surajit Chaudhuri Gautam Das Vivek Narasayya Proceedings of the.
By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.
Oracle Business Intelligence Foundation - Commonly Used Features in Repository.
University of Texas at Arlington Presented By Srikanth Vadada Fall CSE rd Sep 2010 Dynamic Sample Selection for Approximate Query Processing.
Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.
BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data Authored by Sameer Agarwal, et. al. Presented by Atul Sandur.
Dense-Region Based Compact Data Cube
Practical Database Design and Tuning
BlinkDB.
Updating SF-Tree Speaker: Ho Wai Shing.
An Efficient, Cost-Driven Index Selection Tool for MS-SQL Server
BlinkDB.
Advanced QlikView Performance Tuning Techniques
Data Virtualization Tutorial… Semijoin Optimization
A paper on Join Synopses for Approximate Query Answering
Ripple Joins for Online Aggregation
Anthony Okorodudu CSE ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan.
Bolin Ding Silu Huang* Surajit Chaudhuri Kaushik Chakrabarti Chi Wang
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Database Performance Tuning and Query Optimization
CSCE 990: Advanced Distributed Systems
Introduction to Query Optimization
ICICLES: Self-tuning Samples for Approximate Query Answering
RES 728 Competitive Success/snaptutorial.com
RES 728 RANK Lessons in Excellence-- res728rank.com.
RES 728 RANK Education for Service-- res728rank.com.
RES 728 Education for Service/snaptutorial.com
Chapter 15 QUERY EXECUTION.
Meta-Analysis: Synthesizing the evidence
Practical Database Design and Tuning
Meta-Analysis: Synthesizing evidence
C.U.SHAH COLLEGE OF ENG. & TECH.
Basic Statistical Concepts
Chapter 11 Database Performance Tuning and Query Optimization
Data Warehousing Concepts
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Sample Question Styles
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #08 Comparisons of Indexes and Indexing Performance Instructor: Chen Li.
Information Retrieval and Web Design
Interactive Powerpoint
Presentation transcript:

Presented by: Mariam John CSE 6392 02/14/2006 Dynamic Sample Selection for Approximate Query Processing Brian Babcock, Surajit Chaudari & Gautam Das Presented by: Mariam John CSE 6392 02/14/2006

Contents Introduction Dynamic Sample Selection Policies for Sample Selection Small Group Sampling Pre-Processing Phase Summary

Why do we do Approximate Query Processing? Multi-gigabyte data repositories Data Analysis Application Data mining Decision Support Analysis Fast query response time Acceptability of inexact query response

Problem Constructing an optimal sample that well represents the underlying data. Uniform sampling Non-uniform sampling

Non-uniform sampling Purpose is to produce more accurate results across a particular set of queries. Produces more approximate results than uniform sampling. Optimal bias differs from query to query.

Dynamic Sample Selection DATA SAMPLE Dynamic Sample Selection Standard Sampling DATA SAMPLE ? ?

Dynamic Sample Selection Pre-Processing Phase Query Workload Sample Data Select Strata Build Sample Data Meta- Data

Dynamic Sample Selection Runtime Phase Query Sample Data Choose Samples Rewrite Query Meta- Data

Dynamic Sample Selection How to identify the set of biased samples to be created? Occurs during pre-processing phase How to determine which of the various samples to use to answer a query? Occurs during runtime phase Simplest and most efficient strategy is when choice of samples is guided by the syntax of incoming query.

Small Group Sampling Specific dynamic sample selection technique which targets aggregate queries with “group-by’s”. Small group sampling approach: Overall sample – perform uniform sampling on large groups. Small group tables-one or more sample tables for smaller groups.

Small group Sampling Set of small groups depends on: grouping columns selection predicates

Small Group Sampling Idea behind Small Group Sampling: Determine for which values in each column to create small group tables. Create small group tables for each column of a table along with the overall sample. During runtime, choose a subset of sample tables to answer a query most accurately. Query is rewritten to run against the sample tables instead of the base tables.

Pre-processing Phase For every column, identify the rare values within it and create small group tables. Pre-processing phase produces three outputs: Overall sample table Small group tables Metadata table

Pre-processing phase Rows can appear in multiple sample tables. Bitmask field is used to identify the set of sample tables to which a row was added. Avoids double counting of rows assigned to multiple sample tables.

Summary Dynamic Sample Selection Small Group Sampling Takes advantage of available disk space Creates multiple biased sample tables during the pre-processing phase Picks best samples during runtime for query processing. Small Group Sampling Notion is to treat large and small groups differently Creates an overall sample table for large groups and a number of small group tables for each rare values in each column.