Presented by: Mariam John CSE /14/2006

Presented by: Mariam John CSE 6392 02/14/2006
Dynamic Sample Selection for Approximate Query Processing Brian Babcock, Surajit Chaudari & Gautam Das Presented by: Mariam John CSE 6392 02/14/2006

Contents Introduction Dynamic Sample Selection
Policies for Sample Selection Small Group Sampling Pre-Processing Phase Summary

Why do we do Approximate Query Processing?
Multi-gigabyte data repositories Data Analysis Application Data mining Decision Support Analysis Fast query response time Acceptability of inexact query response

Problem Constructing an optimal sample that well represents the underlying data. Uniform sampling Non-uniform sampling

Non-uniform sampling Purpose is to produce more accurate results across a particular set of queries. Produces more approximate results than uniform sampling. Optimal bias differs from query to query.

Dynamic Sample Selection
DATA SAMPLE Dynamic Sample Selection Standard Sampling DATA SAMPLE ? ?

Pre-Processing Phase Query Workload Sample Data Select Strata Build Sample Data Meta- Data

Runtime Phase Query Sample Data Choose Samples Rewrite Query Meta- Data

How to identify the set of biased samples to be created? Occurs during pre-processing phase How to determine which of the various samples to use to answer a query? Occurs during runtime phase Simplest and most efficient strategy is when choice of samples is guided by the syntax of incoming query.

Small Group Sampling Specific dynamic sample selection technique which targets aggregate queries with “group-by’s”. Small group sampling approach: Overall sample – perform uniform sampling on large groups. Small group tables-one or more sample tables for smaller groups.

Small group Sampling Set of small groups depends on: grouping columns
selection predicates

Small Group Sampling Idea behind Small Group Sampling:
Determine for which values in each column to create small group tables. Create small group tables for each column of a table along with the overall sample. During runtime, choose a subset of sample tables to answer a query most accurately. Query is rewritten to run against the sample tables instead of the base tables.

Pre-processing Phase For every column, identify the rare values within it and create small group tables. Pre-processing phase produces three outputs: Overall sample table Small group tables Metadata table

Pre-processing phase Rows can appear in multiple sample tables.
Bitmask field is used to identify the set of sample tables to which a row was added. Avoids double counting of rows assigned to multiple sample tables.

Summary Dynamic Sample Selection Small Group Sampling
Takes advantage of available disk space Creates multiple biased sample tables during the pre-processing phase Picks best samples during runtime for query processing. Small Group Sampling Notion is to treat large and small groups differently Creates an overall sample table for large groups and a number of small group tables for each rare values in each column.

Presented by: Mariam John CSE /14/2006

Similar presentations

Presentation on theme: "Presented by: Mariam John CSE /14/2006"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Presented by: Mariam John CSE /14/2006

Similar presentations

Presentation on theme: "Presented by: Mariam John CSE /14/2006"— Presentation transcript:

Similar presentations

About project

Feedback