© ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

Slides:



Advertisements
Similar presentations
Multi-RQP Generating Test Databases for the Functional Testing of OLTP Applications Carsten Binnig Joint work with: Donald Kossmann, Eric Lo DBTest Workshop,
Advertisements

Query optimisation.
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
Outline  Introduction  Background  Distributed DBMS Architecture  Distributed Database Design  Semantic Data Control ➠ View Management ➠ Data Security.
CS4432: Database Systems II
CS 540 Database Management Systems
Lecture-7/ T. Nouf Almujally
Kai Pan, Xintao Wu University of North Carolina at Charlotte Generating Program Inputs for Database Application Testing Tao Xie North Carolina State University.
Efficient Query Evaluation on Probabilistic Databases
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
1 Primitives for Workload Summarization and Implications for SQL Prasanna Ganesan* Stanford University Surajit Chaudhuri Vivek Narasayya Microsoft Research.
Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.
Chapter 3 An Introduction to Relational Databases.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Attribute databases. GIS Definition Diagram Output Query Results.
1DBTest2008. Motivation Background Relational Data Warehousing (DW) SQL Server 2008 Starjoin improvement Testing Challenge Extending Enterprise-class.
Query Processing Presented by Aung S. Win.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
Efficient Query Evaluation over Temporally Correlated Probabilistic Streams Bhargav Kanagal, Amol Deshpande ΗΥ-562 Advanced Topics on Databases Αλέκα Σεληνιωτάκη.
Fruitful functions. Return values The built-in functions we have used, such as abs, pow, int, max, and range, have produced results. Calling each of these.
Retrievals & Projections Objectives of the Lecture : To consider retrieval actions from a DB; To consider using relational algebra for defining relations;
DATABASE MANAGEMENT SYSTEMS BASIC CONCEPTS 1. What is a database? A database is a collection of data which can be used: alone, or alone, or combined /
DATABASE MANAGEMENT SYSTEMS BASIC CONCEPTS 1. What is a database? A database is a collection of data which can be used: alone, or alone, or combined /
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Chapter 3 An Introduction to Relational Databases.
An Investigation of Oracle and SQL Server with respect to Integrity, and SQL Language standards Presented by: Paul Tarwireyi Supervisor: John Ebden Date:
Chapter 9 Integrity. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.9-2 Topics in this Chapter Predicates and Propositions Internal vs.
2. Database System Concepts and Architecture
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Access Path Selection in a Relational Database Management System Selinger et al.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Database Management 9. course. Execution of queries.
Querying Structured Text in an XML Database By Xuemei Luo.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Ad Hoc Constraints Objectives of the Lecture : To consider Ad Hoc Constraints in principle; To consider Ad Hoc Constraints in SQL; To consider other aspects.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
The Data in a Relation To consider atomic data in relations; To consider data types in a relation; To consider missing data & NULLs in relations. Objectives.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12.
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Exam and Lecture Overview.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
Data Structures R e c u r s i o n. Recursive Thinking Recursion is a problem-solving approach that can be used to generate simple solutions to certain.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
CS4432: Database Systems II Query Processing- Part 2.
Robust Estimation With Sampling and Approximate Pre-Aggregation Author: Christopher Jermaine Presented by: Bill Eberle.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Source Level Debugging of Parallel Programs Roland Wismüller LRR-TUM, TU München Germany.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
CS 540 Database Management Systems
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
Rationale Databases are an integral part of an organization. Aspiring Database Developers should be able to efficiently design and implement databases.
Reverse Query Processing Carsten Binnig, Donald Kossmann and Eric Lo ICDE 2007 Presented by Bhupesh Chawda.
Chapter (12) – Old Version
Applying Control Theory to Stream Processing Systems
About the Presentations
Chapter 15 QUERY EXECUTION.
1 Demand of your DB is changing Presented By: Ashwani Kumar
The Vision of Self-Aware Performance Models
A Framework for Testing Query Transformation Rules
Practical Database Design and Tuning Objectives
Presentation transcript:

© ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter Hass (IBM Almaden Research Center) Symbolic Query Processing

ETH Zurich 2 Symbolic Query Processing  Treat all data as symbols (think of variables)  E.g., a1 represents any value under the domain of attribute a  Table R and S are called symbolic relations

ETH Zurich 3 Background – Symbolic Execution 1/3  Borrow the concept from symbolic execution  A well known program verification technique  Represent values of program variables with symbolic values instead of concrete data  Manipulate expressions based on those symbolic values

ETH Zurich 4 Background – Symbolic Execution 2/3 1.minsalary = read_input(); 2.bensalary = minsalary ; 3.if (bensalary < 80000) 4.output “no kidding!”; 5.else 6.output “that’s right”; Find a test case for path 1  2  3  6 Symbolic execution – start: 1. minsalary = ben 2. bensalary = ben ; 3. bensalary = ben and !(bensalary < 80000);- ( ) Symbolic execution – end Instantiate ( ): ben =  expected input “that’s right”  expected output

ETH Zurich 5 Background – Symbolic Execution 3/3  Has been research for > 20 years  Still have many limitations  E.g., cannot handle highly complex software  However, many large software vendors still put hope on this technique for program verifications  E.g., Microsoft Research  No progress on database applications  involve an external database and SQL

ETH Zurich 6 SQP Applications  Extend program verification and symbolic execution techniques to support database applications  For DBMS testing  focus of today

ETH Zurich 7 Symbolic Query Processing  Query manipulates data according to different needs  R b=c S  Want the join results to have one tuple? set c1=b1  Want the join results to have:  four tuples  Zipf distribution (t1 joins more, t2 joins less)? b1

ETH Zurich 8 DBMS Testing  To test a DBMS, we generate a lot of test databases and execute a lot of test queries  DBMS vendors are looking for a way to control the intermediate results of a test query such that we can test an individual component of a DBMS under a particular test case

ETH Zurich 9 DBMS Testing Example  Test the accuracy of a cardinality estimation component of a query optimizer under  a multi-way hash join query  a two-way join query with aggregation  If we can make sure executing the test query on the test database gives expected answer

ETH Zurich 10 DBMS Testing  The test query is given  Physical join ordering can be fixed (by testers)  Evaluation algorithm (e.g., using hash-join) can be fixed too  However, the size of the intermediate results cannot be fixed easily

ETH Zurich 11 DBMS Testing Problem Guarantee that executing a test query on a test database can obtain the desired intermediate query results (e.g.,. output cardinality, data distribution)

ETH Zurich 12 DBMS Testing Problem  A test case T is:  a parametric query Q p  with a set of constraints C on each intermediate result  A good test database D means  Q p ( D ) satisfies C -if the set of parameters p is properly instantiated  D covers test case T Test case T

ETH Zurich 13 Trial-and-error  Generate Database 3, 2, and 1  Using traditional database generators such as IBM Test DB generator, MSR DB generator, etc  Search for parameters  T 2 is never covered  The database generation process does not care about the test queries

ETH Zurich 14 Latest approach – Finding query parameters  MSR realized this problem [TKDE06]  Given the test database + the test query Q p, search parameter values for p such that Q p (D) (almost) fit the cardinality requirements defined on the test case  It is a NP -hard problem  Same as the previous approach, T 2 is never covered

ETH Zurich 15 QAGen – Query Aware test database Generator  Based on symbolic query processing  We can control the output size of each intermediate query result (and even more)

ETH Zurich 16 QAGen – Generate a query-aware test database for each test case

ETH Zurich 17 QAGen overview

ETH Zurich 18 QAGen overview – Query Analyzer  Analyzer the query and assign the knob to an operator  A knob is a parameter of an operator to control the output (e.g., output cardinality, distribution)  A knob for an operator is not always available for tuning

ETH Zurich 19 QAGen overview – Query Analyzer A knob for an operator is not always available for tuning join distribution? Yes join distribution? No

ETH Zurich 20 QAGen overview – Query Analyzer The available knob(s) for an operator depends on its input characteristics Definition: pre-grouping data Definition: non pre-grouping data

ETH Zurich 21 QAGen overview – Query Analyzer

ETH Zurich 22 Symbolic Query Engine and Symbolic Database

ETH Zurich 23 Symbolic Query Engine and Symbolic Database (SDB)  An SQL operator:  Add predicates to a symbol  Replace a symbol with another other symbol (e.g., joining)  E.g., SELECT a FROM R WHERE a > p;  1 output σ a>p <=p >p

ETH Zurich 24 Symbolic Query Engine and Symbolic Database (SDB)  How to physically store the symbolic data?  Options:  Implement a native symbolic database  Use relational database -How to represent “ a1 > p ”? -Stores all predicates that are associated with a symbol s in a separate relation called PTable <=p >p a1 a1>p a2 a2<=p s Pred. PTable

ETH Zurich 25 Data Instantiator

ETH Zurich 26 Data Instantiation Data instantiator uses a constraint solver: Input: a (propositional) constraint (e.g., A + B > 50) Output: any concrete values for the constraint (e.g., A=99, B=12)

ETH Zurich 27 Symbolic Query Engine

ETH Zurich 28 Symbolic Query Engine  Iterator-based  open(), getNext(), close()  No naughty user  Contradicting knob values

ETH Zurich 29 SQP – Table operator  Fill up the table with symbols

ETH Zurich 30 SQP – σ operator

ETH Zurich 31 SQP – operator (with FK constraint) Action: join key replacement

ETH Zurich 32 SQP – operator (with FK constraint) Action: join key replacement

ETH Zurich 33 SQP – operator (with FK constraint)  When the input of the join is pre-grouped, the world has changed  It sometimes happen, e.g.,  2-way join  Base tables A, B and C with foreign key relationships  A  B, B  C

ETH Zurich 34 SQP – operator (with FK constraint)  Do not support join distribution (the knob is disabled by the analyzer)  Controlling the output cardinality is a subset-sum problem (weakly NP -hard)  Subset-sum has a pseudo-polynomial time exact solution using dynamic programming

ETH Zurich 35 SQP – operator (with FK constraint)  Blocking  During open()  Materialize Table S in a temporary relation  SELECT COUNT(k) From S GROUP BY k  Solve the subset-sum

ETH Zurich 36 SQP – χ operator Action 1: Aggregation attribute replacement o_date3  o_date1 o_date4  o_date2 2 nd output group (o_date2) 1st output group (o_date1)

ETH Zurich 37 SQP – χ operator Action 2 (base case version): - Adding aggregation constraints to PTable, base case:

ETH Zurich 38 SQP – χ operator Action 2 (optimized version): - A constraint solver call is exponential to the size of predicates - Adding 2 aggregation constraints to PTable: and do l_price replacement

ETH Zurich 39 Data Instantiation

ETH Zurich 40 Data Instantiation  Use a constraint solver to instantiate the symbolic database  for each symbolic relation r for each tuple t for each symbol s load the related predicates P instantiate P cache P

ETH Zurich 41 Experiment 1 – Operator Performance  Study the performance (and scalability) of  Individual operator during SQP  The data instantiation phase  Use TPC-dbgen to generate 3 TPCH-DB  10M, 100M, 1G  Q8(TPCH-DB) to collect the intermediate results R for each operator  QAGen(Q8, R)  Q8 query aware database

ETH Zurich 42 Experiments – TPC-H Query 8

ETH Zurich 43 Experiment 1 – TPC-H Query 8

ETH Zurich 44 Experiment 2 – Effects of knob values  Use TPCH Q8  6 sets of knob values  TPCH-Uniform, TPCH-Zipf  Min-Uniform, Min-Zipf  Max-Uniform, Max-Zipf

ETH Zurich 45 Experiment 2 – Effects of knob values

ETH Zurich 46 Experiment 3 – System Scalability

ETH Zurich 47 Related Work, Future Work, Conclusions  Reverse Query Processing (ICDE07)  Given the result R, the query Q, reversely process Q to generate D  for function testing database applications, view maintenance, debugging SQL  Multiple SQL statements (to ACM TSE journal)

ETH Zurich 48

ETH Zurich 49 Current approach 2 – Stochastically generate many test queries  Based on a given test database, RAGS/QGen generates many valid SQL queries to test the system  No guarantee that T 1 can be covered  Same as the previous approach, T 2 is never covered

ETH Zurich 50 QAGen overview – Query Analyzer  Each knob combination (e.g., output cardinality + join distribution) for an operator may have different ways to implement it  The output is an knob- annotated execution plan