Presentation is loading. Please wait.

Presentation is loading. Please wait.

© ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

Similar presentations


Presentation on theme: "© ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter."— Presentation transcript:

1 © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter Hass (IBM Almaden Research Center) Symbolic Query Processing

2 ETH Zurich 2 Symbolic Query Processing  Treat all data as symbols (think of variables)  E.g., a1 represents any value under the domain of attribute a  Table R and S are called symbolic relations

3 ETH Zurich 3 Background – Symbolic Execution 1/3  Borrow the concept from symbolic execution  A well known program verification technique  Represent values of program variables with symbolic values instead of concrete data  Manipulate expressions based on those symbolic values

4 ETH Zurich 4 Background – Symbolic Execution 2/3 1.minsalary = read_input(); 2.bensalary = minsalary + 2000; 3.if (bensalary < 80000) 4.output “no kidding!”; 5.else 6.output “that’s right”; Find a test case for path 1  2  3  6 Symbolic execution – start: 1. minsalary = ben 2. bensalary = ben + 2000; 3. bensalary = ben + 2000 and !(bensalary < 80000);- ( ) Symbolic execution – end Instantiate ( ): ben = 90000  expected input “that’s right”  expected output

5 ETH Zurich 5 Background – Symbolic Execution 3/3  Has been research for > 20 years  Still have many limitations  E.g., cannot handle highly complex software  However, many large software vendors still put hope on this technique for program verifications  E.g., Microsoft Research  No progress on database applications  involve an external database and SQL

6 ETH Zurich 6 SQP Applications  Extend program verification and symbolic execution techniques to support database applications  For DBMS testing  focus of today

7 ETH Zurich 7 Symbolic Query Processing  Query manipulates data according to different needs  R b=c S  Want the join results to have one tuple? set c1=b1  Want the join results to have:  four tuples  Zipf distribution (t1 joins more, t2 joins less)? b1

8 ETH Zurich 8 DBMS Testing  To test a DBMS, we generate a lot of test databases and execute a lot of test queries  DBMS vendors are looking for a way to control the intermediate results of a test query such that we can test an individual component of a DBMS under a particular test case

9 ETH Zurich 9 DBMS Testing Example  Test the accuracy of a cardinality estimation component of a query optimizer under  a multi-way hash join query  a two-way join query with aggregation  If we can make sure executing the test query on the test database gives expected answer

10 ETH Zurich 10 DBMS Testing  The test query is given  Physical join ordering can be fixed (by testers)  Evaluation algorithm (e.g., using hash-join) can be fixed too  However, the size of the intermediate results cannot be fixed easily

11 ETH Zurich 11 DBMS Testing Problem Guarantee that executing a test query on a test database can obtain the desired intermediate query results (e.g.,. output cardinality, data distribution)

12 ETH Zurich 12 DBMS Testing Problem  A test case T is:  a parametric query Q p  with a set of constraints C on each intermediate result  A good test database D means  Q p ( D ) satisfies C -if the set of parameters p is properly instantiated  D covers test case T Test case T

13 ETH Zurich 13 Trial-and-error  Generate Database 3, 2, and 1  Using traditional database generators such as IBM Test DB generator, MSR DB generator, etc  Search for parameters  T 2 is never covered  The database generation process does not care about the test queries

14 ETH Zurich 14 Latest approach – Finding query parameters  MSR realized this problem [TKDE06]  Given the test database + the test query Q p, search parameter values for p such that Q p (D) (almost) fit the cardinality requirements defined on the test case  It is a NP -hard problem  Same as the previous approach, T 2 is never covered

15 ETH Zurich 15 QAGen – Query Aware test database Generator  Based on symbolic query processing  We can control the output size of each intermediate query result (and even more)

16 ETH Zurich 16 QAGen – Generate a query-aware test database for each test case

17 ETH Zurich 17 QAGen overview

18 ETH Zurich 18 QAGen overview – Query Analyzer  Analyzer the query and assign the knob to an operator  A knob is a parameter of an operator to control the output (e.g., output cardinality, distribution)  A knob for an operator is not always available for tuning

19 ETH Zurich 19 QAGen overview – Query Analyzer A knob for an operator is not always available for tuning join distribution? Yes join distribution? No

20 ETH Zurich 20 QAGen overview – Query Analyzer The available knob(s) for an operator depends on its input characteristics Definition: pre-grouping data Definition: non pre-grouping data

21 ETH Zurich 21 QAGen overview – Query Analyzer

22 ETH Zurich 22 Symbolic Query Engine and Symbolic Database

23 ETH Zurich 23 Symbolic Query Engine and Symbolic Database (SDB)  An SQL operator:  Add predicates to a symbol  Replace a symbol with another other symbol (e.g., joining)  E.g., SELECT a FROM R WHERE a > p;  1 output σ a>p <=p >p

24 ETH Zurich 24 Symbolic Query Engine and Symbolic Database (SDB)  How to physically store the symbolic data?  Options:  Implement a native symbolic database  Use relational database -How to represent “ a1 > p ”? -Stores all predicates that are associated with a symbol s in a separate relation called PTable <=p >p a1 a1>p a2 a2<=p s Pred. PTable

25 ETH Zurich 25 Data Instantiator

26 ETH Zurich 26 Data Instantiation Data instantiator uses a constraint solver: Input: a (propositional) constraint (e.g., A + B > 50) Output: any concrete values for the constraint (e.g., A=99, B=12)

27 ETH Zurich 27 Symbolic Query Engine

28 ETH Zurich 28 Symbolic Query Engine  Iterator-based  open(), getNext(), close()  No naughty user  Contradicting knob values

29 ETH Zurich 29 SQP – Table operator  Fill up the table with symbols

30 ETH Zurich 30 SQP – σ operator

31 ETH Zurich 31 SQP – operator (with FK constraint) Action: join key replacement

32 ETH Zurich 32 SQP – operator (with FK constraint) Action: join key replacement

33 ETH Zurich 33 SQP – operator (with FK constraint)  When the input of the join is pre-grouped, the world has changed  It sometimes happen, e.g.,  2-way join  Base tables A, B and C with foreign key relationships  A  B, B  C

34 ETH Zurich 34 SQP – operator (with FK constraint)  Do not support join distribution (the knob is disabled by the analyzer)  Controlling the output cardinality is a subset-sum problem (weakly NP -hard)  Subset-sum has a pseudo-polynomial time exact solution using dynamic programming

35 ETH Zurich 35 SQP – operator (with FK constraint)  Blocking  During open()  Materialize Table S in a temporary relation  SELECT COUNT(k) From S GROUP BY k  Solve the subset-sum

36 ETH Zurich 36 SQP – χ operator Action 1: Aggregation attribute replacement o_date3  o_date1 o_date4  o_date2 2 nd output group (o_date2) 1st output group (o_date1)

37 ETH Zurich 37 SQP – χ operator Action 2 (base case version): - Adding aggregation constraints to PTable, base case:

38 ETH Zurich 38 SQP – χ operator Action 2 (optimized version): - A constraint solver call is exponential to the size of predicates - Adding 2 aggregation constraints to PTable: and do l_price replacement

39 ETH Zurich 39 Data Instantiation

40 ETH Zurich 40 Data Instantiation  Use a constraint solver to instantiate the symbolic database  for each symbolic relation r for each tuple t for each symbol s load the related predicates P instantiate P cache P

41 ETH Zurich 41 Experiment 1 – Operator Performance  Study the performance (and scalability) of  Individual operator during SQP  The data instantiation phase  Use TPC-dbgen to generate 3 TPCH-DB  10M, 100M, 1G  Q8(TPCH-DB) to collect the intermediate results R for each operator  QAGen(Q8, R)  Q8 query aware database

42 ETH Zurich 42 Experiments – TPC-H Query 8

43 ETH Zurich 43 Experiment 1 – TPC-H Query 8

44 ETH Zurich 44 Experiment 2 – Effects of knob values  Use TPCH Q8  6 sets of knob values  TPCH-Uniform, TPCH-Zipf  Min-Uniform, Min-Zipf  Max-Uniform, Max-Zipf

45 ETH Zurich 45 Experiment 2 – Effects of knob values

46 ETH Zurich 46 Experiment 3 – System Scalability

47 ETH Zurich 47 Related Work, Future Work, Conclusions  Reverse Query Processing (ICDE07)  Given the result R, the query Q, reversely process Q to generate D  for function testing database applications, view maintenance, debugging SQL  Multiple SQL statements (to ACM TSE journal)

48 ETH Zurich 48

49 ETH Zurich 49 Current approach 2 – Stochastically generate many test queries  Based on a given test database, RAGS/QGen generates many valid SQL queries to test the system  No guarantee that T 1 can be covered  Same as the previous approach, T 2 is never covered

50 ETH Zurich 50 QAGen overview – Query Analyzer  Each knob combination (e.g., output cardinality + join distribution) for an operator may have different ways to implement it  The output is an knob- annotated execution plan


Download ppt "© ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter."

Similar presentations


Ads by Google