Download presentation
Presentation is loading. Please wait.
Published byRudolph Hutchinson Modified over 9 years ago
1
© ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter Hass (IBM Almaden Research Center) Symbolic Query Processing
2
ETH Zurich 2 Symbolic Query Processing Treat all data as symbols (think of variables) E.g., a1 represents any value under the domain of attribute a Table R and S are called symbolic relations
3
ETH Zurich 3 Background – Symbolic Execution 1/3 Borrow the concept from symbolic execution A well known program verification technique Represent values of program variables with symbolic values instead of concrete data Manipulate expressions based on those symbolic values
4
ETH Zurich 4 Background – Symbolic Execution 2/3 1.minsalary = read_input(); 2.bensalary = minsalary + 2000; 3.if (bensalary < 80000) 4.output “no kidding!”; 5.else 6.output “that’s right”; Find a test case for path 1 2 3 6 Symbolic execution – start: 1. minsalary = ben 2. bensalary = ben + 2000; 3. bensalary = ben + 2000 and !(bensalary < 80000);- ( ) Symbolic execution – end Instantiate ( ): ben = 90000 expected input “that’s right” expected output
5
ETH Zurich 5 Background – Symbolic Execution 3/3 Has been research for > 20 years Still have many limitations E.g., cannot handle highly complex software However, many large software vendors still put hope on this technique for program verifications E.g., Microsoft Research No progress on database applications involve an external database and SQL
6
ETH Zurich 6 SQP Applications Extend program verification and symbolic execution techniques to support database applications For DBMS testing focus of today
7
ETH Zurich 7 Symbolic Query Processing Query manipulates data according to different needs R b=c S Want the join results to have one tuple? set c1=b1 Want the join results to have: four tuples Zipf distribution (t1 joins more, t2 joins less)? b1
8
ETH Zurich 8 DBMS Testing To test a DBMS, we generate a lot of test databases and execute a lot of test queries DBMS vendors are looking for a way to control the intermediate results of a test query such that we can test an individual component of a DBMS under a particular test case
9
ETH Zurich 9 DBMS Testing Example Test the accuracy of a cardinality estimation component of a query optimizer under a multi-way hash join query a two-way join query with aggregation If we can make sure executing the test query on the test database gives expected answer
10
ETH Zurich 10 DBMS Testing The test query is given Physical join ordering can be fixed (by testers) Evaluation algorithm (e.g., using hash-join) can be fixed too However, the size of the intermediate results cannot be fixed easily
11
ETH Zurich 11 DBMS Testing Problem Guarantee that executing a test query on a test database can obtain the desired intermediate query results (e.g.,. output cardinality, data distribution)
12
ETH Zurich 12 DBMS Testing Problem A test case T is: a parametric query Q p with a set of constraints C on each intermediate result A good test database D means Q p ( D ) satisfies C -if the set of parameters p is properly instantiated D covers test case T Test case T
13
ETH Zurich 13 Trial-and-error Generate Database 3, 2, and 1 Using traditional database generators such as IBM Test DB generator, MSR DB generator, etc Search for parameters T 2 is never covered The database generation process does not care about the test queries
14
ETH Zurich 14 Latest approach – Finding query parameters MSR realized this problem [TKDE06] Given the test database + the test query Q p, search parameter values for p such that Q p (D) (almost) fit the cardinality requirements defined on the test case It is a NP -hard problem Same as the previous approach, T 2 is never covered
15
ETH Zurich 15 QAGen – Query Aware test database Generator Based on symbolic query processing We can control the output size of each intermediate query result (and even more)
16
ETH Zurich 16 QAGen – Generate a query-aware test database for each test case
17
ETH Zurich 17 QAGen overview
18
ETH Zurich 18 QAGen overview – Query Analyzer Analyzer the query and assign the knob to an operator A knob is a parameter of an operator to control the output (e.g., output cardinality, distribution) A knob for an operator is not always available for tuning
19
ETH Zurich 19 QAGen overview – Query Analyzer A knob for an operator is not always available for tuning join distribution? Yes join distribution? No
20
ETH Zurich 20 QAGen overview – Query Analyzer The available knob(s) for an operator depends on its input characteristics Definition: pre-grouping data Definition: non pre-grouping data
21
ETH Zurich 21 QAGen overview – Query Analyzer
22
ETH Zurich 22 Symbolic Query Engine and Symbolic Database
23
ETH Zurich 23 Symbolic Query Engine and Symbolic Database (SDB) An SQL operator: Add predicates to a symbol Replace a symbol with another other symbol (e.g., joining) E.g., SELECT a FROM R WHERE a > p; 1 output σ a>p <=p >p
24
ETH Zurich 24 Symbolic Query Engine and Symbolic Database (SDB) How to physically store the symbolic data? Options: Implement a native symbolic database Use relational database -How to represent “ a1 > p ”? -Stores all predicates that are associated with a symbol s in a separate relation called PTable <=p >p a1 a1>p a2 a2<=p s Pred. PTable
25
ETH Zurich 25 Data Instantiator
26
ETH Zurich 26 Data Instantiation Data instantiator uses a constraint solver: Input: a (propositional) constraint (e.g., A + B > 50) Output: any concrete values for the constraint (e.g., A=99, B=12)
27
ETH Zurich 27 Symbolic Query Engine
28
ETH Zurich 28 Symbolic Query Engine Iterator-based open(), getNext(), close() No naughty user Contradicting knob values
29
ETH Zurich 29 SQP – Table operator Fill up the table with symbols
30
ETH Zurich 30 SQP – σ operator
31
ETH Zurich 31 SQP – operator (with FK constraint) Action: join key replacement
32
ETH Zurich 32 SQP – operator (with FK constraint) Action: join key replacement
33
ETH Zurich 33 SQP – operator (with FK constraint) When the input of the join is pre-grouped, the world has changed It sometimes happen, e.g., 2-way join Base tables A, B and C with foreign key relationships A B, B C
34
ETH Zurich 34 SQP – operator (with FK constraint) Do not support join distribution (the knob is disabled by the analyzer) Controlling the output cardinality is a subset-sum problem (weakly NP -hard) Subset-sum has a pseudo-polynomial time exact solution using dynamic programming
35
ETH Zurich 35 SQP – operator (with FK constraint) Blocking During open() Materialize Table S in a temporary relation SELECT COUNT(k) From S GROUP BY k Solve the subset-sum
36
ETH Zurich 36 SQP – χ operator Action 1: Aggregation attribute replacement o_date3 o_date1 o_date4 o_date2 2 nd output group (o_date2) 1st output group (o_date1)
37
ETH Zurich 37 SQP – χ operator Action 2 (base case version): - Adding aggregation constraints to PTable, base case:
38
ETH Zurich 38 SQP – χ operator Action 2 (optimized version): - A constraint solver call is exponential to the size of predicates - Adding 2 aggregation constraints to PTable: and do l_price replacement
39
ETH Zurich 39 Data Instantiation
40
ETH Zurich 40 Data Instantiation Use a constraint solver to instantiate the symbolic database for each symbolic relation r for each tuple t for each symbol s load the related predicates P instantiate P cache P
41
ETH Zurich 41 Experiment 1 – Operator Performance Study the performance (and scalability) of Individual operator during SQP The data instantiation phase Use TPC-dbgen to generate 3 TPCH-DB 10M, 100M, 1G Q8(TPCH-DB) to collect the intermediate results R for each operator QAGen(Q8, R) Q8 query aware database
42
ETH Zurich 42 Experiments – TPC-H Query 8
43
ETH Zurich 43 Experiment 1 – TPC-H Query 8
44
ETH Zurich 44 Experiment 2 – Effects of knob values Use TPCH Q8 6 sets of knob values TPCH-Uniform, TPCH-Zipf Min-Uniform, Min-Zipf Max-Uniform, Max-Zipf
45
ETH Zurich 45 Experiment 2 – Effects of knob values
46
ETH Zurich 46 Experiment 3 – System Scalability
47
ETH Zurich 47 Related Work, Future Work, Conclusions Reverse Query Processing (ICDE07) Given the result R, the query Q, reversely process Q to generate D for function testing database applications, view maintenance, debugging SQL Multiple SQL statements (to ACM TSE journal)
48
ETH Zurich 48
49
ETH Zurich 49 Current approach 2 – Stochastically generate many test queries Based on a given test database, RAGS/QGen generates many valid SQL queries to test the system No guarantee that T 1 can be covered Same as the previous approach, T 2 is never covered
50
ETH Zurich 50 QAGen overview – Query Analyzer Each knob combination (e.g., output cardinality + join distribution) for an operator may have different ways to implement it The output is an knob- annotated execution plan
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.