To Tune or not to Tune? A Lightweight Physical Design Alerter Nico Bruno, Surajit Chaudhuri DMX Group, Microsoft Research VLDB’06
2 A DBA’s Dilemma Physical design tuning is important Workloads and data change over time Installations often become suboptimal Current tools: good but expensive DBAs: Avoid suboptimal installations Periodically run expensive tools If no improvement, wasted resources Tuner DBMS SELECT … INSERT … SELECT … Recommendation: { Index1, Index2, View1, View2}
3 A Lightweight Alerter Low-overhead diagnostics Reliable lower-bound improvement No false positives “Proof” with valid configuration Upper-bound improvement Reduce false negatives
4 Outline Instrumenting the optimizer Access path selection Index requests Lower bounds Local transformations Alerting algorithm Upper bounds Experimental results
5 Access Path Selection Single entry-point for access-path selection (System-R, Cascades) Intercept requests during optimization, save logical properties for later
6 Access Path Requests SELECT T.b FROM T1, T2, T3 WHERE T1.x=T2.y AND T1.w=T3.z AND T1.a=5 AND T3.b=8
7 Monitoring Access Path Requests “ AND/OR trees” Encode relationships between requests Aggregated across queries 2-level normalized AND/OR tree.
8 Local Transformations Requests encode properties of any physical plan rooted at the corresponding operator Allow cost inferences for varying physical designs without calling the optimizer Result is upper bound of query cost after true optimization If cost is 0.02, query is = 0.06 faster
9 Impact of Hypothetical Indexes Single index, single request Exploits logical information about request Safe inferences on subset of valid plans Only need costs, do not “build” plans Multiple indexes, multiple requests Analyze all available indexes for each request Exploit AND/OR tree for multiple requests Measures lower bound in difference between current and original configurations
10 Alerting Algorithm For each request in T, obtain index that results in best strategy Repeat while space constraint is not satisfied and improvement still large enough. AND/OR tree gathered during original optimization No additional optimizer calls! If size between storage bounds and improvement is big enough, save configuration for alert. Transformations: - Index Merge. - Index Deletion.
11 Upper Bounds Reduce false negatives Alert if: improvement is at least 25% OR maximum improvement is 75% Fast Upper Bounds Track all requests (not only AND/OR tree) Group requests by table Calculate “required work” Tighter Upper Bounds Add new optimization phase that only considers viable plans More expensive, but tightest upper bound
12 Handling Updates Update queries are handled as: (select core) + (update shell) Optimizer instrumentation: also gathers update information Lower bounds: small changes to main algorithm (skyline of alternatives, non- monotonic improvement) Upper bounds: Add necessary work for update shells
13 Experimental Evaluation Real and synthetic databases Metrics: Execution time and Improvement Experiments: Monitoring Overhead (server optimization) Diagnostics Overhead (alerting client) Quality of bounds/recommendation
14 Performance Server Overhead for Upper Bounds (Lower Bound Overhead << 1%) Client Overhead for lower + upper bounds TPC-H Database and workloads
15 Varying Workloads TPC-H workloads W 1 (first 11 queries) W 2 (last 11 queries) W 3 (mix). Initial design tuned for W 1
16 Varying Initial Physical Design TPC-H database and workloads C i is recommendation of alerter after executing the workload under C i-1
17 Conclusions Alerter fills gap in automatic physical design tools Low server/client overhead, can monitor/diagnose very efficiently Lower bounds are supported by valid (applicable) configurations Upper bounds provide additional flexibility for defining policies
18 Lower and Upper bounds for improvement Single-Query Workloads TPC-H Database and workloads
19 Complex Workloads TPCH MIRMS