Self-Tuning and Self-Configuring Systems Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 16, 2005.

Self-Tuning and Self-Configuring Systems Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 16, 2005

2 Administrivia No class 3/21 – out of town  Read and summarize the Natix paper for Wednesday 3/23  Tomorrow, 3PM, Levine 101: Shuchi Chawla, CMU, Path-Planning Algorithms  Tomorrow, 4:30PM, Levine 307: Sihem Amer-Yahia, AT&T Labs: “Full-Text Querying in XML: A Little Bit of Standards and Lot's o' Research”  Tuesday, 3/21, Levine 101: Nick Feamster, MIT: Robust Internet Routing

3 Today’s Trivia Question

4 Midterm Mini-Retrospective  We’ve now seen many of the major issues in databases  … Which are?  Mike Stonebraker thinks we’ve run out of good things to work on  Is he right?  What problems should people be working on now?

5 A Few of My Thoughts (Please chime in with your own!)  More automation  Different data types  “Schema mostly”, text, …  Semantic reconciliation and mapping  Perhaps we’ll never solve this, but we can clearly do better  Uncertainty and inconsistency  Probabilities, inconsistencies, different perspectives, …  Truly scalable data sharing  Can’t we share at the level of the Web?  Two-way data exchange  Streams and sensors

6 Self-Tuning Systems  Databases are complicated!  Schema design is hard  Lots of “knobs” to tweak  Need appropriate information  Does the DB approach give us more ability to “self- tune” than some other approach (e.g., Java)?

7 What Would We Like to Auto-Tune?  Query optimization – statistics, bad decisions, …  The schema itself?  Indices  Auxiliary materialized views  Data partitioning  Perhaps logging?

8 What Are The Challenges in Building Adaptive Systems?  Really, a generalization of those in adaptive query processing  Information gathering – how do we get it?  Extrapolating – how do we do this accurately and efficiently?  Sampling or piloting  Minimizing the impact of mistakes if they happen  Using app-specific knowledge

9 Who’s Interested in these Problems?  Oracle:  Materialized view “wizard”  Microsoft “AutoAdmin”:  Index selection, materialized view selection  Stats on materialized views  Database layout  IBM SMART (Self-Managing And Resource Tuning):  Histogram tuning (“LEO” learning optimizer)  Partitioning in clusters  Index selection  Adaptive query processing

10 A Particular Instance: Microsoft’s Index Tuning Wizard  Why not let the system choose the best index combination(s) for a workload  The basic idea:  Log a whole bunch of queries that are frequently run  See what set of indices is best  Why is this hard? Why not index everything?  Create these indices with little or no human input

11 Possible Approaches  Obviously: only consider indices that would be useful  The optimizer can “tell” which indices it might use in executing a query  But that continues to be a lot of indices!  Can exhaustively compare all possible indices  Note that indices can interact (esp. for updates)  How do we compare costs and benefits of indices?  Execute for real  Use optimizer cost model with whatever stats we have  Gather some stats (e.g., build histograms, sample) and use cost model

12 SQL Server Architecture

13 Their Approach in More Detail  For a workload of n queries:  Generate a separate workload with each query  Evaluate the candidate indices for this query to find the best “configuration” – limited to 2 indices, 2 tables, single joins  Candidate index set for workload is the union of all configurations  Too expensive to enumerate all; use a greedy algorithm:  Exhaustively enumerate (using optimizer) best m-index configuration  Pick a new index I to add, which seems to save cost relative to adding some other I’ or to the current cost  Repeat until we’ve added “enough” k indices  “Despite interaction among indices, the largest cost reductions often result from indices that are good candidates by themselves”  They iteratively expand to 2-column indices – index on leading column must be desirable for this to be desirable

14 How Many Candidates?

15 Savings Due to Considering Single Joins

16 Compared to Baseline Baseline considers all indices during enumeration, with greedy algorithm mentioned previously

17 Further Enhancements  Use the tool for “what-if” analysis  What if a table grows by a substantial amount?  Supplement with extra info gathered from real query execution  Maybe we can “tweak” estimates for certain selectivities  An attempt to compensate for the “exponential error” problem

18 Where Next?  Perhaps we can go further, automating database design itself?  How would we start to tackle this problem?

19 Next Time: XML  The bridge between databases and the Web in general

Self-Tuning and Self-Configuring Systems Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 16, 2005.

Similar presentations

Presentation on theme: "Self-Tuning and Self-Configuring Systems Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 16, 2005."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Self-Tuning and Self-Configuring Systems Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 16, 2005.

Similar presentations

Presentation on theme: "Self-Tuning and Self-Configuring Systems Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 16, 2005."— Presentation transcript:

Similar presentations

About project

Feedback