Self-Tuning and Self-Configuring Systems Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 16, 2005.

Slides:



Advertisements
Similar presentations
Grand Challenges for the Database Community Jim Gray Microsoft.
Advertisements

Physical Database Design and Tuning R&G - Chapter 20 Although the whole of this life were said to be nothing but a dream and the physical world nothing.
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Debugging/Tuning Queries via iSeries Navigator Tom McKinley
Copyright © SoftTree Technologies, Inc. DB Tuning Expert.
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.
Cluster Analysis Purpose and process of clustering Profile analysis Selection of variables and sample Determining the # of clusters.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Using the Optimizer to Generate an Effective Regression Suite: A First Step Murali M. Krishna Presented by Harumi Kuno HP.
Database Systems A 1. 2  Project goal: to tackle and resolve real-life DB related development issues  So what do we need to do:  Design.
CMPUT 466/551 Principal Source: CMU
IBM Software Group ® Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Outline SQL Server Optimizer  Enumeration architecture  Search space: flexibility/extensibility  Cost and statistics Automatic Physical Tuning  Database.
An Efficient Cost-Driven Selection Tool for Microsoft SQL Server Surajit ChaudhuriVivek Narasayya Indian Institute of Technology Bombay CS632 Course seminar.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Wrapup Amol Deshpande CMSC424. “Inventing the Future” Wednesday at 3:30pm 1115 CSIC Exam.
1 CIS607, Fall 2005 Semantic Information Integration Presentation by Dayi Zhou Week 4 (Oct. 19)
Chapter 6: Database Evolution Title: AutoAdmin “What-if” Index Analysis Utility Authors: Surajit Chaudhuri, Vivek Narasayya ACM SIGMOD 1998.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Opportunistic Optimization for Market-Based Multirobot Control M. Bernardine Dias and Anthony Stentz Presented by: Wenjin Zhou.
Clustering and greedy algorithms Prof. Noah Snavely CS1114
Google and Scalable Query Services
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Evaluation February 23,
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
© 2006 IBM Corporation Adaptive Self-Tuning Memory in DB2 Adam Storm, Christian Garcia-Arellano, Sam Lightstone – IBM Toronto Lab Yixin Diao, M. Surendra.
Web Search Created by Ejaj Ahamed. What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Finding dense components in weighted graphs Paul Horn
Oracle Challenges Parallelism Limitations Parallelism is the ability for a single query to be run across multiple processors or servers. Large queries.
Access Path Selection in a Relational Database Management System Selinger et al.
The Self-Managing Database: Guided Application and SQL Tuning Mohamed Ziauddin Consulting Member of Technical Staff Oracle Corporation Session id:
Module 11: Programming Across Multiple Servers. Overview Introducing Distributed Queries Setting Up a Linked Server Environment Working with Linked Servers.
Oracle Tuning Considerations. Agenda Why Tune ? Why Tune ? Ways to Improve Performance Ways to Improve Performance Hardware Hardware Software Software.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Digas Digital Archiving System. Digas is the database program used for research and fact checking in the Research Department (“Dokumentation”, ~ 60 researchers)
CERN – European Organization for Nuclear Research Administrative Support - Internet Development Services CET and the quest for optimal implementation and.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
1 Oracle Enterprise Manager Slides from Dominic Gélinas CIS
Grade Book Database Presentation Jeanne Winstead CINS 137.
Experimental Algorithmics Reading Group, UBC, CS Presented paper: Fine-tuning of Algorithms Using Fractional Experimental Designs and Local Search by Belarmino.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Aliya Farheen October 29,2015.
Database Projects in Visual Studio Improving Reliability & Productivity.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Copyright Sammamish Software Services All rights reserved. 1 Prog 140  SQL Server Performance Monitoring and Tuning.
Execution Plans Detail From Zero to Hero İsmail Adar.
DB Index Expert Copyright © SoftTree Technologies, Inc.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
3 Copyright © 2006, Oracle. All rights reserved. Designing and Developing for Performance.
CS 540 Database Management Systems
An Efficient, Cost-Driven Index Selection Tool for MS-SQL Server
Challenges in Creating an Automated Protein Structure Metaserver
A Black-Box Approach to Query Cardinality Estimation
Proactive Re-optimization
COTS testing Tor Stålhane.
Query Optimization Techniques
Automatic Physical Design Tuning: Workload as a Sequence
Content of Presentation
Database Systems Instructor Name: Lecture-3.
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Diving into Query Execution Plans
Maintenance of data warehouse
Query Optimization Techniques
A modest attempt at measuring and communicating about quality
Presentation transcript:

Self-Tuning and Self-Configuring Systems Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 16, 2005

2 Administrivia No class 3/21 – out of town  Read and summarize the Natix paper for Wednesday 3/23  Tomorrow, 3PM, Levine 101: Shuchi Chawla, CMU, Path-Planning Algorithms  Tomorrow, 4:30PM, Levine 307: Sihem Amer-Yahia, AT&T Labs: “Full-Text Querying in XML: A Little Bit of Standards and Lot's o' Research”  Tuesday, 3/21, Levine 101: Nick Feamster, MIT: Robust Internet Routing

3 Today’s Trivia Question

4 Midterm Mini-Retrospective  We’ve now seen many of the major issues in databases  … Which are?  Mike Stonebraker thinks we’ve run out of good things to work on  Is he right?  What problems should people be working on now?

5 A Few of My Thoughts (Please chime in with your own!)  More automation  Different data types  “Schema mostly”, text, …  Semantic reconciliation and mapping  Perhaps we’ll never solve this, but we can clearly do better  Uncertainty and inconsistency  Probabilities, inconsistencies, different perspectives, …  Truly scalable data sharing  Can’t we share at the level of the Web?  Two-way data exchange  Streams and sensors

6 Self-Tuning Systems  Databases are complicated!  Schema design is hard  Lots of “knobs” to tweak  Need appropriate information  Does the DB approach give us more ability to “self- tune” than some other approach (e.g., Java)?

7 What Would We Like to Auto-Tune?  Query optimization – statistics, bad decisions, …  The schema itself?  Indices  Auxiliary materialized views  Data partitioning  Perhaps logging?

8 What Are The Challenges in Building Adaptive Systems?  Really, a generalization of those in adaptive query processing  Information gathering – how do we get it?  Extrapolating – how do we do this accurately and efficiently?  Sampling or piloting  Minimizing the impact of mistakes if they happen  Using app-specific knowledge

9 Who’s Interested in these Problems?  Oracle:  Materialized view “wizard”  Microsoft “AutoAdmin”:  Index selection, materialized view selection  Stats on materialized views  Database layout  IBM SMART (Self-Managing And Resource Tuning):  Histogram tuning (“LEO” learning optimizer)  Partitioning in clusters  Index selection  Adaptive query processing

10 A Particular Instance: Microsoft’s Index Tuning Wizard  Why not let the system choose the best index combination(s) for a workload  The basic idea:  Log a whole bunch of queries that are frequently run  See what set of indices is best  Why is this hard? Why not index everything?  Create these indices with little or no human input

11 Possible Approaches  Obviously: only consider indices that would be useful  The optimizer can “tell” which indices it might use in executing a query  But that continues to be a lot of indices!  Can exhaustively compare all possible indices  Note that indices can interact (esp. for updates)  How do we compare costs and benefits of indices?  Execute for real  Use optimizer cost model with whatever stats we have  Gather some stats (e.g., build histograms, sample) and use cost model

12 SQL Server Architecture

13 Their Approach in More Detail  For a workload of n queries:  Generate a separate workload with each query  Evaluate the candidate indices for this query to find the best “configuration” – limited to 2 indices, 2 tables, single joins  Candidate index set for workload is the union of all configurations  Too expensive to enumerate all; use a greedy algorithm:  Exhaustively enumerate (using optimizer) best m-index configuration  Pick a new index I to add, which seems to save cost relative to adding some other I’ or to the current cost  Repeat until we’ve added “enough” k indices  “Despite interaction among indices, the largest cost reductions often result from indices that are good candidates by themselves”  They iteratively expand to 2-column indices – index on leading column must be desirable for this to be desirable

14 How Many Candidates?

15 Savings Due to Considering Single Joins

16 Compared to Baseline Baseline considers all indices during enumeration, with greedy algorithm mentioned previously

17 Further Enhancements  Use the tool for “what-if” analysis  What if a table grows by a substantial amount?  Supplement with extra info gathered from real query execution  Maybe we can “tweak” estimates for certain selectivities  An attempt to compensate for the “exponential error” problem

18 Where Next?  Perhaps we can go further, automating database design itself?  How would we start to tackle this problem?

19 Next Time: XML  The bridge between databases and the Web in general