Autonomic DBMSs: System Tune Thyself! Pat Martin Database Systems Laboratory School of Computing Supported by IBM, CITO and NSERC.

2 2 Outline of Talk The problem – system complexity The solution – autonomic computing systems Autonomic DBMSs Some current research - tuning multiple buffer pools Summary

3 3 The Problem Computer systems continually expanded to achieve greater functionality and efficiency Expansion has led to a complexity crisis –Systems are too complex to be managed effectively!

4 4 Can you manage this?

5 5 How about this?

6 6 A Solution – Autonomic Computing Systems Autonomic Computing Systems, like our nervous system, manage themselves

7 7 Autonomic Computing System Aware of itself and its environment and acts accordingly Able to reconfigure itself under varying and unpredictable conditions Able to recover from events that cause it to malfunction Able to anticipate optimized resources needed to perform a task Able to protect itself

8 8 Autonomic DBMS Project Goal is develop a DBMS that can automatically –Recognize properties of its workload –Monitor itself with minimal impact on applications’ performance –Reallocate resources to improve performance –Detect and diagnose performance problems –Recognize and react to changes in its environment and available resources

9 9 Example – Buffer Pool Tuning Automatically configure tablespaces to buffer pools based on an analysis of the database and the workload (BP Configuration Problem) Dynamically adjust sizes of buffer pools to minimize I/O costs for the database and workload (BP Sizing Problem)

10 10 Multiple Buffer Pools warehousecustomer item index logical access physical write physical read

11 11 BP Configuration Problem Given a set of database objects and a workload, determine a mapping of database objects to buffer pools to maximize performance for the given workload.

12 12 Configuration Rules of Thumb Separate data and indexes Isolate a large data table Separate objects that are updated frequently and objects that are primarily read Put temporary tables in their own BP Separate small frequently accessed tables from larger tables that are scanned Isolate tables that are accessed frequently by short updates

13 13 BPConfig Approach Analyze logical page reference trace –obtain trace of workload on default configuration –derive access patterns for DB objects random, re-reference and sequential accesses Create characterization vectors –type, access patterns, read/write info, size info Partition DB objects into buffer pools –cluster based on characterization vectors

14 14 Partitioning DB Objects Partition using k-means clustering algorithm Similarity measured by weighted Euclidean distance Considered different weighting schemes –equal –favour read/write –favour access pattern

15 15 Experiments Experimental environment –IBM Netfinity 8500R: 4 900 MHz PIII Xeon CPU, 16 GB RAM, 70 disks, Windows NT –TPC-C benchmark: OLTP workload, 400 warehouse (40 GB) database –DB2 Version 7.1 100,000 4K pages for the buffer pools

16 16 Experiments (cont.) Configuration schemes –BPConfig, expert, default (1BP), random, distributed (1 BP per DB object) Evaluation criteria –Weighted Response Time –TPM –% Physical Reads

17 17 Experiments (cont.) Properties of BPConfig configurations (3 buffer pools) –separates index and data objects –separates heavy access and light access objects –WID tables isolated (equal and read/write weightings)

18 18 Experiments (cont.) Equal Weight Read/ Write Access Pattern ExpertRandomDefaultDist WRT 11.1111.2010.8610.95 14.0512.50 TPM 8129804783318287 63717159 %PR 10.48.1

19 19 BP Sizing Problem Given a workload, a set of buffer pools and a fixed number of buffer pages, determine the appropriate size of each buffer pool to maximize performance for the given workload.

20 20 Approaches to Sizing BPs – Class-based Optimization Specify performance goals for each transaction class Algorithm tries to satisfy goals Logical access cost proportional to physical access cost Physical access cost determined by buffer pool miss rates

21 21 Class-based Optimization (cont.) Collect performance data Choose target class Loop until goal met Choose target buffer pool Choose source buffer pool Reallocate pages End T i with worst performance BP with greatest benefit BP with least cost

22 22 Class-based Optimization (cont.) Problems: –How do we select appropriate performance goals for a class? –Some classes may be favoured over others –Thrashing between buffer pool states is a possibility

23 23 Approaches to Sizing BPs – System-based Optimization BP sizes chosen to maximize system performance metric, eg. throughput Use a simple greedy algorithm Considered 2 cost functions: –Minimize hit rate –Minimize data access time (physical reads don’t all cost the same!)

24 24 System-based Optimization - Experiments Experimental environment –IBM xSeries 240 PC Server: 2 1 GHz PIII CPUs, 2 GB RAM, 22 disks, Windows NT –TPC-C benchmark –DB2 Version 7.1 50,000 4K buffer pool pages 3 buffer pools configured with BPConfig

25 25 Experiments (cont.) DAT-BasedHR-Based BP Sizing WHR 0.93080.9342 WcostLR 1.53751.5639 TPM 44934318

26 26

27 27 Other AutoDBA Projects Automatic diagnosis Automatic recognition of workload type Integration of BPConfig and sizing algorithm Automatic BP management in PostgreSQL Tools for DBMS capacity planning

28 28 AutoDBA Project Members Queen’s: –Wendy Powley, Darcy Benoit, Said Elnaffar, Wenhu Tian, Xiaoyi Xu, Xilin Cui, Ted Wasserman, Nailah Ogeer IBM: –Berni Schiefer, Sam Lightstone, Randy Horman, Robin Van Boeschoten, Keri Romanufa, Calisto Zuzarte

