CS Operating System & Database Performance Tuning Xiaofang Zhou School of Computing, NUS Office: S URL:
2 Outline Part 1: Operating systems and DBMS Part 2: OS-related tuning
3 Operating System Operating system is an interface between hardware and other software, supporting: Processes and threads; Paging, buffering and IO scheduling Multi-tasking File system Other utilities such as timing, networking and performing monitoring hardware
4 Scheduling Process vs thread Scheduling based on time-slicing, IO, priority etc Different from transaction scheduling The cost of content switching When switch is desirable? And when is not? The administrator can set priorities to processes/threads Case 1: the DBMS runs at a lower priority Case 2: different transactions run at different priority Case 3: online transactions with higher priority than offline transactions
5 Priority Inversion Let priorities T1 > T2s > T3 T1 T2s T3 Lock x Request X … a solution: priority inheritance
6 Database Buffers Application buffers DBMS buffers OS buffers An application can have its own in- memory buffers (e.g., variables in the program; cursors); A logical read/write will be issued to the DBMS if the data needs to be read/written to the DBMS; A physical read/write is issued by the DBMS using its systematic page replacement algorithm. And such a request is passed to the OS. OS may initiate IO operations to support the virtual memory the DBMS buffer is built on.
7 Database Buffer Size Buffer too small, then hit ratio too small hit ratio = (logical acc. - physical acc.) / (logical acc.) Buffer too large, paging Recommended strategy: monitor hit ratio and increase buffer size until hit ratio flattens out. If there is still paging, then buy memory. LOGDATA RAM Paging Disk DATABASE PROCESSES DATABASE BUFFER
8 Buffer Size - Data Settings: employees(ssnum, name, lat, long, hundreds1, hundreds2); clustered index c on employees(lat); (unused) 10 distinct values of lat and long, 100 distinct values of hundreds1 and hundreds rows (630 Mb); Warm Buffer Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000 RPM), Windows 2000.
9 Buffer Size - Queries Queries: Scan Query select sum(long) from employees; Multipoint query select * from employees where lat = ?;
10 Database Buffer Size SQL Server 7 on Windows 2000 Scan query: LRU (least recently used) does badly when table spills to disk as Stonebraker observed 20 years ago. Multipoint query: Throughput increases with buffer size until all data is accessed from RAM.
11 It’s All About $$$ Buffering is about a trade-off between speed and cost A (18 GB) disk offers 170 random access for $300 the access cost A=$1.76 per access per second RAM C=$0.5/MB Page size B = 8 KB Page p is accessed every I=200 s Keep page p in memory? Yes: cost C/1024*B = $ for 8KB RAM No: cost A/I = $ So, p is in memory until its access interval reaches ??? s
12 Multiprogramming Levels More concurrent users Better utilization of CPU cycles (and other system resources) Risk of excessive page swapping More lock conflicts So how many exactly Depends on transaction profiles Experiments to find the best value And this parameter may change when application patterns change
13 Disk Layout and Access Larger disk allocation chunks improves write performance At the cost of disk utilisation Setting disk usage factor Low when expecting updates/inserts Higher for scan-type of queries Using prefetching For non-random accesses
14 Scan Performance - Data Settings: lineitem ( L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT ); rows Lineitem tuples are ~ 160 bytes long Cold Buffer Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.
15 Scan Performance - Queries Queries: select avg(l_discount) from lineitem;
16 Usage Factor DB2 UDB v7.1 on Windows 2000 Usage factor is the percentage of the page used by tuples and auxiliary data structures (the rest is reserved for future) Scan throughput increases with usage factor.
17 Prefetching DB2 UDB v7.1 on Windows 2000 Throughput increases up to a certain point when prefetching size increases.
18 Summary In this module, we have covered: A review of OS from the DBMS perspective How to optimise OS-related parameters and options Thread Buffer, and File system Next: tuning the hardware