DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept. Madison, WI Presented by Derwin Halim

Agenda Database and DBMS Motivation for DBMS performance study Proposed DBMS performance study Processor model Query execution time breakdown Database workload Experimental setup and results Conclusion

Database and DBMS Database is a collection of data, typically describing the activities of one or more related organizations: entities and relationships DBMS (Database Management System) is a software designed to assist in maintaining and utilizing large collections of data

Motivation for DBMS Performance Study DBMSs are becoming compute and memory bound Modern processors do not improve database system performance to the same extent as scientific workloads Contrasting commercial DBMSs and identifying common characteristics are difficult Urgent need to evaluate and understand the processor and memory behavior of commercial DBMSs on existing hardware platform

Proposed DBMS Performance Study Analyze the execution time breakdown of multiple different commercial DBMSs on the same hardware platform Use workload consists of simple queries on a memory resident database Isolate basic operations and identify common trends across the DBMSs Identify and analyze bottlenecks and provide solutions

Processor Model: Basic Pipeline Operation

Processor Model: Handling Pipeline Stall Non-blocking cache Out-of-order execution Speculative execution with branch prediction

Query Execution Time Breakdown T Q = T C + T M + T B + T R – T OVL

Database Workload Single-table range selections and two table equijoins over a memory resident database, running a single command stream Eliminates dynamic and random parameters Isolate basic operations: sequential access and index selection Allows examination of the processor and memory behavior without I/O interference

Database Workload Table: create table R (a1integer not null, a2integer not null, a3integer not null, ) Sequential range selection: select avg(a3) from R where a2 Lo

Database Workload Indexed range selection: construct non-clustered index on R.a2 then resubmitted the range selection Sequential join: select avg(R.a3) from R, S where R.a2 = S.a1 40,000 100-byte records in S, each of which joins with 30 records in R

Experimental Setup: Hardware and Software Platform 400MHz PII Xeon/MT Workstation 512 MB main memory with 100 MHz system bus Out-of-order engine and speculative instruction execution Non-blocking cache Separate data and instruction first level caches Unified second level cache 4 commercial DBMSs on Windows NT 4.0 Service Pack 4 Event measurement counters and emon

Experimental Setup: PII Xeon Cache Characteristics

Experimental Setup: Measuring Stall Time Components

Results: Execution Time Breakdown Processor spends most of the time stalled The problem will be exacerbated by the ever increasing processor-memory gap Bottleneck shifts

Results: Memory Stalls Breakdown L1 D-cache, L2 I-cache, ITLB stall time are insignificant Focus on L1 I-cache and L2 D-cache stall time component

Results: L2 D-cache Stall Time Position of the accessed data in the records and the record size L2 D-cache miss is much more expensive than L1 D-cache miss Only gets worse as processor-memory performance gap increases Larger cache => longer latency

Results: L1 I-cache Stall Time L1 I-cache miss is difficult to overlap and causes serial bottleneck in the pipeline L1 cache size vs. latency L1 cache miss increases as data record size increases - Inclusion: L2 cache replacement forces L1 cache replacement - OS interrupt: periodical context switching - Page boundary crossing

Results: Branch Mis-prediction Serial bottleneck and instruction cache misses 40% BTB misses on average => more static prediction L1 I-cache miss follows branch mis-prediction behavior

Results: Resource Stall Time Dominated by dependency and/or functional unit stalls Dependency stalls are the most important resource stalls due to low ILP except for System A FU stalls are caused by contention in the execution unit

Results: Simple Query vs TPC Benchmarks Simple Query vs TPC-D (DSS): - Similar CPI breakdown - Still dominated by L1 I-cache and L2 D-cache miss Simple Query vs TPC-C (OLTP): - CPI rate of TPC-C is much higher - Resource stalls are higher - Dominated by L2 D- and I-cache miss

Conclusion Memory stall is a serious performance bottleneck Focus on L1 I-cache and L2 D-cache misses Improvements should address all of the stall components due to possibility of bottleneck shifts Simple query offers methodological advantage TPC-D has similar execution time breakdown, while TPC-C incur more second level cache and resource stalls

DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Similar presentations

Presentation on theme: "DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Similar presentations

Presentation on theme: "DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept."— Presentation transcript:

Similar presentations

About project

Feedback