Download presentation
Presentation is loading. Please wait.
Published byChloe Norman Modified over 9 years ago
1
DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept. Madison, WI Presented by Derwin Halim
2
Agenda Database and DBMS Motivation for DBMS performance study Proposed DBMS performance study Processor model Query execution time breakdown Database workload Experimental setup and results Conclusion
3
Database and DBMS Database is a collection of data, typically describing the activities of one or more related organizations: entities and relationships DBMS (Database Management System) is a software designed to assist in maintaining and utilizing large collections of data
4
Motivation for DBMS Performance Study DBMSs are becoming compute and memory bound Modern processors do not improve database system performance to the same extent as scientific workloads Contrasting commercial DBMSs and identifying common characteristics are difficult Urgent need to evaluate and understand the processor and memory behavior of commercial DBMSs on existing hardware platform
5
Proposed DBMS Performance Study Analyze the execution time breakdown of multiple different commercial DBMSs on the same hardware platform Use workload consists of simple queries on a memory resident database Isolate basic operations and identify common trends across the DBMSs Identify and analyze bottlenecks and provide solutions
6
Processor Model: Basic Pipeline Operation
7
Processor Model: Handling Pipeline Stall Non-blocking cache Out-of-order execution Speculative execution with branch prediction
8
Query Execution Time Breakdown T Q = T C + T M + T B + T R – T OVL
9
Database Workload Single-table range selections and two table equijoins over a memory resident database, running a single command stream Eliminates dynamic and random parameters Isolate basic operations: sequential access and index selection Allows examination of the processor and memory behavior without I/O interference
10
Database Workload Table: create table R (a1integer not null, a2integer not null, a3integer not null, ) Sequential range selection: select avg(a3) from R where a2 Lo
11
Database Workload Indexed range selection: construct non-clustered index on R.a2 then resubmitted the range selection Sequential join: select avg(R.a3) from R, S where R.a2 = S.a1 40,000 100-byte records in S, each of which joins with 30 records in R
12
Experimental Setup: Hardware and Software Platform 400MHz PII Xeon/MT Workstation 512 MB main memory with 100 MHz system bus Out-of-order engine and speculative instruction execution Non-blocking cache Separate data and instruction first level caches Unified second level cache 4 commercial DBMSs on Windows NT 4.0 Service Pack 4 Event measurement counters and emon
13
Experimental Setup: PII Xeon Cache Characteristics
14
Experimental Setup: Measuring Stall Time Components
15
Results: Execution Time Breakdown Processor spends most of the time stalled The problem will be exacerbated by the ever increasing processor-memory gap Bottleneck shifts
16
Results: Memory Stalls Breakdown L1 D-cache, L2 I-cache, ITLB stall time are insignificant Focus on L1 I-cache and L2 D-cache stall time component
17
Results: L2 D-cache Stall Time Position of the accessed data in the records and the record size L2 D-cache miss is much more expensive than L1 D-cache miss Only gets worse as processor-memory performance gap increases Larger cache => longer latency
18
Results: L1 I-cache Stall Time L1 I-cache miss is difficult to overlap and causes serial bottleneck in the pipeline L1 cache size vs. latency L1 cache miss increases as data record size increases - Inclusion: L2 cache replacement forces L1 cache replacement - OS interrupt: periodical context switching - Page boundary crossing
19
Results: Branch Mis-prediction Serial bottleneck and instruction cache misses 40% BTB misses on average => more static prediction L1 I-cache miss follows branch mis-prediction behavior
20
Results: Resource Stall Time Dominated by dependency and/or functional unit stalls Dependency stalls are the most important resource stalls due to low ILP except for System A FU stalls are caused by contention in the execution unit
21
Results: Simple Query vs TPC Benchmarks Simple Query vs TPC-D (DSS): - Similar CPI breakdown - Still dominated by L1 I-cache and L2 D-cache miss Simple Query vs TPC-C (OLTP): - CPI rate of TPC-C is much higher - Resource stalls are higher - Dominated by L2 D- and I-cache miss
22
Conclusion Memory stall is a serious performance bottleneck Focus on L1 I-cache and L2 D-cache misses Improvements should address all of the stall components due to possibility of bottleneck shifts Simple query offers methodological advantage TPC-D has similar execution time breakdown, while TPC-C incur more second level cache and resource stalls
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.