MEMORY SYSTEM CHARACTERIZATION OF COMMERCIAL WORKLOADS Authors: Luiz André Barroso (Google, DEC; worked on Piranha) Kourosh Gharachorloo (Compaq, DEC; worked on Dash and Flash) Edouard Bugnion (one of the original founders of VMware; also worked on SimOS) Presented by: David Eitel, March 31, 2010
Types of Commercial Applications Online Transaction Processing (OLTP) Decision Support Systems (DSS) Web Index Search (WIS) Source: S. Brin and L. Page. “The Anatomy of a Large-Scale Hypertextual Web Search Engine.”
Benchmarks Oracle Database Engine TPC-B Banking Benchmark for OLTP TPC-D Benchmark for DSS (read-only queries) AltaVista Sources:
Monitoring Results Source: Fig. 4 OLTP has more complex queries than DSS/AV Important to have low-latency to non-primary caches because OLTP working set is very large. Cache misses for DSS are low – misses on large database tables. Icache = instruction cache Dcache = data cache Scache = secondary cache Bcache = board-level cache Big CPI! Lots of Bcache misses Breakdown of the execution time misses Sum of single- and dual-issue cycles Pipeline and address translation related stalls >75% mem stalls Scache = secondary cache Bcache = board-level cache
Simulation Results for OLTP Source: Fig. 5 Associativity Cache Size Data capacity/ Conflict misses INST = instruction execution CACHE = stalls within cache hierarchy MEM = memory system stalls Idle time increases with bigger caches. The I/O latency cannot be hidden with faster processing rates. Faster processing rates with a more efficient memory system = more commits ready for the log writer (I/O). OLTP benefits from larger Bcaches.
More Simulation Results (OLTP and DSS) DSS works well with current sized caches because the working sets are small (few misses in on-chip caches) Replacement/instr miss rate are not affected by line size good for larger cache sizes. False sharing increases with cache line size. What would be different if increased latency and bandwidth were accounted for when line size increases? Are the results NOT valid because size(database) = size(main memory)? Sources: Fig. 7 and Fig. 8
Important Things to Remember As # processors increases, communication stalls increase (see Fig. 6) O/S activity & I/O latencies do not greatly affect the behavior of database engines. OLTP has instruction & data locality helped by off-chip caches DSS and WIS have working sets that fit in memory sensitive to on-chip caches Source:
Discussion Questions What are some new commercial applications that have developed since this paper was written? How much have the issues in this paper been addressed in recent architecture designs? What should we focus on in the “parallel” future to increase performance for commercial applications? Could we change commercial workloads to function more like scientific workloads to obtain performance gains? Source: