Presentation is loading. Please wait.

Presentation is loading. Please wait.

105-10-2002CS747 Analytical Evaluation of Shared-Memory Systems with Commercial Workloads Jichuan Chang.

Similar presentations


Presentation on theme: "105-10-2002CS747 Analytical Evaluation of Shared-Memory Systems with Commercial Workloads Jichuan Chang."— Presentation transcript:

1 105-10-2002CS747 Analytical Evaluation of Shared-Memory Systems with Commercial Workloads Jichuan Chang

2 205-10-2002CS747 Outline A Case for Analytical Models Existing Models and Their Limitations What Kind of Tools do We Need

3 305-10-2002CS747 Background Shared-memory Multiprocessors Servers –Important - the computing infrastructure of our society –Complex system (ILP processors + caches + interconnection) Commercial workloads –Important - 80% server market, supporting our daily business –Different behavior from scientific workloads Large code size and data set, different cache behaviors Lots of OS interactions (context switches), higher I/O rate –Hard to study (complex, hard to setup, no code, moving target)

4 405-10-2002CS747 A Motivating Example Bob is designing a next generation multiprocessor server for commercial workloads. Assume that the largest benchmark he can setup now is a 10G database. How can Bob predict the performance (IPC, or tpm) of running a 100G database TPC-D benchmark on the future machine? What’s the ideal cache hierarchy design for this workload given his prediction of future technology constants? We need tools to characterize the workloads! We need tools to prune the vast design space!

5 505-10-2002CS747 Performance Evaluation Tools Hardware Monitors, Binary Instrumentation Tools  Realistic, dynamic information  Only work for existing systems, aggregated info Program Analysis Tools (i.e. compilers)  Can do global analysis, works well for arrays/loops  Little dynamic info, not good for (pointer-based) irregular programs, needs source code. (Full System) Architecture Simulators  Detailed simulation, realistic result, can simulate future HW  Slow (can’t extrapolate), complex, can’t simulate future SW Analytical Models  Fast, gives insights, can predict for future SW/HW combinations  Need to create models of multiprocessor with new workloads

6 605-10-2002CS747 ILP Processor L1$ L2$ The rest of the system (Bus, NI, Switches DRAM, Directories)  (when MSHR not full) MSHR Sorin et al. MVA for ILP Multiprocessors Application input parameters –  CV  f M f sync-write P read P write …... Iterate between 2 submodels –SB (fraction of time CPU stalls due to synch operations) –MB (fraction of time CPU stalls due to limited MSHR size) –Surrogate service time inflation

7 705-10-2002CS747 Sorin et al. MVA Model + Target system design, answer question like + MSHR size, directory organization, NI latencies, etc + Insight into application behavior + Miss rate (  ), burstiness (CV  ), degree of parallelism (f M ) – Some app. param. ( , f M, f sync-write ) depend on arch. param. –Most parameters insensitive to changes outside CPU/cache –Need input parameters for each CPU/cache configuration –Caches also interact with the system design (i.e update protocol) – Fixed problem size, not characterizing the workload Can we break the processor/cache black-box into processor and cache two submodels? What would be the application input parameters?

8 805-10-2002CS747 Cache Models (1) Stack distance model –Estimate capacity misses, based on one access trace –Work for inclusive fully-associated cache –Have extensions for direct-mapped and set-associative cache ABBACAA typical access trace

9 905-10-2002CS747 Cache Models (2) Agarwal et al. 1989 –Model cache block size, working-set transitions, conflict misses and multi-programming interference Data Reference Model (Tsai/Agarwal 1993) –Configuration independent model for Multiprocessor problem size, # processor, block size as parameters –Model sharing pattern for each shared block –Assume certain data distribution for data-dependent applications (i.e. parallel quick-sort) –Limitation: simple and iterative program, well-known algorithm, no significant synchronization

10 1005-10-2002CS747 Cache Models (3) Mathematical Cache Miss Equations –Compiler generated equations for loop-based array access –Model reuse along array dimensions by “reuse vector” –Extended to model pointer data structures Single-linked lists and binary trees on uniprocessor Must understand malloc() implementation –Ultimate aim is to model B-tree for databases

11 1105-10-2002CS747 Architects’ Workload Characterization Observe for different configurations –Busy/stall time breakdown –Kernel/user time breakdown –Misses breakdown (4C) –Last touch prediction Observe for different problem size –Working set and working set transition –Sharing degree (producer-consumer, migratory)

12 1205-10-2002CS747 What Tools do We Need Application models for commercial workloads –What to model? (working set, sharing, communication, etc.) –Include problem size as input parameter –Configuration independent (or less dependent) –Algorithm-based (need source code) –Or observation-based (on simulations) Architectural Models –Separate processor core and caches –Separate CPU and the rest of the system [Sorin et al] Model vs. Simulation –Analytical models to simplify simulator design [CAECW 01] –Simulators to ease the acquisition of model parameters

13 1305-10-2002CS747 Configuration Independent Analysis What to characterize? [Abandah/Davidson] –general characteristics –working set (access-age, footprint) –concurrency (serial / imbalance / contention / busy) –communication pattern (sharing degree/invalidation degree) –communication phases and locality, sharing behavior –Possible parameters for workload characterization An Example - DSS systems working-set sizes –Application parameters (for each node i in the query plan) N i = # truples in a scan; H i = probability a tuple matches QD = depth of the query tree; DB_RE i = fraction of a relation accessed –Model the reuse after working set transitions (instructions, private, meta-data, index, tuple-locks, tuples)

14 1405-10-2002CS747 A (simplistic?) Model for TPCC Use stack distance curve to derive miss rates L1 cache accesses totally overlapped with execution M/G/1 queue to model bus/memory contention Things not being modeled –Query algorithms –Communication misses –Overlapping between computation and memory access The paper reports <10% errors. [Zhang et al 99]

15 1505-10-2002CS747 Conclusion Analytical models are needed to –Characterize commercial workloads –Predict their performance on multiprocessors We need models that –Perform configuration independent analysis –Can use the output from workload models

16 1605-10-2002CS747 Thank You! Questions?

17 1705-10-2002CS747 Backup Slides References Acknowledgement

18 1805-10-2002CS747 References Cache Models –An Analytical Cache Model, Agarwal et al, ACM Transaction on Computer Systems, 1989 –Analyzing Multiprocessor Cache Behavior Through Data Reference Modeling, Tsai and Agarwal, SIGMETRICS 93 –An Analytical Model for Designing Memory Hierarchies, Jacob et al, IEEE Transaction on Computers, 1996 –Cache Miss Equations: A Compiler Framework for Analyzing and Turning Memory Behavior, Ghosh et al, ACM Transactions on Programming Languages and Systems, 1999 –A Mathematical Cache Miss Analysis for Pointer Data Structures, Zhang and Martonosi, SIAM Commercial Workloads Overview –Trends in Shared Memory Multiprocessing, Stenstrom et al, IEEE Computer 97 –Memory System Characterization of Commercial Workloads, Barroso et al, ISCA 98

19 1905-10-2002CS747 Reference (cont.) Configuration Independent Analysis –Configuration Independent Analysis for Characterizing Shared-memory Applications, Abandah and Davidson, UMich TR 1997. Shared Memory Multiprocessor Models –Analytical Evaluation of Shared-memory Systems with ILP Processors, Sorin et al, ISCA 98 –A Customized MVA Model for Shared-memory Systems with Heterogeneous Applications, Sorin et al, UWisc TR, 2000 Commercial Workload Specific Models –An Analytical Model of the Working-set Sizes in Decision-Support Systems, Karlsson et al, SIGMETRICS 2000 –Analysis of Commercial Workload on SMP Multiprocessors, Zhang et al, Proceedings of Performance 99 Evaluation of Commercial Workloads –A Processor Queueing Simulation Model for Multiprocessor System Performance Analysis, Tsuei and Yamamoto, CAECW 2001 –Evaluating the Non-determinism in Commercial Workloads, Multifacet group, CAECW 2001


Download ppt "105-10-2002CS747 Analytical Evaluation of Shared-Memory Systems with Commercial Workloads Jichuan Chang."

Similar presentations


Ads by Google