Download presentation
Presentation is loading. Please wait.
Published byClaribel Fowler Modified over 9 years ago
1
Pre-Silicon Simulation of Multi-Core Benchmarks Shubu Mukherjee Principal Engineer Director, SPEARS Group Intel Corporation Panel in Symposium on Workload Characterization, Sep 27, 2007
2
2 Detailed Model Good for Core Analysis Single core simulation model executes ~ 12 milliseconds of a real machine’s execution Assumes core speed = 1 KIPS (kilo simulated insts per second) Assumes each simulation run is about 10 hours Core Uncore Socket
3
3 Four-Socket Platform Model Too Slow 1-socket simulation model executes ~ 1-3 milliseconds of a real machine’s execution 4-socket simulation model executes only 100s of microseconds of a real machine’s execution (recall disk latency is in milliseconds) Need at least a 10x Boost in Platform Performance Model Speed
4
4 What 10x Speed Improvement Gives Us? Improved Accuracy Via greater coverage of benchmark slices Better glassjaw analysis Faster Turnaround Improved Latency Faster debugging Improved Benchmarking Greater coverage of benchmarks Enables multithreaded (cooperative) benchmarks
5
5 Approaches to Boost Simulation Speed (one key charter for SPEARS) Improve Basic Infrastructure Create Faster Core Models That are Less Accurate Go Parallel in a Modular Fashion Use Accelerators, such as FPGAs
6
6 What’s Novel Here? Parallel Simulation is an Old Technology Distributed, discrete-event simulation, Fujimoto, 1990 Wisconsin Wind Tunnel I + II, Reinhardt, et al 1992 & Mukherjee, et al. 1997 Customized for specific applications (e.g., shared memory) So, What Are the Challenges? Starting point is several millions of lines of non-parallel C++ code (!) This is production software must be stable (unlike “research” software) Parallel infrastructure must be modular, built once, used repeatedly without changing any architecture model code Deal with new problems: load imbalance at multiple levels Current Status: Created infrastructure, Work-In-Progress
7
7 Speedup of the Pthread-per-socket Model (on Clovertowns) Speedup scales linearly with problem size LOT more room for improvement exists
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.