System Simulation Of 1000-cores Heterogeneous SoCs Shivani Raghav Embedded System Laboratory (ESL) Ecole Polytechnique Federale de Lausanne (EPFL)
ESL Work on Energy-Aware Datacenter Design 2 System Simulation for many-core
Emerging Data-Intensive Workloads Cloud Servers Molecular Dynamics Monte Carlo Simulations Gene Sequencing Online Gaming Services Financial Simulations Medical Imaging
Demand for Hardware Acceleration Tile based Manycores Intel SCC, Tile 64 (Integrated) GPU Clusters (off –chip Accelerators) Hybrid Cores AMD Fusion (on-chip)
Urgent Need for Simulation of Heterogeneous SoCs Thermal & Power Evaluations Benchmarking Profiling Debugging Design Space Exploration Early Software Development Simulation
How to Design a Fast and Scalable Many-Core Simulator? Parallel Target Parallel Simulator Parallel Host
Simulating Parallel Target on Parallel Host is an Old Technology… FPGA GPGPU Flexus RAMP Opportunity WWT II Graphite Cotson, OVPSim Large Parallel Systems
Target Architecture Data-Parallel Coprocessors Simple In-order Cores 1000s of cores in a tile network Fine grain parallelism Core Caches Memory Switch
Solution – Accelerating Simulation using GPGPUs Target ArchitectureHost Platform A Perfect Match
Outline Problem Overview Simulation of Heterogeneous SoCs Solution SIMinG-1k (GPU accelerated simulator) Evaluation Summary
Outline Problem Overview Simulation of Heterogeneous SoCs Solution SIMinG-1k: A GPU accelerated simulator Evaluation Summary
Overall Simulation Framework Host Platform Sequential Code Data Parallel Code Simulator Target Architecture General Purpose CPU General Purpose CPU Many-Core Accelerator Application
SIMinG-1k - Features Instruction Accurate Inexpensive and Easily Available Fast Development Cycle Equation Performance Model Portability (Target Independent) Interpretation based core-simulation
Challenges of using GPU as a host SIMT (Single inst multiple threads) Divergent Code is a problem Synchronization outside thread block Slow CPU-GPU communication Global Memory is slow and limited
Outline Problem Overview Simulation of Heterogeneous SoCs Solution SIMinG-1k (GPU accelerated simulator) Evaluation Summary
Results – Architecture 1 MIPS - Number of simulated instruction in host wall clock time ARM ISA Data Scratchpad Single tile of target Accelerator Inst Scratchpad
Speed Up – Architecture 1 Speedup compared to simulation on OVPSim (thousands of ARM cores)
Single tile of Data-parallel Accelerator (cores, caches, on-chip interconnect) Results – Architecture 2 Core Caches Memory Switch
Speed Up – Architecture 2 Speedup compared to serial simulation on QEMU
Outline Problem Overview Simulation of Heterogeneous SoCs Solution SIMinG-1k (GPU accelerated simulator) Evaluation Summary
Conclusion Challenge Fast and parallel simulator for heterogeneous SoCs Solution Parallelize 1000 core simulation using GPUs Design Full System Simulation using QEMU and SIMinG-1k Results High Scalability and speedup upto 4096 cores Extend the simulator for thermal and power evaluations Complete simulation of Cloud Data Centers Future Work
Thanks! Questions?