GPU Power Model Nandhini Sudarsanan Nathan Vanderby Neeraj Mishra Usha Vinodh Chi Xu
Outline Introduction and Motivation Analytical Model Description Experiment Setup Results Conclusion and Further Work
Introduction
Motivation
Outline Introduction and Motivation Analytical Model Description o Parser o Power Model Experiment Setup Results Conclusion and Further Work
Parser
Outline Introduction and Motivation Analytical Model Description o Parser o Power Model Experiment Setup Results Conclusion and Further Work
Power Model PTX Level
Power Model Assembly Level
Outline Introduction and Motivation Analytical Model Description o Parser o Power Model Experiment Setup Results Conclusion and Further Work
Experiment Setup - Hardware Measure Power Consumption and Temperature o Current Clamp for PCIE & GPU Power Cable Data Acquisition 100Hz o GPU Performance Counter o Sample 10Hz, GPU sensor
Experiment Setup - Software Driver API Generate and Modify PTX code o Minimize control loops CUDA 4.0 o Built in Binary -> Assembly Converter (cuobjdump) MATLAB to build model Remote login
CUDA- Fermi Architecture Third Generation Streaming Multiprocessor(SM) o 32 CUDA cores per SM, 4x over GT200 o 1024 thread block size, 2x over GT200 o Unified address space enables full C++ support o Improved Memory Subsystem
Benchmarks Small number of overhead operations (loop counters, initialization, etc.). Computational intensive work to allow for an experiment of significant length for accurate current measurement. Exhibit high utilization of the CUDA cores, few data hazards as possible. Grid and block sizes appropriately so that all SM are used, since idle SM leak. Accordingly 7 benchmarks were selected from CUDA SDK.
Benchmarks For this project we tested out a few benchmarks. 2D convolution Matrix Multipication Vector Addition Vector Reduction Scalar Product DCT 8x8 3DFD
Limitations of PTX Higher level than assembly o Divide & Sqrt: 1 PTX line, library in assembly Compiler optimizations from PTX -> assembly Doesn’t reflect RAW dependencies Performance counters use assembly
Outline Introduction and Motivation Analytical Model Description o Parser o Power Model Experiment Setup Results Conclusion and Further Work
Results
Outline Introduction and Motivation Analytical Model Description o Parser o Power Model Experiment Setup Results Conclusion and Further Work
Conclusion Further Work o Take into account context switches o Consider Multiple kernels running simultaneously
The End Thanks Q&A