Download presentation
Presentation is loading. Please wait.
Published byDora Stanley Modified over 8 years ago
1
GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu Chi Xu xuchi@umn.eduvande501@umn.edumish0088@umn.edukuma0253@un.eduxuchi@umn.edu
2
Outline Introduction and Motivation Analytical Model Description Experiment Setup Results Conclusion and Further Work
3
Introduction
4
Motivation
5
Outline Introduction and Motivation Analytical Model Description o Parser o Power Model Experiment Setup Results Conclusion and Further Work
6
Parser
7
Outline Introduction and Motivation Analytical Model Description o Parser o Power Model Experiment Setup Results Conclusion and Further Work
8
Power Model PTX Level
9
Power Model Assembly Level
10
Outline Introduction and Motivation Analytical Model Description o Parser o Power Model Experiment Setup Results Conclusion and Further Work
11
Experiment Setup - Hardware Measure Power Consumption and Temperature o Current Clamp for PCIE & GPU Power Cable Data Acquisition Card @ 100Hz o GPU Performance Counter o Sample Temperature @ 10Hz, GPU sensor
12
Experiment Setup - Software Driver API Generate and Modify PTX code o Minimize control loops CUDA 4.0 o Built in Binary -> Assembly Converter (cuobjdump) MATLAB to build model Remote login
13
CUDA- Fermi Architecture Third Generation Streaming Multiprocessor(SM) o 32 CUDA cores per SM, 4x over GT200 o 1024 thread block size, 2x over GT200 o Unified address space enables full C++ support o Improved Memory Subsystem
14
Benchmarks Small number of overhead operations (loop counters, initialization, etc.). Computational intensive work to allow for an experiment of significant length for accurate current measurement. Exhibit high utilization of the CUDA cores, few data hazards as possible. Grid and block sizes appropriately so that all SM are used, since idle SM leak. Accordingly 7 benchmarks were selected from CUDA SDK.
15
Benchmarks For this project we tested out a few benchmarks. 2D convolution Matrix Multipication Vector Addition Vector Reduction Scalar Product DCT 8x8 3DFD
17
Limitations of PTX Higher level than assembly o Divide & Sqrt: 1 PTX line, library in assembly Compiler optimizations from PTX -> assembly Doesn’t reflect RAW dependencies Performance counters use assembly
18
Outline Introduction and Motivation Analytical Model Description o Parser o Power Model Experiment Setup Results Conclusion and Further Work
19
Results
20
Outline Introduction and Motivation Analytical Model Description o Parser o Power Model Experiment Setup Results Conclusion and Further Work
21
Conclusion Further Work o Take into account context switches o Consider Multiple kernels running simultaneously
22
The End Thanks Q&A
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.