Download presentation
Presentation is loading. Please wait.
Published byOphelia Jocelin Phelps Modified over 9 years ago
1
GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu Chi Xu xuchi@umn.eduvande501@umn.edumish0088@umn.edukuma0253@un.eduxuchi@umn.edu
2
Outline Introduction and Motivation Analytical Model Description Experiment Setup Results Conclusion and Further Work CSCI 8205: GPU Power Model 2 5/4/11
3
Introduction Develop a methodology for building an accurate power model for a GPU. Validate with a NVIDA’s GTX 480 GPU. Measure power efficiency of various NVIDIA SDK benchmarks. Accurate power model can help Explore various architectural and algorithmic trade offs. Figure out balance of workload between GPU and CPU. CSCI 8205: GPU Power Model 3 5/4/11
4
Motivation Power Consumption: Key criterion for future Hardware Devices and Embedded Software. Effect of increased power density has been not been felt till now Supply voltage was scaled back too. Current and Power density remained constant. Further reduction in supply voltage difficult in future Supply voltage approaching close to threshold voltage. Gate oxide thickness almost equal to 1nm. CSCI 8205: GPU Power Model 4 5/4/11
5
Motivation CSCI 8205: GPU Power Model 5 5/4/11
6
GPU Processing Power CSCI 8205: GPU Power Model 6 5/4/11
7
Price of Power Maximum Load = Lot of Power Nvidia 8800 GTX: 137W Intel Xeon LS5400: 50W CSCI 8205: GPU Power Model 7 5/4/11
8
Power Wall Power Density in GPUs larger that even high end CPUs Power gating, Clock gating have been successfully employed in CPUs [Brooks, Hpca 2001] Power gating, Clock gating and other H/W based schemes are not used in most GPUs [Kim Isca 2010] Accurate power model can help Explore various architectural and algorithmic trade offs. Figure out balance of workload between GPU and CPU. CSCI 8205: GPU Power Model 8 5/4/11
9
Background Power consumption can be divided into: Power = Dynamic_power + Static_power + Short_Ckt_Power Dynamic power is determined by run-time events Fixed-function units: texture filtering and rasterization Programmable units: memory and floating point Static power determined by circuit technology chip layout operating temperature. P = V CC * N* K design * I leak CSCI 8205: GPU Power Model 9 5/4/11
10
Previous Power Models Statistical power modeling approach for GPU [Matsuoka 2010] Uses 13 CUDA Performance counters (ld,st,branch,tlb miss) to obtain profile Finds correlation b/w profiles and power by statistical model learning. Lot of information not captured by counters lost Cycle-level simulations based Power Model,[Skadron HWWS'04] Assume hypothetical architecture to explore new GPU microarchitectures and model power and leakage properties Cycle-level processor simulations are time consuming [Martonosi&Isci 2003] Do not allow a complete view of operating system effects, I/O [Isci 2003] CSCI 8205: GPU Power Model 10 5/4/11
11
Outline Introduction and Motivation Analytical Model Description Parser Power Model Experiment Setup Results Conclusion and Further Work CSCI 8205: GPU Power Model 11 5/4/11
12
Need for a Parser GPGPUsim is time consuming GPGPUsim output is not tailored to our needs Parser is very fast GPGPUsim works only with CUDA 2.3 or prior CSCI 8205: GPU Power Model 12 5/4/11
13
Limitations of the Parser Dynamic loops are not automatically determined. Branch prediction is assumed to be taken Highly tailored to our specific needs. A change in the PTX layout might require change to parser. CSCI 8205: GPU Power Model 13 5/4/11
14
Outline Introduction and Motivation Analytical Model Description Parser Power Model Experiment Setup Results Conclusion and Further Work CSCI 8205: GPU Power Model 14 5/4/11
15
Power Model PTX Level CSCI 8205: GPU Power Model 15 5/4/11
16
Power Model Assembly Level CSCI 8205: GPU Power Model 16 5/4/11
17
Outline Introduction and Motivation Analytical Model Description Parser Power Model Experiment Setup Results Conclusion and Further Work CSCI 8205: GPU Power Model 17 5/4/11
18
Experiment Setup - Hardware Measure Power Consumption and Temperature Sample Temperature @ 10Hz, GPU sensor Current Clamp for PCIE & GPU Power Cable Data Acquisition Card @ 100Hz GPU Performance Counter Profile 57 Counters per Kernel 9 Executions CSCI 8205: GPU Power Model 18 5/4/11
19
Experiment Setup - Software Driver API Generate and Modify PTX code Minimize control loops CUDA 4.0 Built in Binary -> Assembly Converter (cuobjdump) MATLAB to build model Remote login CSCI 8205: GPU Power Model 19 5/4/11
20
CUDA – Fermi Architecture Third Generation Streaming Multiprocessor(SM) 32 CUDA cores per SM, 4x over GT200 1024 thread block size, 2x over GT200 Unified address space enables full C++ support Improved Memory Subsystem 5/4/11CSCI 8205: GPU Power Model 20
21
CUDA – Fermi Architecture 5/4/11CSCI 8205: GPU Power Model 21 Fermi Memory Hierarchy Registers SM - 0 L1 Cache Shared Mem. Registers SM - N L1 CacheShared Mem. L2 Cache Global Memory
22
Benchmarks Small number of overhead operations (loop counters, initialization, etc.). Computational intensive work to allow for an experiment of significant length for accurate current measurement. Exhibit high utilization of the CUDA cores, few data hazards as possible. Grid and block sizes appropriately so that all SM are used, since idle SM leak. Accordingly 7 benchmarks were selected from CUDA SDK. 5/4/11CSCI 8205: GPU Power Model 22
23
Benchmarks Our benchmarks 2D convolution Matrix Multiplication Vector Addition Vector Reduction Scalar Product DCT 8x8 3DFD 5/4/11CSCI 8205: GPU Power Model 23
24
Limitations of PTX Higher level than assembly Divide & Sqrt: 1 PTX line, library in assembly Compiler optimizations from PTX -> assembly Doesn’t reflect RAW dependencies Performance counters use assembly CSCI 8205: GPU Power Model 24 5/4/11
25
Outline Introduction and Motivation Analytical Model Description Parser Power Model Experiment Setup Results Conclusion and Further Work CSCI 8205: GPU Power Model 25 5/4/11
26
Results CSCI 8205: GPU Power Model 26 5/4/11
27
Outline Introduction and Motivation Analytical Model Description Parser Power Model Experiment Setup Results Conclusion and Further Work CSCI 8205: GPU Power Model 27 5/4/11
28
Conclusion and Further Work Conclusion Further Work Take into account context switches Consider Multiple kernels running simultaneously CSCI 8205: GPU Power Model 28 5/4/11
29
The End Thanks Q&A CSCI 8205: GPU Power Model 29 5/4/11
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.