Download presentation
Presentation is loading. Please wait.
Published bySheldon Gowen Modified over 9 years ago
1
Cooperative Boosting: Needy versus Greedy Power Management INDRANI PAUL 1,2, SRILATHA MANNE 1, MANISH ARORA 1,3, W. LLOYD BIRCHER 1, SUDHAKAR YALAMANCHILI 2 JUNE 2013 1 Advanced Micro Devices, Inc. 2 Georgia Institute of Technology 3 University of California, San Diego
2
2COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 GOAL & OUTLINE Goal: –Optimize performance under power and thermal constraints in heterogeneous architecture Outline: –State-of-the-Art Power and Thermal Management –Thermal Coupling –Performance Coupling –Cooperative Boosting –Results
3
STATE-OF-THE-ART POWER AND THERMAL MANAGEMENT
4
4COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 STATE-OF-THE-ART PROCESSOR Graphics processing unit (GPU): 384 AMD Radeon™ cores Multi-threaded CPU cores Shared Northbridge access to overlapping CPU-GPU physical address spaces Many resources shared among CPU and GPU –For example, memory hierarchy, power, and thermal capacity Accelerated processing unit (APU)
5
5COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 PROGRAMMING MODEL Coupled programming model Offload compute intensive tasks to the GPU APU Hardware CPU Operating System User Application OpenCL™ Software Stack Host Tasks GPU Tasks GPU Each OpenCL kernel Grid of threads, each operating over a data partition N-Dimensional Range
6
6COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 WHAT IS THERMAL DESIGN POWER? Thermal design power: TDP –Upper bound for the sustainable power draw –Determines the cooling solution and package limits –Usually set by determining worst-case execution profile Performance depends on effective utilization of thermal headroom www.legitreviews.com Instructions/cycle Time
7
7COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 STATE-OF-THE-ART: BI-DIRECTIONAL APPLICATION POWER MANAGEMENT (BAPM) Power management algorithm 1.Calculate digital estimate of power consumption 2.Convert power to temperature - RC network model for heat transfer 3.Assign new power budgets to TEs based on temperature headroom 4.TEs locally control (boost) their own DVFS states Chip is divided into BAPM-controlled thermal entities (TEs) CU0 TE CU1 TE GPU TE
8
8COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 CURRENT BOOST ALGORITHMS: POWER VS. THERMAL MANAGEMENT 3.0 Time APU Die Temperature Thermal Headroom Convert thermal headroom to higher performance through boost HW Boost states Max Die Temp SW visible states APU Performance CPU DVFS- state HW Only (Boost) Pb0 Pb1 SW- Visible P0 P1 P2 - - - Pmin GPU DVFS- state HW Only High Medium Low
9
9COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 KEY TAKEAWAYS Power and thermals are shared resources in a heterogeneous processor thermal coupling Overall application performance is a function of both the CPU and the GPU performance coupling State of the practice: Managing to thermal limits by locally boosting when thermal headroom is available utilize all of the headroom!
10
THERMAL COUPLING
11
11COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 THERMAL SIGNATURES: CPU & GPU High-power GPU benchmark Sustained power: 19.7 W High-power CPU benchmark, idle GPU Sustained power: 18.8 W Higher thermal density of CPUs steeper thermal gradients Faster consumption of thermal headroom on the CPU Steady-state thermal fields produced by BAPM on a 19W AMD Trinity APU
12
12COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 THERMAL TIME CONSTANT Significant rise in temperature of the idle component due to thermal coupling and pollution from the active components within a die CPU consumes thermal headroom more rapidly (4X faster) GPU can sustain higher power boosts longer Idle GPU temperature rose by ~20 o C
13
13COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 THERMAL COUPLING: THERMAL HEADROOM AVAILABILITY
14
14COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 THERMAL COUPLING: BOOST FOR CONSUMPTION OF THERMAL HEADROOM 6 o C rise in GPU temperature once CPU power limit was removed and both CUs were allowed to boost
15
15COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 THERMAL COUPLING: THERMAL THROTTLING Minimize detrimental effects of thermal coupling by capping maximum CPU P-state P-state limiting
16
16COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 BAPM P2 Capping the max CPU DVFS state at P2 Capping the max CPU DVFS state at P4 RESIDENCY IN DIFFERENT POWER STATES
17
17COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 KEY TAKEAWAYS Thermal signatures different between CPU and GPU Heterogeneity in physical properties High thermal density leads to faster consumption of thermal headroom in the CPU cores Significant thermal coupling from active to idle components Near the thermal limit, boosting based on available thermal headroom introduces inefficiencies –Reduce the CPU P-state limit
18
PERFORMANCE COUPLING
19
19COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 CPU-GPU PERFORMANCE COUPLING CPU should be just fast enough to keep the GPU fully utilized P-state should be high enough APU Hardware CPU Operating System User Application OpenCL™ Software Stack Host Tasks GPU Tasks GPU Each OpenCL kernel Grid of threads, each operating over a data partition N-Dimensional Range
20
20COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 MANAGING THERMALS FOR PERFORMANCE-COUPLED APPLICATIONS
21
21COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 MANAGING THERMALS FOR PERFORMANCE-COUPLED APPLICATIONS
22
22COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 MANAGING THERMALS FOR PERFORMANCE-COUPLED APPLICATIONS
23
23COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 P-STATE SENSITIVITY
24
24COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 DETERMINING CRITICAL CPU P-STATE Find the inflection point in performance as a function of CPU P-state critical P-state Critical P-state is determined by interference (CPU vs. GPU) in the memory system Critical CPU P-state Limit
25
25COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 KEY TAKEAWAYS Performance coupling – CPU-GPU performance dependency Balance between detrimental effects of thermal coupling and needs of performance coupling CPU critical P-state limit is determined by performance coupling and thermal coupling GPU memory bandwidth gradients as a function of CPU frequency along with CPU IPC serve as a measure of performance coupling
26
COOPERATIVE BOOSTING ALGORITHM
27
27COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 COOPERATIVE BOOSTING (CB) Overlaid on top of BAPM – invoked periodically when thermal coupling is detrimental i.e. when thermal limit is approached
28
28COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 EXPERIMENTAL SET-UP Trinity A8-4555M APU: 19W TDP CPU: Managed by HW or SW P- state Voltage (V) Freq (MHz) HW Only (Boost) Pb012400 Pb10.8751800 SW- Visible P00.8251600 P10.8121400 P20.7871300 P30.7621100 P40.75900 GPU: Managed by HW only GPU-high: 423 MHz GPU-med: 320 MHz Cooperative Boosting implemented as a system software policy overlaid on top of BAPM in real hardware
29
29COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 BENCHMARKS BM (Description)Problem SizeType NDL (Needleman- Wusch) 4096x4096 data points, 1K iterations Performance- coupled HS (HotSpot)1024x1024 data points, 100K iterations Performance- coupled BF (BoxFilter SAT)1Kx1K input image, 6x6 filter,10K iterations Performance- coupled FAH (Folding at Home) Synthesis of large protein: spectrin$ Performance- coupled BS (Binary Search)4096 inputs, 256 segments, 1M iterations Performance- coupled Viewdle (Haar facial recognition) Image 1920x1080, 2K iterationsPerformance- coupled Lbm (CPU2006)4 threads, Ref inputCPU-centric Gcc (CPU2006)4 threads, Ref inputCPU-centric
30
RESULTS
31
31COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 PERFORMANCE IMPROVEMENT WITH COOPERATIVE BOOSTING Static P-state limiting requires profiling and a priori information of workload An average of 15% performance gain for performance-coupled applications with CB
32
32COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 POWER SAVINGS Average 10% power savings across performance-coupled applications 5 o C reduction in peak temperature for BS -> large percentage of leakage power savings
33
33COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 ENERGY*DELAY^2 Average 33% energy-delay^2 savings across performance-coupled applications
34
34COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 CONCLUSIONS Demonstrated effects of thermal and performance coupling on performance –Applications with high GPU compute-to-load ratio are more susceptible to detrimental effects of thermal coupling –Emergent balanced workloads with split CPU-GPU computation are tightly performance-coupled Proposed Cooperative Boosting (CB) technique to determine critical CPU P-state at which effects of thermal coupling are balanced with needs of performance coupling –Shifts power to CPU only when needed Demonstrated effectiveness of CB on real hardware as a well- rounded power and thermal management scheme
35
35COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners.
36
BACKUP
37
37COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 VIEWDLE PERFORMANCE ANALYSIS
38
38COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 BINARY SEARCH TEMPERATURE
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.