Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multi-Story Power Distribution Networks for GPUs

Similar presentations


Presentation on theme: "Multi-Story Power Distribution Networks for GPUs"— Presentation transcript:

1 Multi-Story Power Distribution Networks for GPUs
Qixiang Zhang, Liangzhen Lai, Mark Gottscho, and Puneet Gupta Electrical Engineering Department nanocad.ee.ucla.edu

2 Problem: GPU Power Delivery is Expensive
GPUs draw large currents due to high power consumption at low voltage Power loss & voltage noise in the power distribution network (PDN) Many supply and ground pins required Design consequences High package cost Reduced I/O pin availability Inefficient PDN Aging & wearout Goal: Reduce overhead of power delivery in GPUs 16-Mar-2016 Mark Gottscho / UCLA

3 Previous Work: Multi-Story PDNs to the Rescue?
Idea: Stack multiple voltage planes for logic! [Gu ISLPED’05] Challenge: How to partition logic such that current demand of each story is matched? Gate-level, functional units, cores, …? Multi-Story PDN concept [Gu ISLPED’05] We adapt this idea to GPUs at the core level and improve it further 16-Mar-2016 Mark Gottscho / UCLA

4 Proposal: Multi-Story Approach is Ideal for GPUs
We propose multi-story PDNs for GPUs! These low-cost techniques can help stabilize the voltage rails: Hardware Auxiliary regulator On-chip supercapacitors Dynamic Current Compensation (DCC) Software Static SIMT Thread Scheduling (SSTS) GPUs are good for current matching at the core level Architectural homogeneity Regular layout SIMT model: single-instruction, multiple-thread Minimal communication between threads NVIDIA Fermi Block Diagram [NVIDIA] Motivational Results: GPGPU-Sim (NVIDIA GTX 480 with 14 cores) + HSPICE NQU LPS STO RAY 16-Mar-2016 Mark Gottscho / UCLA

5 Conventional 1-Story PDN for GPUs
All cores in a voltage domain share common off-chip and 1-Story PDN has high off-chip current demand 16-Mar-2016 Mark Gottscho / UCLA

6 Proposed 2-Story PDNs for GPUs
GPU cores divided across two stacked voltage domains. New node : virtual ground for upper story virtual supply for bottom story 𝑉 𝑖𝑛𝑡𝑒𝑟 2-Story: off-chip current demand 1/2X, resistive power losses 1/4X, power pins 1/2X! 16-Mar-2016 Mark Gottscho / UCLA

7 Proposed 2-Story, 1-Regulator GPU
Problem: nodes are floating! Sensitive to minor current imbalances between cores. 𝑉 𝑖𝑛𝑡𝑒𝑟 16-Mar-2016 Mark Gottscho / UCLA

8 Proposed 2-Story, 2-Regulator GPU
nodes stabilized by the auxiliary regulator, but costs extra pins and power. 𝑉 𝑖𝑛𝑡𝑒𝑟 16-Mar-2016 Mark Gottscho / UCLA

9 Results: Conventional 1-Story vs. 2-Stories
2-story, 1-regulator design is most efficient and cheapest, BUT is unreliable without fixes: target 10% MVS 16-Mar-2016 Mark Gottscho / UCLA

10 On-Chip Supercapacitors Stabilize for 1-Reg.
𝑉 𝑖𝑛𝑡𝑒𝑟 Supercaps near GPU cores can filter transient voltage noise on instead of aux. regulator 𝑉 𝑖𝑛𝑡𝑒𝑟 16-Mar-2016 Mark Gottscho / UCLA

11 Results: 2-Story, 1-Regulator with On-Chip Supercaps
RAY Benchmark 38 uF per core required for 10% MVS on 2-story, 1-regulator Assume on-chip supercapacitor density is 23 pF/um2 [Leung 2015, El-Kady 2013] and is not stackable on logic/metal Supercap area overhead est mm2 per core, 4.6% for chip Supercaps make the 2-story, 1-regulator design more reliable and more efficient with low overhead 16-Mar-2016 Mark Gottscho / UCLA

12 Dynamic Current Compensation (DCC)
DCC can actively balance current demand among cores when supercaps cannot fix steady-state mismatches Voltage-controlled current source (VCCS) Ring oscillator (RO) Control latency critical to stability DCC can assist supercaps in stabilizing 𝑉 𝑖𝑛𝑡𝑒𝑟 16-Mar-2016 Mark Gottscho / UCLA

13 Results: 2-Story, 1-Regulator with Supercaps & DCC
LPS Benchmark (Csupercap = 8 uF per core) 10% power loss & 10% MVS with approx. 1% supercap die area overhead and up to 1us VCCS latency 16-Mar-2016 Mark Gottscho / UCLA

14 Static SIMT Thread Scheduling (SSTS)
Current profiles may be imperfectly matched for different cores Propose software-based solution Given prior knowledge of workload characteristics… Minimize average difference in top/bottom story current demand via thread placement We use a greedy thread partitioning algorithm akin to Fiduccia-Mattheyses (FM) SSTS is well suited for compensating static power offsets 16-Mar-2016 Mark Gottscho / UCLA

15 Results: 2-Story, 1-Regulator with Supercaps & SSTS
SSTS can achieve similar result to DCC without extra hardware, but cannot manage dynamic variation 16-Mar-2016 Mark Gottscho / UCLA

16 Practical Considerations
Multiple virtual ground planes required in silicon Triple-well or moat isolation processes between stories [Pei et al. IEDM’14] Boot time: need to control due to gate oxide breakdown Slowly ramp off-chip voltage Process variations & aging cause power mismatches Proposed techniques can compensate Memory/NoC/IO power distribution Use separate domains + level shifters 𝑉 𝑖𝑛𝑡𝑒𝑟 16-Mar-2016 Mark Gottscho / UCLA

17 Conclusion: Multi-Story PDNs Promising for GPUs
Benefits Fewer required power pins More efficient power delivery Our innovations Application of multi-story to GPU Auxiliary regulator On-chip supercaps DCC SSTS Future Work: DVFS for multi-story GPUs 16-Mar-2016 Mark Gottscho / UCLA

18 Thank you!


Download ppt "Multi-Story Power Distribution Networks for GPUs"

Similar presentations


Ads by Google