Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Electrical and Computer Engineering University of Wisconsin - Madison Optimizing Total Power of Many-core Processors Considering Voltage.

Similar presentations


Presentation on theme: "Department of Electrical and Computer Engineering University of Wisconsin - Madison Optimizing Total Power of Many-core Processors Considering Voltage."— Presentation transcript:

1 Department of Electrical and Computer Engineering University of Wisconsin - Madison Optimizing Total Power of Many-core Processors Considering Voltage Scaling Limit and Process Variations Jungseob Lee and Nam Sung Kim October 9, 2009

2 Outline Introduction Supply Voltage and Power Scaling  Supply Voltage Scaling of Many-Core Processors  Power Scaling of Many-Core Processors Impacts of Within-Die(WID) Spatial Process Variations  Global Clocking  Frequency−Island Clocking Conclusions

3 Parallel Processing  Improved throughput of computing systems w/ more cores  Throughput is limited by power+thermal constraints w/ all cores running Challenges: How do we  Determine # of cores for best performance-power efficiency?  Exploit process variations for multicore processors? Multicore processors [1] Serial processing Parallel processing [1] Source: http://www.interactivesupercomputing.com/starpexpress/042007/3_Task_Parallel.html [2] Source: NVIDIAhttp://www.interactivesupercomputing.com/starpexpress/042007/3_Task_Parallel.html GPU which has many cores [2]

4 Types of Process variations Process variations Within-Die (WID) Variations Die-to-Die (D2D) Variations Wafer Scale Courtesy: K. Bowman from Intel A Systematic V th variation map for a 16-core processor The corresponding Norm F max and P leak map C2C frequency and leakage power variations due to spatial correlated WID variations become considerable.

5 Supply Voltage Scaling 1 Supply voltage scaling of many-core processors  Throughput w/ certain # of cores at max V DD (thus F max ) = Throughput w/ more cores at lower V DD (thus F max )  Potential throughput increase by many cores and lower V DD can reduce power.  # of cores 4  Operating freq V DD  # of cores 8  Operating freq Lower V than V DD

6 Supply Voltage Scaling 2 Supply voltage scaling of many-core processors  M∙T cycle (V DD ) = M∙((1−F) + F/N)∙T cycle (V) MNumber of operations T cycle Cycle time of a processor at supply voltage V DD Nominal supply voltage of base core processor FFraction of operations parallelizable w/o overhead NRelative number of cores VScaled supply voltage of N x more cores PTM 32nm HP PTM 32nm LP Require higher V DD due to high V th > 40 % ↓

7 Dynamic Power Analysis 1 Dynamic power scaling  Dynamic power of a base many-core processor P dyn,base = C eff ∙V 2 DD ∙F max (V DD )  Dynamic power of N x more cores than the base processor P dyn,N = ((1−F) ∙(1+(N−1) ∙K) + F ∙N) ∙C eff ∙V 2 ∙F max (V) = k(F, K, N) ∙f(V) ∙(V/V DD ) 2 ∙P dyn,base P dyn,base Dynamic power of a base core C eff Effetive total switching capacitance V DD Nominal voltage of the base core F max Maximum operating frequency of the base core P dyn,N Dynamic power of N x more cores KFraction of dynamic power of idle cores k(F,K,N)((1−F) ∙(1+(N−1) ∙K) + F ∙N) f(V)Frequency scaling factor at V; F max (V)/F max (V DD ) P dyn,base Dynamic power of a base processor C eff Effetive total switching capacitance V DD Nominal voltage of the base core F max Maximum operating frequency of the base proc

8 Dynamic Power Analysis 2 Dynamic power scaling PTM 32nm HP PTM 32nm LP Optimal Normalized P dyn / Relative # of cores Dotted lines show projected power consumption when no supply limit. V DD,min = 0.7V Less V DD scaling  Less P dyn reduction HP: 25~55% LP: 25~54%

9 Leakage Power Analysis 1 Leakage power scaling  In nanoscale technology, leakage power is significant fraction of total power consumption.  Leakage power of a base many-core processor P leak,base = I leak (V DD ) ∙V DD  Leakage power of N x more cores than the base processor P leak,N = N ∙I leak (V) ∙V = N ∙ l (V) ∙(V/V DD ) ∙P leak,base P leak,base Dynamic power of a base core I leak Total Leakage current of the base processor V DD Nominal voltage of the base core P leak,N Dynamic power of N x more cores l (V) Leakage scaling factor at V P leak,base Leakage power of a base core I leak Total Leakage current of the base processor V DD Nominal voltage of the base core

10 Leakage power scaling Leakage Power Analysis 2 PTM 32nm HP PTM 32nm LP Optimal Normalized P leak / Relative # of cores But Absolute P leak is much less than HP HP: 54~80% LP: 33~50%

11 Total Power Analysis 1 Total power scaling  The total power of a base many-core processor is the sum of dynamic and leakage power. P tot,base = P dyn,base + P leak,base = P dyn,base ∙ (1 + LF)  The total power of N x more cores than the base processor is the sum of dynamic and leakage power. P tot,N = P dyn,N + P leak,N = P dyn,base ∙ { k(F,K,N) ∙ f(V) ∙ (V/V DD ) 2 + N ∙ l (V) ∙ (V/V DD ) ∙ LF } P tot,base Total power of a base core LFRatio between P leak and P dyn ; (P leak /P dyn ) P tot,N Total power of N x more cores

12 Total power scaling Total Power Analysis 2 PTM 32nm HP LF 0.4/0.6 PTM 32nm LP LF 0.2/0.8 Optimal Normalized P tot / Relative # of cores More V DD scaling  only 17% more P tot reduction, but require more on-die memory area HP: 36~65% LP: 26~52%

13 Impacts of WID Variations − GC Global Clocking  Limits F max of a many-core processor to that of slowest core.  Previous P dyn,N equation still can be used to estimate P dyn,N  Estimation of P leak,N have to account for each core’s leakage variations as follows. P leak,N = l i (V) ∙(V/V DD ) ∙P leak,base l i (V) Leakage scaling factor of i-th core; Normalized to I leak (V DD ) A Systematic V th variation map for a 16-core processor The corresponding F max and P leak map Core ID Normalized F max, P leak

14 Impacts of WID Variations − GC Global Clocking HP Slowest base core HP Fastest base core Much more relative total power reduction because the fastest base core is not power efficient Average total power of 100 die samples / Relative # of cores(N) Slow: 23~54% Fast: 77~90%

15 Impact of WID Variations − FI Frequency−Island Clocking  FI clocking is more performance and power efficient than GC because each core can run at its own fastest frequency.  Previous GC P leak,N equation can be used to estimate P leak,N.  The equation for supply voltage scaling have to be modified as follows. M ∙T cycle,base (V DD ) = M ∙((1−F) / f j + F/ f i ) ∙T cycle (V)  Estimation of P dyn,N also have to account for an independent clock frequency per core. P dyn,N = ((1−F)∙(f j + f i ∙K) + F ∙ f i ) ∙ (V/V DD ) 2 ∙ P dyn,base  The fastest one among the chosen active cores always offers the optimal total power for processing the totally sequential portion of workload.

16 Impacts of WID Variations − FI Frequency−Island Clocking HP Slowest base core HP Fastest base core Average total power of 100 die samples / Relative # of cores(N) FI clocking is more power- efficient than the global clocking (GC) that often wastes F max of faster cores. On average, FI clocking offers 7% lower total power consumption than GC. Slow: 30~58% Fast: 81~90%

17 Experimental Methodology HSPICE simulations  32nm PTM HP and LP model Frequency / Leakage scaling factor  A range of V DD : 0.55 ~ 1.05(V) V th and L eff WID spatial and D2D variation map Complex gates for measuring l (V DD ) 24 FO4 inv chain for measuring f(V DD ) WID variation Correlation distance coefficient (Φ) 0.5 6.4% D2D variation5.0% 1 grid point [3] Smruti R. Sarangi et al., “VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects”, IEEE Transactions on Semiconductor Manufacturing (IEEE TSM), February 2008. [3]

18 Conclusions Optimal number of active cores to minimize total power consumption of many-core processors.  2x more active cores at lower voltage offer more than 50% of total power reduction at the same throughput with a base core. Extended power analysis considering WID C2C frequency and leakage variations  2x more active cores at lower voltage is the optimal choice.  FI clocking provides lower power consumption than GC since it can exploit C2C variations. Also the fastest one in active cores for sequential portion of application led to the lowest power consumption.

19 Backup

20 Process variations  Manufactured dies exhibit a large spread of transistor delay and leakage power across die and within each die.  Die-to-die(D2D) variations affect all transistors on a die equally. Within- die(WID) variations induce different characteristics across each die.  As individual core size becomes smaller, core-to-core(C2C) frequency and leakage power variations due to spatial correlated WID variations will become considerable. Introduction Source: Synopsys Die-to-die variations Spatial Within-die variations

21 Supply Voltage and Power Scaling 2 Supply voltage scaling of many-core processors  Throughput w/ a certain # of cores at max V DD (thus F max ) = Throughput w/ more cores at lower V DD (thus F max )  Potential throughput increase by many cores and lower V DD can reduce power.  # of active cores 1  Operating freq V DD  # of active cores 8  Operating freq Lower V than V DD Idle Core Many−Core Processor [1] x x x x x x x x x x x x x x x x x x x x x x Active Core x


Download ppt "Department of Electrical and Computer Engineering University of Wisconsin - Madison Optimizing Total Power of Many-core Processors Considering Voltage."

Similar presentations


Ads by Google