Presentation is loading. Please wait.

Presentation is loading. Please wait.

Amirali Shayan Advisor: Chung-Kuan Cheng

Similar presentations


Presentation on theme: "Amirali Shayan Advisor: Chung-Kuan Cheng"— Presentation transcript:

1 System Level Design of Power Distribution Network for Mobile Computing Platforms
Amirali Shayan Advisor: Chung-Kuan Cheng University of California, San Diego 2011

2 Agenda Introduction and motivation Dissertation organization
Regulator based Pre-Silicon analysis vs. post silicon correlation LDO based PDN design under worst loading Worst case current synthesis Resonance aware vs. Rogue wave Poles/Zeros based methodology Experimental results and trade offs Conclusion remarks / future research directions

3 What is a Power Distribution Network (PDN)
Power supply noise Resistive IR drop Inductive Ldi/dt noise [Popovich et al. 2008]

4 Motivation (1): PDN Roadmap
Consumer Portable Processing Performance Trend Vdd Supply Voltage Scaling Consumer Portable Power Consumption Trend [ITRS 2010]

5 Research Motivation (2)
Dominant impact on performance ( Processor frequency ) Scaling push for higher performance Increasing power demand Reduction of supply noise On-chip variation (OCV) in below 28nm. Voltage variation affect the margins, i.e. timing margins. Form factor limitations Increasing applications complexity Cost Reliability Functional mode Test mode

6 Dissertation Organization
System Level Design and Analysis of Mobile Platforms System level Design aspects of the Power Distribution (DAC 10 – user track) Time domain and Frequency domain Analysis (ISQED 09) Emerging Technologies: Reliability and Design Aspects of 3D Integration Time and Frequency domain analysis of the three dimensional networks (EPEP 08) Reliability aware design and Optimization of the three dimensional stacking (DATE09, ICCD09) What would be a Worst Scenario for Portable Computing? Resonance aware analysis of the PDN in System Level (EPEP 10) Rogue Wave based Synthesis of the realistic Current Activity for worst Voltage drop Towards on-die Power Distribution Regulations LDO based design and Optimization of Power Distribution (ASICON) Parallel Flow to Analyze the Impact of the Voltage Regulators (ISQED 09) Power Distribution Impact on Performance Impact of the Power Delivery on the Processor Performance (ICEAC 10) Performance under worst case voltage drop and temperature Variation (SLIP 10) Peak Power Reduction in Architecture Level (MICRO 09 , (HPCA 11 – submitted ) ) Future Directions

7 Publications Worst Case Noise On-die and Off-chip Regulations
A. Shayan, K. Bowles, S. Dobre, M. Popovich, X. Chen, C. Pan , Resonance-aware modulation methodology for system level power distribution co-design, IEEE Conference on Electrical Performance of Electronic Packaging(EPEP), 2009. Peng Du; Xiang Hu; Shih-Hung Weng; Shayan, A.; Xiaoming Chen; Ege Engin, A.; Chung-Kuan Cheng; , "Worst-case noise prediction with non-zero current transition times for early power distribution system verification,“ ISQED 10. On-die and Off-chip Regulations A. Shayan, X. Hu, C.K. Cheng, W. Yu, C. Pan, Linear Dropout Regulator based Power Distribution Design under Worst Loading, IEEE International Conference on ASIC, 2011. A. Shayan, X. Hu, H. Peng, W. Yu; W. Zhang; C.K. Cheng, M. Popovich, X. Chen, L. Chua-Eaon; Xiaohua Kong, \Parallel ow to analyze the impact of the voltage regulator model in nanoscale power distribution network", The International Symposium on Quality Electronic Design(ISQED), 2009. A. Shayan, X. Hu, S.H. Weng C.K. Cheng, W. Yu, C. Pan, “Optimization of On-die Linear Dropout based Power Distribution”, IEEE Transaction of VLSI, under submission. Performance Impact A. Shayan, C. Pan, M. Popovich, K. Bowles, Estimation of Power Integrity Impact to Low Power Processor Performance through Pre-Silicon Simulation and Post-Silicon Measurements", AMSE InterPack, 2011. Chung-Kuan Cheng, Andrew B. Kahng, Kambiz Samadi and Amirali Shayan, Worst-case Performance Prediction Under Supply Voltage and Temperature Variation", System-Level Interconnect Prediction (SLIP), 2010.

8 Publications 3D Integration PDN Analysis and Simulation
A. Shayan, X. Hu, H. Peng, W. Yu, T. Toms, M. Popovich, X. Chen, C.K. Cheng, Reliability aware through silicon via planning for nanoscale stacked silicon ICs, DATE, 2009. A. Shayan, X. Hu, M. Popovich, A.E. Engin, C.K. Cheng, Reliable 3D Stacked Power Distribution Considering Substrate Coupling, ICCD, 2009. A. Shayan, X. Hu, Power Distribution Design for 3D Integration, Jacob School of Eng. Research Expo, 2009 [Best Poster Award]. A. Shayan, X. Hu, H. Peng, W. Zhang, M. Popovich, L. Chua-Eoan, C.K. Cheng, Power distribution co-design for nanoscale stacked silicon ICs, EPEP, 2008. W. Zhang, W. Yu, X. Hu, A. Shayan, A.E. Engin, C.K. Cheng, Predicting the worst-case voltage violation in a 3D power network. SLIP, 2009. Analysis and Simulation W. Zhang, Y. Zhu, W. Yu, A. Shayan, R. Wang; Z. Zhu; C.K. Cheng, Noise minimization during power-up stage for a multi-domain power network,“ ASPDAC, 2009. X. Hu, W. Zhao, P. Du, Y. Zhang, A. Shayan, C. Pan, A.E. Engin, C.K. Cheng On the bound of time-domain power supply noise based on frequency-domain target impedance. SLIP, 2009. Measurement Sensors L .Chua-Eoan, B. Andreev, C. Pan, A. Shayan, X. Kong, M. Popovich, M. Calle, I.-J.Chang, On-Chip Sensor For Measuring Dynamic Power Supply Noise Of The Semiconductor Chip", US Patent Granted, US , Aug 11, 2011.

9 Publication Architectures Computational Biomedical Engineering
H. Homayoun, V. Kontorinis, A. Shayan, T. Lin, D. Tullsen, Dynamically Heterogeneous Cores Through 3D Resource Pooling, Submitted to IEEE International Symposium on High Performance Computer Architecture (HPCA), 2011. Vasileios Kontorinis, A. Shayan, R. Kumar, D. Tullsen, Reducing Peak Power with a Table-Driven Adaptive Processor Core, IEEE/ACM International Symposium on Microarchitecture (MICRO), 2009. A. Shayan, Online thermal aware scheduling for multiple clock domain CMPs, IEEE International SOC Conference(SOCC), 2007. Computational Biomedical Engineering A. Shayan, Y. Zhu, Y.N. Cheng, C.K. Cheng, S.F. Lin, P.S. Chen, Exploring Cardioneural Signals from Noninvasive ECG Measurement," BIBE, 2007. A. Shayan, Y. Zhu, W. Zhang, T.P. Jung, J.R. Duann, C.K. Cheng, Spatial Density Reduction in the Study of the ECG Signal using Independent Component Analysis," EMB 2007. Y. Zhu, A. Shayan, W. Zhang, T. L. Chen,T.P. Jung, J.R. Duann, S. Makeig, C.K. Cheng, "Analyzing High-Density ECG Signals Using ICA," , IEEE Transactions on Biomedical Engineering, vol.55, no.11, pp , Nov

10 Contributions in Second Part of Thesis
Silicon Correlation: Power Integrity Impact on Performance. LDO based Optimization of the PDN Exact analytical step response calculation Poles/Zeros Maximum Voltage Drop Vector based Rogue wave synthetic LDO based PDN optimization

11 Silicon Correlation: Power Integrity Impact on Performance
Power integrity has been a key design concern for high performance processors with 10s of current amps in GHz freq. Many studies look into high performance core sensitivity Arabi et al, [VLSI Symposium 02] Kantorovich, [EPEP 06/08] Waizman, [EPEP03] Do we need to be concerned for low power processor performance? Is performance of low power processors sensitive to power integrity? Power consumption of low power processor power is low. Strong drivers from smart phones, notebooks to push the application performance to high frequency range. Many cycles of high and low power, burst applications, …. make processor prone to exciting resonances. Constrained physical design resource, packaging, etc .

12 Impedance Measurement in Silicon
i.e. Waizman, EPEP03 introduced a methodology to measure the impedance. Clock is modulated and three main harmonic of the grid is measured for impedance reconstruction.

13 Impedance Sensitivity cross the Low Frequency
Low freq resonance with 1x bulk capacitance with 5x bulk capacitance 0x DSC 2x DSC 7x DSC Mid freq resonance vectors trigger different droop Frequency

14 Performance (Fmax) Sensitivity to Low Freq Impedance
Impact of systematic increase of bulk cap on improving the low frequency impedance on Fmax across different process.

15 Silicon Measurement: Fmax cliff due to di/dt drop
Loop = 1x Loop = 4x Loop = 7x Loop = 18x Loop = 27x Vnom ΔFmax = 0 Vnom+100mV Δ Fmax = 0 Vnom±13% ΔVnom Δ Fmax = -6.7% Vnom+100mV±13% ΔVnom Δ Fmax = -9.3% Δ Fmax = -6.7% Δ Fmax = -9.3% Δ Fmax = -6.7% Δ Fmax = -9.4% Δ Fmax = -6.9% Δ Fmax =-9.4% Performance cliff

16 Mid frequency Impedance Enhancement with Package
4Layer 6Layer, no cap 6Layer, 1×DSC 6Layer,2×DSC 6Layer, 3×DSC Analysis of the impact of including 1, 2, and 3 on-package die-side caps (DSCs). Major improvement in high frequency resonance peak was observed when the first DSC was inserted. Improvement at mid frequency range was dramatic when all three DSCs were inserted.

17 Measured Fmax Sensitivity by Systematically Removing Pkgdcp
Marginal impact to Fmax noted until all on-package decaps are removed. This finding correlates to the simulated PDN impedance which adding the first on-package decap led to the largest high frequency impedance drop.

18 Correlation of Physical vs. Partitioned Model
Granularity of the electrical model could possibly change the understanding of the behavior of the power distribution system. Physical Physical Partitioned Partitioned Low resolution partitioned model. High resolution partitioned model. Graph on left side shows miscorrelation between commercially available equivalent circuit model and physical model. The equivalent circuit model used 18 tiles to partition the die, while the physical model used 6500.

19 Modeling Challenges: High Performance vs. Low Power (1)
Power grid models in the low power processors are non-uniform structures. We need to take into account proper MxN resolution to cover non - uniform regions behavior.

20 Modeling Challenges: High Performance vs. Low Power (2)
Decap allocations are not homogenous. The model granularity should be representative for the density of decaps in a heterogeneous fashion. Should not clamp the non uniform decap cells together !

21 Impedance Profile of a PDN in Heterogeneous decap Regions
Ztarget Combining a non uniform decap region will lead to optimism. 21

22 Contributions in second part of Thesis
Silicon Correlation: Power Integrity Impact on Performance. LDO based Optimization of the PDN Exact analytical step response calculation Poles/Zeros Maximum Voltage Drop Vector based Rogue wave synthetic LDO based PDN optimization

23 LDO based PDN Optimization under Worst Loading
Virtually eliminate 1st and 2nd droops |z| RPCB RDIE RPKGL RPKGU TR VR Power saving opportunity LDO Freq Integrated On-die LDO Shortens the PDN loop Bulk caps Package Die MB caps LDO VRM Motherboard Adv #1: Better dynamic power management through faster response time Adv #2: Maintain low package cost while provide adequate power delivery

24 LDO-PDN Model of Design (1)

25 LDO-PDN (2) – Model Approximation

26 Proposed Flow for Worst Case Loading LDO Optimization

27 Problem Formulation P = LDO Power C = Decoupling Capacitor
P0 = Power limit I peak = Peak loading current of functional block Vmax = Worst voltage drop based on rogue wave Z LDO-PDN = impedance profile of ldo-pdn

28 LDO-PDN Output Impedance
impedance zero = Z1= x 1e9 Z4,5= ± i × 1e9 impedance pole = p1= p4,5= ± i × 1e9 Impedance k=

29 Step Response of the LDO-PDN

30 Analytical Worst Step Response

31 “Rogue Wave” Phenomenon
Worst-case noise response: The maximum noise is formed when a long and slow oscillation followed by a short and fast oscillation. Rogue wave: In oceanography, a large wave is formed when a long and slow wave hits a sudden quick wave. High-frequency oscillation corresponds to the resonance of the 1st stage Low-frequency oscillation corresponds to the resonance of the 2nd stage

32 Ideal Worst-Case PDN Noise
Problem formulation I PDN noise: Worst-case current [Xiang ’09]: Zero current transition time. Unrealistic!

33 Rogue Wave in LDO-PDN

34 Rogue wave based Current Vector Synthesis

35 Algorithm for Vector-based Rogue Wave Generation
for i = 0 to N-window_size Begin sum each current peak of current pattern(i, i+window_size - 1) End sorted_list_des = sorting the sum of the intervals of current peak descending sorted_list_asc = sorting the sum of the intervals of current peak ascending //here is for worst-case calculating for i = 0 to N-window_size and i is increased by window_size //N is the size of impulse_reseponse if impulse_response(i) > 0 current_list = sorted_list_des else current_list = sorted_list_asc for j = 0 to M - window_size + 1 //M is the size of current pattern idx_current = current_list(j) tmp_val = convolution of impulse_response(i, i + window_size - 1) and current_pattern(idx_current, idx_current + window_size -1) if tmp_val > max_val max_val = tmp_val max_current(i, i+window_size -1) = current_pattern(idx_current, idx_current + window_size - 1) break end end //end of for j end //end of for i Complexity of algorithm = N= Impulse response size m= Current windows size

36 Vector-based Synthetic Rogue Wave

37 Theoretical Background
T=5nS Pulse train tr=tf=2.5nS Single current demand pulse Current stimulus is band limited BW ~ 1/tr Periodic triangular pulses FFT of single triangular pulse sampled by III(freq) freq = 1/T 2/T=400MHz Pulse train 1st harmonic 3rd harmonic 1/tr=400MHz Single current demand pulse November 8, 2018 37

38 Theoretical Background
Current spectrum: DC 15MHz 185MHz 200MHz 215MHz 585MHz 600MHz 615MHz Voltage peak for an ideal 1W resistor Vpeak_cal ≈0.75V Vpeak_sim=0.80V Voltage peak for a RLC with resonance at 200MHz Vpeak_cal ≈141mV Vpeak_sim=144.5mV Modulated pulse train T=5nS Fmod=15MHz Single current demand pulse 15MHz 200MHz – 15MHz 200MHz + 15MHz 600MHz – 15MHz 600MHz + 15MHz Modulated pulse train Single current demand pulse November 8, 2018 38

39 Resonance-aware Modulations
Regular VCD Run FFT Bumps V I FFT Device Resonance aware FFT Bumps FFT Device

40 What if “Multiple Resonance” ..?

41 What if “Multiple Resonance” ..?
Piece-wise linear current Modulating waveform AM-modulated waveform The vcd run is modulated with multiple frequency to perturb the PDN at multiple anti resonance peak. On average rogue wave leads to more voltage variation vs. resonance aware.

42 Vmax LDO-PDN Voltage Drop (Overshoot)
Overshoot is a main concern for: Reliability of devices Hold margins

43 Vmin LDO-PDN Voltage Drop (Undershoot)
undershoot is a main concern for: Functional failures Optimum Configuration: Optimal Decap = 350pF Optimal Power= 20uW Noise = ~10mV

44 Future Research Directions
Variation-aware Voltage-Timing Analysis Temporal and Spatial Voltage Variation Impact Correlation with AOCV margins Architecture/Software/PDN Co-design PDN-Friendly Architectures [V. Kontronis, A. Shayan, MICRO09] Distributed LDO based Power Management Heterogeneous on-die regulations Emerging technologies Feasibility of on-die SMPS Continue towards heterogeneous 3D Integration

45 Conclusion Remarks Proposed an enhanced time-frequency domain flow: Parallel Efficient Flow with Spice accuracy for Co-simulation in both domains. Reliability-aware 3D PDN Analysis : Upton 70% voltage drop Worst-Case current loading generation based on rogue-wave: Realistic Scenario. On-die LDO based PDN Optimization: Power Trading for Noise Predictive Performance Model : Within 4.3% - 6% of SPICE accuracy Silicon Correlation of Power Integrity on Performance : Upton 15% impact on Fmax. More interesting research venue on: Architecture/Software/PDN Co-design. Distributed regulations for heterogeneous systems.

46 Back Up

47 Complexity Proof for the Rogue Wave Synthesis
We assume there are K intervals for the time span with window size of m. The complexity for sorting is O(K log K). For the for-loop of synthesizing worst-case current signature, the complexity of convolution is: O(m log m) and the loops repeat convolution for K*(N-m) times. Therefore, the overall complexity of synthesizing worst-case is O(K*(N-m)*m log m). Then, the complexity of the algorithm is O(K*(N-m)*m log m + K log K). We know that K = ceiling(N/m). The complexity can be expressed as O(N^2 log m ).

48 Vector current template

49 Worst case noise vs. partition resolution

50 Performance Prediction
Power distribution network (PDN) is major consumer (30+%) of interconnect resources  seek efficient early-stage PDN optimization Correct optimization of PDN requires understanding the implications on delay Our proposed models attempt to accurately and efficiently provide such implications Worst-case Vdd Our proposed models PDN Optimization Noise waveform characteristics Perf OK? No Circuit model Stimuli Layout decaps, ESR, … Yes Performance constraints done Critical path

51 Multivariate Adaptive Regression Splines (MARS)
MARS is a nonparametric regression technique MARS builds models of form: Each basis function Bi(x) can be: a constant a “hinge” function max(0, c – x) or max (0, x – c) a product of two or more hinge functions Two modeling steps: (1) forward pass: obtains model with defined maximum number of terms (2) backward pass: improves generality by avoiding an overfit model ^

52 Example MARS Output Models
Delay Model B1 = max(0, loadout – 0.021); B2 = max(0, – loadout); … B98 = max(0, offestnoise + 2.4e-12 )×B92; B100 = max(0, offsetnoise + 2.4e-12) ×B37; dcell = 1.02e e-10×B e-10×B e-11×B3+… - 1.71e-7×B e-7×B e-8×B100 Output Slew Model B1 = max(0, loadout – ); B2 = max(0, cellsize - 4)×B1; … B99 = max(0, slewnoise)×B55; B100 = max(0, offsetnoise ) ×B94; slewout = 1.23e e-10×B1 – 2.05e-10×B e-9×B3 + … e-8×B98 – 4.33e-9×B99 – 7.42e-9×B100 Closed-form expressions with respect to cell and supply voltage noise parameters Suitable to drive early-stage PDN design exploration

53 Worst-Case Performance Model
GOAL: find set of seven parameters (7-tuple) where the path delay is maximum  Mapping from set of all 7-tuples to cell delay and output slew values In a single stage pick the 7-tuple with maximum delay In a multi-stage path: Output slew of the previous stage becomes the input slew to the current stage Noise offset must be adjusted according to delay and output slew values of the previous stages Worst-case configuration is always an element of |slewin|×|loadout|×|cellsize|×|ampnoise|×|slewnoise|×|offsetnoise|×|temp| In our studies, we use configurations 53

54 Experimental Setup and Results
Scripting to generate SPICE decks for configurations Three different paths with different number of stages: (1) only INV, (2) only ND2D, and (3) a mix of INV and ND2D Models are insensitive to random selection of training data set Cell delay model within 6% of SPICE (on average) Our multi-stage path delay within 4.3% of SPICE simulation Worst-case predictions are in top 3 (out of configurations) w.r.t. list Parameter Values slewin { , , , , 0.56, }ns loadout {0.0009, , , }pF cellsize INV: {1, 4, 8, 20} ND2D: {1, 2, 4, 8} ampnoise {0, 0.054, 0.144, 0.27}V slewnoise {0.01, 0.04, 0.07, 0.09}ns offsetnoise {-0.15, -0.05, 0, 0.05, 0.15}ns temp {-40, 25, 80, 125}°C 54


Download ppt "Amirali Shayan Advisor: Chung-Kuan Cheng"

Similar presentations


Ads by Google