Yiyu Shi, Jinjun Xiong+, Chunchen Liu and Lei He*

Slides:

Advertisements

Similar presentations

Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.

Advertisements

Non-Gaussian Statistical Timing Analysis Using Second Order Polynomial Fitting Lerong Cheng 1, Jinjun Xiong 2, and Lei He 1 1 EE Department, UCLA *2 IBM.

Design Rule Generation for Interconnect Matching Andrew B. Kahng and Rasit Onur Topaloglu {abk | rtopalog University of California, San Diego.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Engineering Optimization

1 Modeling and Optimization of VLSI Interconnect Lecture 9: Multi-net optimization Avinoam Kolodny Konstantin Moiseev.

Point-wise Discretization Errors in Boundary Element Method for Elasticity Problem Bart F. Zalewski Case Western Reserve University Robert L. Mullen Case.

3D-STAF: Scalable Temperature and Leakage Aware Floorplanning for Three-Dimensional Integrated Circuits Pingqiang Zhou, Yuchun Ma, Zhouyuan Li, Robert.

Variability-Driven Formulation for Simultaneous Gate Sizing and Post-Silicon Tunability Allocation Vishal Khandelwal and Ankur Srivastava Department of.

Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.

Minimal Skew Clock Embedding Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.

The continuous scaling trends of smaller devices, higher operating frequencies, lower power supply voltages, and more functionalities for integrated circuits.

Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm Sadiq M. Sait, Mustafa I. Ali, Ali Zaidi.

Non-Linear Statistical Static Timing Analysis for Non-Gaussian Variation Sources Lerong Cheng 1, Jinjun Xiong 2, and Prof. Lei He 1 1 EE Department, UCLA.

Lecture 8: Clock Distribution, PLL & DLL

Off-chip Decoupling Capacitor Allocation for Chip Package Co-Design Hao Yu Berkeley Design Chunta Chu and Lei He EE Department.

SAMSON: A Generalized Second-order Arnoldi Method for Reducing Multiple Source Linear Network with Susceptance Yiyu Shi, Hao Yu and Lei He EE Department,

UCSD CSE245 Notes -- Spring 2006 CSE245: Computer-Aided Circuit Simulation and Verification Lecture Notes Spring 2006 Prof. Chung-Kuan Cheng.

Circuit Simulation Based Obstacle-Aware Steiner Routing Yiyu Shi, Paul Mesa, Hao Yu and Lei He EE Department, UCLA Partially supported by NSF Career Award.

Efficient Decoupling Capacitance Budgeting Considering Operation and Process Variations Yiyu Shi*, Jinjun Xiong +, Chunchen Liu* and Lei He* *Electrical.

1 Adjoint Method in Network Analysis Dr. Janusz A. Starzyk.

RLC Interconnect Modeling and Design Students: Jinjun Xiong, Jun Chen Advisor: Lei He Electrical Engineering Department Design Automation Group (

Temperature Aware Microprocessor Floorplanning Considering Application Dependent Power Load *Chunta Chu, Xinyi Zhang, Lei He, and Tom Tong Jing Electrical.

Noise and Delay Uncertainty Studies for Coupled RC Interconnects Andrew B. Kahng, Sudhakar Muddu † and Devendra Vidhani ‡ UCLA Computer Science Department,

Decoupling Capacitance Allocation for Power Supply Noise Suppression Shiyou Zhao, Kaushik Roy, Cheng-Kok Koh School of Electrical & Computer Engineering.

Chapter 3 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

More Realistic Power Grid Verification Based on Hierarchical Current and Power constraints 2 Chung-Kuan Cheng, 2 Peng Du, 2 Andrew B. Kahng, 1 Grantham.

Worst-Case Timing Jitter and Amplitude Noise in Differential Signaling Wei Yao, Yiyu Shi, Lei He, Sudhakar Pamarti, and Yu Hu Electrical Engineering Dept.,

Normalised Least Mean-Square Adaptive Filtering

CSE245: Computer-Aided Circuit Simulation and Verification Lecture Note 2: State Equations Prof. Chung-Kuan Cheng.

Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego

PiCAP: A Parallel and Incremental Capacitance Extraction Considering Stochastic Process Variation Fang Gong 1, Hao Yu 2, and Lei He 1 1 Electrical Engineering.

An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.

New Modeling Techniques for the Global Routing Problem Anthony Vannelli Department of Electrical and Computer Engineering University of Waterloo Waterloo,

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.

A Power Grid Analysis and Verification Tool Based on a Statistical Prediction Engine M.K. Tsiampas, D. Bountas, P. Merakos, N.E. Evmorfopoulos, S. Bantas.

Scalable Symbolic Model Order Reduction Yiyu Shi*, Lei He* and C. J. Richard Shi + *Electrical Engineering Department, UCLA + Electrical Engineering Department,

Partition-Driven Standard Cell Thermal Placement Guoqiang Chen Synopsys Inc. Sachin Sapatnekar Univ of Minnesota For ISPD 2003.

PAPER PRESENTATION Real-Time Coordination of Plug-In Electric Vehicle Charging in Smart Grids to Minimize Power Losses and Improve Voltage Profile IEEE.

Xianwu Ling Russell Keanini Harish Cherukuri Department of Mechanical Engineering University of North Carolina at Charlotte Presented at the 2003 IPES.

Stochastic Current Prediction Enabled Frequency Actuator for Runtime Resonance Noise Reduction Yiyu Shi*, Jinjun Xiong +, Howard Chen + and Lei He* *Electrical.

1 Chapter 5: Harmonic Analysis in Frequency and Time Domains Contributors: A. Medina, N. R. Watson, P. Ribeiro, and C. Hatziadoniu Organized by Task Force.

EE 201C Modeling of VLSI Circuits and Systems

CSE245: Computer-Aided Circuit Simulation and Verification Lecture Note 2: State Equations Spring 2010 Prof. Chung-Kuan Cheng.

EE201C : Stochastic Modeling of FinFET LER and Circuits Optimization based on Stochastic Modeling Shaodi Wang

Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.

1 Hardware Reliability Margining for the Dark Silicon Era Liangzhen Lai and Puneet Gupta Department of Electrical Engineering University of California,

Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.

Worst Case Crosstalk Noise for Nonswitching Victims in High-Speed Buses Jun Chen and Lei He.

Data Transformation: Normalization

Chapter 4b Process Variation Modeling

On-Chip Power Network Optimization with Decoupling Capacitors and Controlled-ESRs Wanping Zhang1,2, Ling Zhang2, Amirali Shayan2, Wenjian Yu3, Xiang Hu2,

Boundary Element Analysis of Systems Using Interval Methods

PSG College of Technology

CSE 245: Computer Aided Circuit Simulation and Verification

Haihua Su, Sani R. Nassif IBM ARL

Chapter 2 Interconnect Analysis

Standard-Cell Mapping Revisited

Chapter 5a On-Chip Power Integrity

Performance Optimization Global Routing with RLC Crosstalk Constraints

CSE245: Computer-Aided Circuit Simulation and Verification

Chapter 5b Stochastic Circuit Optimization

Yiyu Shi*, Jinjun Xiong+, Howard Chen+ and Lei He*

Yiyu Shi*, Wei Yao*, Jinjun Xiong+ and Lei He*

Simultaneous Power and Thermal Integrity Driven Via Stapling in 3D ICs

EE 201C Modeling of VLSI Circuits and Systems

Post-Silicon Calibration for Large-Volume Products

Yiyu Shi*, Jinjun Xiong+, Chunchen Liu* and Lei He*

Simultaneous Power and Thermal Integrity Driven Via Stapling in 3D ICs

Presentation transcript:

Efficient Decoupling Capacitance Budgeting Considering Operation and Process Variations Yiyu Shi*, Jinjun Xiong+, Chunchen Liu* and Lei He* *Electrical Engineering Department, UCLA +IBM T. J. Watson Research Center, Yorktown Heights, NY This work is partially supported by NSF CAREER award and a UC MICRO grant sponsored by Altera, RIO and Intel.

Motivation The continuous semiconductor technology scaling leads to growing process variations, and statistical optimization has been actively researched to cope with process variations. Stochastic gate sizing for power reduction [Bhardwaj:DAC’05, Mani:DAC’05] Stochastic gate sizing for yield optimization [Davoodi:DAC’06, Sinha:ICCAD’05] Stochastic buffer insertion to minimize delay [He:TCAD’07] Adaptive body biasing with post-silicon tuning [Main:ICCAD’06] However, all these work ignore operation variation such as crosstalk difference over input vectors power supply noise fluctuation over time processor temperature variation over workload A better design could be achieved by considering both operation and process variations As a vehicle to demonstrate this point, we study the on-chip decoupling capacitance insertion and sizing (or decap budgeting) problem taking into account operation and process variations

Decap Budgeting Overview Function Load current causes the voltage droop/bounce Suppress dynamic noise by supplying sudden current demands from local charge storage Side effect of adding too much decap Increased leakage Increased die area Risk of yield loss Location matters The closer to the turbulent point, the more noise reduction can be achieved Need to add minimum amount of decap at proper location, yet sufficient for reducing noise Load current power supply intrinsic cap decap We define the noise as the integral over time of the area below Vn t0 t1

Decap Budgeting Problem Formulation Objective Find the distribution and location of the white space so the noise on power network is minimized Constraints: Circuit system constraints: KCL, KVL and circuit element equations Decap constraints: amount of decap allowed at a location is limited Limitation of existing work: Most existing work in essence uses worst case load current in order to guarantee there is no noise violation, which is too pessimistic It is not clear how to provide decap budgeting solution that is robust to current load under all kinds of operations for a circuit

Major Contribution of our work In this paper, we develop a novel stochastic model for current loads, taking into account operation variation such as temporal and logic-induced correlations and process variations such as systematic and random Leff variation. We propose a formal method to extract operation variation and formulate a new decap budgeting problem using the stochastic current model. We develop an effective yet efficient iterative alternative programming algorithm and conduct experiments using industrial designs. Experiments show that considering both operation and process variations can reduce over-design significantly. This demonstrates the importance of considering operation variation. We convincingly demonstrate the significance of considering both operation and process variations and open a new research direction for optimizing signal, power and thermal integrity with consideration of operation variation

Outline Stochastic Modeling and Problem Formulation Algorithm Experimental Results Conclusions

Correlated Load Currents Strong correlation between load currents due to Operation variation Currents at different ports have logic-induced correlation Large number of ports with limited control bits Currents at certain ports cannot reach maximum at the same time due to the inherent logic dependency for a given design Currents at the same port have temporal correlation System takes several clock cycles to execute one instruction The currents cannot reach maximum at all the clock cycles Process variation Currents have intra-die variation due to process variation The P/G network is robust to process variation, but the load currents have intra-die variation because the circuit suffers from process variation. Leff variation is one of the primary variation sources and the variation is spatially correlated [Cao:DAC’05]

Current Sampling Model the current in each clock cycle as a triangular waveform and assume constant rising\falling time Other current waveforms can be used. It will not affect the algorithm In our verification, we use the detailed non-simplified current waveform Partition a circuit into blocks and assume no correlation between different blocks [Najm:ICCAD’05] Extensive simulation for each block to get the peak current value in each clock cycle and at each port. Assume there is only temporal correlation within certain number of clock cycles L L can be the number of clock cycles to execute certain function

Stochastic Current Modeling Divide peak current values into different sets according to the clock cycle and port number The set contains peak current values at port k and in clock cycle j, j+L, j+2L,… Example: Take L=2, and consider two ports in 8 consecutive clock cycles Define to be the stochastic variable with the sample set For example, has the samples 0.1, 0.3, 0.5, 0.7, and therefore has mean value 0,4 The correlation between and reflects the temporal correlation between clock cycle j1 and j2 The correlation between and reflects the logic induced correlation between port k1 and k2. clock cycles j, temporal correlation port k, logic-induced correlation

Extraction of Correlations The logic-induced correlation coefficient between port k1 and k2 at clock cycle j can be computed as Temporal correlation coefficient between clock cycle j1 and j2 at port k can be computed as To take process variation into consideration, sample each multiple times over different region, and the above two formulas can still be applied We use the general definition of correlation coefficient. In our case, we have two.

Extraction of Correlations As is not Gaussian, apply Independent Component Analysis [Hyvarinen’01] to remove the correlation between and get a new set of independent variables r1, r2, … Each can be represented by the linear combination of r1, r2,… Accordingly the waveform at each clock cycle can be reconstructed from those r1,r2,…, i.e., The new variables ri catch both the operation and process variations. We use the general definition of correlation coefficient. In our case, we have two.

Example of Extracted Temporal Correlation The correlation map for peak currents between different clock cycles of one port from an industry application. The P/G network is modeled as RC mesh The load currents are obtained by detailed simulation of the circuit It can be seen that the correlation matrix can be clearly divided into four trunks, and L can be set as 10

Parameterized MNA Formulation Original MNA formulation With the design variables - decap area wi, the G, C matrices can be expressed as Together with the stochastic current model, the MNA formulation becomes: With parameters wi and ri The objective now is to find the optimal solution for those parameters More specifically, find the wi values that minimize the noise with the ri corresponding to the load currents which introduce the maximum noise

Stochastic Decap Formulation Minimize the maximum noise sum over all ports Subject to the stochastic current variable upper/lower bound Subject to Individual decap area constraint due to placement constraints Total decap area constraint Non-convex min/max optimization problem Difficult to find exact optimal solution

Outline Stochastic Modeling and Problem Formulation Algorithm Experimental Results Conclusions

Iterative Programming Algorithm Each iteration we increase the white space allowed until all the white space has been used up or it converges Find the optimal decap budgeting for the giving max droop/bounce update the max droop/bounce update the decap budgeting Find the input corresponding to the max. droop/bounce for the given decap budgeting Cannot guarantee optimality, but can guarantee convergence and efficiency Experimental results show our algorithm can achieve good optimization results

Algorithm Outline

Illustration of Iterative Programming A3: (P3) A1: (P3) A0: Initial A2: (P2) A0: Initial noise curve at one randomly selected port A1: The noise curve under the optimal decap budgeting for a giving droop/bounce A2: The noise curve with the input corresponding to the max. droop/bounce for the decap budgeting in A1 A3: The noise curve under the optimal decap budgeting for the giving max droop/bounce in A2

Impact of Incremental Step Size of White Space

Sequential Linear Programming We apply sequential linear programming (sLP) to solve each of the two sub-problems. For each sub-problem, we iteratively do the following two steps until the solution converges: Compute the sensitivities of all the variables to the first order by moment matching. Linearize the objective function with the sensitivities and the optimization problem becomes an LP first order sensitivities

Sequential Quadratic Programming To improve accuracy, sequential quadratic programming (sQP) can be applied instead of sLP. For each sub-problem, we iteratively do the following two steps until the solution converges: Compute the sensitivities of all the variables to the first order by moment matching. Linearize the objective function with the sensitivities and the optimization problem becomes an LP second order sensitivities

Forward Euler Integration The computation of all the sensitivities require to solve the equations of the type in time domain, i.e., The method used to solve those differential equations are called integration method. We first discretize time t as t1, t2, …, with step size h, then by Taylor expansion of x(t), we have

Forward Euler Integration (cont’d) By using the first order approximation, we have Together with the original discretized differential equation Then from we can easily get However, those kind of explicit integration method have the common problem of numerical instability.

Backward Euler Integration Instead of the explicit method, we use one more Taylor expansion Insert into the expansion of x0(t), we have Together with the original discretized equation,

Outline Stochastic Modeling and Problem Formulation Algorithm Experimental Results Conclusions

Impact of Current Correlations Model 1 Maximum current at all ports Model 2 Stochastic model with logic-induced correlation Model 3 Model 2 + temporal correlation Node # Noise (V*s) Runtime (s) Model 1 Model 2 Model 3 1284 6.33e-7 1.28e-7 4.10e-8 104.2 161.2 282.3 10490 5.21e-5 1.09e-5 4.80e-6 973.2 1430 2199 42280 7.92e-4 5.38e-4 9.13e-5 2732 3823 5238 166380 1.34e-2 5.37e-3 2.28e-3 3625 5798 7821 avg 1 1/2.68X 1/9.10X 1.50X 2.26X Compared with the model assuming maximum currents at all ports, under the same decap area, Stochastic model with spatial correlation only reduce the noise by up to 3X Stochastic model with both spatial and temporal correlation reduce the noise by up to 9X

Impact of Leff Variation Node #3429 3.06X V.R. sLP sLP + Leff mean (V*s) std (V*s) runtime (s) 1284 10% 9.28e-7 3.97e-7 184.2 6.14e-7 1.38e-7 332.8 1.81X 20% 9.43e-7 4.55e-7 6.38e-7 1.86e-7 10490 1.03e-4 4.79e-5 1121 7.22e-5 1.23e-5 3429 3.06X 1.22e-4 4.38e-5 7.94e-5 2.06e-5 42280 2.29e-3 9.72e-4 2236 8.23e-4 1.01e-4 6924 3.10X 4.43e-3 1.01e-3 8.28e-4 1.92e-4 166380 2.06e-2 9.91e-3 3824 5.31e-3 8.92e-4 11224 2.93X 2.31e-2 1.03e-2 5.92e-3 9.33e-4 avg 1 1/2.02X 1/5.05X 2.73X 1/1.95X 1/4.05X Compared with the stochastic model without considering Leff variation, the stochastic model with it reduce the average noise by up to 4X and the 3-sigma noise by up to 13X

Comparison between sLP and sQP In terms of noise, sQP is much better than sLP for large test cases and slightly worse for the small test case. In terms of runtime, sLP is on average 3.25X faster than sQP

Conclusions In this paper, we develop a novel stochastic model for current loads, taking into account operation variation such as temporal and logic-induced correlations and process variations such as systematic and random Leff variation. We propose a formal method to extract operation variation and formulate a new decap budgeting problem using the stochastic current model. We develop an effective yet efficient iterative alternative programming algorithm and conduct experiments using industrial designs. Experimental results show that the noise can be reduced by up to 9X. We also apply similar idea to temperature-aware clock routing [Hao:ispd’07] and microprocessor floorplanning (Section 8C.2).

Thank you!