Fast Buffer Insertion Considering Process Variation Jinjun Xiong, Lei He EE Department University of California, Los Angeles Sponsors: NSF, UC MICRO, Actel,

Slides:



Advertisements
Similar presentations
Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.
Advertisements

Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
Non-Gaussian Statistical Timing Analysis Using Second Order Polynomial Fitting Lerong Cheng 1, Jinjun Xiong 2, and Lei He 1 1 EE Department, UCLA *2 IBM.
Design Rule Generation for Interconnect Matching Andrew B. Kahng and Rasit Onur Topaloglu {abk | rtopalog University of California, San Diego.
Advanced Interconnect Optimizations. Buffers Improve Slack RAT = 300 Delay = 350 Slack = -50 RAT = 700 Delay = 600 Slack = 100 RAT = 300 Delay = 250 Slack.
4/22/ Clock Network Synthesis Prof. Shiyan Hu Office: EREC 731.
On the Need for Statistical Timing Analysis Farid N. Najm University of Toronto
Buffer and FF Insertion Slides from Charles J. Alpert IBM Corp.
ELEN 468 Lecture 261 ELEN 468 Advanced Logic Design Lecture 26 Interconnect Timing Optimization.
Net-Ordering for Optimal Circuit Timing in Nanometer Interconnect Design M. Sc. work by Moiseev Konstantin Supervisors: Dr. Shmuel Wimer, Dr. Avinoam Kolodny.
1 Interconnect Layout Optimization by Simultaneous Steiner Tree Construction and Buffer Insertion Presented By Cesare Ferri Takumi Okamoto, Jason Kong.
Stochastic Analog Circuit Behavior Modeling by Point Estimation Method
Coupling-Aware Length-Ratio- Matching Routing for Capacitor Arrays in Analog Integrated Circuits Kuan-Hsien Ho, Hung-Chih Ou, Yao-Wen Chang and Hui-Fang.
An Efficient Method for Chip-Level Statistical Capacitance Extraction Considering Process Variations with Spatial Correlation W. Zhang, W. Yu, Z. Wang,
FastPlace: Efficient Analytical Placement using Cell Shifting, Iterative Local Refinement and a Hybrid Net Model FastPlace: Efficient Analytical Placement.
Variability-Driven Formulation for Simultaneous Gate Sizing and Post-Silicon Tunability Allocation Vishal Khandelwal and Ankur Srivastava Department of.
Parameterized Timing Analysis with General Delay Models and Arbitrary Variation Sources Khaled R. Heloue and Farid N. Najm University of Toronto {khaled,
Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
Minimal Skew Clock Embedding Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
Statistical Full-Chip Leakage Analysis Considering Junction Tunneling Leakage Tao Li Zhiping Yu Institute of Microelectronics Tsinghua University.
 Device and architecture co-optimization – Large search space – Need fast yet accurate power and delay estimator for FPGAs  Trace-based power and delay.
Non-Linear Statistical Static Timing Analysis for Non-Gaussian Variation Sources Lerong Cheng 1, Jinjun Xiong 2, and Prof. Lei He 1 1 EE Department, UCLA.
Robust Extraction of Spatial Correlation Jinjun Xiong, Vladimir Zolotov*, Lei He EE, University of California, Los Angeles IBM T.J. Watson Research Center,
38 th Design Automation Conference, Las Vegas, June 19, 2001 Creating and Exploiting Flexibility in Steiner Trees Elaheh Bozorgzadeh, Ryan Kastner, Majid.
Interconnect Optimizations. A scaling primer Ideal process scaling: –Device geometries shrink by  = 0.7x) Device delay shrinks by  –Wire geometries.
EE4271 VLSI Design Interconnect Optimizations Buffer Insertion.
Statistical Crosstalk Aggressor Alignment Aware Interconnect Delay Calculation Supported by NSF & MARCO GSRC Andrew B. Kahng, Bao Liu, Xu Xu UC San Diego.
Power Optimal Dual-V dd Buffered Tree Considering Buffer Stations and Blockages King Ho Tam and Lei He Electrical Engineering Department University of.
UCLA TRIO Package Jason Cong, Lei He Cheng-Kok Koh, and David Z. Pan Cheng-Kok Koh, and David Z. Pan UCLA Computer Science Dept Los Angeles, CA
Interconnect Optimizations
Stochastic Physical Synthesis for FPGAs with Pre-routing Interconnect Uncertainty and Process Variation Yan Lin and Lei He EE Department, UCLA
© 2005 Altera Corporation © 2006 Altera Corporation Placement and Timing for FPGAs Considering Variations Yan Lin 1, Mike Hutton 2 and Lei He 1 1 EE Department,
Jan. 2007VLSI Design '071 Statistical Leakage and Timing Optimization for Submicron Process Variation Yuanlin Lu and Vishwani D. Agrawal ECE Dept. Auburn.
SAMSON: A Generalized Second-order Arnoldi Method for Reducing Multiple Source Linear Network with Susceptance Yiyu Shi, Hao Yu and Lei He EE Department,
A Global Minimum Clock Distribution Network Augmentation Algorithm for Guaranteed Clock Skew Yield A. B. Kahng, B. Liu, X. Xu, J. Hu* and G. Venkataraman*
EE4271 VLSI Design Advanced Interconnect Optimizations Buffer Insertion.
Circuit Performance Variability Decomposition Michael Orshansky, Costas Spanos, and Chenming Hu Department of Electrical Engineering and Computer Sciences,
ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Interconnect Timing Optimization II.
Statistical Gate Delay Calculation with Crosstalk Alignment Consideration Andrew B. Kahng, Bao Liu, Xu Xu UC San Diego
CDCTree: Novel Obstacle-Avoiding Routing Tree Construction based on Current Driven Circuit Model Speaker: Lei He.
Interconnect Synthesis. Buffering Related Interconnect Synthesis Consider –Layer assignment –Wire sizing –Buffer polarity –Driver sizing –Generalized.
Advanced Interconnect Optimizations. Timing Driven Buffering Problem Formulation Given –A Steiner tree –RAT at each sink –A buffer type –RC parameters.
1 Coupling Aware Timing Optimization and Antenna Avoidance in Layer Assignment Di Wu, Jiang Hu and Rabi Mahapatra Texas A&M University.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.
Tarek A. El-Moselhy and Luca Daniel
A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.
Fast Algorithms for Slew Constrained Minimum Cost Buffering S. Hu*, C. Alpert**, J. Hu*, S. Karandikar**, Z. Li*, W. Shi* and C. Sze** *Dept of ECE, Texas.
Radhamanjari Samanta *, Soumyendu Raha * and Adil I. Erzin # * Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore, India.
On the Assumption of Normality in Statistical Static Timing Analysis Halim Damerdji, Ali Dasdan, Santanu Kolay February 28, 2005 PrimeTime Synopsys, Inc.
QuickYield: An Efficient Global-Search Based Parametric Yield Estimation with Performance Constraints Fang Gong 1, Hao Yu 2, Yiyu Shi 1, Daesoo Kim 1,
Routing Tree Construction with Buffer Insertion under Obstacle Constraints Ying Rao, Tianxiang Yang Fall 2002.
Maze Routing Algorithms with Exact Matching Constraints for Analog and Mixed Signal Designs M. M. Ozdal and R. F. Hentschke Intel Corporation ICCAD 2012.
1ISPD'03 Process Variation Aware Clock Tree Routing Bing Lu Cadence Jiang Hu Texas A&M Univ Gary Ellis IBM Corp Haihua Su IBM Corp.
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
An O(bn 2 ) Time Algorithm for Optimal Buffer Insertion with b Buffer Types Authors: Zhuo Li and Weiping Shi Presenter: Sunil Khatri Department of Electrical.
An O(nm) Time Algorithm for Optimal Buffer Insertion of m Sink Nets Zhuo Li and Weiping Shi {zhuoli, Texas A&M University College Station,
Proximity Optimization for Adaptive Circuit Design Ang Lu, Hao He, and Jiang Hu.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Fault-Tolerant Resynthesis for Dual-Output LUTs Roy Lee 1, Yu Hu 1, Rupak Majumdar 2, Lei He 1 and Minming Li 3 1 Electrical Engineering Dept., UCLA 2.
Buffer Insertion with Adaptive Blockage Avoidance
2 University of California, Los Angeles
Performance Optimization Global Routing with RLC Crosstalk Constraints
Yiyu Shi*, Wei Yao*, Jinjun Xiong+ and Lei He*
Chapter 4b Statistical Static Timing Analysis: SSTA
On the Improvement of Statistical Timing Analysis
Simultaneous Buffer Insertion and Wire Sizing Considering Systematic CMP Variation and Random Leff Variation Lei He1, Andrew Kahng2, King Ho Tam1, Jinjun.
Objectives What have we learned? What are we going to learn?
Chapter 4C Statistical Static Timing Analysis: SSTA
Presentation transcript:

Fast Buffer Insertion Considering Process Variation Jinjun Xiong, Lei He EE Department University of California, Los Angeles Sponsors: NSF, UC MICRO, Actel, Mindspeed

2 Agenda  Introduction and motivation  Modeling  Problem formulation  Detailed algorithms with complexity analysis  Experimental results  Conclusion

3 Buffer Insertion Flashback  Buffer insertion and sizing is a commonly used technique for high- performance chip designs to minimize delay  Classic results on buffer insertion –Two-pin nets: closed form for optimal solution [Bakoglu 90] –Multi-pin nets: dynamic-programming based algorithm to find the optimal solution [Van Ginneken 90]  Extensions –Multiple buffer libraries considering power minimization [Lillis 96] –Wire segmentation [Alpert DAC97] –Simultaneous buffer insertion and wire sizing [Chu, ISPD97] –Simultaneous tree construction and buffer insertion [Okamoto DAC96] –Simultaneous dual Vdd assignment and buffered tree construction [Tam DAC05] –…..

4 Design Optimization in Nanometer Manufacturing  Probabilistic design approaches showed great promise to achieve better design quality –Compared to deterministic approaches, statistical circuit tuning achieved 20% area reduction [Choi DAC04] 17% power reduction [Mani DAC05]  Buffer insertion considering process variation is also gaining attention recently –Limited consideration of process variation Wire-length variation [Khandelwal ICCAD03] –Independency assumption on process variation Ignores global and spatial correlation [Xiong DATE05], [He ISPD05] –High complexity Numerical integration to obtain JPDF [Xiong DATE05] –Applicable to only special routing structures Two-pin nets only [Deng ICCAD05] Our major contributions: theoretical foundations that lifts these limitations

5 Agenda  Introduction and motivation  Modeling  Problem formulation  Detailed algorithms with complexity analysis  Experimental results  Conclusion

6 Modeling  Linear delay model for buffer –Input capacitance (Cb), output resistance (Rb), and intrinsic delay (Tb)   -model for interconnect –Wire capacitance (Cw) and wire resistance (Rw)  How to model these quantities with correlated process variation?

7 First-order Canonical Form for Variation Modeling  Mean value E(A) = a 0  Random variables X 1, X 2, …, X n model –Die-to-die global variation: instances are affected in the same way –Within-die spatial correlation: instances physically nearby are more likely to be similar [Agarwal ASPDAC03, Chang ICCAD03, Khandelwal DAC05]  Random variable X Ra model –Independent variation: instances next to each other are different  All X i follow independent normal distributions –Well accepted practice in SSTA [Chang ICCAD03, Visweswariah DAC04]  In vector form, write device and interconnect with process variation –Device: T b = T b0 +  b T X, C b = C b0 +  b T X, R b = R b0 +  b T X –Interconnect: C w = C w0 +  w T X, R w = R w0 +  w T X

8 Buffer Insertion Considering Process Variation  Given: a routing tree with required arrival time (RAT) and loading capacitance specified at sinks, and N possible buffer locations  Considering: both FEOL device and BEOL interconnect process variations  Find: locations to insert buffers  So that: the timing slack at the root is maximized –Timing slack: min i (RAT i – delay i ) s2s2 s0s0 s1s1 s3s3 s4s4 root possible buffer locations sinks

9 Agenda  Introduction and motivation  Modeling  Problem formulation  Detailed algorithms –Key operations for buffering solutions –Transitive-closure pruning rule –Complexity analysis  Experimental results  Conclusion

10 Key Operations in Van Ginneken Algorithm  Associate each node with two metrics (C t, T t ) –Downstream loading capacitance (C t ) and RAT (T t ) –DP-based alg propagates potential solutions bottom-up [Van Ginneken, 90]  Add a wire  Add a buffer  Merge two solutions  How to define these operations in statistical sense? C n, T n C t, T t C n, T n C t, T t C n, T n C m, T m C t, T t C w, R w

11  Keep all quantities in canonical form after operations –Maintain correlation w.r.t. sources of variation –Updated solutions can still be handled by the same set of operations  Add a wire  Add a buffer  Merge two solutions Atomic Operations Addition/subtraction Multiplications Minimum of two canonical forms is another canonical form No longer a canonical form

12 Approximate Multiplication as Canonical Form  Multiplication of two canonical forms results in a quadratic term –Matrix  =  T  Approximate it as a canonical form by matching the mean and variance with that of the exact solution –E(C) is the mean value (first moment) of C –E(C 2 ) is the second moment of C  E(C 2 )-E(C) 2 is the variance –C ’ is a new canonical form with the same mean and variance as C

13  Theorem: If X is an independent multivariate normal distribution ~N(0,I), then for any vector  and matrix  –Trace of a matrix (tr) equals to the sum of all diagonal elements  In general, tr(  ) and tr(  2 ) are expensive, but if  =  T +  T (a row rank matrix), we can show Closed Form for Moment Computation 1 st Moment 2 nd Moment

14 Approximate Minimum as Canonical Form  Minimum of two canonical forms is also not a canonical form  Approximate it as a canonical from by matching the exact mean and variance –Tightness probability of A:  is the CDF of a standard normal distribution  is given by –Exact mean and variance can be computed in closed form [Clack 65] Well known for statistical timing analysis  Design for mean value ≠ design for nominal value because of mean shift Design for mean value Design for nominal value

15 Agenda  Introduction and motivation  Modeling  Problem formulation  Detailed algorithms –Key operations for buffering solutions –Transitive-closure pruning rule –Complexity analysis  Experimental results  Conclusion

16  If T 1 >T 2 and C 1 < C 2  (C 1, T 1 ) dominates (C 2, T 2 ) –Dominated solution (C 2, T 2 ) is redundant  Deterministic pruning has linear time complexity because of the following two desired properties –Ordering property Either A>B or A<B holds –Transitive ordering (transitive-closure) property A>B, B>C  A>C –Make it possible to sort solutions in order Assume sorted by load  linear time to prune redundant solutions Deterministic Pruning Rule Load RAT Redundant solutions Can we achieve the same time complexity for statistical pruning?

17 Statistical Pruning Rule  (C 1, T 1 ) dominates (C 2, T 2 ) P(C 1 T 2 ) ≥ 0.5  Properties of this statistical pruning rule –Ordering property Given: T 1 and T 2 as two dependent random variables Then: either P(T 1 >T 2 ) ≥ 0.5 or P(T 1 <T 2 ) ≥ 0.5 holds –Transitive-closure ordering property Given T 1, T 2, and T 3 as three dependent random variables with a joint normal distribution, If: P(T 1 >T 2 ) ≥0.5, P(T 2 >T 3 ) ≥0.5 Then: P(T 1 >T 3 ) ≥0.5 –Transitive-closure property can be extended to the more general case >P(T 1 >T 2 ) ≥p, P(T 2 >T 3 ) ≥p  P(T 1 >T 3 ) ≥p for any p 2 [0.5, 1]  Statistical pruning has the same linear time complexity as deterministic pruning

18 Deterministic vs Statistical Buffering  Same O(N 2 ) complexity as the classic deterministic buffering algorithm  Deterministic merge and pruning operations can be combined into one linear time operation –New complexity result: O(N*log 2 (N)) [Wei, DAC 03]  Statistic merge and pruning can not be combined –Statistic version’s complexity is higher For solution (C n, T n ) in node t Z 1 = ADD-WIRE(C n, T n ); Z 2 = ADD-BUFFER(Z 1 ); …… For solution (C m, T m ) from subtree m For solution (C n, T n ) from subtree n (C t, T t )=MERGE((C m, T m ),(C n, T n )); …… Z = PRUNE(Z); For solution (C n, T n ) in node t Z 1 = STAT-ADD-WIRE(C n, T n ); Z 2 = STAT-ADD-BUFFER(Z 1 ); …… For solution (C m, T m ) from subtree m For solution (C n, T n ) from subtree n (C t, T t )=STAT-MERGE((C m, T m ),(C n, T n )); …… Z = STAT-PRUNE(Z); All quantities are deterministic values All quantities are canonical forms

19 When Merge and Prune can be Combined?  Made possible via merge-sort like operation in deterministic case –Because of the following property: Min(A 1,B 1 ) ≤ Min(A 2,B 1 ) if A 1 ≤ A 2 Min(A,B) ≤ Min(A+  A,B)  Min(A,B) is a nondecreasing function of inputs  In statistic case, such a property does not hold (even for mean) Load RAT Load RAT RAT Load

20 Agenda  Introduction and motivation  Modeling  Problem formulation  Detailed algorithms –Key operations for buffering solutions –Transitive-closure pruning rule –Complexity analysis  Experimental results  Conclusion

21 Experimental Setting  Variation setting –Global, spatial, and independent variations all to be 5% of the nominal value –Spatial variation used a grid model similar to [Chang, ICCAD03] Grid size 500um Correlation distance about 2mm (beyond that, no spatial correlation)  Benchmarks –Two sets of benchmarks from public domain [Shi, DAC03]  Deterministic design for worst case (WORST) –All parameters projected to its respective 3-sigma values

22 Runtime Comparison  Compared with T2P proposed in [Xiong DATE05] –Only known work that considered both device and interconnect variations –JPDF computed via expensive numerical integration –No global and spatial correlation considered –Heuristic pruning rules (T2P)  Re-implement T2P under the same first-order variation model, but still use its heuristic pruning rule BenchSinkBuf LocWORSTT2PThis work p (25.4x) p r r r r r

23  For a given a buffered routing tree with 10K MC runs, delay PDF at the root –PDF from Monte Carlo roughly follows a normal distribution –Our approximation technique captures the PDF well  Figure-of-merits: 3-sigma delay vs yield loss Monte Carlo Simulation Results 3-sigma delay for red PDF

24 Timing Optimization Comparison based on MC  Buffer insertion considering process variation improves timing yield by 15% on average –More effective for large benchmarks –Relative mean (or 3-sigma) delay improvement is small  large mean values –More buffers are inserted in order to achieve this gain WORSTThis Work BufferMean3-sigma Delay Yield Loss Buffermean3-sigma Delay Yield p %60 (3.3%) % p %156 (4.5%) % r %65 (9.2%) % r (1.7%)1128 (1.5%)35.3%135 (17%) % r (0.7%)1147 (0.5%)1.6%188 (8%) % r (1.5%)1723 (1.4%)54.9%374 (14.4%) % r (1%)1986 (1.0%)17.9%608 (10.5%) % Avg0.7%0.6%15.7%9.6%

25 Conclusion and Future Work  Developed a novel algorithm for buffer insertion considering process variation  Two major theoretical contributions –An effective approximation technique to handle nonlinear multiplication operation, all through closed form computation –A provably transitive-closure pruning rule –Maybe useful for other applications  Timing optimization shown that considering process variation can improve timing yield by more than 15%  Future work –Theoretically examine the impact of process variation on buffering –Apply the theories in this work to other design applications

26 Questions?