An O(nm) Time Algorithm for Optimal Buffer Insertion of m Sink Nets Zhuo Li and Weiping Shi {zhuoli, Texas A&M University College Station,

Slides:



Advertisements
Similar presentations
Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.
Advertisements

Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
OCV-Aware Top-Level Clock Tree Optimization
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advanced Interconnect Optimizations. Buffers Improve Slack RAT = 300 Delay = 350 Slack = -50 RAT = 700 Delay = 600 Slack = 100 RAT = 300 Delay = 250 Slack.
Ispd-2007 Repeater Insertion for Concurrent Setup and Hold Time Violations with Power-Delay Trade-Off Salim Chowdhury John Lillis Sun Microsystems University.
4/22/ Clock Network Synthesis Prof. Shiyan Hu Office: EREC 731.
Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao.
© KLMH Lienig Paper: A Unified Theory of Timing Budget Management Presented by: Hangcheng Lou Original Authors: Soheil Ghiasi, Elaheh Bozorgzadeh, Siddharth.
Buffer and FF Insertion Slides from Charles J. Alpert IBM Corp.
ELEN 468 Lecture 261 ELEN 468 Advanced Logic Design Lecture 26 Interconnect Timing Optimization.
Confidentiality/date line: 13pt Arial Regular, white Maximum length: 1 line Information separated by vertical strokes, with two spaces on either side Disclaimer.
1 Interconnect Layout Optimization by Simultaneous Steiner Tree Construction and Buffer Insertion Presented By Cesare Ferri Takumi Okamoto, Jason Kong.
CS16: Introduction to Data Structures & Algorithms
An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem Juyeon Kim, Deokjin Joo, Taehan Kim DAC’13.
Rajat K. Pal. Chapter 3 Emran Chowdhury # P Presented by.
Minimum-Buffered Routing of Non- Critical Nets for Slew Rate and Reliability Control Supported by Cadence Design Systems, Inc. and the MARCO Gigascale.
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
Weiping Shi Department of Computer Science University of North Texas HiCap: A Fast Hierarchical Algorithm for 3D Capacitance Extraction.
Interconnect Optimizations. A scaling primer Ideal process scaling: –Device geometries shrink by  = 0.7x) Device delay shrinks by  –Wire geometries.
EE4271 VLSI Design Interconnect Optimizations Buffer Insertion.
Power Optimal Dual-V dd Buffered Tree Considering Buffer Stations and Blockages King Ho Tam and Lei He Electrical Engineering Department University of.
Jean-Charles REGIN Michel RUEHER ILOG Sophia Antipolis Université de Nice – Sophia Antipolis A global constraint combining.
Interconnect Optimizations
An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunathan 2 1 EE Department,
Retiming with Interconnect and Gate Delay CUHK CSE CAD Group Dennis Tong 29 th Sept., 2003.
Fast Buffer Insertion Considering Process Variation Jinjun Xiong, Lei He EE Department University of California, Los Angeles Sponsors: NSF, UC MICRO, Actel,
EE4271 VLSI Design Advanced Interconnect Optimizations Buffer Insertion.
ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Interconnect Timing Optimization II.
Gate Sizing by Mathematical Programming Prof. Shiyan Hu
Interconnect Synthesis. Buffering Related Interconnect Synthesis Consider –Layer assignment –Wire sizing –Buffer polarity –Driver sizing –Generalized.
Advanced Interconnect Optimizations. Timing Driven Buffering Problem Formulation Given –A Steiner tree –RAT at each sink –A buffer type –RC parameters.
VLSI Physical Design: From Graph Partitioning to Timing Closure Paper Presentation © KLMH Lienig 1 EECS 527 Paper Presentation Accurate Estimation of Global.
A Methodology for Interconnect Dimension Determination By: Jeff Cobb Rajesh Garg Sunil P Khatri Department of Electrical and Computer Engineering, Texas.
1 Coupling Aware Timing Optimization and Antenna Avoidance in Layer Assignment Di Wu, Jiang Hu and Rabi Mahapatra Texas A&M University.
EE 5900 Advanced Algorithms for Robust VLSI CAD, Spring 2009 Static Timing Analysis and Gate Sizing.
Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern.
A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.
1 Design Space Exploration for Power-Efficient Mixed-Radix Ling Adders Chung-Kuan Cheng Computer Science and Engineering Depart. University of California,
Low-Power Gated Bus Synthesis for 3D IC via Rectilinear Shortest-Path Steiner Graph Chung-Kuan Cheng, Peng Du, Andrew B. Kahng, and Shih-Hung Weng UC San.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig 1 EECS 527 Paper Presentation Techniques for Fast.
Thermal-aware Steiner Routing for 3D Stacked ICs M. Pathak and S.K. Lim Georgia Institute of Technology ICCAD 07.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
Modern VLSI Design 3e: Chapter 4 Copyright  1998, 2002 Prentice Hall PTR Topics n Interconnect design. n Crosstalk. n Power optimization.
A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.
1 Prune-and-Search Method 2012/10/30. A simple example: Binary search sorted sequence : (search 9) step 1  step 2  step 3  Binary search.
1 Efficient Obstacle-Avoiding Rectilinear Steiner Tree Construction Chung-Wei Lin, Szu-Yu Chen, Chi-Feng Li, Yao-Wen Chang, Chia-Lin Yang National Taiwan.
GLOBAL ROUTING Anita Antony PR11EC1011. Approaches for Global Routing Sequential Approach: – Route the nets one at a time. Concurrent Approach: – Consider.
1 Shape Segmentation and Applications in Sensor Networks Xianjin Xhu, Rik Sarkar, Jie Gao Department of CS, Stony Brook University INFOCOM 2007.
1 ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National.
Fast Algorithms for Slew Constrained Minimum Cost Buffering S. Hu*, C. Alpert**, J. Hu*, S. Karandikar**, Z. Li*, W. Shi* and C. Sze** *Dept of ECE, Texas.
Po-Wei Lee, Chung-Wei Lin, Yao-Wen Chang, Chin-Fang Shen, Wei-Chih Tseng NTU &Synopsys An Efficient Pre-assignment Routing Algorithm for Flip-Chip Designs.
A Stable Fixed-outline Floorplanning Method Song Chen and Takeshi Yoshimura Graduate School of IPS, Waseda University March, 2007.
Radhamanjari Samanta *, Soumyendu Raha * and Adil I. Erzin # * Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore, India.
Physical Synthesis Buffer Insertion, Gate Sizing, Wire Sizing,
Foundations of Data Structures Practical Session #8 Heaps.
Space-Efficient Online Computation of Quantile Summaries SIGMOD 01 Michael Greenwald & Sanjeev Khanna Presented by ellery.
Routing Tree Construction with Buffer Insertion under Obstacle Constraints Ying Rao, Tianxiang Yang Fall 2002.
An Efficient Surface-Based Low-Power Buffer Insertion Algorithm
1ISPD'03 Process Variation Aware Clock Tree Routing Bing Lu Cadence Jiang Hu Texas A&M Univ Gary Ellis IBM Corp Haihua Su IBM Corp.
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
A Novel Timing-Driven Global Routing Algorithm Considering Coupling Effects for High Performance Circuit Design Jingyu Xu, Xianlong Hong, Tong Jing, Yici.
An O(bn 2 ) Time Algorithm for Optimal Buffer Insertion with b Buffer Types Authors: Zhuo Li and Weiping Shi Presenter: Sunil Khatri Department of Electrical.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
On-Chip Power Network Optimization with Decoupling Capacitors and Controlled-ESRs Wanping Zhang1,2, Ling Zhang2, Amirali Shayan2, Wenjian Yu3, Xiang Hu2,
Buffer Insertion with Adaptive Blockage Avoidance
Convex Hull obstacle start end 11/21/2018 4:05 AM Convex Hull
Objectives What have we learned? What are we going to learn?
Presentation transcript:

An O(nm) Time Algorithm for Optimal Buffer Insertion of m Sink Nets Zhuo Li and Weiping Shi {zhuoli, Texas A&M University College Station, Texas USA

2 ASPDAC06 Outline Introduction O(n) Time Algorithm for 2-pin Nets O(mn) Time Algorithm for m-pin Nets Experimental Results Conclusion

3 ASPDAC06 Buffer insertion and sizing is an effective method to reduce interconnect delay. With large number of nets and buffer positions, fast algorithms are crucial. Introduction Saxena, et al. [TCAD 2004]

4 ASPDAC06 Problem Formulation Given: A routing tree, possible buffer positions, sink capacitances and required arrival times (RAT), wire resistance and capacitance. Delay model: Elmore delay for interconnect and linear delay model for buffers. s0s0 s1s1 s2s2 s3s3 s4s4 b buffer types source m sinks n buffer positions

5 ASPDAC06 Maximum Slack Problem Find: Where to insert buffers so that the slack at the source Q(s 0 ) is maximized. )},()({min)( ii i ssdelaysRATsQ   s0s0 s1s1 s3s3 s4s4 with 2 buffers, Q(S 0 )= 100 ps s2s2 without buffer, Q(S 0 )= – 50 ps

6 ASPDAC06 Previous Research Maximum slack van Ginneken [ISCAS 90]: O(n 2 ) time for one buffer type, where n is the number of buffer positions. Lillis, Cheng and Lin [TCAS 96]: O(b 2 n 2 ) time, where b is the number of buffer types. Shi and Li [TCAD 05]: O(b 2 nlogn) time. Li and Shi [TCAD 06]: O(bn 2 ) time. Minimum cost (area, power, etc.) Lillis, Cheng and Lin [TCAS 96]: pseudo- polynomial time algorithm. Shi, Li and Alpert [ASPDAC 04]: buffer cost minimization is NP-hard if b is a variable.

7 ASPDAC06 Sinks m and Buffer Positions n All previous research assumes m and n are same order. In practice, m << n In one IBM chip, 95% of the most time- consuming nets have less than 5 sinks, while n is hundreds to thousands. In another IBM chip, 80% of the most time- consuming nets have less than 50 sinks. In this paper, we propose Simple algorithms and data structures. O(b 2 n) time for 2-pin nets. First linear time algorithm in terms of n O(bmn+b 2 n) time for m-pin nets.

8 ASPDAC06 Outline Introduction O(n) Time Algorithm for 2-pin Nets O(mn) Time Algorithm for m-pin Nets Experimental Results Conclusion

9 ASPDAC06 O(n) Algorithm for 2-pin Nets Given a 2-pin net with n buffer positions, 1 buffer type, R(w) and C(w) for each wire, and sink non-redundant candidates (Q 1, C 1 ), …, (Q n, C n ) in sorted order. Compute non-redundant candidates at source s0s0 Sink candidates (Q 1, C 1 ), (Q 2, C 2 ), … (Q n, C n ) 1 buffer type source n buffer positions

10 ASPDAC06 New View of Candidates Non-redundant candidates form a monotonically increasing sequence in (Q, C) plane Cap C Slack Q A 1 (6, 1) A 2 (7, 2) A 3 (15, 3) A 4 (19, 4) A 5 (21, 5) worse slack, worse cap, Redundant!

11 ASPDAC06 Slack Q A 3 (15, 3) Add C to All Candidates When we add capacitance to all candidates, they shift to right in the (Q, C) plane A 1 (6, 1) A 2 (7, 2) A 4 (19, 4) A 5 (21, 5) Add C=3 Cap C

12 ASPDAC06 Slack Q A 3 (15, 3) Subtract Q from All Candidates When we subtract Q from all candidates, they shift down in the (Q, C) plane A 1 (6, 1) A 2 (7, 2) A 4 (19, 4) A 5 (21, 5) Subtract Q=2 Cap C

13 ASPDAC06 Slack Q A 3 (15, 3) Add R to All Candidates When we add resistance to all candidates, they move right and down in the (Q, C) plane A 1 (6, 1) A 2 (7, 2) A 4 (19, 4) A 5 (21, 5) A 1 (5, 1) A 2 (5, 2) A 3 (12, 3) A 4 (15, 4) A 5 (16, 5) Add R=1 Cap C

14 ASPDAC06 Slack Q Combined Effect of Add Wires Add wire with R = 1, C = 1 Add wire with R= 2, C= 2 Initial Observations: 1.Max Q moves to left 3. Convex candidate useless    2. Right of Max Q redundant    Cap C

15 ASPDAC06 Convex Convex Pruning A1A1 A2A2 A3A3 A4A4 A5A5 A convex candidate can never give max Q Slack Q Cap C

16 ASPDAC06 Pruned Convex Pruning A1A1 A3A3 A4A4 A5A5 Convex pruning can be done in O(n) time. Convex pruning guarantees optimality for 2- pin nets Slack Q Cap C

17 ASPDAC06 Linked List Data Structure Non-redundant and convex pruned In increasing Q and increasing C order We store slack and capacitance of each candidate implicitly Three global variables Qa, Ca and Ra that are updated as wires and buffers are added Each candidate also has (q, c) pair that are never updated (q 2,c 2 ) Greater slackLess capacitance (q 1,c 1 )(q 3,c 3 )(q 4,c 4 )(q 5,c 5 )(q 6,c 6 )

18 ASPDAC06 (Q, C) Implicitly Stored Global variables that are updated Qa is accumulated wire delay Ca is accumulated wire capacitance, and Ra is accumulated wire resistance Each candidate also has (q i, c i ) that are not updated (Q, C) calculation Slack Q i = q i – Qa – Ra*c i Capacitance C i = c i + Ca For example (q, c) = (15, 1), (19, 2), (21, 3) If Qa=2, Ca=1, Ra=0, then (Q, C) are (15–2, 1+1), (19–2, 2+1), (21–2, 3+1) If Qa=0, Ca=0, Ra=1, then (Q, C) are (15–1*1, 1), (19–1*2, 2), (21–1*3, 3)

19 ASPDAC06 Add Wire For a wire of R(w) and C(w), we update (Q, C) values of all candidates in O(1) time, by updating only Qa, Ca and Ra: Qa = Qa + R(w)*C(w)/2 + R(w)*Ca Ca = Ca + C(w) Ra = Ra + R(w) For example (q, c) = (15, 1), (19, 2), (21, 3) and Qa=1, Ca=1, Ra=2 Add a wire R(w)=2, C(w)=1, then Qa = 1 + 2*1/2 + 2*1 = 4 Ca = = 2 Ra = = 4

20 ASPDAC06 Buffer delay Assume only one buffer type B for now. Let R(B) be driver resistance, C(B) be input capacitance, and t(B) be intrinsic delay. Define the best candidate  as the candidate that maximizes slack among all candidates after B is inserted. Define the new candidate  as the candidate formed by  with the buffer B Q(  ) = Q(  ) – R(B)C(  ) – t(B), C(  ) = C(B). Add Buffer

21 ASPDAC06 (Q, C) Plane View: Add Buffers Add wire with R = 1, C = 1 Add wire with R= 2, C= 2 Convex pruned Q C Observations: 1.Best candidate moves to left C(B) 2. New buffer position moves to left

22 ASPDAC06 Form and Insert New Candidates All n best candidates can be found in O(n) time Best candidate index always moves to left Best candidates can be found by local search All n new candidate can be inserted into the data structure in O(n) time Position of new candidate always moves to left To insert new candidate (Q, C) into the data structure, we set (q, c) values as q = Q + Qa + Ra*C, c = C – Ca It is now consistent with the implicit data structure

23 ASPDAC06 O(1) per deletion O(1) Algorithm for 2-Pin Nets total O(n) O(1) O(1) per del Initial Store all sink candidates in increasing Q and C order Perform convex pruning for sink candidates Let Qa = Ca = Ra = 0 Repeat the following, from sink to source: For each wire w Update Qa, Ca and Ra Prune redundant candidates at right, if any For each buffer position Search in decreasing Q order for the best candidate Form a new candidate Search in decreasing C order for the position of new candidate Insert new candidate Perform local redundancy pruning and convex pruning Total time O(n) Total memory O(n)

24 ASPDAC06 2-Pin Nets with b Buffer Types For each buffer type Bi Its best candidates always move to the left Positions of new candidates always move to the left Therefore, for each buffer type Bi, we use One pointer for best candidate, moves to left One pointer for new buffer position, moves to left Time complexity At most O(bn) new candidates are inserted: O(bn) At most O(bn) redundant candidates deleted: O(bn) At most O(n) wires are added: O(n) Each of the 2b pointers goes through the entire candidate list: 2b*O(bn) = O(b 2 n) Total time O(b 2 n) Total memory O(bn)

25 ASPDAC06 Outline Introduction O(n) Time Algorithm for 2-pin Nets O(mn) Time Algorithm for m-pin Nets Experimental Results Conclusion

26 ASPDAC06 Multi-Pin Nets Convex pruning is not optimal under merging Non-convex candidates could generate optimal solution with candidates from other merging branch Solution: Two lists NR list for non-redundant candidates, for storing candidates CP list for convex pruned candidates, for generating new candidates

27 ASPDAC06 O(mbn)O(b^2n) Initial For each sink s, create NR list and CP list, both contain only one candidate (Q(s), C(s)) Repeat the following, from sink to source: For each 2-pin segment Apply 2-pin algorithm on CP list When add wire, add to both CP and NR When insert new candidate, insert to both CP and NR For each merging point Perform redundancy pruning for NR lists of two branches Perform van Ginneken style merging of two NR lists, and create a new CP list Algorithm for m-Pin Nets Total time O(b 2 n+bmn) Total memory O(bn)

28 ASPDAC06 Outline Introduction O(n) Time Algorithm for 2-pin Nets O(mn) Time Algorithm for m-pin Nets Experimental Results Conclusion

29 ASPDAC06 Two-Pin Nets Number of buffer types = 8

30 ASPDAC06 Multi-Pin Nets (25 sinks) Number of buffer types = 8

31 ASPDAC06 Multi-Pin Nets (25 sinks) Number of buffer positions = 2567

32 ASPDAC06 Conclusion New algorithm for optimal buffer insertion O(b 2 n) for 2-pin nets O(bmn + b 2 n) for m-pin nets Theoretical innovations Simple data structure Fast new candidates generation Fast candidates insertion Practical applications Simple and robust Efficient on industrial test cases Extension to min cost buffer insertion and more general buffer delay models

33 ASPDAC06 m sinks and n buffer positions All previous algorithms assume m and n are of the same order. For most of nets in industrial applications, m is much less than n. Among 1000 most synthesis-time-consuming nets in one IBM ASIC chip, 95% nets with sinks less than 5, and n is generally tens or hundreds times larger than m. In another chip, among 5000 most time-consuming nets, 80% nets with sinks less than 50. O(mn) algorithm: Two-pin nets: O(b 2 n) time. Multi-pin nets: O(b 2 n + bmn) time. Simple data structures. The algorithm is linear respect to n. Add more buffer positions to improve timing.