ABSTRACT We consider the problem of buffering a given tree with the minimum number of buffers under load cap and buffer skew constraints. Our contributions.

Slides:



Advertisements
Similar presentations
Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
Advertisements

OCV-Aware Top-Level Clock Tree Optimization
Advanced Interconnect Optimizations. Buffers Improve Slack RAT = 300 Delay = 350 Slack = -50 RAT = 700 Delay = 600 Slack = 100 RAT = 300 Delay = 250 Slack.
Spring 08, Mar 11 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2008 Zero - Skew Clock Routing Vishwani D. Agrawal.
4/22/ Clock Network Synthesis Prof. Shiyan Hu Office: EREC 731.
Buffer and FF Insertion Slides from Charles J. Alpert IBM Corp.
ELEN 468 Lecture 261 ELEN 468 Advanced Logic Design Lecture 26 Interconnect Timing Optimization.
1 Interconnect Layout Optimization by Simultaneous Steiner Tree Construction and Buffer Insertion Presented By Cesare Ferri Takumi Okamoto, Jason Kong.
F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Provably Good Global Buffering by.
Clock Skewing EECS 290A Sequential Logic Synthesis and Verification.
A.B. Kahng, Ion I. Mandoiu University of California at San Diego, USA A.Z. Zelikovsky Georgia State University, USA Supported in part by MARCO GSRC and.
Improved Algorithms for Link- Based Non-tree Clock Network for Skew Variability Reduction Anand Rajaram †‡ David Z. Pan † Jiang Hu * † Dept. of ECE, UT-Austin.
© Yamacraw, 2001 Minimum-Buffered Routing of Non-Critical Nets for Slew Rate and Reliability A. Zelikovsky GSU Joint work with C. Alpert.
The Cache Location Problem IEEE/ACM Transactions on Networking, Vol. 8, No. 5, October 2000 P. Krishnan, Danny Raz, Member, IEEE, and Yuval Shavitt, Member,
Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.
Minimum-Buffered Routing of Non- Critical Nets for Slew Rate and Reliability Control Supported by Cadence Design Systems, Inc. and the MARCO Gigascale.
Background: Scan-Based Delay Fault Testing Sequentially apply initialization, launch test vector pairs that differ by 1-bit shift A vector pair induces.
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
Power-Aware Placement
ER UCLA UCLA ICCAD: November 5, 2000 Predictable Routing Ryan Kastner, Elaheh Borzorgzadeh, and Majid Sarrafzadeh ER Group Dept. of Computer Science UCLA.
F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (Georgia Tech/UCLA) S. Muddu (Silicon Graphics) A. Zelikovsky (Georgia State) Provably Good Global.
Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification.
1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.
Non-tree Routing for Reliability & Yield Improvement A.B. Kahng – UCSD B. Liu – Incentia I.I. Mandoiu – UCSD Work supported by Cadence, MARCO GSRC, and.
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
A Global Minimum Clock Distribution Network Augmentation Algorithm for Guaranteed Clock Skew Yield A. B. Kahng, B. Liu, X. Xu, J. Hu* and G. Venkataraman*
EE4271 VLSI Design Advanced Interconnect Optimizations Buffer Insertion.
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Interconnect Timing Optimization II.
Pei-Ci Wu Martin D. F. Wong On Timing Closure: Buffer Insertion for Hold-Violation Removal DAC’14.
UC San Diego Computer Engineering. VLSI CAD Laboratory.. UC San Diego Computer EngineeringVLSI CAD Laboratory.. UC San Diego Computer EngineeringVLSI CAD.
Interconnect Synthesis. Buffering Related Interconnect Synthesis Consider –Layer assignment –Wire sizing –Buffer polarity –Driver sizing –Generalized.
1 Chapter 8 Priority Queues. 2 Implementations Heaps Priority queues and heaps Vector based implementation of heaps Skew heaps Outline.
Advanced Interconnect Optimizations. Timing Driven Buffering Problem Formulation Given –A Steiner tree –RAT at each sink –A buffer type –RC parameters.
Flow Models and Optimal Routing. How can we evaluate the performance of a routing algorithm –quantify how well they do –use arrival rates at nodes and.
DELAY INSERTION METHOD IN CLOCK SKEW SCHEDULING BARIS TASKIN and IVAN S. KOURTEV ISPD 2005 High Performance Integrated Circuit Design Lab. Department of.
Modern VLSI Design 4e: Chapter 4 Copyright  2008 Wayne Wolf Topics n Interconnect design. n Crosstalk. n Power optimization.
Xin-Wei Shih and Yao-Wen Chang.  Introduction  Problem formulation  Algorithms  Experimental results  Conclusions.
1 IEEE Trans. on Smart Grid, 3(1), pp , Optimal Power Allocation Under Communication Network Externalities --M.G. Kallitsis, G. Michailidis.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
On the Construction of Data Aggregation Tree with Minimum Energy Cost in Wireless Sensor Networks: NP-Completeness and Approximation Algorithms National.
-1- UC San Diego / VLSI CAD Laboratory A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction Kwangsoo.
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
1 Coupling Aware Timing Optimization and Antenna Avoidance in Layer Assignment Di Wu, Jiang Hu and Rabi Mahapatra Texas A&M University.
A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
Thermal-aware Steiner Routing for 3D Stacked ICs M. Pathak and S.K. Lim Georgia Institute of Technology ICCAD 07.
The Application of The Improved Hybrid Ant Colony Algorithm in Vehicle Routing Optimization Problem International Conference on Future Computer and Communication,
Modern VLSI Design 3e: Chapter 4 Copyright  1998, 2002 Prentice Hall PTR Topics n Interconnect design. n Crosstalk. n Power optimization.
A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
1 ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National.
Fast Algorithms for Slew Constrained Minimum Cost Buffering S. Hu*, C. Alpert**, J. Hu*, S. Karandikar**, Z. Li*, W. Shi* and C. Sze** *Dept of ECE, Texas.
Minimal Spanning Tree Problems in What is a minimal spanning tree An MST is a tree (set of edges) that connects all nodes in a graph, using.
Spring 2014, Mar 17...ELEC 7770: Advanced VLSI Design (Agrawal)1 ELEC 7770 Advanced VLSI Design Spring 2014 Zero - Skew Clock Routing Vishwani D. Agrawal.
Efficient Resource Allocation for Wireless Multicast De-Nian Yang, Member, IEEE Ming-Syan Chen, Fellow, IEEE IEEE Transactions on Mobile Computing, April.
Clock Distribution Network
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
-1- Delay Uncertainty and Signal Criticality Driven Routing Channel Optimization for Advanced DRAM Products Samyoung Bang #, Kwangsoo Han ‡, Andrew B.
Retiming EECS 290A Sequential Logic Synthesis and Verification.
"Teachers open the door, but you must enter by yourself. "
Chapter 7 – Specialized Routing
ELEC 7770 Advanced VLSI Design Spring 2016 Zero-Skew Clock Routing
Buffered tree construction for timing optimization, slew rate, and reliability control Abstract: With the rapid scaling of IC technology, buffer insertion.
Parallel ClockDesigner
"Teachers open the door, but you must enter by yourself. "
Greedy Algorithms TOPICS Greedy Strategy Activity Selection
Reducing Clock Skew Variability via Cross Links
Clock Tree Routing With Obstacles
Presentation transcript:

ABSTRACT We consider the problem of buffering a given tree with the minimum number of buffers under load cap and buffer skew constraints. Our contributions include: A proof that the greedy algorithm proposed by Tellez and Sarrafzadeh (TCAD’97) is suboptimal for all non-zero skew bounds An optimal dynamic programming algorithm for the problem Experimental results on test cases extracted from recent industrial designs showing that the dynamic programming algorithm has practical run time and saves up to 20% of the buffers inserted by the algorithm of Tellez and Sarrafzadeh On the Skew-Bounded Minimum Buffer Routing Tree Problem C. Albrecht (Synopsys), A.B. Kahng, B. Liu, I. Mandoiu (UCSD), A. Zelikovsky (GSU)

Motivation In order to initiate meaningful placement and timing optimizations, every design flow requires early elimination of all electrical violations (e.g., load cap and slew violations), even for non-critical nets. Bounds on load caps - Serve as proxies for signal slew rate bound - Improve coupling noise immunity - Reduce delay uncertainty due to coupling noise - Improve reliability with respect to hot-carrier and AC self-heating effects - Facilitate technology migration since designs are more balanced - Guarantee bounded input rise/fall times at buffers and sinks For clock and test distribution an additional design requirement is bounding the buffer skew, i.e., the difference between the maximum and the minimum number of buffers over all source-to-sink paths in a routing tree, since buffer skew is one of the main factors affecting the actual delay skew To make progress with any methodology, it is crucial to have a fast and resource efficient method for fixing load cap and buffer skew violations. Of particular interest are practical methods for buffering non-critical nets that have up to tens of thousands of sinks (e.g., scan enable)

Given: –Net N with source r and set of sinks S –Binary routing tree T = (r, V, E) for N –Input capacitance c s for each sink s  S –Buffer input capacitance C b –Unit-length wire capacitance C w –Capacitive load upper-bound C U –Buffer-skew bound  Find: buffering of the routing tree T such that –The load cap of each buffer and of the source r is at most C U –The buffer skew is at most  –The number of inserted buffers is minimized Minimum-Buffered Routing Problem Tree with bounded buffer load cap CUCU 0.75C U C w =C b =0 Tree with bounded buffer load cap and zero buffer-skew CUCU 0.75C U C w =C b =0  =0

Bounded load cap w/o buffer skew bound For each u  V, in bottom-up order, do –A. packNode(u) : Let v and w be the two children of u. If cap(T v ) + cap(T w ) > C u add a buffer at the topmost position of the child branch with the largest cap (the greedy choice) then remove the subtree driven by the buffer –B. packEdge(u) : While cap(T u ) > C u add a buffer on edge (u,parent(u)) at the highest possible position still meeting the load cap bound C u The Greedy Algorithm Proposed by Tellez and Sarrafzadeh (IEEE Trans. on CAD, vol. 16, 1997, pp ) packNode(u) w/ buffer skew bound  –A.0 If l(T v ) < l(T w ) (longest path of v is less than longest path of w) then swap v and w. –A.1 If l(T v ) - l(T w ) >  then insert l(T v ) - l(T w ) -  buffers at the topmost position of (u,w) ; exit if cap(T u )<C u –A.2 Perform packNode(u) excluding child branches with maximum longest path; exit if cap(T u )<C u –A.3 Insert buffers at topmost position of child branches with shortest path equal to l(u) –  –A.4 Perform packNode(u) considering only child branches with maximum longest path

The Greedy Algorithm is Suboptimal Greedy bufferingOptimum buffering The greedy algorithm of Tellez and Sarrafzadeh finds the optimum buffering when  = 0 However, the algorithm is suboptimal for any buffer skew  > 0 Counterexample 1. Buffer skew  = 1, sink input cap C u =C U, C v =C x =0.75C U Interconnect and buffer have zero cap CUCU 0.75C U CUCU

To guarantee optimality, solutions w/  different longest path lengths may be required for a subtree in any bottom-up algorithm Counterexample 3: C w =C b =0,  ‘ u ’  leaves, each with c u = C U – , one ‘ v ’ leaf with c v =  Optimum: depending on upstream tree topology, each of the following  bufferings may be the only way to complete the optimum solution To guarantee optimality, arbitrarily many solutions may be need for a subtree in any bottom-up algorithm Counterexample 2:  =1, C w =C b =0, c u =C U and c v satisfies c v 2 d-2 C U where d is depth of T a Greedy buffers one of the two branches into node a, this triggers the insertion of arbitrarily many buffers upstream due to the skew constraint Optimum: buffers as many of the ‘ v ’ nodes as needed in one of the two subtrees of node a Why No Greedy Algorithm Will Work v uuu uuu uuu v v a uuuuvvvv

Initialize solution set L(u) = ,  u  V For each u  V, in bottom-up order, do (1) Let v and w be the children of u (2) For each buffering X  L(v) and Y  L(w), with l(X) ≥ l(Y), do (a) Let Z be X  Y with max{0,l(X)-s(Y)} buffers added at the top (b) For each i = 0, …, min{max{0, s(X) – s(Y)}, l(X) – l(Y)} do – Let Z i be Z with i buffers added at the top of edge (w,u) – EdgeBuffering(Z i,u) (3) Remove from L(u) all bufferings with more than NB buffers (4) For each buffering with ( nb, l, s) buffers in total, on longest path, and on shortest path, respectively, remove from L(u) all bufferings with parameters ( nb+k, l+k, s+k) where k ≥2 Return the buffering X  L(v) with minimum number of buffers Procedure EdgeBuffering(X,u): While cap(X) > C U, add a buffer on edge (u, parent(u)) at the highest position meeting the load cap bound C u L(u)  L(u) + X If cap(X) > C b then L(u)  L(u) + X ’ where X ’ is X with an additional buffer just below parent(u) Dynamic Programming Algorithm

Analysis Corectness: By induction: for each buffering X of the branch driven by (u,parent(u)) there exists k > 0 and a buffering Y  L(u) such that X is dominated by Y with k buffers added at the top  The dynamic programming algorithm returns an optimum feasible buffering Runtime: For each node u  T, the solution set L(u) computed by the dynamic programming algorithm contains at most 2(  +1)NB bufferings  The running time of the algorithm is O(n(  +1) 3 NB 2 ) time, where n,  and NB are the number of sinks, the given skew bound and a given upper-bound on the optimum number of buffers, respectively The bound is not known to be tight, in practice the runtime is much better

DP has practical runtime (less than 1 second for the above 2676-sink test) DP saves up to 20% of the buffers inserted by Tellez-Sarrafzadeh algorithm Compared to zero-skew buffering, DP achieves a significant reduction in the number of inserted buffers even with a very small buffer skew (  =1 or 2 ) Experimental Results CUCU  =0  =1  =2  =3  =4 LB  =  TS97DPTS97DPGainTS97DPGainTS97DPGainTS97DPGain % % % % % % % % % % % % % % % % % % % %

On the Skew-Bounded Minimum Buffer Routing Tree Problem C. Albrecht (Synopsys Inc.) A.B. Kahng, B. Liu, I. Mandoiu (UC San Diego) A. Zelikovsky (Georgia State U.)

Minimum-Buffered Routing Early elimination of load cap and slew violations is needed for all nets, even for non-critical ones. Bounds on load caps - Serve as proxies for signal slew rate bound - Improve coupling noise immunity - Reduce delay uncertainty due to coupling noise - Improve reliability with respect to hot-carrier and AC self-heating effects - Facilitate technology migration since designs are more balanced - Guarantee bounded input rise/fall times at buffers and sinks For clock and test distribution an additional design requirement is bounding the buffer skew, i.e., the difference between the maximum and the minimum number of buffers over all source-to-sink paths in a routing tree Minimum-Buffered Routing Problem: Given a routed net, sink/buffer input caps, and unit-wire cap, insert the minumum number of buffers to satisfy given load cap and buffer skew constraints Introduced by Tellez and Sarrafzadeh (IEEE TCAD’97) who gave a greedy algorithm

Our Contributions A proof that the greedy algorithm of Tellez and Sarrafzadeh is suboptimal for all non-zero skew bounds - We give examples showing that no greedy algorithm can achieve optimality An optimal dynamic programming algorithm for the problem - The algorithm computes lists of undominated feasible solutions for all subtrees, in bottom-up order - Worst-case runtime is O(n(  +1) 3 NB 2 ) time, where n,  and NB are the number of sinks, the skew bound, and a given upper-bound on the optimum number of buffers, respectively - Runtime is much better in practice Experimental study of buffering algorithms on test cases extracted from recent industrial designs - The dynamic programming algorithm uses significantly fewer buffers than the algorithm of Tellez and Sarrafzadeh

DP has practical runtime (less than 1 second per run) DP saves up to 20% of the buffers inserted by Tellez-Sarrafzadeh algorithm Compared to zero-skew buffering, DP achieves a significant reduction in the number of inserted buffers even with a very small buffer skew (  =1 or 2 ) Results on a 2676-sink testcase CUCU  =0  =1  =2  =3  =4 LB  =  TS97DPTS97DPGainTS97DPGainTS97DPGainTS97DPGain % % % % % % % % % % % % % % % % % % % %