Improved Algorithms for Link- Based Non-tree Clock Network for Skew Variability Reduction Anand Rajaram †‡ David Z. Pan † Jiang Hu * † Dept. of ECE, UT-Austin.

Slides:



Advertisements
Similar presentations
Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.
Advertisements

Gate Sizing for Cell Library Based Designs Shiyan Hu*, Mahesh Ketkar**, Jiang Hu* *Dept of ECE, Texas A&M University **Intel Corporation.
OCV-Aware Top-Level Clock Tree Optimization
Chapter 4 Retiming.
4/22/ Clock Network Synthesis Prof. Shiyan Hu Office: EREC 731.
Buffer and FF Insertion Slides from Charles J. Alpert IBM Corp.
ELEN 468 Lecture 261 ELEN 468 Advanced Logic Design Lecture 26 Interconnect Timing Optimization.
1 Interconnect Layout Optimization by Simultaneous Steiner Tree Construction and Buffer Insertion Presented By Cesare Ferri Takumi Okamoto, Jason Kong.
Coupling-Aware Length-Ratio- Matching Routing for Capacitor Arrays in Analog Integrated Circuits Kuan-Hsien Ho, Hung-Chih Ou, Yao-Wen Chang and Hui-Fang.
Chop-SPICE: An Efficient SPICE Simulation Technique For Buffered RC Trees Myung-Chul Kim, Dong-Jin Lee and Igor L. Markov Dept. of EECS, University of.
Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
Minimal Skew Clock Embedding Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
EE141 © Digital Integrated Circuits 2nd Timing Issues 1 Digital Integrated Circuits A Design Perspective Timing Issues Jan M. Rabaey Anantha Chandrakasan.
CLOCK DISTRIBUTION Shobha Vasudevan. The clock distribution problem Large Chip Area Different flop densities Non-uniform distribution of flops All flops.
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
© Yamacraw, 2001 Minimum-Buffered Routing of Non-Critical Nets for Slew Rate and Reliability A. Zelikovsky GSU Joint work with C. Alpert.
Chapter 11 Timing Issues in Digital Systems Boonchuay Supmonchai Integrated Design Application Research (IDAR) Laboratory August 20, 2004; Revised - July.
Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.
Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm Sadiq M. Sait, Mustafa I. Ali, Ali Zaidi.
A Novel Clock Distribution and Dynamic De-skewing Methodology Arjun Kapoor – University of Colorado at Boulder Nikhil Jayakumar – Texas A&M University,
Power-Aware Placement
Lecture 8: Clock Distribution, PLL & DLL
ABSTRACT We consider the problem of buffering a given tree with the minimum number of buffers under load cap and buffer skew constraints. Our contributions.
Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported.
Fast and Area-Efficient Phase Conflict Detection and Correction in Standard-Cell Layouts Charles Chiang, Synopsys Andrew B. Kahng, UC San Diego Subarna.
Jan. 2007VLSI Design '071 Statistical Leakage and Timing Optimization for Submicron Process Variation Yuanlin Lu and Vishwani D. Agrawal ECE Dept. Auburn.
A Global Minimum Clock Distribution Network Augmentation Algorithm for Guaranteed Clock Skew Yield A. B. Kahng, B. Liu, X. Xu, J. Hu* and G. Venkataraman*
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Interconnect Timing Optimization II.
Changbo Long ECE Department, UW-Madison Lei He EDA Research Group EE Department, UCLA Distributed Sleep Transistor Network.
L o g o Jieyi Long, Hai Zhou, and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. An O(nlogn) Edge-Based Algorithm for Obstacle- Avoiding Rectilinear.
L i a b l eh kC o m p u t i n gL a b o r a t o r y On Effective and Efficient In-Field TSV Repair for Stacked 3D ICs Presenter: Li Jiang Li Jiang †, Fangming.
Page 1 Department of Electrical Engineering National Chung Cheng University, Chiayi, Taiwan Power Optimization for Clock Network with Clock Gate Cloning.
VLSI Physical Design Automation
DELAY INSERTION METHOD IN CLOCK SKEW SCHEDULING BARIS TASKIN and IVAN S. KOURTEV ISPD 2005 High Performance Integrated Circuit Design Lab. Department of.
Modern VLSI Design 4e: Chapter 4 Copyright  2008 Wayne Wolf Topics n Interconnect design. n Crosstalk. n Power optimization.
Xin-Wei Shih and Yao-Wen Chang.  Introduction  Problem formulation  Algorithms  Experimental results  Conclusions.
-1- UC San Diego / VLSI CAD Laboratory A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction Kwangsoo.
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
Pattern Selection based co-design of Floorplan and Power/Ground Network with Wiring Resource Optimization L. Li, Y. Ma, N. Xu, Y. Wang and X. Hong WuHan.
1 Coupling Aware Timing Optimization and Antenna Avoidance in Layer Assignment Di Wu, Jiang Hu and Rabi Mahapatra Texas A&M University.
Lecture 12 Review and Sample Exam Questions Professor Lei He EE 201A, Spring 2004
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
Efficient Multi-Layer Obstacle- Avoiding Rectilinear Steiner Tree Construction Chung-Wei Lin, Shih-Lun Huang, Kai-Chi Hsu,Meng-Xiang Li, Yao-Wen Chang.
Thermal-aware Steiner Routing for 3D Stacked ICs M. Pathak and S.K. Lim Georgia Institute of Technology ICCAD 07.
Modern VLSI Design 3e: Chapter 4 Copyright  1998, 2002 Prentice Hall PTR Topics n Interconnect design. n Crosstalk. n Power optimization.
A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.
ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Gate and Interconnect Optimization.
1 Efficient Obstacle-Avoiding Rectilinear Steiner Tree Construction Chung-Wei Lin, Szu-Yu Chen, Chi-Feng Li, Yao-Wen Chang, Chia-Lin Yang National Taiwan.
1 ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National.
Clock-Tree Aware Placement Based on Dynamic Clock-Tree Building Yanfeng Wang, Qiang Zhou, Xianlong Hong, and Yici Cai Department of Computer Science and.
Fast Algorithms for Slew Constrained Minimum Cost Buffering S. Hu*, C. Alpert**, J. Hu*, S. Karandikar**, Z. Li*, W. Shi* and C. Sze** *Dept of ECE, Texas.
Maze Routing Algorithms with Exact Matching Constraints for Analog and Mixed Signal Designs M. M. Ozdal and R. F. Hentschke Intel Corporation ICCAD 2012.
Routing Topology Algorithms Mustafa Ozdal 1. Introduction How to connect nets with multiple terminals? Net topologies needed before point-to-point routing.
1ISPD'03 Process Variation Aware Clock Tree Routing Bing Lu Cadence Jiang Hu Texas A&M Univ Gary Ellis IBM Corp Haihua Su IBM Corp.
Clock Distribution Network
Zero Skew Clock Routing ECE 556 Project Proposal John Thompson Kurt Ting Simon Wong.
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
-1- Delay Uncertainty and Signal Criticality Driven Routing Channel Optimization for Advanced DRAM Products Samyoung Bang #, Kwangsoo Han ‡, Andrew B.
An O(bn 2 ) Time Algorithm for Optimal Buffer Insertion with b Buffer Types Authors: Zhuo Li and Weiping Shi Presenter: Sunil Khatri Department of Electrical.
An O(nm) Time Algorithm for Optimal Buffer Insertion of m Sink Nets Zhuo Li and Weiping Shi {zhuoli, Texas A&M University College Station,
Proximity Optimization for Adaptive Circuit Design Ang Lu, Hao He, and Jiang Hu.
Chapter 7 – Specialized Routing
Performance Optimization Global Routing with RLC Crosstalk Constraints
Performance and RLC Crosstalk Driven Global Routing
Reducing Clock Skew Variability via Cross Links
Zero Skew Clock tree Implementation
Zero-Skew Trees Zero-Skew Tree: rooted tree in which all root-to-leaf paths have the same length Used in VLSI clock routing & network multicasting.
Clock Tree Routing With Obstacles
Under a Concurrent and Hierarchical Scheme
Presentation transcript:

Improved Algorithms for Link- Based Non-tree Clock Network for Skew Variability Reduction Anand Rajaram †‡ David Z. Pan † Jiang Hu * † Dept. of ECE, UT-Austin ‡ Texas Instruments, Dallas * Dept. of EE, TAMU

Outline  Introduction  Review of link-based non-tree clock network  Improved algorithms (over [Rajaram et al, DAC ’ 04]) ›Rule based algorithm ( δ Rule) ›Graph theoretical approach (MST-based)  Experimental results  Conclusions

Clock Distribution Network Register D max Clock Network 12 d1d1 Launch signals d2d2 T Catch signals  Signal transfer coordinated by clock signal  All registers are supplied with clock signal by clock distribution network  Skew = d 1 – d 2  Zero skew: d 1 = d 2  Useful skew, d 1 – d 2 = δ 12

Clocks : Important Considerations & Objectives  One of the biggest & most frequently switching nets  Very sensitive to unwanted skew introduced by PVT ›Manufacturing process variations (P) ›Power supply voltage noise (V) ›Temperature variations (T)  Less clock skew variation a “ MUST ” for nanometer VLSI designs  Minimizing clock routing wire-length can ›Reduce power consumption

Approaches for Reducing Skew Variability  Buffer & wire sizing [Pullela et al., DAC ’ 93; Chung et al., ICCAD ’ 94; Wang et al., ISPD ’ 04]  Variation aware routing [Lin et al., ICCAD ’ 94; Lu et al., ISPD ’ 03]  Non-tree clock networks ›McCoy et al., ETC ’ 94; Vandenberghe et al., ICCAD ’ 97; Xue et al., ICCAD ’ 95 ›Link based non-tree clock networks [Rajaram et al., DAC ’ 04]

Non-tree: 1-D Spine [Kurd et.al JSSC ’ 01]  1-D spine  Applied in Intel Pentium processor design  Variations between spines still exists Spines Clock sinks or local sub-networks

Non-tree: 2-D Mesh  Top level mesh [Su et. al, ICCAD ’ 01]  Less wire, less effective  Leaf level mesh [Restle et. al, JSSC ’ 01]  Very effective, huge wire  Applied in IBM microprocessors Clock sinks or local sub-networks

Linked Non-tree = Tree + Links [Rajaram et al, DAC ’ 04]  Non-tree = tree + links  How to select link pairs is the key!  Link = link_capacitors + link_resistor u w i w u RlRl C/2 uw RlRl

Skew Between Link Endpoints  New skew with link (u, w): R link u w R loop  Value of becomes smaller when link is closer to leaf nodes for a given R link

Skew Between any Two Nodes (i, j) with Link (u, w) Skew variation between any node pair (i, j)  Scenario1: i  T g, j  T h => always smaller  Scenario2: i & j  T g (or T h ) => could be worse  Scenario3: i  T p, j  T p => could be much worse  Key idea: try to avoid Scenario 3 and 2 for link insertion u w P g h P: nearest common ancestor for u and w T x : Sub-tree rooted at x

Rule Based Algorithms [Rajaram et al, DAC ’ 04] α-rule: Lower the α, better the link β- rule: Lower the β, lesser the tuning required γ-rule: The nearest common ancestor's depth from root is < γ max

Guidelines for Node Pair Selection for Link Insertion  Select nodes which are hierarchically far apart  Select nodes physically close to each other  Select nodes with equal nominal delay  Select nodes closer to leaf nodes  For zero skew routing, only select leaf nodes

 Merits ›Physical characteristics of the links considered. So bad links avoided. ›Independent of balanced nature of clock structure ›Efficient run time  Demerits ›No control over distribution of links. ›Possibility of links getting added in the same region  Solution ›δ-rule: No two links should have the same pair of ancestors at the depth = δ from the clock source ›Retains the merits of the previous rules and addresses the demerit A B CD A B CD Using δ = 2 Rule Based Algorithms [Rajaram et al, DAC’04]

δ Rule – An Example A B CD Crowding of links. Subtrees A and D not linked! Using δ = 2 δ is the node level from clock source

Graph Theoretical Approach Select_Node_Pairs(T v ) { l = v.left_child r = v.right_child P = Select_node_pair_between(T l, T r, k) ≥ if Depth(v) ≥ depth_limit, exit; P = P  Select_Node_Pairs(T l ) P = P  Select_Node_Pairs(T r ) Return P } lr v T l1 T l2 T r1 T r2  The entire clock tree is recursively divided into two parts and links added between them  This ensures distribution of links throughout the clock tree Edge weight = Min-distance between sinks of T li and T rj T l1 T l2 T r1 T r2

Graph theoretical approach – Min-matching [Rajaram et al, DAC ’ 04]  Bipartite min-matching algorithm to select the node pairs  Merits ›Distribute links evenly through all regions of the clock network  Demerits ›Due to the nature of the min-matching algorithm, only one link per sub-tree is allowed ›May result in some very lengthy links and increased wire lengths ›Lengthy links might be difficult to route ›Complexity of min-matching is O(n 3 ). Not scalable! l r v Lengthy links

New graph theoretical approach – Minimum Spanning Tree Based  MST algorithm allows more than one link per sub-tree ›More number of short links (cf. bipartite approach)  Retains the merits of the min- matching based approach ›Evenly distribute the links  Complexity is O(nlogn) ›Much faster than bipartite matching algorithm O(n 3 ) l r v

MST_node_pair_select(T l, T r, k) { Divide T l into k sub-trees, S l = { T l1, T l2, T l3, … T lk. } Divide T r into k subtrees, S r = { T r1, T r2, T r3, … T rk. } Find MST of the completely connected bipartite graph between S l & S r } T l1 T l2 T r1 T r2 SlSl SrSr l r v T l1 T l2 T r1 T r2 MST Based Algorithm After MST pair selection, iteratively delete edges violating the four rules (α, β, γ, and δ)

Experimental Setup  Benchmarks: r1 – r5 from bounded skew tree work [Cong et. al, ICCAD’95]  Interconnect width variation ›Smaller than thickness ›More sensitive to variations  Load capacitance variation -3σ -2σ -1σ +1σ +2σ +3σ Max Nom 99.74% Min All variables assumed to be Gaussian  Standard Deviation = Delay of sink i Delay of reference sink  Skew Variability measure: Standard Deviation

Experimental Result on Skew Variability Benchmarkr1r2r3r4r5 No. of sinks

HSPICE Validation Benchmarkr1r2r3r4r5 No. of sinks

Experimental Result on Wire- length

Wire-length comparison between link insertion methods

Conclusions  Two new efficient algorithms for link insertion have been proposed ›Significant skew variability reduction with very small wire-length increase ›Scale very well with size of clock network for both runtime and QOR  Proposed methodology is independent of the nature of variability effects  Friendly to incremental changes

Sources of the Unwanted Skew Variations  Process variations (P) ›Gate variations »Gate length variation »T ox variation ›Interconnect variations »Significantly affects delay and skew [Liu, et al., DAC ’ 00] ›Load capacitance variations  Supply voltage noise (V)  Temperature variations (T) t ox Gate length Gate variations Interconnect width Variations width

Skew Between Link Endpoints  Original skew skew variation, then link resistor always reduces skew variation  If nominal is zero, can be treated as  Link capacitance may affect nominal skew i w 1A1A R i,w = voltage at w u  Effect of the link resistor & capacitor on skew r u - r w > 0 always Elmore delays evaluated with C u = +1 and C w = -1

General Flow of Non-tree Clock 1. Obtain initial clock tree 2. Find node pairs for link insertion 3. Add link capacitances to selected nodes 4. Tune merging node location to restore original skew 5. Insert link resistance to selected node pairs

Run Time Comparison Runtime comparison between the different methods as a function of number of links at γ = 1