Download presentation
Presentation is loading. Please wait.
Published byAlaina White Modified over 9 years ago
1
Improved Algorithms for Link- Based Non-tree Clock Network for Skew Variability Reduction Anand Rajaram †‡ David Z. Pan † Jiang Hu * † Dept. of ECE, UT-Austin ‡ Texas Instruments, Dallas * Dept. of EE, TAMU
2
Outline Introduction Review of link-based non-tree clock network Improved algorithms (over [Rajaram et al, DAC ’ 04]) ›Rule based algorithm ( δ Rule) ›Graph theoretical approach (MST-based) Experimental results Conclusions
3
Clock Distribution Network Register D max Clock Network 12 d1d1 Launch signals d2d2 T Catch signals Signal transfer coordinated by clock signal All registers are supplied with clock signal by clock distribution network Skew = d 1 – d 2 Zero skew: d 1 = d 2 Useful skew, d 1 – d 2 = δ 12
4
Clocks : Important Considerations & Objectives One of the biggest & most frequently switching nets Very sensitive to unwanted skew introduced by PVT ›Manufacturing process variations (P) ›Power supply voltage noise (V) ›Temperature variations (T) Less clock skew variation a “ MUST ” for nanometer VLSI designs Minimizing clock routing wire-length can ›Reduce power consumption
5
Approaches for Reducing Skew Variability Buffer & wire sizing [Pullela et al., DAC ’ 93; Chung et al., ICCAD ’ 94; Wang et al., ISPD ’ 04] Variation aware routing [Lin et al., ICCAD ’ 94; Lu et al., ISPD ’ 03] Non-tree clock networks ›McCoy et al., ETC ’ 94; Vandenberghe et al., ICCAD ’ 97; Xue et al., ICCAD ’ 95 ›Link based non-tree clock networks [Rajaram et al., DAC ’ 04]
6
Non-tree: 1-D Spine [Kurd et.al JSSC ’ 01] 1-D spine Applied in Intel Pentium processor design Variations between spines still exists Spines Clock sinks or local sub-networks
7
Non-tree: 2-D Mesh Top level mesh [Su et. al, ICCAD ’ 01] Less wire, less effective Leaf level mesh [Restle et. al, JSSC ’ 01] Very effective, huge wire Applied in IBM microprocessors Clock sinks or local sub-networks
8
Linked Non-tree = Tree + Links [Rajaram et al, DAC ’ 04] Non-tree = tree + links How to select link pairs is the key! Link = link_capacitors + link_resistor u w i w u RlRl C/2 uw RlRl
9
Skew Between Link Endpoints New skew with link (u, w): R link u w R loop Value of becomes smaller when link is closer to leaf nodes for a given R link
10
Skew Between any Two Nodes (i, j) with Link (u, w) Skew variation between any node pair (i, j) Scenario1: i T g, j T h => always smaller Scenario2: i & j T g (or T h ) => could be worse Scenario3: i T p, j T p => could be much worse Key idea: try to avoid Scenario 3 and 2 for link insertion u w P g h P: nearest common ancestor for u and w T x : Sub-tree rooted at x
11
Rule Based Algorithms [Rajaram et al, DAC ’ 04] α-rule: Lower the α, better the link β- rule: Lower the β, lesser the tuning required γ-rule: The nearest common ancestor's depth from root is < γ max
12
Guidelines for Node Pair Selection for Link Insertion Select nodes which are hierarchically far apart Select nodes physically close to each other Select nodes with equal nominal delay Select nodes closer to leaf nodes For zero skew routing, only select leaf nodes
13
Merits ›Physical characteristics of the links considered. So bad links avoided. ›Independent of balanced nature of clock structure ›Efficient run time Demerits ›No control over distribution of links. ›Possibility of links getting added in the same region Solution ›δ-rule: No two links should have the same pair of ancestors at the depth = δ from the clock source ›Retains the merits of the previous rules and addresses the demerit A B CD A B CD Using δ = 2 Rule Based Algorithms [Rajaram et al, DAC’04]
14
δ Rule – An Example A B CD Crowding of links. Subtrees A and D not linked! Using δ = 2 δ is the node level from clock source
15
Graph Theoretical Approach Select_Node_Pairs(T v ) { l = v.left_child r = v.right_child P = Select_node_pair_between(T l, T r, k) ≥ if Depth(v) ≥ depth_limit, exit; P = P Select_Node_Pairs(T l ) P = P Select_Node_Pairs(T r ) Return P } lr v T l1 T l2 T r1 T r2 The entire clock tree is recursively divided into two parts and links added between them This ensures distribution of links throughout the clock tree Edge weight = Min-distance between sinks of T li and T rj T l1 T l2 T r1 T r2
16
Graph theoretical approach – Min-matching [Rajaram et al, DAC ’ 04] Bipartite min-matching algorithm to select the node pairs Merits ›Distribute links evenly through all regions of the clock network Demerits ›Due to the nature of the min-matching algorithm, only one link per sub-tree is allowed ›May result in some very lengthy links and increased wire lengths ›Lengthy links might be difficult to route ›Complexity of min-matching is O(n 3 ). Not scalable! l r v Lengthy links
17
New graph theoretical approach – Minimum Spanning Tree Based MST algorithm allows more than one link per sub-tree ›More number of short links (cf. bipartite approach) Retains the merits of the min- matching based approach ›Evenly distribute the links Complexity is O(nlogn) ›Much faster than bipartite matching algorithm O(n 3 ) l r v
18
MST_node_pair_select(T l, T r, k) { Divide T l into k sub-trees, S l = { T l1, T l2, T l3, … T lk. } Divide T r into k subtrees, S r = { T r1, T r2, T r3, … T rk. } Find MST of the completely connected bipartite graph between S l & S r } T l1 T l2 T r1 T r2 SlSl SrSr l r v T l1 T l2 T r1 T r2 MST Based Algorithm After MST pair selection, iteratively delete edges violating the four rules (α, β, γ, and δ)
19
Experimental Setup Benchmarks: r1 – r5 from bounded skew tree work [Cong et. al, ICCAD’95] Interconnect width variation ›Smaller than thickness ›More sensitive to variations Load capacitance variation -3σ -2σ -1σ +1σ +2σ +3σ Max Nom 99.74% Min All variables assumed to be Gaussian Standard Deviation = Delay of sink i Delay of reference sink Skew Variability measure: Standard Deviation
20
Experimental Result on Skew Variability Benchmarkr1r2r3r4r5 No. of sinks26759886219033100
21
HSPICE Validation Benchmarkr1r2r3r4r5 No. of sinks26759886219033100
22
Experimental Result on Wire- length
23
Wire-length comparison between link insertion methods
24
Conclusions Two new efficient algorithms for link insertion have been proposed ›Significant skew variability reduction with very small wire-length increase ›Scale very well with size of clock network for both runtime and QOR Proposed methodology is independent of the nature of variability effects Friendly to incremental changes
25
Sources of the Unwanted Skew Variations Process variations (P) ›Gate variations »Gate length variation »T ox variation ›Interconnect variations »Significantly affects delay and skew [Liu, et al., DAC ’ 00] ›Load capacitance variations Supply voltage noise (V) Temperature variations (T) t ox Gate length Gate variations Interconnect width Variations width
26
Skew Between Link Endpoints Original skew skew variation, then link resistor always reduces skew variation If nominal is zero, can be treated as Link capacitance may affect nominal skew i w 1A1A R i,w = voltage at w u Effect of the link resistor & capacitor on skew r u - r w > 0 always Elmore delays evaluated with C u = +1 and C w = -1
27
General Flow of Non-tree Clock 1. Obtain initial clock tree 2. Find node pairs for link insertion 3. Add link capacitances to selected nodes 4. Tune merging node location to restore original skew 5. Insert link resistance to selected node pairs
28
Run Time Comparison Runtime comparison between the different methods as a function of number of links at γ = 1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.