Download presentation
Presentation is loading. Please wait.
Published byWesley Robbins Modified over 9 years ago
1
1 ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National Taiwan University) University of Wisconsin-Madison http://vlsi.ece.wisc.edu
2
2 Outline Background Motivation and contribution Literature overview ClockTune algorithm Problem formulation ClockTune algorithm overview Optimality and complexity analysis Experimental results Runtime, memory usage, and optimality Power/Delay trade-off Incremental refinement
3
3 Motivation Clock skew cycle time penalty Start with zero-skew clock tree Minimize clock delay reduces system-level skew (Kuh, et al. [DAC ‘90]) Clock tree is power-hungry (30% in Intel McKinley(0.18um/1GHz/130W) P = f CV 2 Minimize switching capacitance (wiring area) Stability affects design convergence Allow incremental refinement to accommodate local changes Interconnect delay dominates total delay Wire-sizing is effective in reducing interconnect delay
4
4 Motivation Non-convex zero-skew constraints No known algorithm solves zero-skew wire-sizing problem optimally with polynomial runtime Hence, a good clock tree wire-sizing algorithm can Minimize delay and power Guarantee optimality and runtime Have good stability
5
5 Contribution First ε-optimal algorithm for solving clock min-delay/power zero-skew wire-sizing optimization problem Provide complete (Sampled) solution set of the delay/power/area trade-off information for design planning Efficient pseudo-polynomial runtime (6170-branch clock tree in 6 minutes within 1% optimality) Runtime v.s. Optimality tradeoff Incremental clock re-balancing to speed up design convergence
6
6 Literature Overview “ Reliable non-zero skew clock tree using wire width optimization”, Pillage, et al. [DAC ’93] Iteratively optimize skew and delay using adjoint sensitivity analysis Aimed at reliable clock trees under process variation Deferred Merging Embedding (DME) algorithm, Kahng, et al. [TCAD ’92] Bottom-up merging segment construction, top-down embedding Integrated Deferred Merging Embedding (IDME) algorithm, Wong, et al. [ISPD’00] Handles simultaneous routing, buffer-insertion, and wire-sizing Merging segment set: a set of line samples of a merging region No optimality guarantee The size of MSS grows exponentially “Process variation aware clock tree routing”, Lu, et al. [ISPD ’03] Based on DME/BST
7
7 Outline Background Motivation and contribution Literature overview ClockTune algorithm Problem formulation ClockTune algorithm overview Optimality and complexity analysis Experimental results Runtime, memory usage, and optimality Power/Delay trade-off Incremental refinement
8
8 Problem formulation min-ZSWS (Zero Skew Wire Sizing) problem Given a clock routing minimize s.t. wherePi, Pj are paths from v to leaf nodes i and j Zero-skew constraints are non-convex constraints No known algorithm solves the problem optimally in polynomial runtime
9
9 DC region approach Clock Delay and wiring Capacitance are top concerns Define f : R N R 2, such that f Y (w) = Delay(T v (w)), f X (w) = Capacitance(T v (w)) DC region ( v ): The projection of the feasible region Choose a d-c pair from the DC region on R 2 DC region Feasible region
10
10 ClockTune algorithm overview Phase 1: bottom-up construct DC regions for every node Phase 2: top-down embedding after delay/power tradeoff
11
11 Optimality analysis Embeddings not fall on the delay samples will be omitted Propagated error Delay sampling error Wire width sampling error (detailed in the paper)
12
12 Optimality analysis Error is bounded d : delay sampling resolution w : wire width sampling resolution k, : Constants related to l, r 0, c 0, w m, w M … Generally speaking, error reduced about a half when resolution doubled Error Resolution
13
13 Optimality runtime trade off Control sampling resolution can trade off optimality with runtime and memory
14
14 Complexity analysis Runtime Bottom-up phase takes O(n p max(p,q)) Top-down phase takes O(np) Overall: O(n p max(p,q)) Memory O(np) where n : number of nodes of the clock tree, p : number of delay samples taken at each node q : number of wire width samples taken at each level-2 node
15
15 Outline Background Motivation and contribution Related works problem formulation ClockTune Algorithm Design space projection Algorithm overview Optimality and complexity analysis Experimental Results Runtime, memory usage, and optimality Power/Delay trade-off Incremental refinement
16
16 Experimental setup ClockTune is implemented in C++, executed on a 128MB 533MHz Pentium III PC Benchmarks r1 – r5 from Tsay et al. [ICCAD‘91] Initial routing generated by BB+DME algorithm with minimum wire width w = 1 m ClockTune uses w m = 1 m, w M = 4 m p: number of delay samples taken at every node q: number of wire width samples taken at every level-2 node r 0 = 0.03, c 0 = 2 10 -16 / m 2
17
17 Runtime and memory usage Runtime and memory usage are linear to problem size when p, q are fixed Within 1% optimality when p,q=256 (runtime < 6 minutes, memory ~ 64MB) p, q = 256# sink nodes# branchesRuntime (s)Memory (MB)Optimality r126752724.16.00.38% r2598118561.012.50.71% r38621710100.014.40.46% r419033787202.438.00.57% r531016170339.264.00.93%
18
18 Optimality results Optimality Error below 1% with p=q=256 Error reduced to about a half when resolution doubled
19
19 Power/Delay trade-off r5 Capacitance Delay 0.2~1.1nF 5~150ns Minimum power Minimum delay 15:1 delay:power trade-off
20
20 Incremental refinement DC region captures the design space Enables incremental refinement
21
21 Conclusion & Future Work Provide a zero-skew clock tree wire-sizing algorithm which Minimizes delay and area ε-optimally Guarantees pseudo-polynomial runtime and memory usage Provides delay/power trade-off information to designers Speeds up design convergence by allowing clock tree re- balancing with minimum changes Better delay model Buffer insertion/sizing capability
22
22 Thank you !
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.