A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu, Zhuo Li, and Charles J. Alpert Dept of ECE, Michigan Technological.

Slides:

Advertisements

Similar presentations

Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.

Advertisements

Gate Sizing for Cell Library Based Designs Shiyan Hu*, Mahesh Ketkar**, Jiang Hu* *Dept of ECE, Texas A&M University **Intel Corporation.

ECE 667 Synthesis and Verification of Digital Circuits

Advanced Interconnect Optimizations. Buffers Improve Slack RAT = 300 Delay = 350 Slack = -50 RAT = 700 Delay = 600 Slack = 100 RAT = 300 Delay = 250 Slack.

Ispd-2007 Repeater Insertion for Concurrent Setup and Hold Time Violations with Power-Delay Trade-Off Salim Chowdhury John Lillis Sun Microsystems University.

Yasuhiro Fujiwara (NTT Cyber Space Labs)

Buffer and FF Insertion Slides from Charles J. Alpert IBM Corp.

ELEN 468 Lecture 261 ELEN 468 Advanced Logic Design Lecture 26 Interconnect Timing Optimization.

Confidentiality/date line: 13pt Arial Regular, white Maximum length: 1 line Information separated by vertical strokes, with two spaces on either side Disclaimer.

EE 553 Integer Programming

Coupling-Aware Length-Ratio- Matching Routing for Capacitor Arrays in Analog Integrated Circuits Kuan-Hsien Ho, Hung-Chih Ou, Yao-Wen Chang and Hui-Fang.

© Yamacraw, 2001 Minimum-Buffered Routing of Non-Critical Nets for Slew Rate and Reliability A. Zelikovsky GSU Joint work with C. Alpert.

Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.

Minimum-Buffered Routing of Non- Critical Nets for Slew Rate and Reliability Control Supported by Cadence Design Systems, Inc. and the MARCO Gigascale.

Deterministic Wavelet Thresholding for Maximum-Error Metrics Minos Garofalakis Bell Laboratories Lucent Technologies 600 Mountain Avenue Murray Hill, NJ.

Layer Assignment Algorithm for RLC Crosstalk Minimization Bin Liu, Yici Cai, Qiang Zhou, Xianlong Hong Tsinghua University.

38 th Design Automation Conference, Las Vegas, June 19, 2001 Creating and Exploiting Flexibility in Steiner Trees Elaheh Bozorgzadeh, Ryan Kastner, Majid.

ER UCLA UCLA ICCAD: November 5, 2000 Predictable Routing Ryan Kastner, Elaheh Borzorgzadeh, and Majid Sarrafzadeh ER Group Dept. of Computer Science UCLA.

EE4271 VLSI Design Interconnect Optimizations Buffer Insertion.

Approximation Algorithms

Dean H. Lorenz, Danny Raz Operations Research Letter, Vol. 28, No

Interconnect Optimizations

EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.

UNC Chapel Hill Lin/Manocha/Foskey Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject.

System-Wide Energy Minimization for Real-Time Tasks: Lower Bound and Approximation Xiliang Zhong and Cheng-Zhong Xu Dept. of Electrical & Computer Engg.

A Global Minimum Clock Distribution Network Augmentation Algorithm for Guaranteed Clock Skew Yield A. B. Kahng, B. Liu, X. Xu, J. Hu* and G. Venkataraman*

EE4271 VLSI Design Advanced Interconnect Optimizations Buffer Insertion.

Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.

1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray.

Two Discrete Optimization Problems Problem #2: The Minimum Cost Spanning Tree Problem.

CDCTree: Novel Obstacle-Avoiding Routing Tree Construction based on Current Driven Circuit Model Speaker: Lei He.

Advanced Interconnect Optimizations. Timing Driven Buffering Problem Formulation Given –A Steiner tree –RAT at each sink –A buffer type –RC parameters.

Computational aspects of stability in weighted voting games Edith Elkind (NTU, Singapore) Based on joint work with Leslie Ann Goldberg, Paul W. Goldberg,

1 Coupling Aware Timing Optimization and Antenna Avoidance in Layer Assignment Di Wu, Jiang Hu and Rabi Mahapatra Texas A&M University.

Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.

An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.

New Modeling Techniques for the Global Routing Problem Anthony Vannelli Department of Electrical and Computer Engineering University of Waterloo Waterloo,

Approximation Algorithms for Knapsack Problems 1 Tsvi Kopelowitz Modified by Ariel Rosenfeld.

Thermal-aware Steiner Routing for 3D Stacked ICs M. Pathak and S.K. Lim Georgia Institute of Technology ICCAD 07.

1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.

Kwangsoo Han, Andrew B. Kahng, Hyein Lee and Lutong Wang

Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Pattern Sensitive Placement For Manufacturability Shiyan Hu, Jiang Hu Department of Electrical and Computer Engineering Texas A&M University College Station,

Pattern Sensitive Placement For Manufacturability Shiyan Hu, Jiang Hu Department of Electrical and Computer Engineering Texas A&M University College Station,

1 Efficient Obstacle-Avoiding Rectilinear Steiner Tree Construction Chung-Wei Lin, Szu-Yu Chen, Chi-Feng Li, Yao-Wen Chang, Chia-Lin Yang National Taiwan.

Fast Algorithms for Slew Constrained Minimum Cost Buffering S. Hu*, C. Alpert**, J. Hu*, S. Karandikar**, Z. Li*, W. Shi* and C. Sze** *Dept of ECE, Texas.

EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.

Column Generation By Soumitra Pal Under the guidance of Prof. A. G. Ranade.

Space-Efficient Online Computation of Quantile Summaries SIGMOD 01 Michael Greenwald & Sanjeev Khanna Presented by ellery.

Maze Routing Algorithms with Exact Matching Constraints for Analog and Mixed Signal Designs M. M. Ozdal and R. F. Hentschke Intel Corporation ICCAD 2012.

Reliable Multicast Routing for Software-Defined Networks.

Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject to some constraints. (There may.

Efficient Resource Allocation for Wireless Multicast De-Nian Yang, Member, IEEE Ming-Syan Chen, Fellow, IEEE IEEE Transactions on Mobile Computing, April.

1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.

A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.

A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.

TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.

An O(bn 2 ) Time Algorithm for Optimal Buffer Insertion with b Buffer Types Authors: Zhuo Li and Weiping Shi Presenter: Sunil Khatri Department of Electrical.

An O(nm) Time Algorithm for Optimal Buffer Insertion of m Sink Nets Zhuo Li and Weiping Shi {zhuoli, Texas A&M University College Station,

Honors Track: Competitive Programming & Problem Solving Seminar Topics Kevin Verbeek.

Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.

Data Driven Resource Allocation for Distributed Learning

Buffer Insertion with Adaptive Blockage Avoidance

EE5900 Advanced Embedded System For Smart Infrastructure

Objectives What have we learned? What are we going to learn?

Under a Concurrent and Hierarchical Scheme

Presentation transcript:

A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological University **IBM Austin Research Lab

2 Outline Introduction Problem Formulation Linear time dynamic programmingLinear time dynamic programming Bound independent oracle searchBound independent oracle search The Algorithm Experimental Results Conclusion

3 Layer Assignment 1X1X1X1X 2X2X2X2X 4X4X4X4X In 45nm technology, layer assignment is critical for timing and buffer area optimization In 45nm technology, layer assignment is critical for timing and buffer area optimization

4 Wire RC and Delay Wire in higher layer has much smaller delay

5 Impact to Buffering A buffer can drive longer distance in higher layer A buffer can drive longer distance in higher layer Timing is improved Timing is improved Fewer buffers are needed Fewer buffers are needed

6 Impact to Routing/Buffering IP IP

7 Problem Formulation Find a minimal cost layer assignment such that the timing constraint is satisfied. Find a minimal cost layer assignment such that the timing constraint is satisfied. Same Layer Can be different layers Given Given –A buffered Steiner tree with n wire segments –Timing constraint –m wire layers with RC parameters and cost A layer refers to a pair of horizontal and vertical layers with similar RC characteristics A layer refers to a pair of horizontal and vertical layers with similar RC characteristics Between any buffers, one layer is used Between any buffers, one layer is used In early design stage, when buffering effect is considered, wire shaping is not important [Alpert TCAD’01] In early design stage, when buffering effect is considered, wire shaping is not important [Alpert TCAD’01] In post-routing stage, wire shaping could improve timing, reduce vias and reduce coupling and so forth In post-routing stage, wire shaping could improve timing, reduce vias and reduce coupling and so forth

Fully Polynomial Time Approximation Scheme (FPTAS) 8 A Fully Polynomial Time Approximation Scheme A Fully Polynomial Time Approximation Scheme Provably good Provably good Within (1+ ɛ ) optimal cost for any ɛ >0 Within (1+ ɛ ) optimal cost for any ɛ >0 Runs in time polynomial in n (segments), m (layers) and 1/ ɛ Runs in time polynomial in n (segments), m (layers) and 1/ ɛ Ultimate solution for an NP-hard problem in theory Ultimate solution for an NP-hard problem in theory Highly practical Highly practical 1X1X1X1X 2X2X2X2X 4X4X4X4X

Previous Work in ICCAD’08 It depends on M and uses a DP of O(mn 3 / ɛ 2 ) time It depends on M and uses a DP of O(mn 3 / ɛ 2 ) time 9 Bound independent oracle query Our DP needs one run for all W New FPTAS runs in O(mn 2 / ɛ ) time New FPTAS runs in O(mn 2 / ɛ ) time Ratio between upper and lower bounds of the cost of optimal layer assignment An iterative DP with incremental W

10 The Rough Picture W*: the cost of optimal solution Check it Make guess on W* Return the solution Good (close to W*) Not Good Key 2: Smart guess Key 1: Efficient checking

11 Key 1: Efficient Checking Benefit of guess Only maintain the solutions with cost no greater than the guessed cost Only maintain the solutions with cost no greater than the guessed cost Accelerate DP Accelerate DP

Oracle (x): the checker, able to decide whether x>W* or not Oracle (x): the checker, able to decide whether x>W* or not – Without knowing W* – Answer efficiently 12 The Oracle Oracle (x) Guess x within the bounds Setup upper and lower bounds of cost W* Update the bounds

13 Construction of Oracle(x) Scale and round each wire cost Only interested in whether there is a solution with cost up to x satisfying timing constraint Dynamic Programming Perform DP to scaled problem with cost bound n/ ɛ. Time polynomial in n/ ɛ

14 Scaling and Rounding ɛ x ɛ /n ɛ 2x ɛ /n ɛ 3x ɛ /n ɛ 4x ɛ /n Wire cost 0 Wire cost is integer after scaling and rounding with upper bound n/ ɛ. Total # solutions is bounded in DP Rounding error at each wire ɛ, total rounding error ɛ. Rounding error at each wire  x ɛ /n, total rounding error  x ɛ. Larger x: larger error, fewer distinct costs and faster Larger x: larger error, fewer distinct costs and faster Smaller x: smaller error, more distinct costs and slower Smaller x: smaller error, more distinct costs and slower Rounding is the reason of acceleration Rounding is the reason of acceleration

Dynamic Programming Results 15 Yes, there is a solution satisfying timing constraint No, no such solution With cost rounding back, the solution has cost at most n/ ɛ x ɛ /n + x ɛ = (1+ ɛ )x > W* With cost rounding back, the solution has cost at least n/ ɛ x ɛ /n = x  W* DP result w/ all w are integers  n/ ɛ

16 Solution Characterization To model effect to upstream, a candidate solution is associated with To model effect to upstream, a candidate solution is associated with v: a node v: a node Q: required arrival time Q: required arrival time W: cumulative wire cost W: cumulative wire cost

17 Cost (W)-Bounded Dynamic Programming (DP) Candidate solutions are propagated toward the source Start from sinks Candidate solutions are generated Two operations – –Subtree processing – –Solution update at buffer Solution Pruning

18 Subtree Processing Three paths Three paths –p a : a -> u –P b : b -> u –P c : c -> u Q u (l)=min{Q a -d(p a,l),Q b - d(p b,l),Q c -d(p c,l)} Q u (l)=min{Q a -d(p a,l),Q b - d(p b,l),Q c -d(p c,l)} W u (l)=W a +W b +W c +w(T,l) W u (l)=W a +W b +W c +w(T,l) Wires are in the same layer l Wires are in the same layer l (Q u,W u ) (Q a,W a ) (Q b,W b ) (Q c,W c )

19 Exponential # of Solutions W (=n/ ɛ ) solutions at each downstream buffer W (=n/ ɛ ) solutions at each downstream buffer Naïve merging takes O(W k ) time with k branches Naïve merging takes O(W k ) time with k branches (Q a,1,W a,1 ) (Q a,2,W a,2 ) (Q a,3,W a,3 ) (Q a,4,W a,4 ) (Q u,W a ) (Q b,1,W b,1 ) (Q b,2,W b,2 ) (Q b,3,W b,3 ) (Q b,4,W b,4 ) (Q c,1,W c,1 ) (Q c,2,W c,2 ) (Q c,3,W c,3 ) (Q c,4,W c,4 ) k For two solutions at a node with the same W, the one with smaller Q is dominated For two solutions at a node with the same W, the one with smaller Q is dominated Try to only generate non-dominated solutions since most of O(W k ) solutions are dominated solutions Try to only generate non-dominated solutions since most of O(W k ) solutions are dominated solutions

20 Multi-Way Merging If best Q for cost w is obtained by merging Q(a 1 i1 ), Q(a 2 i2 ),..., Q(a k ik ), where i 1 +i 2 +…i k =w, best Q for cost w+1 is obtained by If best Q for cost w is obtained by merging Q(a 1 i1 ), Q(a 2 i2 ),..., Q(a k ik ), where i 1 +i 2 +…i k =w, best Q for cost w+1 is obtained by max 1 r k min {Q(a 1 i1 ),Q(a 2 i2 ),..., Q(a r ir+1 ),...,Q(a k ik )} max 1  r  k min {Q(a 1 i1 ),Q(a 2 i2 ),..., Q(a r ir+1 ),...,Q(a k ik )}

Four-Branch Example 21 Solution(w=8, Q=9) is shown. To compute Solution (w=9, Q)

Four-Branch Example – Case 1 22 Candidate Solution (w=9, Q=8)

Four-Branch Example – Case 2 23 Candidate Solution (w=9, Q=4)

Four-Branch Example – Case 3 24 Candidate Solution (w=9, Q=5)

Four-Branch Example – Case 4 25 Candidate Solution (w=9, Q=7)

Linear Time Multi-Way Merging 26 Lemma: given a subtree with m layers, k branches and W non-dominated solutions at each downstream buffer, one can merge them in O(mkW) time. Lemma: given a subtree with m layers, k branches and W non-dominated solutions at each downstream buffer, one can merge them in O(mkW) time.

Solution Update at Buffer 27 After merging, one non- dominated solution per layer per cost, totally O(mW) solutions After merging, one non- dominated solution per layer per cost, totally O(mW) solutions For each cost, find largest Q for all layers after buffer and propagate it For each cost, find largest Q for all layers after buffer and propagate it (Q u,W u ) (Q a,W a ) (Q b,W b ) (Q c,W c )

28 Cost-Bounded DP Lemma: given a tree with n wire segments and m layers, the optimal layer assignment subject to cost budget W=n/ ɛ can be computed in O(mnW)=O(mn 2 / ɛ ) time. Lemma: given a tree with n wire segments and m layers, the optimal layer assignment subject to cost budget W=n/ ɛ can be computed in O(mnW)=O(mn 2 / ɛ ) time.

29 Key 2: Bound Independent Guess U (L): upper (lower) bound on W* U (L): upper (lower) bound on W* Naive binary search style approach Naive binary search style approach Runtime depends on the initial bounds U and L Runtime depends on the initial bounds U and L Oracle (x) ɛ x=(U+L)/2 and W=n/ ɛ Set U and L on W* (1+ ɛ )x U= (1+ ɛ )x L= x W*<(1+ ɛ )x W*  x

30 Adapt ɛ ɛ Rounding factor x ɛ /n for cost Larger ɛ : faster with rough estimation Larger ɛ : faster with rough estimation Smaller ɛ : slower with accurate estimation Smaller ɛ : slower with accurate estimation Adapt ɛ and relate it with U and L Adapt ɛ and relate it with U and L

31 U/L Related Scale & Round Wire cost 0 U/L x ɛ /n

32 Conceptually Begin with large ɛ ’ and progressively reduce it according to U/L as x approaches W* Begin with large ɛ ’ and progressively reduce it according to U/L as x approaches W* Set ɛ ’ ɛ Set ɛ ’ as a geometric sequence of …, 8, 4, 2, 1, 1/2, …, ɛ ɛ Total runtime is O(… + n/8 + n/4 + n/2 + … + n/ ɛ ) = O(n/ ɛ ). Independent of # of iterations One run of DP takes about O(n/ ɛ ) time. Total runtime is O(… + n/8 + n/4 + n/2 + … + n/ ɛ ) = O(n/ ɛ ). Independent of # of iterations

Oracle Query Till U/L<2 33

When U/L<2 34 At least one feasible solution, otherwise no solution w/ cost 2n/ ɛ L ɛ /n = 2L  U At least one feasible solution, otherwise no solution w/ cost 2n/ ɛ L ɛ /n = 2L  U Runs in O(mn 2 / ɛ ) time Runs in O(mn 2 / ɛ ) time Pick min cost solution satisfying timing at driver W=2n/ ɛ Scale and round each cost by L ɛ /n Scale and round each cost by L ɛ /n Run DP

35 FPTAS for Layer Assignment  Theorem: a (1+ ɛ ) approximation to the timing constrained minimum cost layer assignment problem can be computed in O(mn 2 / ɛ ) time for any ɛ >0.

36 The Algorithmic Flow Oracle (x) Adapting ɛ =[U/L-1] 1/2 Set U and L of W* Set x=[UL/(1+ ɛ )] 1/2 Update U or L U/L<2 Compute final solution

37 Experiments Experimental Setup Experimental Setup – 1000 industrial nets Compared to Dynamic Programming and the previous FPTAS [ICCAD’08] Compared to Dynamic Programming and the previous FPTAS [ICCAD’08]

38 Cost Ratio Compared to DP Approximation Ratio ɛ Wire Cost Ratio

39 Speedup Compared to DP Approximation Ratio ɛ Speedup

40 Observations FPTAS always achieves the theoretical guarantee FPTAS always achieves the theoretical guarantee Larger ɛ leads to more speedup Larger ɛ leads to more speedup 3.9x faster with 2.2% additional wire area compared to DP 3.9x faster with 2.2% additional wire area compared to DP Up to 6.5x faster than DP Up to 6.5x faster than DP On average about 2x faster than previous FPTAS On average about 2x faster than previous FPTAS

41 Conclusion Propose a (1+ ɛ ) approximation for timing constrained layer assignment for any ɛ > 0 running in O(mn 2 / ɛ ) time Propose a (1+ ɛ ) approximation for timing constrained layer assignment for any ɛ > 0 running in O(mn 2 / ɛ ) time –Linear time DP running in O(mnW) time –Bound independent oracle query –Up to 6.5x faster than DP and 2x faster than previous FPTAS –Few percent additional wire area compared to DP as guaranteed theoretically

42 Thanks