Performance-Driven Interconnect Optimization Charlie Chung-Ping Chen

Slides:



Advertisements
Similar presentations
Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.
Advertisements

Topics Electrical properties of static combinational gates:
Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
EE 201A Modeling and Optimization for VLSI LayoutJeff Wong and Dan Vasquez EE 201A Noise Modeling Jeff Wong and Dan Vasquez Electrical Engineering Department.
Advanced Interconnect Optimizations. Buffers Improve Slack RAT = 300 Delay = 350 Slack = -50 RAT = 700 Delay = 600 Slack = 100 RAT = 300 Delay = 250 Slack.
Ispd-2007 Repeater Insertion for Concurrent Setup and Hold Time Violations with Power-Delay Trade-Off Salim Chowdhury John Lillis Sun Microsystems University.
4/22/ Clock Network Synthesis Prof. Shiyan Hu Office: EREC 731.
Buffer and FF Insertion Slides from Charles J. Alpert IBM Corp.
ELEN 468 Lecture 261 ELEN 468 Advanced Logic Design Lecture 26 Interconnect Timing Optimization.
1 Interconnect Layout Optimization by Simultaneous Steiner Tree Construction and Buffer Insertion Presented By Cesare Ferri Takumi Okamoto, Jason Kong.
Minimum-Buffered Routing of Non- Critical Nets for Slew Rate and Reliability Control Supported by Cadence Design Systems, Inc. and the MARCO Gigascale.
Layer Assignment Algorithm for RLC Crosstalk Minimization Bin Liu, Yici Cai, Qiang Zhou, Xianlong Hong Tsinghua University.
Interconnect Optimization for Deep-Submicron and Giga-Hertz ICs Lei He UCLA Computer Science Department Los Angeles, CA.
Interconnect Optimizations. A scaling primer Ideal process scaling: –Device geometries shrink by  = 0.7x) Device delay shrinks by  –Wire geometries.
EE4271 VLSI Design Interconnect Optimizations Buffer Insertion.
04/09/02EECS 3121 Lecture 25: Interconnect Modeling EECS 312 Reading: 8.3 (text), 4.3.2, (2 nd edition)
UCLA TRIO Package Jason Cong, Lei He Cheng-Kok Koh, and David Z. Pan Cheng-Kok Koh, and David Z. Pan UCLA Computer Science Dept Los Angeles, CA
Interconnect Optimizations
EE 201A (Starting 2005, called EE 201B) Modeling and Optimization for VLSI Layout Instructor: Lei He
A Global Minimum Clock Distribution Network Augmentation Algorithm for Guaranteed Clock Skew Yield A. B. Kahng, B. Liu, X. Xu, J. Hu* and G. Venkataraman*
EE4271 VLSI Design Advanced Interconnect Optimizations Buffer Insertion.
High-Speed Circuit-Tuning Techniques Based on Lagrangian Relaxation Charlie Chung-Ping Chen (608)
ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Interconnect Timing Optimization II.
Interconnect Synthesis. Buffering Related Interconnect Synthesis Consider –Layer assignment –Wire sizing –Buffer polarity –Driver sizing –Generalized.
Advanced Interconnect Optimizations. Timing Driven Buffering Problem Formulation Given –A Steiner tree –RAT at each sink –A buffer type –RC parameters.
MOS Inverter: Static Characteristics
Modern VLSI Design 4e: Chapter 4 Copyright  2008 Wayne Wolf Topics n Interconnect design. n Crosstalk. n Power optimization.
1 Aggressive Crunching of Extracted RC Netlists Vasant Rao, Jeff Soreff, Ravi Ledalla (IBM EDA, Fishkill, NY), Fred Yang (IBM EDA, Almaden, CA)
EE 5900 Advanced Algorithms for Robust VLSI CAD, Spring 2009 Static Timing Analysis and Gate Sizing.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
Modern VLSI Design 3e: Chapter 4 Copyright  1998, 2002 Prentice Hall PTR Topics n Interconnect design. n Crosstalk. n Power optimization.
A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.
Physical Synthesis Buffer Insertion, Gate Sizing, Wire Sizing,
Modern VLSI Design 4e: Chapter 3 Copyright  2008 Wayne Wolf Topics n Wire delay. n Buffer insertion. n Crosstalk. n Inductive interconnect. n Switch logic.
FPGA-Based System Design: Chapter 2 Copyright  2004 Prentice Hall PTR Topics n Logic gate delay. n Logic gate power consumption. n Driving large loads.
Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Wire delay. n Buffer insertion. n Crosstalk. n Inductive interconnect.
High-Speed Circuit-Tuning Techniques Based on Lagrangian Relaxation Charlie Chung-Ping Chen ICCAD 99’ Embedded Tutorial Session 12A
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
Effects of Inductance on the Propagation Delay and Repeater Insertion in VLSI Circuits Yehea I. Ismail and Eby G. Friedman, Fellow, IEEE.
A Novel Timing-Driven Global Routing Algorithm Considering Coupling Effects for High Performance Circuit Design Jingyu Xu, Xianlong Hong, Tong Jing, Yici.
1 Modeling and Optimization of VLSI Interconnect Lecture 2: Interconnect Delay Modeling Avinoam Kolodny Konstantin Moiseev.
An O(bn 2 ) Time Algorithm for Optimal Buffer Insertion with b Buffer Types Authors: Zhuo Li and Weiping Shi Presenter: Sunil Khatri Department of Electrical.
An O(nm) Time Algorithm for Optimal Buffer Insertion of m Sink Nets Zhuo Li and Weiping Shi {zhuoli, Texas A&M University College Station,
1 Clarinet: A noise analysis tool for deep submicron design Rafi Levy Gabi Bracha, David Blaauw, Aurobindo Dasgupta, Amir Grinshpon,
הטכניון - מ.ט.ל. הפקולטה להנדסת חשמל - אביב תשס"ה
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 8: September 21, 2012 Delay and RC Response.
COE 360 Principles of VLSI Design Delay. 2 Definitions.
Wires & wire delay Lecture 9 Tuesday September 27, 2016.
Worst Case Crosstalk Noise for Nonswitching Victims in High-Speed Buses Jun Chen and Lei He.
Circuit characterization and Performance Estimation
The Interconnect Delay Bottleneck.
Static Timing Analysis and Gate Sizing Optimization
Topics Driving long wires..
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Jason Cong, David Zhigang Pan & Prasanna V. Srinivas
Chapter 2 Interconnect Analysis
Chapter 2 Interconnect Analysis Delay Modeling
Buffer Insertion with Adaptive Blockage Avoidance
Static Timing Analysis and Gate Sizing Optimization
Performance Analysis (Clock Signal) مرتضي صاحب الزماني.
Chapter 2 Interconnect Analysis Delay Modeling
Buffered tree construction for timing optimization, slew rate, and reliability control Abstract: With the rapid scaling of IC technology, buffer insertion.
Chapter 3b Static Noise Analysis
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
Homework 2 [due Jan. 31] [1] Given the circuit as shown below and a unit step voltage source at the input node s, use SPICE to simulate the circuit and.
Reducing Clock Skew Variability via Cross Links
Lecture #22 ANNOUNCEMENTS OUTLINE Reading (Rabaey et al.)
Introduction to CMOS VLSI Design Lecture 4: DC & Transient Response
Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.
Jason Cong, David Zhigang Pan & Prasanna V. Srinivas
Presentation transcript:

Performance-Driven Interconnect Optimization Charlie Chung-Ping Chen

Publications A Fast algorithm for Optimal Wire-Sizing Under Elmore Delay Model, ISCAS, 1995. Optimal Wire-Sizing Formula Under the Elmore Delay Model, DAC, 1996. Optimal Wire-Sizing Formula Under the Elmore Delay Model, ACM Physical Deisgn Work Shop, 1996. Performance-Driven Buffered Clock Tree Optimization Based on Lagrangian Relaxation , DAC, 1996. Optimal Non-Uniform Wire-Sizing for Routing Trees, ICCAD, 1996. Optimal Wire-Sizing Funciton with Fringing Capacitance Consideration, DAC 1997. Spec-Based Buffer Insertion and Wire-Sizing for RC Nets, DTTC, 1997. Fast and Exact Simulataneous Transistor and Wire-Sizng by Lagrangian Relaxation, ICCAD, 98.

Outline Interconnect Optimization Thesis Wire Sizing Buffer Sizing Buffer Insertion Interconnect Simulation

Interconnect Delay Trend

Interconnect Delay Trend

While gate delay diminishes, wire delay grows (as the wire becomes thinner and the isolation layers are thinner). Today, signal propagates over wires at about C/40 (in Al: 43mm per 7ps) In 15 years, even with Cu, we expect only C/80 speeds. Gate delay in this graph is not the logical gate delay. Rather, it is t=RC where R is on resistance of a minimal transistor and C is the load of a minimal transistor gate.

Wire-Sizing x1 x2 x3 x4

Buffer Sizing x1 x2

Buffer Insertion

Interconnect Model (area capacitance) r0 L x x ca L x 2 ca L x 2

Driver Model Rd

Elmore Delay Computation Rd R1 R2 R3 R4 CL C1 C2 C3 C4

Uniform Wire-Sizing x1 x2 x3 x4

Non-uniform Wire-Sizing y Rd f(x) x CL

Optimal Wire-Sizing Function: Exponential Tapering y Rd x CL

Interconnect Model (Fringing Capacitance) r0 L x x ca L x+cf L 2 ca L x+cf L 2

Optimal Wire-Sizing Function: W-function tapering y f(x)=ae-bx CL

Lambert’s W function w wew=x 0.4 -1 x -0.4 0.4

Optimal Wire-Sizing Function Cf

Constrained Wire-Sizing B C A B B C A B C

Relations among the six types of functions ABC AB BC A B C L Rd CL

Wire-Sizing for Routing Trees D w 4 1 w 2 w D w 2 1 f 2 5 f 1 f 3 w w 3 7 D 3

Weighted Sink Delay Optimization

Optimally Resizing One Segment

Minimizing Area with Delay Constraints

Minimizing Area with Delay Constraints Lagrangian Relaxation Subproblem

Minimizing Area with Delay Constraints

Minimize Weighted Sink Delays Algorithm Framework Adjust Lagramge Multipliers (Sink Weights) Minimize Weighted Sink Delays

Minimizing Maximum Delay Lagrangian Relaxation Subproblem

Delay/Power/Area Minimization

Simulataneous Buffer and Wire-Sizing 4 1 w 2 w D w 2 1 f 2 5 f 1 f 3 w w 3 7 D 3

Uniform Wire-Sizing D w 4 1 w 2 w D w 2 1 5 f 1 w w 3 7 D 3

Upperbound, Lowerbound, Skew

Buffer Insertion 1

Spec-based Buffer Insertion Given a routing tree, possible buffer insert location, required arrival times at receivers, max slope constraint, and polarity requirement A library of buffers: B1, B2, ..., Bn Insert buffers to satisify the spec (maximum delay, delay bounds at each recivers and maximum slope) 1 2 potential buffer location 3 17

Problem formulation Combinations of the following: Buffer Insertion Buffer-Sizing Wire-Sizing Goals and Constraints Minimize the Maximum Delay Satisfy delay constraints at each receiver Repeater Insertion Location Constraints Maximum Slope constraint Polarity constraints. 6

Current Issues Previous solutions Exhaustive enumeration-> Exponentially Growing First Ginneken, and then, Lillis, suggested a dynamic programming approach which can get optimal solution for delay under the Elmore delay model . Provides very useful information like power-delay curves Problems: not accurate, doesn’t consider reliability issues, runtime and storage already high. 2

Brief Algorithm Description Traverse circuit in a bottom-up manner Enumerate all the possible solutions and prunes out sub-optimal solutions dynamically. How do we know which solution to kill? Violate the constraints If there is another solution cost less and achieve more in all aspects? Number of buffers Maximum delay Polarity Area, Power ... 17

Example 1 2 1 2 1 2

Example 1 1 2 2 1 1 2 2

Problems How to caculate the gate and interconnect delay accurately and efficiently? A naive approach: Repeately caculate the delay of the subtree by calling AWE or SPICE (causing O(N) penalty for each solution). However, the runtime is already high (proportional to N2). An efficiently hierarchical delay computaion method is needed How to include slope into consideration?

Accurate Load Model (Effective Capacitance) The total-net capacitance is no longer a valid load model -- the second-order p-load driving-point admittance approximation is more accurate 10000 mm line 175 mm/100 mm driver 232.8 W 232.8 W 232.8 W 0.178 pF 0.356 pF 0.356 pF 0.178 pF too pessimistic, up 30% error equal average currents 364.2 W 1.07 pF 0.76 pF 0.226 pF 0.884 pF Total capacitance load model Second-order p-load model Effective capacitance load model 7

Accurate Repeater Model (Voltage Ramp) Several timing analyzers model the gate by a single resistor Errors of up to 30% have been reported The proposed gate delay model is a fixed-resistor driven by a ramp voltage source Voltage ramp parameters, t0 and tx, are determined from the Gate characteric equations Reff CMOS gate tin { fixed ZL(s) t0 tx ZL(s) 9

Accuracy of Voltage Ramp Model 2.9 1.9 Model voltage source Model output Actual output 0.9 -0.1 0.5 1.0 0.0 t (ns) 11

How to calculate p-load hierarchically There exists a simple way to calculate the p load (actually it can handle arbitrary higher order approximation) hierarchically. R(s) 1 Y1(s) Ynew(s) = 1 R(s)+ Y2(s) Yeq(s) Y1(s)+ Y2(s) Taylor Expansion: Yeq(s) =y1s + y2s2 + y3s3 + y4s4 + y5s5 + y6s6 Yeq(s)

What about wires? H (s) V2(s) = V1(s) Transfer function computation Y1(s) Y1(s) Y2(s) V2(s) = V1(s) Y1(s)+ Y2(s) Taylor Expansion: H(s) =m0 +m1s+ m2s2 +m3s3 +m4s4 +m5s5 +m6s6 Transfer function computation

Many Stages H1 (s) H2 (s) Yeq(s) H1 (s) H2 (s) V3(s) = V1(s) V2(s) Y1(s)+ Yeq(s) Y3(s)+ Y4(s) V2(s) Transfer function computation

What about Trees? H1 (s) H2 (s) Yeq(s) H2 (s) Yeq(s) V1(s) H2 (s) Y1(s) Yeq(s) H2 (s) V2(s) Y2(s) Yeq(s) Keep track the worse sink’s transfer function

Hierarchical moment computation -- REX Assume in general H(s) = 1/[b0 + b1s + b2s2 + b3s3 + b4s4 + b5s5] Y(s) = y1s + y2s2 + y3s3 + y4s4 + y5s5 + y6s6 Across a capacitor H’(s) = H(s) Y’(s) = Y(s) + Cs Across a resistor H’(s) = H(s)/[1 + R Y(s)] Y’(s) = Y(s)/[1 + R Y(s)] Base case at the receiver H(s) = 1 Y(s) = CLs H(s) Y(s) H(s) C Y(s) H(s) R Y(s) CL

With buffer inserted and slope consideration? slope delay 100 400 200 450 V1(s) slope delay 100 425 200 ...... slope:150 Y1(s) slope delay 100 380 200 430 V2(s) slope:180 Y2(s) Interpolate and extrapolate the delay at receivers

Results Sample net -- 10000 mm line on m1pm2 broken into 40 segments Delay before optimization -- 2405 ps Time for optimization -- 27 seconds on RS6K Stages 1 2 3 4 OUR 2467 1736 1388 1267 1218 SPICE 2405 1761 1404 1301 1274 % error 2.5 -1.4 -1.1 -2.6 -4.3

Cost-Performance Curves # of repeaters vs max delay (ps) 2500 2000 1500 OUR max delay SPICE 1000 500 1 2 3 4 # of repeaters

Case Study: 22.4 slope:100 mcf=1.5 Roses report: Cse report: 16.5 2.4/1000 1.2 m4 2.4/3000 1.8 m5 2.0/4000 1.2 m4 2.0/4000 1.2 m4 2.4/500 1.2 m4 1.2/1000 0.8 m3 22.4 slope:100 16.5 2.0/2500 0.8 m3 16.5 2.0/1200 1.2 m4 1.6/1300 1.2 m4 mcf=1.5 2.0/1500 0.8 m3 16.5 Roses report: Delay=1529 ps Cse report: Delay=1548 ps 16.5

Case Study: Manual Result (8 buffer) 2.4/1000 1.2 m4 2.4/3000 1.8 m5 2.0/4000 1.2 m4 2.0/4000 1.2 m4 2.4/500 1.2 m4 1.2/1000 0.8 m3 16.5 102 90 93 102 27 1.1 22.4 slope:100 16.5 2.0/2500 0.8 m3 32 76 max slope:294 16.5 2.0/1200 1.2 m4 1.6/1300 1.2 m4 2.0/1500 0.8 m3 16.5 Roses report: Delay=1047 ps Cse report: Delay=1053 ps 16.5

Case Study: Our Result (5 buffers) 2.4/1000 1.2 m4 2.4/3000 1.8 m5 2.0/4000 1.2 m4 2.0/4000 1.2 m4 2.4/500 1.2 m4 1.2/1000 0.8 m3 16.5 70 70 60 10 22.4 slope:100 2000 2000 16.5 2.0/2500 0.8 m3 40 max slope:260 16.5 2.0/1200 1.2 m4 1.6/1300 1.2 m4 2.0/1500 0.8 m3 16.5 Roses report: Delay=953 ps SPICE report: Delay=1017 ps 16.5

Runtime Report

# of wire segment vs maximum delay # of repeaters

Conclusion Buffer model provides about 5~7% accuracy relative to SPICE The total net capacitance is no longer a valid load approximation Using accurate models aid Hierarchical moments computation of RC delays and slopes New Moment-matching methods provide efficient and accurate delay calculation for RC nets especificaly for hierarchical moment computation Dynamic programming approaches applied to buffer insertion Hierarchical moment methods for efficient RC delay computation