UCLA TRIO Package Jason Cong, Lei He Cheng-Kok Koh, and David Z. Pan Cheng-Kok Koh, and David Z. Pan UCLA Computer Science Dept Los Angeles, CA 90095.

Slides:



Advertisements
Similar presentations
Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.
Advertisements

Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
EE 201A Modeling and Optimization for VLSI LayoutJeff Wong and Dan Vasquez EE 201A Noise Modeling Jeff Wong and Dan Vasquez Electrical Engineering Department.
Advanced Interconnect Optimizations. Buffers Improve Slack RAT = 300 Delay = 350 Slack = -50 RAT = 700 Delay = 600 Slack = 100 RAT = 300 Delay = 250 Slack.
4/22/ Clock Network Synthesis Prof. Shiyan Hu Office: EREC 731.
Buffer and FF Insertion Slides from Charles J. Alpert IBM Corp.
ELEN 468 Lecture 261 ELEN 468 Advanced Logic Design Lecture 26 Interconnect Timing Optimization.
Confidentiality/date line: 13pt Arial Regular, white Maximum length: 1 line Information separated by vertical strokes, with two spaces on either side Disclaimer.
1 Interconnect Layout Optimization by Simultaneous Steiner Tree Construction and Buffer Insertion Presented By Cesare Ferri Takumi Okamoto, Jason Kong.
Leakage and Dynamic Glitch Power Minimization Using MIP for V th Assignment and Path Balancing Yuanlin Lu and Vishwani D. Agrawal Auburn University ECE.
© Yamacraw, 2001 Minimum-Buffered Routing of Non-Critical Nets for Slew Rate and Reliability A. Zelikovsky GSU Joint work with C. Alpert.
Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.
Minimum-Buffered Routing of Non- Critical Nets for Slew Rate and Reliability Control Supported by Cadence Design Systems, Inc. and the MARCO Gigascale.
Circuit Retiming with Interconnect Delay CUHK CSE CAD Group Meeting One Evangeline Young Aug 19, 2003.
38 th Design Automation Conference, Las Vegas, June 19, 2001 Creating and Exploiting Flexibility in Steiner Trees Elaheh Bozorgzadeh, Ryan Kastner, Majid.
Interconnect Optimization for Deep-Submicron and Giga-Hertz ICs Lei He UCLA Computer Science Department Los Angeles, CA.
Interconnect Optimizations. A scaling primer Ideal process scaling: –Device geometries shrink by  = 0.7x) Device delay shrinks by  –Wire geometries.
ER UCLA UCLA ICCAD: November 5, 2000 Predictable Routing Ryan Kastner, Elaheh Borzorgzadeh, and Majid Sarrafzadeh ER Group Dept. of Computer Science UCLA.
EE4271 VLSI Design Interconnect Optimizations Buffer Insertion.
Chapter 7 Reading on Moment Calculation. Time Moments of Impulse Response h(t) Definition of moments i-th moment Note that m 1 = Elmore delay when h(t)
Power Optimal Dual-V dd Buffered Tree Considering Buffer Stations and Blockages King Ho Tam and Lei He Electrical Engineering Department University of.
Interconnect Optimizations
An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunathan 2 1 EE Department,
Effects of Global Interconnect Optimizations on Performance Estimation of Deep Sub-Micron Design Yu (Kevin) Cao 1, Chenming Hu 1, Xuejue Huang 1, Andrew.
Fast Buffer Insertion Considering Process Variation Jinjun Xiong, Lei He EE Department University of California, Los Angeles Sponsors: NSF, UC MICRO, Actel,
A Global Minimum Clock Distribution Network Augmentation Algorithm for Guaranteed Clock Skew Yield A. B. Kahng, B. Liu, X. Xu, J. Hu* and G. Venkataraman*
EE4271 VLSI Design Advanced Interconnect Optimizations Buffer Insertion.
High-Speed Circuit-Tuning Techniques Based on Lagrangian Relaxation Charlie Chung-Ping Chen (608)
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Interconnect Timing Optimization II.
Gate Sizing by Mathematical Programming Prof. Shiyan Hu
Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.
RLC Interconnect Modeling and Design Students: Jinjun Xiong, Jun Chen Advisor: Lei He Electrical Engineering Department Design Automation Group (
Effects of Global Interconnect Optimizations on Performance Estimation of Deep Sub-Micron Design Yu Cao, Chenming Hu, Xuejue Huang, Andrew B. Kahng, Sudhakar.
Interconnect Synthesis. Buffering Related Interconnect Synthesis Consider –Layer assignment –Wire sizing –Buffer polarity –Driver sizing –Generalized.
Advanced Interconnect Optimizations. Timing Driven Buffering Problem Formulation Given –A Steiner tree –RAT at each sink –A buffer type –RC parameters.
Modern VLSI Design 4e: Chapter 4 Copyright  2008 Wayne Wolf Topics n Interconnect design. n Crosstalk. n Power optimization.
Lecture 12 Review and Sample Exam Questions Professor Lei He EE 201A, Spring 2004
EE 5900 Advanced Algorithms for Robust VLSI CAD, Spring 2009 Static Timing Analysis and Gate Sizing.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
Thermal-aware Steiner Routing for 3D Stacked ICs M. Pathak and S.K. Lim Georgia Institute of Technology ICCAD 07.
Optimal digital circuit design Mohammad Sharifkhani.
ECO Timing Optimization Using Spare Cells Yen-Pin Chen, Jia-Wei Fang, and Yao-Wen Chang ICCAD2007, Pages ICCAD2007, Pages
Modern VLSI Design 3e: Chapter 4 Copyright  1998, 2002 Prentice Hall PTR Topics n Interconnect design. n Crosstalk. n Power optimization.
A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.
ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Gate and Interconnect Optimization.
1 ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National.
Fast Algorithms for Slew Constrained Minimum Cost Buffering S. Hu*, C. Alpert**, J. Hu*, S. Karandikar**, Z. Li*, W. Shi* and C. Sze** *Dept of ECE, Texas.
Physical Synthesis Buffer Insertion, Gate Sizing, Wire Sizing,
Modern VLSI Design 4e: Chapter 3 Copyright  2008 Wayne Wolf Topics n Wire delay. n Buffer insertion. n Crosstalk. n Inductive interconnect. n Switch logic.
Routing Tree Construction with Buffer Insertion under Obstacle Constraints Ying Rao, Tianxiang Yang Fall 2002.
Prof. Shiyan Hu Office: EERC 518
Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Wire delay. n Buffer insertion. n Crosstalk. n Inductive interconnect.
1ISPD'03 Process Variation Aware Clock Tree Routing Bing Lu Cadence Jiang Hu Texas A&M Univ Gary Ellis IBM Corp Haihua Su IBM Corp.
High-Speed Circuit-Tuning Techniques Based on Lagrangian Relaxation Charlie Chung-Ping Chen ICCAD 99’ Embedded Tutorial Session 12A
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
-1- Delay Uncertainty and Signal Criticality Driven Routing Channel Optimization for Advanced DRAM Products Samyoung Bang #, Kwangsoo Han ‡, Andrew B.
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
A Novel Timing-Driven Global Routing Algorithm Considering Coupling Effects for High Performance Circuit Design Jingyu Xu, Xianlong Hong, Tong Jing, Yici.
1 Timing Closure and the constant delay paradigm Problem: (timing closure problem) It has been difficult to get a circuit that meets delay requirements.
An O(bn 2 ) Time Algorithm for Optimal Buffer Insertion with b Buffer Types Authors: Zhuo Li and Weiping Shi Presenter: Sunil Khatri Department of Electrical.
An O(nm) Time Algorithm for Optimal Buffer Insertion of m Sink Nets Zhuo Li and Weiping Shi {zhuoli, Texas A&M University College Station,
Topics Driving long wires..
Jason Cong, David Zhigang Pan & Prasanna V. Srinivas
Buffer Insertion with Adaptive Blockage Avoidance
Chapter 3b Static Noise Analysis
Objectives What have we learned? What are we going to learn?
Performance-Driven Interconnect Optimization Charlie Chung-Ping Chen
Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.
Jason Cong, David Zhigang Pan & Prasanna V. Srinivas
Presentation transcript:

UCLA TRIO Package Jason Cong, Lei He Cheng-Kok Koh, and David Z. Pan Cheng-Kok Koh, and David Z. Pan UCLA Computer Science Dept Los Angeles, CA 90095

Optimal Interconnect Synthesis Constraints: Delay Skew Signal Integrity... Spacing Sizing Topology n n Automatic solutions guided by accurate interconnect models Optimized Interconnect designs:

UCLA TRIO Package n Technology advances lead to the need for interconnect- driven design n Interconnect optimization techniques for performance and signal integrity u Topology optimization u Buffer(repeater) insertion u Device sizing, wire sizing and spacing n TRIO: Tree, Repeater, and Interconnect Optimization n Goal: to develop a unified framework to apply various interconnect layout optimization techniques independently or simultaneously

Components of TRIO n Optimization engine u Tree construction u Buffer (repeater) insertion u Device sizing, wire sizing and spacing n Delay computation u Elmore delay model u Higher-order delay model n Device delay and interconnect capacitance model u Simple formula-based model u Table look-up based model

Optimization Engines of TRIO n Tree construction u A-tree, buffered A-tree, and RATS-tree n Buffer insertion n Wire sizing and spacing u Single-source wire sizing u Multi-source wire sizing u Global wire sizing and spacing with coupling cap n Simultaneous device and interconnect optimization: u Simultaneous buffer insertion and wire sizing u Simultaneous device and wire sizing F simple models for device delay and interconnect cap u Simultaneous device sizing, and wire sizing and spacing F table-based models for device delay and coupling cap

Classification of TRIO Algorithms n Bottom-up approach u A-tree [Cong-Leung-Zhou, DAC’93] u Buffered and wiresized A-tree [Okamoto-Cong, ICCAD’96] u RATS-tree [Cong-Koh, ICCAD’97] u Simultaneous buffer insertion and wire sizing [Lillis-Cheng-Lin, ICCAD’95] u Global interconnect sizing and spacing with coupling cap [Cong- He-Koh-Pan, ICCAD’97] n Local-refinement (LR) based approach u Single-source wire sizing [Cong-Leung, ICCAD’93] u Multi-source and variable-segmentation wire sizing [Cong-He, ICCAD’95] u Simultaneous driver/buffer and wire sizing [Cong-Koh, ICCAD’94, Cong-Koh-Leung, ISLPED’96] u Simultaneous device sizing, and wire sizing and spacing using table-based models for device delay and coupling cap [Cong-He, ICCAD’96, TCAD’99]

A- tree Algorithm [Cong-Leung-Zhou, DAC’93] n A-tree: Rectilinear Steiner arborescence (shortest path tree) n Resistance ratio: Driver resistance vs. unit wire resistance n As resistance ratio decreases, min-cost A-tree has better performance than Steiner minimal tree n A-tree algorithm u Start with a forest of n single-node A-trees, repeatedly F Grow an existing A-tree, or F Combine two A-trees into a new one

Buffer Insertion Algorithm [van Ginneken, ISCAS’90] n Given topology, buffer types, and candidate buffer locations, insert buffers to minimize maximum sink delay DC Connected subtree for i i

Optimal Buffer Insertion by Dynamic Programming n Bottom-up computation of irredundant set of options (c,q)’s at each buffer candidate location u Option (c,q), c: Cap. of DC-connected subtree c: Cap. of DC-connected subtree q: Req. arrival time corresponding to c q: Req. arrival time corresponding to c u Pruning Rule: For (c,q) and (c’, q’), (c’, q’) is redundant if c’  c and q’ < q u Total number of options in the source is polynomial-bounded n Top-down selection of optimal buffer types and buffer locations

Further Works on Bottom-up Approach n Simultaneous buffer insertion and wire sizing [Lillis- Chen-Lin, ICCAD’95] n Wiresized Buffered A-tree (WBA-tree) [Okamoto-Cong, ICCAD’96] u Combination of A-tree, simultaneous buffer insertion and wire sizing n Global interconnect sizing and spacing considering coupling cap [Cong-He-Koh-Pan, ICCAD’97] n RATS-tree [Cong-Koh, ICCAD’97] u Extension to higher-order delay model via bottom-up moment computation

Classification of TRIO Algorithms n Bottom-up approach u A-tree [Cong-Leung-Zhou, DAC’93] u Buffered and wiresized A-tree [Okamoto-Cong, ICCAD’96] u RATS-tree [Cong-Koh, ICCAD’97] u Simultaneous buffer insertion and wire sizing [Lillis-Cheng-Lin, ICCAD’95] u Global interconnect sizing and spacing with coupling cap [Cong- He-Koh-Pan, ICCAD’97] n Local-refinement (LR) based approach u Single-source wire sizing [Cong-Leung, ICCAD’93] u Multi-source and variable-segmentation wire sizing [Cong-He, ICCAD’95] u Simultaneous driver/buffer and wire sizing [Cong-Koh, ICCAD’94, Cong-Koh-Leung, ISLPED’96] u Simultaneous device sizing, and wire sizing and spacing using table-based models for device delay and coupling cap [Cong-He, ICCAD’96, TCAD’99]

Discrete Wiresizing Optimization [Cong-Leung, ICCAD’93] n Given: A set of possible wire widths { W 1, W 2, …, W r } n Find: An optimal wire width assignment to minimize weighted sum of sink delays Wiresizing Optimization

Dominance Relation and Local Refinement [Cong-Leung, ICCAD 93] n Dominance Relation For all E j, W(E j )  W'(E j ) n n Local Refinement of E  W dominates W' Given wire width assignment W compute optimal wire width of E assuming other wire width fixed in W W W’ W’

Dominance Property for Optimal Wiresizing n Theorem (Dominance Property): u Assignment W dominates optimal assignment W* W’ = local refinement of W Then, W’ dominates W* u If W is dominated by W* W’ = local refinement of W Then, W’ is dominated by W* W o = Min Width Assignment (dominated by opt. sol.)  W 1 = Local-Refinement(W 0 )  W 2 = Local-Refinement(W 1 ) Dominates n Application of Dominance Property u W i dominated by opt. sol.  lower bound computation

Further Works on LR-based Approach n Multi-source wire sizing optimization with variable segmentation [Cong-He, ICCAD’95] u Bundled local refinement (BLR) that is 100x faster than local refinement (LR) n Simultaneous driver/buffer and wire sizing [Cong-Koh, ICCAD’94, Cong-Koh-Leung, ISLPED’96] n Simultaneous device and wire sizing [Cong-He, PDW’96, ICCAD’96] n General case: extended local refinement (ELR) for three classes of CH-programs [Cong-He, ISPD’98, TCAD’99] u e.g., simultaneous device sizing, wire sizing and spacing under table models rather than simple models in most works n LR to minimize maximum delay via Lagrangian Relaxation [Chen-Chang-Wong, DAC’96]

Table-based Model for Device size = 100x c l \ t t 0.05ns 0.10ns0.20ns 0.225pf pf pf size = 400x c l \ t t 0.05ns 0.10ns0.20ns 0.501pf pf pf effective-resistance R 0 for unit-width n-transistor n R 0 depends on size, input transition time and output loading u Neither a constant nor a function of a single variable u Device sizing problem no longer has a unique local optimum n Lower and upper bounds of exact solution can be computed by ELR operation [Cong-He, ISPD’98, TCAD’99]

DCLK simple-model table-model sgws 1.16 (0.0%) 1.08 (-6.8%) stis 1.13 (0.0%) 0.96 (-15.1%) 2cm line simple-model table-model sgws 0.82 (0.0%) 0.81 (-0.4%) stis 0.75 (0.0%) 0.69 (-7.6%) n SPICE-delay comparison u sgws: LR-based simultaneous gate and wire sizing u stis: LR-based simultaneous transistor and wire sizing n Runtime u Total LR-based optimization~10 seconds u Total HSPICE simulation ~3000 seconds n Manual optimization of DCLK u delay is 1.2x larger, and power is 1.3x higher Experiment Results:

Global Interconnect Sizing and Spacing (GISS) SISS: Single-net interconnect sizing and spacing SISS: Single-net interconnect sizing and spacing GISS: Global interconnect sizing and spacing GISS: Global interconnect sizing and spacing GISS/DP: Bottom-up based approach [Cong-He -Koh -Pan, ICCAD’97] GISS/DP: Bottom-up based approach [Cong-He -Koh -Pan, ICCAD’97] GISS/ELR: ELR based approach [Cong-He, ISPD’98, TCAD’99] GISS/ELR: ELR based approach [Cong-He, ISPD’98, TCAD’99] All use table-based capacitance model with coupling capacitance [Cong-He-Kahng-et al, DAC’97] All use table-based capacitance model with coupling capacitance [Cong-He-Kahng-et al, DAC’97] Spacing Sizing

16-bit buseach a 10mm-long line, 500um per segment n GISS is up to 39% better than SISS n ELR-based approach achieves best results and is 100x faster than bottom-up based approach Experiment Results

Flexibility of TRIO n Different combinations of optimization techniques, e.g., u T+B+W: Topology (T), followed by optimal buffer insertion and sizing (B), then followed by optimal wire sizing (W) u TB+BW: Simultaneous T and B, followed by simultaneous buffer and wire sizing (BW) u TBW: Simultaneous topology, buffer, and wire optimization n Different models u Simple or table-based model for device delay and interconnect cap u Elmore or higher-order delay models n Different objective functions: u Minimize delay under size constraints u Minimize power under required arrival time constraints n Integrated under an interactive user front-end u Unified input format, data structure and GUI

Example: Trade-off of Run-Times and Solution Quality n T+B+W:Topology (T), followed by optimal buffer insertion and sizing B (B=10) then followed by optimal wire sizing (W=18) n TB+BW: Simultaneous T and B (B=3), followed by simultaneous driver/buffer and wire sizing (BW) with B=40, W=18 n Tbw+BW: Simultaneous TBW with small number of B=3 and W=3, then followed by BW as above n TBW: Simultaneous TBW with larger number of B=10 and W=8

Trade-off of Run-Times and Solution Quality n Tbw+BW achieves “identical” delays as TBW with 10X smaller run-time