用預留原件及技術重新映射做 工程修改命令的時序最佳化 台灣大學電機工程研究所 陳彥賓 指導教授: 張耀文教授

Slides:



Advertisements
Similar presentations
Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.
Advertisements

Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
OCV-Aware Top-Level Clock Tree Optimization
Advanced Interconnect Optimizations. Buffers Improve Slack RAT = 300 Delay = 350 Slack = -50 RAT = 700 Delay = 600 Slack = 100 RAT = 300 Delay = 250 Slack.
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
Buffer and FF Insertion Slides from Charles J. Alpert IBM Corp.
ELEN 468 Lecture 261 ELEN 468 Advanced Logic Design Lecture 26 Interconnect Timing Optimization.
An Efficient Technology Mapping Algorithm Targeting Routing Congestion Under Delay Constraints Rupesh S. Shelar Intel Corporation Hillsboro, OR Prashant.
Xing Wei, Wai-Chung Tang, Yu-Liang Wu Department of Computer Science and Engineering The Chinese University of HongKong
A Size Scaling Approach for Mixed-size Placement Kalliopi Tsota, Cheng-Kok Koh, Venkataramanan Balakrishnan School of Electrical and Computer Engineering.
1 Interconnect Layout Optimization by Simultaneous Steiner Tree Construction and Buffer Insertion Presented By Cesare Ferri Takumi Okamoto, Jason Kong.
Coupling-Aware Length-Ratio- Matching Routing for Capacitor Arrays in Analog Integrated Circuits Kuan-Hsien Ho, Hung-Chih Ou, Yao-Wen Chang and Hui-Fang.
An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem Juyeon Kim, Deokjin Joo, Taehan Kim DAC’13.
Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.
Minimum-Buffered Routing of Non- Critical Nets for Slew Rate and Reliability Control Supported by Cadence Design Systems, Inc. and the MARCO Gigascale.
1 DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jacon Cong ICCAD 2004 Presented by: Wei Chen.
Power-Aware Placement
EE4271 VLSI Design Interconnect Optimizations Buffer Insertion.
Technology Mapping.
A Timing-Driven Soft-Macro Resynthesis Method in Interaction with Chip Floorplanning Hsiao-Pin Su 1 2 Allen C.-H. Wu 1 Youn-Long Lin 1 1 Department of.
Interconnect Optimizations
A Cost-Driven Lithographic Correction Methodology Based on Off-the-Shelf Sizing Tools.
Accurate Pseudo-Constructive Wirelength and Congestion Estimation Andrew B. Kahng, UCSD CSE and ECE Depts., La Jolla Xu Xu, UCSD CSE Dept., La Jolla Supported.
ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Interconnect Timing Optimization II.
Layout-based Logic Decomposition for Timing Optimization Yun-Yin Lien* Youn-Long Lin Department of Computer Science, National Tsing Hua University, Hsin-Chu,
Pei-Ci Wu Martin D. F. Wong On Timing Closure: Buffer Insertion for Hold-Violation Removal DAC’14.
1 Enhancing Performance of Iterative Heuristics for VLSI Netlist Partitioning Dr. Sadiq M. Sait Dr. Aiman El-Maleh Mr. Raslan Al Abaji. Computer Engineering.
1 Reconfigurable ECO Cells for Timing Closure and IR Drop Minimization TingTing Hwang Tsing Hua University, Hsin-Chu.
1 A Method for Fast Delay/Area Estimation EE219b Semester Project Mike Sheets May 16, 2000.
Advanced Interconnect Optimizations. Timing Driven Buffering Problem Formulation Given –A Steiner tree –RAT at each sink –A buffer type –RC parameters.
1 ENTITY test is port a: in bit; end ENTITY test; DRC LVS ERC Circuit Design Functional Design and Logic Design Physical Design Physical Verification and.
Page 1 Department of Electrical Engineering National Chung Cheng University, Chiayi, Taiwan Power Optimization for Clock Network with Clock Gate Cloning.
USING SAT-BASED CRAIG INTERPOLATION TO ENLARGE CLOCK GATING FUNCTIONS Ting-Hao Lin, Chung-Yang (Ric) Huang Graduate Institute of Electrical Engineering,
Introduction to Routing. The Routing Problem Apply after placement Input: –Netlist –Timing budget for, typically, critical nets –Locations of blocks and.
MGR: Multi-Level Global Router Yue Xu and Chris Chu Department of Electrical and Computer Engineering Iowa State University ICCAD
A Topology-based ECO Routing Methodology for Mask Cost Minimization Po-Hsun Wu, Shang-Ya Bai, and Tsung-Yi Ho Department of Computer Science and Information.
Area-I/O Flip-Chip Routing for Chip-Package Co-Design Progress Report 方家偉、張耀文、何冠賢 The Electronic Design Automation Laboratory Graduate Institute of Electronics.
Modern VLSI Design 4e: Chapter 4 Copyright  2008 Wayne Wolf Topics n Interconnect design. n Crosstalk. n Power optimization.
CSE 242A Integrated Circuit Layout Automation Lecture: Partitioning Winter 2009 Chung-Kuan Cheng.
CRISP: Congestion Reduction by Iterated Spreading during Placement Jarrod A. Roy†‡, Natarajan Viswanathan‡, Gi-Joon Nam‡, Charles J. Alpert‡ and Igor L.
Logic Synthesis For Low Power CMOS Digital Design.
TSV-Aware Analytical Placement for 3D IC Designs Meng-Kai Hsu, Yao-Wen Chang, and Valerity Balabanov GIEE and EE department of NTU DAC 2011.
A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.
Solving Hard Instances of FPGA Routing with a Congestion-Optimal Restrained-Norm Path Search Space Keith So School of Computer Science and Engineering.
UC San Diego / VLSI CAD Laboratory Incremental Multiple-Scan Chain Ordering for ECO Flip-Flop Insertion Andrew B. Kahng, Ilgweon Kang and Siddhartha Nath.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
1 Wire Length Prediction-based Technology Mapping and Fanout Optimization Qinghua Liu Malgorzata Marek-Sadowska VLSI Design Automation Lab UC-Santa Barbara.
HDL-Based Layout Synthesis Methodologies Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
A NEW ECO TECHNOLOGY FOR FUNCTIONAL CHANGES AND REMOVING TIMING VIOLATIONS Jui-Hung Hung, Yao-Kai Yeh,Yung-Sheng Tseng and Tsai-Ming Hsieh Dept. of Information.
-1- UC San Diego / VLSI CAD Laboratory Construction of Realistic Gate Sizing Benchmarks With Known Optimal Solutions Andrew B. Kahng, Seokhyeong Kang VLSI.
05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.
Accelerating Statistical Static Timing Analysis Using Graphics Processing Units Kanupriya Gulati and Sunil P. Khatri Department of ECE, Texas A&M University,
ECO Timing Optimization Using Spare Cells Yen-Pin Chen, Jia-Wei Fang, and Yao-Wen Chang ICCAD2007, Pages ICCAD2007, Pages
Ping-Hung Yuh, Chia-Lin Yang, and Yao-Wen Chang
Modern VLSI Design 3e: Chapter 4 Copyright  1998, 2002 Prentice Hall PTR Topics n Interconnect design. n Crosstalk. n Power optimization.
Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.
Register Placement for High- Performance Circuits M. Chiang, T. Okamoto and T. Yoshimura Waseda University, Japan DATE 2009.
1 Efficient Obstacle-Avoiding Rectilinear Steiner Tree Construction Chung-Wei Lin, Szu-Yu Chen, Chi-Feng Li, Yao-Wen Chang, Chia-Lin Yang National Taiwan.
Fast Algorithms for Slew Constrained Minimum Cost Buffering S. Hu*, C. Alpert**, J. Hu*, S. Karandikar**, Z. Li*, W. Shi* and C. Sze** *Dept of ECE, Texas.
Simultaneous Analog Placement and Routing with Current Flow and Current Density Considerations H.C. Ou, H.C.C. Chien and Y.W. Chang Electronics Engineering,
ILP-Based Inter-Die Routing for 3D ICs Chia-Jen Chang, Pao-Jen Huang, Tai-Chen Chen, and Chien-Nan Jimmy Liu Department of Electrical Engineering, National.
Technology Mapping. 2 Technology mapping is the phase of logic synthesis when gates are selected from a technology library to implement the circuit. Technology.
System in Package and Chip-Package-Board Co-Design
Static Timing Analysis
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
1 Timing Closure and the constant delay paradigm Problem: (timing closure problem) It has been difficult to get a circuit that meets delay requirements.
An O(bn 2 ) Time Algorithm for Optimal Buffer Insertion with b Buffer Types Authors: Zhuo Li and Weiping Shi Presenter: Sunil Khatri Department of Electrical.
An O(nm) Time Algorithm for Optimal Buffer Insertion of m Sink Nets Zhuo Li and Weiping Shi {zhuoli, Texas A&M University College Station,
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
Sungho Kang Yonsei University
Presentation transcript:

用預留原件及技術重新映射做 工程修改命令的時序最佳化 台灣大學電機工程研究所 陳彥賓 指導教授: 張耀文教授 ECO Timing Optimization Using Spare Cells and Technology Remapping 用預留原件及技術重新映射做 工程修改命令的時序最佳化 台灣大學電機工程研究所 陳彥賓 指導教授: 張耀文教授 July 6, 2006 GIEE, NTU

Outline Introduction & problem formulation Previous work and preliminaries Algorithm Experimental results Conclusions GIEE, NTU

Outline Introduction & problem formulation Previous work and preliminaries Algorithm Experimental results Conclusions GIEE, NTU

Introduction ECO (Engineering Change Order) is usually performed during the chip implementation cycle. Change the design incrementally. When performing ECO to a placed design, change a small portion of netlist to optimize the chip timing. Functionality is unchanged. change chip functions. Logic bugs. New versions. GIEE, NTU

Netlist Change Using Spare Cells Spare cells are designed for design changes after placement, and they are distributed evenly on the chip layout. Using spare cells is an efficient way to do netlist changes. Save time and effort of re-placing the netlist Save production cost of masks It is getting more and more difficult in the nanometer technology. Circuit size is increasing substantially. Timing issues are hard to be considered when changing netlist locally. GIEE, NTU

Problem Formulation Given a placed chip layout, rewire the circuit using spare cells. There are several techniques: gate sizing buffer insertion technology mapping shorten the delays and minimize the total negative slack of all ECO timing paths. slack: -0.7 slack: 0.0 slack: -0.5 slack: 0.0 before after GIEE, NTU

Outline Introduction & problem formulation Previous work and preliminaries Algorithm Experimental results Conclusions GIEE, NTU

Dynamic Programming Buffer insertion to a single net. van Ginneken et al. proposed a dynamic programming framework for slack optimal buffer insertion to a net. b3 Load Load gT2 b2 RAT gS RAT b1 Load Load b4 Load gT3 RAT RAT RAT gT1 GIEE, NTU

Path Based Buffer Insertion Shi et al. proposed a dynamic programming method to perform buffer insertion and gate sizing to a path by : Cut the timing violated paths into distinct paths View the gates on the path as special type “buffers” and merge the whole path into a “big routing tree”. Perform gate sizing and buffer insertion simultaneously to the routing tree. Start point End point OR type buffer OR NAND type buffer NAND AND type buffer AND GIEE, NTU

Logic Physical Co-synthesis Layout driven technology mapping Proposed by Stok et al. Place the base gates as an initial placement. Map the base gates using the coordinates as cost. Local netlist transformation Proposed by Lou et al. Identify parts of the placed netlist that violate some target cost. Extract those critical parts from the chip placement. Re-synthesis and re-place the extracted netlist according to the target cost. GIEE, NTU

Output capacitive loading Timing Model Synopsys’ Liberty library format Use lookup table to calculate gate delays. The gate delay and the output transition time are functions of the output loading and the input transition time. Input Transition Time Output capacitive loading GIEE, NTU

Timing Model (cont’d) Output loading consists of input pin capacitance output pin capacitance wire loading ΦIs the amount of capacitance per unit wirelengh. GIEE, NTU

Properties of The Timing Model Loading dominance Output loading has a larger effect on gate delay and output transition time than input transition time. (6.74x vs 1.48x) Shielding Change of the netlist effects delay of neighbor gates only. gk gj gi gk gi GIEE, NTU

Properties of The Timing Model (cont’) A buffer chain with the same type BUFX1 Input slope Output slope delay output slope GIEE, NTU

Outline Introduction & problem formulation Previous work and preliminaries Algorithm Overview Tracing ECO paths Dynamic cost programming Example Timing complexity analysis Technology remapping Experimental results Conclusions GIEE, NTU

Optimization Flow Iterate the optimization loop until the total negative slack reaches zero or no path can be improved. Extension GIEE, NTU

Tracing ECO paths When doing STA (static timing analysis), store a pointer at each gate to point one of its fan-ins with the largest arrival time. Obtain the ECO path Trace this pointer from the end-point of the path to the corresponding start-point. Start point End point GIEE, NTU

Dynamic Cost Programming (DCP) Dynamic programming framework with dynamic cost (3 steps): View the gate as a special type “buffer” and merge the whole ECO path as a “big routing tree”. Perform gate sizing and buffer insertion simultaneously from the end-point to the start-point. Perform one buffer insertion operation for each net and one gate sizing operation for each gate. Start point End point OR type buffer OR NAND type buffer NAND AND type buffer AND GIEE, NTU

Dynamic Cost Unlike the traditional buffer insertion problem, the buffering/sizing cost is dynamic because all spare cells are candidates for buffering/sizing. number of spare cells are changing during the optimization process. Optimum solutions of sub-problems do not necessarily result in the optimum one of the overall problem. Need to store a set of solutions for each gate/net. b1 ECO path 1 # inserted buffer S3 S2 b2 S1:No buffer insertion 1 S2:Insert buffer b1 S1 ECO path 2 S3:Insert buffer b2 Path delay GIEE, NTU

Solution Propagation during DCP Store each solution as a point on a plane if it shortens the ECO timing path delays. The two coordinates are # inserted buffer approximated sub-path delays from the current gate to the end point of the path. Sized gates are not counted. Estimate the effect of operations without actually applying them. Generate solutions based on the solutions of the driven gate/net. # inserted buffer # inserted buffer b1 S3 S2 S6 S5 S3 S2 1 g1 S1 1 S4 S1 g2 b2 Path delay GIEE, NTU Path delay

Judgment of Operations The timing effect of a sizing/buffering operation can be estimated by its effect on its fanins. Buffer insertion operaion to net ni If delay’(source of ni)+delay(buffer)<delay(source of ni), store the solutions corresponding to the operation. Gate sizing operation to gate gi If delay(spare cell)<delay(gi) and If delay’(fanin of gi)< delay(fanin of gi), store the solutions corresponding to the operation. Timing of non-ECO paths are preserved after optimization. Net ni gi GIEE, NTU Buffer insertion Gate sizing

Bounding Box Theorem We find a theorem to greatly reduce buffering/sizing candidates. Assumption: Gate delays are independent of the input transition time. The driving capabilities of the sized gate and the sizing spare cell are the same. GIEE, NTU

width=dis(gE1,gE2)+dis(gE1,gE3)+(CEi1+CEi2 )/Φ, center: gE1 nE1 gE1 gE3 GIEE, NTU

Bounding Box Theorem GIEE, NTU

width=dis(gE1,gE2)+dis(gE1,gE3) +(CEo1 )/Φ, center: gE2 Bounding polygon width=dis(gE1,gE2)+dis(gE1,gE3) +(CEo1 )/Φ, center: gE2 width=dis(gE1,gE4) +(CEi1)/Φ, center: gE4 gE2 gE1 gE4 gE3 width=dis(gE1,gE2)+dis(gE1,gE3) +(CEo1 )/Φ , center: gE3 GIEE, NTU

Solution Pruning during DCP For each set of solutions, we keep at most k solutions. (k is a user-defined parameter) Discard non-dominant solutions. Classify these solutions by the number of used buffers. Keep the best solutions for each class. # inserted buffer 3 2 1 1 1 Path delay GIEE, NTU

End of DCP At the start point of the ECO path, choose the solution which meets the timing constraint uses the least number of buffers Change netlist according to the solution Run STA to update the timing information. # inserted buffer 3 Start point 2 End point 1 Path delay clock cycle GIEE, NTU

An Example for Complex ECO Paths : buffer type spare cell Path Source & Target Negative slack P1 S1-T1 P2 S1-T2 medium P3 S2-T3 small : gate type spare cell zero large T1 small zero S2 zero S1 P1 P1 Slack P2 P2 P2 P3 P2 P3 P2 T2 FINISH ≥0 T3 LIST GIEE, NTU

Timing Complexity Analysis of phase 1 Parameters Gate count: V # spare cells: N # iterations of DCP: L Max # gates of ECO path: M Keep at most k solutions per operation Complexity of DCP=O(kMN) Complexity of STA=O(V) Complexity of phase 1=O( (kMN+V)L ) GIEE, NTU

Extension: Technology Remapping After DCP, we can further improve the circuit timing by following steps: Identify timing critical parts of the netlist. Extract those parts from the netlist. Re-synthesize and map the extracted netlist. Decomposition by MVSIS Ideal mapping locations Technology mapping Run STA to update the timing information. GIEE, NTU

Optimal Buffering to a Line The optimal buffering to a line is to insert buffers with equal distance No gate drives a too large loading. Optimal buffering Non-optimal buffering GIEE, NTU

Ideal Mapping Locations Given locations of the input and output pins, map the base gates evenly between the input and output pins. No gate drives a too large loading, and the path delay is smaller. (Delay is proportional to square of wirelength) Makes buffer insertion easier. # inserted buffers delay Input A Output Input B Input A Output GIEE, NTU Input B

Calculating Ideal Mapping Locations From each path from one input pin to one output pin, calculate ideal locations of every passed base gate by equal distance. If a base gate has more than one ideal location, average these values and get a final ideal location. Input A Output Input B Input A Output Input B GIEE, NTU

Technology Mapping Consider actual locations of spare cells as costs. Cut the network into trees. Apply dynamic programming method to map each tree. Locations of mapped base gates are locations of corresponding spare cells. Locations of unmapped base gates are ideal locations of base gates. Insert buffers into mapped circuit to further improve timing. Input A Output Input B GIEE, NTU

Maximum Independent Set For choosing global optimum solution of the technology remapping, we store a set of match solutions for each tree and use MIS to find the best assignments. Tree T2 Tree T1 g1 M2_2 M1_2 M2_3 g5 M1_1 g4 M2_1 M3_2 g2 Tree T3 g3 g6 M3_1 GIEE, NTU

Outline Introduction & problem formulation Previous work and preliminaries Algorithm Experimental results Conclusions GIEE, NTU

Experimental Results The five benchmarks are industrial designs. Our tool is run on Linux workstation with 3.2Ghz CPU and 3GB memory. GIEE, NTU

Experimental Results (cont’d) Our tool beat all competitors with the same subject in the CAD contest ’05. We compare the results of our algorithm with: the case without the aid of the bounding box theorem. a greedy wire cost heuristic. GIEE, NTU

Experimental Results (cont’d) Layout of Case 2 Before optimization After optimization GIEE, NTU

Outline Introduction & problem formulation Previous work and preliminaries Algorithm Experimental results Conclusions GIEE, NTU

Conclusions We proposed a dynamic programming method considering dynamic cost to solve the ECO timing optimization problem. Functional change considering timing is a tougher work, and we will extend our work in this direction. GIEE, NTU