Circuit Retiming with Interconnect Delay CUHK CSE CAD Group Meeting One Evangeline Young Aug 19, 2003.

Slides:



Advertisements
Similar presentations
Integer Optimization Basic Concepts Integer Linear Program(ILP): A linear program except that some or all of the decision variables must have integer.
Advertisements

ECE 667 Synthesis and Verification of Digital Circuits
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
Chapter 4 Retiming.
4/22/ Clock Network Synthesis Prof. Shiyan Hu Office: EREC 731.
Global Flow Optimization (GFO) in Automatic Logic Design “ TCAD91 ” by C. Leonard Berman & Louise H. Trevillyan CAD Group Meeting Prepared by Ray Cheung.
Clock Skewing EECS 290A Sequential Logic Synthesis and Verification.
Sequential Timing Optimization. Long path timing constraints Data must not reach destination FF too late s i + d(i,j) + T setup  s j + P s i s j d(i,j)
A polylogarithmic approximation of the minimum bisection Robert Krauthgamer The Hebrew University Joint work with Uri Feige.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Tirgul 12 Algorithm for Single-Source-Shortest-Paths (s-s-s-p) Problem Application of s-s-s-p for Solving a System of Difference Constraints.
Pipelining and Retiming 1 Pipelining  Adding registers along a path  split combinational logic into multiple cycles  increase clock rate  increase.
Spring 08, Feb 28 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2008 Retiming Vishwani D. Agrawal James J. Danaher.
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification.
Retiming with Interconnect and Gate Delay CUHK CSE CAD Group Dennis Tong 29 th Sept., 2003.
Rewiring – Review, Quantitative Analysis and Applications Matthew Tang Wai Chung CUHK CSE MPhil 10/11/2003.
2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
CS294-6 Reconfigurable Computing Day 16 October 15, 1998 Retiming.
Spring 07, Apr 5 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Retiming Vishwani D. Agrawal James J. Danaher Professor.
Tirgul 13. Unweighted Graphs Wishful Thinking – you decide to go to work on your sun-tan in ‘ Hatzuk ’ beach in Tel-Aviv. Therefore, you take your swimming.
EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
1 Retiming Outline: ProblemProblem FormulationFormulation Retiming algorithmRetiming algorithm.
ICS 252 Introduction to Computer Design
ECE Synthesis & Verification 1 ECE 667 ECE 667 Synthesis and Verification of Digital Systems Retiming.
03/08/2005 © J.-H. Jiang1 Retiming and Resynthesis EECS 290A – Spring 2005 UC Berkeley.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
EDA (CS286.5b) Day 18 Retiming. Today Retiming –cycle time (clock period) –C-slow –initial states –register minimization.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 13, 2008 Retiming.
Assignment 4. (Due on Dec 2. 2:30 p.m.) This time, Prof. Yao and I can explain the questions, but we will NOT tell you how to solve the problems. Question.
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
EE 5900 Advanced Algorithms for Robust VLSI CAD, Spring 2009 Static Timing Analysis and Gate Sizing.
Design Techniques for Approximation Algorithms and Approximation Classes.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.
Logical Topology Design
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 24: April 18, 2011 Covering and Retiming.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 10: February 18, 2015 Architecture Synthesis (Provisioning, Allocation)
Register Placement for High- Performance Circuits M. Chiang, T. Okamoto and T. Yoshimura Waseda University, Japan DATE 2009.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
Master Method (4. 3) Recurrent formula T(n) = a  T(n/b) + f(n) 1) if for some  > 0 then 2) if then 3) if for some  > 0 and a  f(n/b)  c  f(n) for.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 7: February 3, 2002 Retiming.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
ELEC692 VLSI Signal Processing Architecture Lecture 3
1 Retiming and Re-synthesis Outline: RetimingRetiming Retiming and Resynthesis (RnR)Retiming and Resynthesis (RnR) Resynthesis of PipelinesResynthesis.
Pipelining and Retiming
CALTECH CS137 Spring DeHon 1 CS137: Electronic Design Automation Day 5: April 12, 2004 Covering and Retiming.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
1 Timing Closure and the constant delay paradigm Problem: (timing closure problem) It has been difficult to get a circuit that meets delay requirements.
Retiming EECS 290A Sequential Logic Synthesis and Verification.
Approximation Algorithms based on linear programming.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 25: April 17, 2013 Covering and Retiming.
CS137: Electronic Design Automation
The minimum cost flow problem
James D. Z. Ma Department of Electrical and Computer Engineering
ELEC 7770 Advanced VLSI Design Spring 2012 Retiming
Integer Programming (정수계획법)
ESE535: Electronic Design Automation
CSE 373 Data Structures and Algorithms
CS184a: Computer Architecture (Structures and Organization)
Integer Programming (정수계획법)
ESE535: Electronic Design Automation
CSE 373: Data Structures and Algorithms
ELEC 7770 Advanced VLSI Design Spring 2016 Retiming
Timing Analysis and Optimization of Sequential Circuits
Fast Min-Register Retiming Through Binary Max-Flow
Presentation transcript:

Circuit Retiming with Interconnect Delay CUHK CSE CAD Group Meeting One Evangeline Young Aug 19, 2003

Circuit Retiming Given a circuit, we want to relocate the registers to achieve a better clock period. Registers Retiming Clock period = 3 units Clock period = 2 units

Circuit Retiming In order to maintain the functionality of the circuit, registers can only be moved in certain ways: Retiming

Circuit Retiming Given a circuit, how should we place the registers to minimize the clock period?

Traditional Approach This retiming problem is firstly introduced in the following classical paper: “Retiming Synchronous Circuitry”, Charles E. Leiserson and James B. Saxe, Algorithmica, 6:5-35, 1991 Only gate delay was considered. Three methods are proposed. One of them solves the problem by mixed integer linear programming (MILP).

Traditional Approach Notations: d(v) is the delay of node v. w(e) is the original no. of registers on edge e. c is the clock period that we want to check if it is feasible. r(v) is the retiming value of node v, i.e., the no. of registers moved from the output to the input of node v. (r(v) is what we want to find.) s(v) is the longest delay from a register connected directly to node v to the output of v.

Traditional Approach More about s(v)… v s(v) is the delay from point A to B, including the delay of v. A B

Traditional Approach Integer Linear Program: d(v)  s(v)for all node v (1) s(v)  c for all node v (2) r(u)  r(v)  w(e)for all edge e(u,v)(3) s(u) – s(v)  -d(v)wherever e(u,v) s.t. r(u) - r(v) = w(e)(4)

Traditional Approach Write R(v) as r(v) + s(v)/c The ILP can be written as an MILP: r(v) – R(v)  -d(v)/cfor all node v (1) R(v) – r(v)  1 for all node v (2) r(u)  r(v)  w(e)for all edge e(u,v)(3) R(u) – R(v)  w(e)-d(v)/cfor all edge e(u,v) (4) The above set of difference constraints can be solved in polynomial time, though it consists of both integer and real variables.

Traditional Approach Use binary search to find the optimal clock: T 0 = 0 T 1 = e 10 // a large no. Repeat c = (T 0 + T 1 )/2 Check if c is a feasible clock period by solving the MILP. If success, T 1 = c; otherwise, T 0 = c. Until success and (T 1 - T 0 )/T 1 < ε

Retiming with Interconnect Delay We consider clock period minimization. Retiming has been studied and applied extensively at logic synthesis. However, most previous retiming algorithms ignore interconnect delay. Interconnect delay should be considered for high performance circuits in DSM design. This solution is going to be presented in the upcoming ICCAD 2003.

Retiming with Interconnect Delay We assume that wire delay is directly proportional to its length. This assumption is reasonable: For short wires, the quadratic component of a wire delay is significantly smaller than its linear component. For long wires, buffer insertion can be done.

Retiming with Interconnect Delay

Now, a retiming solution needs to specify: the retiming label r(v) for each node v. the positions of the registers on each edge. The positions of the registers on the edges are important as there are interconnect delay. Retiming r( ) = 0 r( ) = -1

Our Contributions Optimal algorithm: O(|V||E| log |V| + |V| 2 log 2 |V|) time per iteration. Near-optimal algorithm: Only 0.13% larger than the optimal on average. O(|V b ||E| + |V b ||E h |) time per iteration, e.g., a circuit with 16.1K gates and 28.6K wires can be retimed in 44.32s by a 1.8GHz PIII PC. Based on an optimal algorithm handling interconnect delay only, i.e., no gate delay.

Optimal Approach Rewrite the ILP on p.8 as follows: d(v)  s(v)for all node v (1) s(v)  c for all node v (2) r(u)  r(v)  w(e)for all edge e(u,v)(3) s(v) ≥ s(u) + d(e) + d(v) - c(r(v) - r(u) + w(e)) for all edge e(u,v) (4)

Optimal Approach Similarly, write R(v) as r(v) + s(v)/c: r(v) – R(v)  -d(v)/cfor all node v (1) R(v) – r(v)  1 for all node v (2) r(u)  r(v)  w(e)for all edge e(u,v)(3) R(u) – R(v)  w(e) - d(v)/c - d(e)/c for all edge e(u,v) (4) Again, the above set of constraints can be solved in polynomial time, though the runtime is quite long.

Optimal Approach Circuit |V||E|c opt Runtime (s) s s s s s s s s s s s s s s >15000

Near Optimal Approach Transform the original graph G by splitting each node v (represents a gate) into a pair of nodes v 1 and v 2 connected by an edge with delay d(v). v v1v1 v2v2 delay = d(v) delay = 0

Near Optimal Approach After representing each gate by a wire, we can find an optimal retiming solution S for the transformed circuit G 1. (We will show how to find the optimal solution with no gate delay.) The clock period of S will be a lower bound L for the optimal solution T opt of G. From S, we can obtain a feasible retiming solution for the original circuit G.

Near Optimal Approach Registers retimed into a wire representing a gate v will be moved backward to the input edges or forward to the output edges depending on their distances from v 1 and v 2. Linear programming is used to determine the positions of the registers on each edge after this relocation step to minimize the clock period considering both gate and wire delay. v1v1 v2v2

Near Optimal Approach It is now the problem of solving the retiming problem optimally assuming that gate delay is zero. When there is no gate delay, the set of constraints on p.17 becomes: r(v) – R(v)  0for all node v (1) R(v) – r(v)  1 for all node v (2) r(u)  r(v)  w(e)for all edge e(u,v)(3) R(u) – R(v)  w(e)-d(e)/cfor all edge e(u,v) (4)

Near Optimal Approach Lemma 1: Given R(v) for all node v that satisfy constraint (4), we can obtain a solution to constraint (1)-(4) by setting r(v) = trunc(R(v)) Given Lemma 1, we only need to solve constraint (4): R(u) – R(v)  w(e)-d(e)/c for all edge e(u,v). Consider the input graph G(V,E) such that the weight of each edge e(u,v) is -w(e)+d(e)/c.

Near Optimal Approach There is a solution to constraint (4) iff G has no positive cycles. Positive cycle detection in G can be achieved by positive cycle detection in a smaller graph H(V b,E h ) constructed from G. This technique can be applied in other positive cycle detection problems, not necessarily in circuit retiming. After solving R(v), we can find r(v) and s(v) for all node v.

Near Optimal Approach After the binary search, we can find the optimal clock and the corresponding r(v) and s(v) for all node v. Then, we can place the registers accordingly: uv c Other registers are placed right in front of v. c - s(u) Assume that r(v)-r(u)+w(e) = 4

Near Optimal Approach First, assuming that gate delay is zero. Binary search to find the minimum feasible clock period c To test the feasibility of a fixed c: Transforming to a positive cycle detection problem on a reduced graph Can be solved by a single-source longest-path algorithm

Results Circuit c opt T opt (s) s s s s s s s s s s s s s s38584>15000 c near opt T near opt (s)

Future Directions Consider a more accurate modeling for the interconnect delay, e.g., use Elmore delay. How to map the retiming solution to the floorplanning or placement solution? Registers are large and take up silicon resources. How to consider fan-out capacitance with interconnect delay?