Download presentation
Presentation is loading. Please wait.
1
Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification
2
Outline Motivation Classical retiming Continuous retiming Experimental comparison
3
Motivation Retiming can reduce the clock cycle of the circuit Critical path has delay 4Critical paths have delay 2
4
Motivation (cont.) Previous algorithms for retiming require Computing latch-to-latch delays Computing latch-to-latch delays Solving an ILP problem Solving an ILP problem The goal is to develop a more efficient algorithm that works directly on the circuit without ILP
5
Classical Formulation During retiming the registers are moved over combinational nodes: w r (e u v ) = r(v) + w(e u v ) – r(u), where r(v), the retiming lags, are the number of registers moved from the outputs to the inputs of v. For each path p: u v we define its weight w(p) as the sum total of registers on all edges. The minimum clock period stands for the maximum 0-weight path P = max p: w(p) = 0 {d(p)} Matrices W(u,v) and D(u,v) are defined for all pairs of vertices that are connected by a path that does not go through the host node W(u,v) = min p: u v {w(p)} and D(u,v) = max p: u v and w(p)= W(u,v) {d(p)} C. E. Leiserson and J. B. Saxe. Retiming synchronous circuitry, Algorithmica, 1991, vol. 6, pp. 5-35.
6
Classical Formulation (cont.) W(u,v) denotes the minimum latency, in clock cycles, for the data flowing from u to v D(u,v) gives the maximum delay from u to v over all path with the minimum latency The computation of retiming labels for the clock period P is performed by solving a Linear Programming problem: r(u) – r(v) w(e u v ), e u v E r(u) – r(v) W(u,v) – 1, D(u,v) > P The constraints ensure that after retiming the latency of each edge is non-negative the latency of each edge is non-negative each path whose delay is larger than the clock period has at least one register on it each path whose delay is larger than the clock period has at least one register on it
7
Implementations of Retiming Leiserson/Saxe compute the matrices, generate constraints, and then solve the LP problem Shenoy/Rudell compute the matrix one column at a time Reduced space requirements, still prohibitive runtime Reduced space requirements, still prohibitive runtime Sapatnekar proposed a way of utilizing retiming/skew equivalence to reduce the number of constraints generated S. S. Sapatnekar, R. B. Deokar, “Utilizing the retiming-skew equivalence in a practical algorithms for retiming large circuits”, IEEE Trans. CAD, vol. 15(10), Oct.1996, pp. 1237-1248.
8
Sapatenekar’s Retiming Algorithm Find ASAP and ALAP skews for a feasible clock period Use binary search to find a feasible clock period Use binary search to find a feasible clock period Perform min-delay retiming by moving latched to fit the timing window Perform min-area retiming under delay constraints by solving a reduced LP problem The reduced set of constraints is generated using the skews The reduced set of constraints is generated using the skews The LP problem is solved efficiently using a variation of network simplex method The LP problem is solved efficiently using a variation of network simplex method Improvement: Start by finding maximum ration using Howard’s algorithm
9
Pan’s Algorithm Definitions Pseudo-code Convergence Improvements Experiments
10
Definitions A circuit is an edge-weighted, node-weighted directed graph Weight of a node, d(v), is its combinational delay Weight of a node, d(v), is its combinational delay Weight of an edge, w(e), is its number of FFs Weight of an edge, w(e), is its number of FFs Continuous retiming is a retiming, in which the number of latches retimed is a continuous value (rather than an integer) The retiming value is computed as before: w r (e u v ) = s(v) + w(e u v ) – s(u), where s(v) are the continuous retiming lags.
11
Definitions Definition. A circuit is retimed to a clock period by a retiming r if the following two conditions are satisfied: (1) w r (e) 0 and (2) w r (p) 1 for each path p such that d(p) . Definition. A circuit is c-retimed to a clock period of by a c-retiming s if w s (e) d(v) / for each edge u v. Definition of c-retiming enforces non-negative edge weights non-negative edge weights if d(u 1 ) – d(u 2 ) , then w s (p) 1. if d(u 1 ) – d(u 2 ) , then w s (p) 1.
12
Pseudo-code for each node v in N do if (v is a PI) s(v) = 0; if (v is a PI) s(v) = 0; else s(v) = - ; else s(v) = - ; for each i = 0 to |U| + 2 done = true; done = true; for each non-PI node vj in N do for each non-PI node vj in N do tmp = max e: u vj { s(u) – w(e) + d(v j ) / } tmp = max e: u vj { s(u) – w(e) + d(v j ) / } if ( v j is a PO and tmp > 1 ) return failure; if ( v j is a PO and tmp > 1 ) return failure; if (s(v j ) < tmp ) if (s(v j ) < tmp ) s(v j ) = tmp; done = false; s(v j ) = tmp; done = false; if (done == true ) if (done == true ) return success; // c-retiming reached a fixed point return success; // c-retiming reached a fixed point return failure;
13
Convergence Theorem. If the nodes are relaxed according to the topological order, the algorithm stops in at most |U| + 1 relaxation iterations if there is no positive cycle, where U is a cut which breaks all the loops.
14
Reduction to Classical Retiming Let s be a c-retiming that achieves clock period . Let r be the retiming defined as follows: Then r can achieve a clock period less than + D where D is the largest combinational delay of a node.
15
Area Minimization The problem of minimizing the amount of (fractional) FFs subject to a given clock period is a LP: minimize[ c w s (e) ] minimize[ c w s (e) ] subject to w s (e) d(v) / for each u v. subject to w s (e) d(v) / for each u v. The dual of this problem is an uncapacitated min-cost flow problem The flow graph is a network The flow graph is a network The flow out of each node is difference between its fanout count and fanin count The flow out of each node is difference between its fanout count and fanin count The cost of an edge is w 1 (e) = - w(e) + d(v) / The cost of an edge is w 1 (e) = - w(e) + d(v) /
16
Improvements Perform a “required time” c-retiming In addition to the “arrival time” c-retiming In addition to the “arrival time” c-retiming Retime over circuits with choice nodes Combines logic synthesis and c-retiming Combines logic synthesis and c-retiming Heuristically minimize area Leads to faster computation than solving ILP Leads to faster computation than solving ILP
17
Experimental Results Comparing the following three algorithms P. Pan (ICCD ’96) P. Pan (ICCD ’96) Sapatnekar/Deokar (TCAD ’96) Sapatnekar/Deokar (TCAD ’96) Maheshwari/Sapatnekar (TVLSI ’98) Maheshwari/Sapatnekar (TVLSI ’98)
18
P. Pan (ICCD’96) CPU time is measured on Sparc 5
19
Sapatnekar/Deokar (TCAD ’96) CPU time is measured on HP 735 workstation
20
Maheshwari/Sapatnekar (TVLSI ’98) CPU time is measured on DEC AXP system 3000/900 workstation
21
Conclusions Presented an alternative approach to retiming Compared it with other methods Proposed several improvements
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.