ELEC 7770 Advanced VLSI Design Spring 2016 Retiming Vishwani D. Agrawal James J. Danaher Professor ECE Department, Auburn University Auburn, AL 36849 vagrawal@eng.auburn.edu http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr16/course.html Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
ELEC 7770: Advanced VLSI Design (Agrawal) Retiming Retiming is a function-preserving transformation of a synchronous sequential circuit. Flip-flops are moved according to specific rules. Original references: C. E. Leiserson, F. Rose and J. B. Saxe, “Optimizing Synchronous Circuits by Retiming,” Proc. 3rd Caltech Conf. on VLSI, 1983, pp. 87-116. C. E. Leiserson and J. B. Saxe, “Retiming Synchronous Circuitry,” Algorithmica, vol. 6, pp. 5-35, 1991. Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
A Trivial Example: Reduced Hardware FF FF FF Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
ELEC 7770: Advanced VLSI Design (Agrawal) Example 2: Faster Clock FF FF Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
Example 3: Reduced Flip-Flops FF FF FF Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
Applications of Retiming Performance optimization Area optimization Power optimization Testability enhancement FPGA optimization Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
Fundamental Operation of Retiming A retiming move in a circuit is caused by moving all of the memory elements at the input of a combinational block to all of its outputs, or vice-versa. FF Combinational logic Combinational logic ≡ FF FF Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
ELEC 7770: Advanced VLSI Design (Agrawal) A Correlator Circuit Adder delay = 7 + + + PO host PI = = = = a1 a2 a3 a4 Comparator delay = 3 Flip-flops Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
ELEC 7770: Advanced VLSI Design (Agrawal) Graph Model g f e 7 7 7 h 1 3 3 3 3 1 1 1 a b c d Vertex vi: combinational, delay = d(vi), assumed unchanged by retiming d(host) = 0 Edge e(vi,vj): or eij, weight wij = number of flip-flops between vi and vj Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
Path Delay and Path Weight A set of connected nodes specify a path. A path does not traverse through the host node. Path delay = ∑ d(vi) = combinational delay of path Path weight = ∑ wij = clock delay of path Retiming of a node i is denoted by an integer ri It represents the number of registers moved across, initially ri = 0 Register moved from output to input, ri → ri + 1 Register moved from input to output, ri → ri – 1 After retiming, edge weight wij’ = wij + rj – ri Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
Example of Node Retiming r1 = 0 r2 = 0 r3 = 0 r4 = 0 r5 = 0 r6 =0 3 3 3 3 3 3 ∑ d(vi) = 12, ∑ wij = 0 r1 = 0 r2 = -1 r3 = 0 r4 = 0 r5 = 1 r6 =0 3 3 3 3 3 3 ∑ d(vi) = 12, ∑ wij = 2 Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
ELEC 7770: Advanced VLSI Design (Agrawal) Legal Retiming Retiming is legal if the retimed circuit has no negative weights. A legally retimed circuit is functionally equivalent to the original circuit – proof by Leiserson and Saxe (1991). Retiming is the most general method for changing the register count and position without knowing the functions of vertices. Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
ELEC 7770: Advanced VLSI Design (Agrawal) Example FF a c b x d c 1 x host Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
Example: Illegal Retiming c 1 c 1 → 0 x x host host 0 → –1 0 → –1 0 →1 Retiming vector = {0, 0, 0} Retiming vector = {0, 0, –1} a c FF x b d Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
Example: Legal Retiming 0 →1 c 1 c 1 → 0 0 →1 x x host host Retiming vector = {0, 0, 0} Retiming vector = {0, 1, 0} FF a FF c b x d Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
ELEC 7770: Advanced VLSI Design (Agrawal) Correlator Circuit Critical path delay = 24 g f e 7 7 7 re = 0 rg = 0 rf = 0 h rh = 0 1 3 3 3 3 1 1 1 rd = 0 ra = 0 a b rb = 0 c rc = 0 d Initial retiming vector = {0,0,0,0,0,0,0,0} Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
Retimed Correlator Circuit Critical path delay = 13 g f e 0→1 0→1 7 7 7 re = – 2 rg = 0 rf = –1 h 0→1 rh = 0 1→0 1→0 1 3 3 3 3 1 rd = – 2 ra = –1 rb = – 1 rc = – 2 a b c d retiming vector = {-1,-1,-2,-2,-2,-1,0,0} Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
ELEC 7770: Advanced VLSI Design (Agrawal) Retiming Theorem Given a network G(V, E, W) and a cycle time T, (r1, . . . ) is a feasible retiming if and only if: ri – rj ≤ wij for all edges (vi,vj) ε E ri – rj ≤ W(vi,vj) – 1 for all node-pairs vi, vj such that D(vi,vj) > T Where, W(vi,vj): is the minimum weight for all paths between vi and vj D(vi,vj): is the maximum delay among all minimum weight paths between vi and vj Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
ELEC 7770: Advanced VLSI Design (Agrawal) Proof of Condition 1 We assume that the original network is legal, i.e., all edge weights are positive. For an arbitrary edge (vi,vj) ε E: ri – rj ≤ wij or wij + rj – ri ≥ 0, means that after retiming the new weight wij’ = wij + rj – ri will be positive. Thus, condition 1 ensures the legality of retiming. ri flip-flops rj flip-flops wij flip-flops i j Edge (i,j) Original flip-flops, wij Retimed flip-flops, wij’ = wij + rj – ri ≥ 0 Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
ELEC 7770: Advanced VLSI Design (Agrawal) Proof of Condition 2 Given: d(vi) < T, for all i. Any retimed path whose combinational delay exceeds clock period, will have at least one flip-flop. The above is the requirement for correct operation. ri flip-flops rj flip-flops Wij flip-flops i j Path (i,j), D(i,j) > T Original weight, Wij Retimed weight, Wij’ = Wij + rj – ri ≥ 1 Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
Retiming Optimization Problem Given the initial retiming graph G(V, E, d, w) of a synchronous system and a required clock period P, find a feasible retiming transformation such that for the retimed graph G’ CP(G’) ≤ P Solution: Algorithm 1 – Finds CP(G), critical path of G Algorithm 2 – Finds feasible retiming G → G’ Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
Algorithm 1: Critical Path Delay Delete all edges (vi, vj) for which wij ≥ 1. Create a level order for vertices such that an edge (vi, vj) requires order of vj to be higher than that of vi. Traversing all nodes (v) in level order, compute ∆(v) ∆(v) = d(v), if v has no incoming edge ∆(v) = d(v) + max{∆(vi)}, for all incoming edges (vi, v)} i CP(G) = max{∆(vj), for all vertices j} j Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
Algorithm 1 Application 7 7 7 g f e h a 1 b c 3 3 3 3 d 1 1 1 ∆=24 7 7 7 ∆=10 g CP(G)=∆=24 f e ∆=17 h a 1 b 1 c 3 1 3 3 3 d ∆=3 1 ∆=3 ∆=3 ∆=3 Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
Algorithm 2: Retiming for Period = P Initialize retiming variable, r(v) = 0, for all v. Repeat |V| – 1 times: Derive retiming graph. Run Algorithm 1 to determine ∆(v) for all v. For each v such that ∆(v) > P, set r(v) to r(v) + 1. Derive retiming graph and run Algorithm 1: If CP(G) > P, then no feasible retiming exists. Otherwise, CP(G) < P and the retimed graph is the required result. Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
Algorithm 2 Application, P = 13 ∆=24 ∆=10 7 7 7 CP(G)=∆=24 g ∆=17 f e h ∆=3 ∆=3 a ∆=3 1 b c 3 3 3 3 d ∆=3 1 1 1 ∆=14 1 ∆=10 7 7 7 g ∆=7 f e ∆=14 h 1 1 ∆=3 a 1 b 1 c 1 3 3 3 3 d ∆=14 ∆=3 ∆=3 Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
ELEC 7770: Advanced VLSI Design (Agrawal) Retimed Circuit for P = 13 Critical path delay = 13 g f e 1 1 7 7 7 re= -2 rg=0 rf= -1 h 1 rh=0 1→0 1 3 3 3 3 1 rd= -2 ra= -1 rb= -1 rc= -2 a b c d retiming vector = {-1,-1,-2,-2,-2,-1,0,0} Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)
ELEC 7770: Advanced VLSI Design (Agrawal) References Two papers by Leiserson et al. (see slide 2). G. De Micheli, Synthesis and Optimization of Digital Circuits, New York: McGraw-Hill, 1994. N. Maheshwari and S. S. Sapatnekar, Timing Analysis and Optimization of Sequential Circuits, Boston: Springer, 1999. Spring 2016, Feb 12 . . . ELEC 7770: Advanced VLSI Design (Agrawal)