Integrating Logic Synthesis, Technology Mapping, and Retiming Alan Mishchenko Robert Brayton Satrajit Chatterjee UC Berkeley
Outline Mapping: from combinational to sequential Experimental results Handling sequential circuits during mapping Generalizing arrival times for sequential circuits Computing sequential arrival times (optional) Computing sequential cuts (with choices) Selecting the minimum-delay sequential mapping Performing final retiming Experimental results
Sequential Mapping Consider the sequential circuit as a cyclic combinational circuit Find a cutset breaking all loops From now on, disregard latches (registers), but keep them as labels on edges between the nodes Compute sequential arrival times Iteration over the circuit (in topological order from the cutset) (optional) Use sequential cuts for the computation of sequential arrival times Find the final mapping Perform the final retiming
Sequential Arrival Times Combinational arrival times, a(v) a(v) = minM, a match of v ( max[ a(ui) + Muiv | uiCM] Sequential arrival times, l(v) (given clock period ) l(v) = minM, a match of v ( max[ l(ui) - di + Muiv | uidiCM] u1 u2 u3 v v u1 u2 u3 d1 = 1 d2 = 0 d3 = 2
Computing Sequential Arrival Times for each node v in N do if (v is a PI) l(v) = 0; else l(v) = -; done = false; while ( done == false ) do done = true; for each non-PI node v in N do tmp = minM, a match of v ( max[ l(ud) - d + Muv | ud CM] ) if ( l(v) < tmp ) l(v) = tmp; done = false; if ( v is a PO and l(v) > ) return failure; return success; // bound has settled Note: clock cycle time f is given all k-cuts and matches have been pre-computed for the FRAIG
Theorem Theorem: Circuit S can be retimed to a clock period iff the l-value of each PO is less than or equal to .
Illustration of Iterative Sequential Arrival Time Computation b c i1 i2 f 2 d = 1 f = 2 -1 1 Converged 1 3 1
Convergence Theorem. If the nodes are relaxed in a topological order, the algorithm stops in at most |U| + 1 iterations, where U is a cut which breaks all loops.
Final Retiming Define Theorem: r is a legal retiming and can achieve a clock period less than + D where D is the largest combinational delay of a node. c(v) = l(v) / f is called the continuous retiming lag of node v by Pan
Example of c-Retiming wr(euv) = r(v) + w(euv) – r(u) b c i1 i2 f 2 f = 2 2 2 1 2 3 1 1 1.5 0.5 sequential arrival times c-lags retime lags new latch positions. 1 1 3
Overall View of Integration Flow
Experimental Results No retiming Retiming, no choices mapping (M) integrated choices and mapping (MC) Retiming, no choices mapping followed by retiming (M+R) integrated mapping and retiming (MR) Retiming and choices mapping with choices followed by retiming (MC+R) integrated choices, mapping, and retiming (MCR) FPGA 1.0 .96 .97 .82 .95 .74 SC 1.0 .95 .96 .84 .91 .76
IWLS 2005: Benchmark statistics
Integration for FPGAs (k = 5) Note: integrated results (MR and MCR) are significantly better than consecutive results (M+R and MC+R).
Integration for SCs (mcnc.genlib)
Conclusions Introduced a combination of synthesis/mapping/retiming based on detecting and using multiple circuit structures generalizing combinational tech-mapping to work with sequential circuits implementing retiming with initial state computation using AIGs as a unifying circuit representation The clock period is provably the smallest one (constant-delay model) reduction of 25% for both FPGAs and SCs, compared to only tech-mapping The approach is highly scalable because global minimization is achieved by a sequence of simple local transformations The results of integration can be efficiently verified Future work making the integration flow work incrementally minimizing registers after retiming recovering area for sequential circuits improving convergence speed of iterative procedures generating structural choices for sequential networks integrating with place and route adding verification capabilities