Address comments to FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1 Electrical Engineering Dept., UCLA 2 Computer Science Dept., UCLA Presented by Yu Hu
Outline Background and Motivation Combinational Resynthesis with MIMO Blocks Sequential Resynthesis Experimental Results Conclusion and Future Work
Background Area-optimal Technology Mapping for LUT-based FPGAs is NP-Hard [Farrahi, TCAD’94] Post-mapping resynthesis is effective to reduce area (LUT#) [Ling, DAC’05] Area reduction Fault tolerance, power optimization, physical-aware optimization, and many others.
Boolean Matching Based Resynthesis Attempt to re-map a logic block to reduce LUT# BM can be used to handle both homogenous and heterogeneous PLBs (Source: Andrew Ling, University of Toronto, DAC'05)
Overall Flow of BM-based Resynthesis Multi-iterations of block-based Boolean Matching (Source: Andrew Ling, University of Toronto, DAC'05)
Limitations of Existing Work Considering single-output logic blocks Considering combinational portion of the circuit A larger solution space can be explored and area could be reduced if Multiple-output logic blocks are considered FF boundaries are eliminated
Motivation Example – Retiming Resynthesis is restricted by FF boundaries … Retiming creates chances for resynthesis 2-LUT network
Motivation Example – MISO Resynthesis 2-LUT network Function of O 2 has to be preserved … Only 1-LUT reduction
Motivation Example – MIMO Resynthesis 2-LUT network 60% area reduction is obtained by sequential MIMO resynthesis!
Major Contributions Present a Boolean matching based resynthesis algorithm considering multi-output logic blocks Propose a sequential resynthesis technique Reduce area by up to 10% compared to combinational resynthesis, when both using MIMO blocks
Outline Background and Motivation Combinational Resynthesis with MIMO Blocks SAT-based Boolean Matching for Multiple Output Functions Resynthesis Algorithm Experimental Results Sequential Resynthesis Experimental Results Conclusion and Future Work
Existing Boolean Matching for MISO 2-LUT fg ? Formulate the sub-problem of resynthesis to Boolean matching (BM) BM: Can function f be implemented in circuit g ? Resynthesis: Is there a configuration to g so that for all inputs to g, f is equivalent to g ? (Source: Andrew Ling, University of Toronto, DAC'05)
SAT-BM for Multi-Output Functions G LUT [i 1, i 2, F] = ( i 1 + i 2 + ¬L 0 + F) ( i 1 + i 2 + L 0 + ¬ F) ( i 1 + ¬ i 2 + ¬L 1 + F) ( i 1 + ¬ i 2 + L 1 + ¬ F) (¬ i 1 + i 2 + ¬L 2 + F) (¬ i 1 + i 2 + L 2 + ¬ F) (¬ i 1 + ¬ i 2 + ¬L 3 + F) (¬ i 1 + ¬ i 2 + L 3 + ¬ F) G = G LUT1 [x 1, x 2, F 2 ] · G LUT2 [F 2, x 3, F 1 ] Configuration bits are encoded as SAT literals Characteristic function
SAT-BM for Multi-Output Functions G = G LUT1 [x 1, x 2, F 2 ] · G LUT2 [F 2, x 3, F 1 ] Replicated SAT Problem: G expand = G[X/000, F 1 /0, F 2 /0] · G[X/001, F 1 /0, F 2 /0] G[X/010, F 1 /1, F 2 /0] · G[X/011, F 1 /0, F 2 /0] G[X/100, F 1 /1, F 2 /0] · G[X/101, F 1 /0, F 2 /0] G[X/110, F 1 /1, F 2 /1] · G[X/111, F 1 /1, F 2 /1] The solution of this SAT problem corresponds to the Boolean matching results SAT!
Unique Problem of MIMO Synthesis MIMO-resynthesis can generate new path in the block The new path might cause combinational cycles Conservative solution: detect combinational cycles and discard resynthesis solutions with cycles Combinational cycle! PI PO False path?
Experimental Settings Implementation in OAGear SAT-BM uses miniSAT biggest MCNC benchmarks are tested 10 combinational 10 sequential mapped with 4-LUTs by Berkeley ABC Resynthesis settings One traversal is performed Blocks with up to 10 inputs are considered Results are verified by ABC equivalency checkers
Experimental Settings – PLB templates All three possible structures for PLBs with up to 10 inputs and less than 4 4-LUTs [Ling, DAC’05] All intermediate wires are treated as the outputs in MIMO resynthesis
Combinational Resynthesis: MISO vs. MIMO MIMO does not out-perform MISO significantly, probably due to Rejecting “false paths” introduced by MIMO resynthesis Narrow PLB templates Small block size and LUT size No iterations of re-synthesis
Outline Background and Motivation Combinational Resynthesis with MIMO Blocks Sequential Resynthesis Experimental Results Conclusion and Future Work
Structure Impact on Sequential Resynthesis The structure of a logic block decides the sequential resynthesis strategies Retiming Classic retiming All edges have non-negative weights after retiming Peripheral retiming Result in negative number of FFs at peripheral edges Logic Duplication Allow duplication Not allow duplication
Case I: Classic Retiming w/o Duplication Step1: backward retiming Step2: combinational resynthesis Step3: forward retiming
Case II: Peripheral Retiming w/o Duplication Step1: peripheral retiming Step2: combinational resynthesis Step3: check feasibility of forward retiming Brorrow FFs from outside. A resynthesis solution w/ feasible retiming
Case II: Peripheral Retiming w/o Duplication Step4: forward retiming
Case III: Retiming w/ Duplication FF not movable! FF# = 1 FF# = 0 Duplication is required to enable retiming!
Case III: Peripheral Retiming w/ Duplication FF not movable! Identical configuration for LUT-c and LUT-d.
Duplication or Not? – A Sufficient and Necessary Condition An acyclic block is feasible for retiming w/o duplication iff [Brayton, TCAD’91] a. All input-output paths have the same FF# b. There exist numbers α i and β j for input i and output j, s.t. FF# in (i,j) path is equal to (α i +β j ) α 1 +β 1 α 2 +β 1 α 3 +β 1 α 4 +β 1 * * α 3 +β 2 α 4 +β * * 0 0 = α 1 = 1, α 2 = 0, α 3 = 1, α 4 = 1, β 1 = 0, β 2 = -1 α 1 α 2 α 3 α 4 β 1 β 2
Duplication or Not? – A Sufficient and Necessary Condition An acyclic block is feasible for retiming w/o duplication iff [Brayton, TCAD’91] a. All input-output paths have the same FF# b. There exist numbers α i and β j for input i and output j, s.t. FF# in (i,j) path is equal to (α i +β j ) Time complexity O(e min(m,n)) Negligible for small block Classic or peripheral retiming? Classic retiming iff there exist non-negative α i and β j
Can We Accept Every Single Resynthesis? – Feasibility Checking for Sequential Resynthesis Initial State Computation Filter out some of the rewriting steps so that an equivalent initial state for the synthesized machine can be computed from a given initial state of the original machine. Rewriting invariant [Brayton, IWLS’07] Can be reduced to a SAT problem Clock Period Preservation A New Retiming-based Technology Mapping Algorithm for LUT-based FPGAs [Pan, FPGA’98] Sequential arrival time: l-values
Experimental Results – Sequential vs. Combinational Resynthesis Seq-resynthesis obtains up to 9% area reduction Factors to affect seq-resynthesis Sequential structure All factors in combinational resynthesis
Outline Background and Motivation Combinational Resynthesis with MIMO Blocks SAT-based Boolean Matching for Multiple Output Functions Resynthesis Algorithm Sequential Resynthesis Conclusion and Future Work
Conclusions and Future Work Proposed a new resynthesis considering both MIMO blocks and retiming Results indicate that sequential resynthesis obtains more gain than MIMO resynthesis Future work PLBs from [Ling, DAC’05] are optimal only for MISO, and we will develop new PLB structures for MIMO re- synthesis Study the resynthesis for heterogeneous FPGAs
Thanks FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu, Victor Shih, Rupak Majumdar and Lei He