Temporal Logic Replication for Dynamically Reconfigurable FPGA Partitioning Wai-Kei Mak Dept. of Computer Science and Engineering University of South Florida Evangeline F.Y. Young Dept. of Computer Science and Engineering The Chinese University of Hong Kong
Outline I.Dynamically reconfigurable FPGA II.Temporal partitioning = Conventional partitioning? III.Temporal logic replication What? Why? How? IV.Experimental results V.Conclusions
Dynamically Reconfigurable FPGA zStore multiple contexts on chip. zReuse logic blocks and wire segments dynamically. zThe contexts stored can correspond to the multiple stages of a large circuit.
Temporal Circuit Partitioning zTemporal partitioning multiple stages execute sequentially z Spatial partitioning multiple components execute concurrently
Temporal Logic Replication zCan reduce buffering requirement. zEffectively utilize available slack logic capacity.
Temporal Constraints zFor a net n = (v 1, {v 2, …, v p }), require s(v 1 ) s(v j ), j=2,…,p, if v 1 is a combinational node
Temporal Constraints (Cont’d) require s(v j ) s(v 1 ), j=2,…,p, if v 1 is a flip-flop node
Temporal Partitioning with Replication Problem: Partition given circuit into pre-defined # stages satisfying all temporal constraints. Objective: Minimize buffers required between stages. Proposal: Utilize available slack logic capacity to reduce signal buffering. Solution: An effective 2-step approach.
2-Step Approach Step 1: Compute a temporal partition w/o replication. Step 2: Repeatedly identify the bottleneck stage and apply replication for that stage.
Advantages of 2-Step Approach zWill not replicate unnecessarily. zAll temporal constraints are already satisfied when replicating.
Min-Area Min-Cut Replication Let stage i be the bottleneck stage. Min-Cut Replication Compute a subset of nodes R i in stage i for replication into stage i+1 to maximally reduce the communication cost at stage i. Min-Area Min-Cut Replication Compute a minimum subset of nodes R i in stage i for replication into stage i+1 to maximally reduce the communication cost at stage i.
Optimal Solution for Min-Area Min-Cut Replication Observation 1 : The min-cut replication problem can be solved by computing a minimum cut (V i -R i,R i ) in stage i. Observation 2 : The min-area min-cut replication problem can be solved by computing a minimum cut (V i -R i,R i ) in stage i s.t. |R i | is minimized. Let V i = set of nodes in stage i.
Example A pre-partition: Computing a minimum cut in stage 2:
Example (Cont’d) Computed R 2 = {j}
Network Modeling zNeed to ensure that cut size = buffer requirement zFor a net (v 1, {v 2, …, v p }),
The Case of Limited Slack Logic Capacity zThe solution of min-area min-cut replication suffices if slack logic capacity is sufficiently large. Otherwise, |R i | exceeds the slack, then use a heuristic to reduce R i. Use a repeated max-flow min-cut heuristic to gradually reduce R i (so cut size is only increased gradually). zH. Yang, D.F. Wong, “Efficient Network Flow based Min-Cut Balanced Partitioning”, ICCAD’94.
Algorithm Input: Stage area bound A. 1. Network modeling for bottleneck stage i. 2. Compute min-cut (V i -R i,R i ) s.t. |R i | is minimized. 3. If |V i+1 |+|R i | A, stop and return R i. 4. Collapse a node in R i with all nodes in V i -R i, goto 2.
Experimental Results Circuit#buf w/o rep.#buf w/ rep.Imprv. %Rep. % C C C C S S S S S
Conclusions zProposed temporal logic replication to reduce buffering requirement in DRFPGA partitioning. zPresented an effective 2-step approach. zFormulated and optimally solved the min-area min-cut replication problem. zExtended to case of limited slack logic capacity. zIn the paper, a new timing-driven temporal partitioning algorithm was introduced to compute pre-partition.