Reconfigurable Computing

Reconfigurable Computing
Dr. Christophe Bobda CSCE Department University of Arkansas

Chapter 3 FPGA Synthesis

Agenda Brief tour in Logic Synthesis LUT-Based technology mapping
The chortle algorithm The FlowMap approach

1. Goal A structured system is made upon a set of combinatorial parts separated by memory elements The goal of the logic synthesis is to provide an implementation of a structured system for a given platform or for a given target library FPGA-Goal: Generation of configuration data The implementation must be optimized according to factors like area, delay, power consumption, testability, etc... A structured digital system

1. Two-level-logic * * + * * Two-level logic
Two approaches to logic synthesis: Two-level logic synthesis: targets designs represented in two-level logic sum of product-terms sums are implemented on the first level and the product on the second level Advantages: Natural representation of Boolean functions Well understood and Easy manipulation Drawbacks: not representative of the logic complexity. bad estimator of complexity during logic optimization Initially developed for PALs and PLAs X1 * X2 * F X3 + X2 * X4 * X3 Two-level logic

1. Multi-level-logic Multi-level logic
Multi-level logic synthesis: targets multilevel designs many Boolean function on the path from the inputs to the outputs Advantages: Small Faster Consume less power in most cases Representative of the logic complexity Drawbacks: Difficult to manipulate Few manipulation algorithms exist Appropriate for mask-programmable or field programmable devices Multi-level will therefore be considered in this course X1 F2 X2 X3 F1 X2 X4 X6 F3 X5 X5 Multi-level logic

1. Boolean Networks Multi-level logic are usually represented using
A Boolean network is a directed Acyclic graph (DAG) in which: A node represents an arbitrary Boolean function An edge represents the (data) dependency between nodes viable representation is required for manipulation Important factors are: memory efficiency correlation with the final representation

1. Node representation The choices usually made for node representation are: Sum-Of-Products (SOP) Factored Form (FF) Binary Decision Diagram (BDD) Sum-Of-Product: Sum of product term Factored form (FF): Defined recursively as follow: (FF = product) or (FF = sum). (product = literal) or (product = FF1*FF2). (sum = literal) or (sum = FF1+FF2). Example: is a product of the factored forms and , which in turn is a sum of the factored forms and

1. Node representation Binary Decision Diagram (BDD): A BDD is a rooted DAG used to represent Boolean function. Two kinds of nodes exist: Variable nodes : A variable node v is a non terminal node with the following attributes: (i defines a variable xi) Two children and Constant nodes: A constant node v is a terminal node with OBDD: A BDD is ordered if an ordering relation exists between its the nodes Example: ordering the nodes from the root to the terminal For each non terminal v, if is non terminal, then Similarly, if is non terminal, then

1. Node representation Correspondence between a BDD with root v and a Boolean function The root represents the Boolean function If v is terminal, then If v is a non terminal node with index i, the Shannon expansion theorem is used: The value of for a given assignment is obtained by traversing the graph from the root to the terminal according to the assignment values The figure aside shows the optimal-BDD representation of the function

1. Node representation ROBDD (Reduces Ordered BDD)
An ROBDD is an OBDD with: the subtree rooted at and the one rooted at are not isomorphic Two BDDs and are isomorphic iff there exists a bijective function s.t For a terminal node in , is a terminal node in For a non terminal node , is a non terminal node with and The figure aside shows the optimal-BDD representation of the function

1. Node manipulation Given a suitable node representation, operations are done on the Boolean network. The goal is the generation of an equivalent and cost effective simplified function. The operations usually applied for the reduction of Boolean networks are: Decomposition: Replace a Boolean expression with a collection of new expressions. A Boolean function is decomposable if we can find a function such that Example: literals Decomposition: , Representation with 8 literals Extraction: Use to identify common intermediate sub-functions from a set of given functions. Example and can be rewritten as and with

1. Node manipulation Factoring: Transformation of SOP-expressions in factored form Example can be rewritten Substitution: Replace an expression within a function with the value of an equivalent expression Example: can be rewritten as with Collapsing or Elimination: Reverse operation to substitution. It is use to eliminate levels in order to meet timing constraints Example: with will be replaced by

2. LUT-Technology mapping
The technology mapping implements the optimized nodes of the Boolean network to the target device library. In the FPGA case, library elements are LUT. Therefore, this process is called LUT-based Technology mapping. LUT-Based technology mapping is an optimization process whose goal is usually: Minimizing the number of LUT used (device area) Minimizing the signal delay (Speed) Optimizing routability, minimizing power (very few work) In this chapter we will study two LUT-technology mapping algorithms. The chortle-crf for area minimization The FlowMap for delay minimization

2. LUT-Technology mapping – definitions
Given a Boolean network: The fan-in of a node is the set of nodes whose outputs are inputs of The fan-out of a node is the set of nodes, which use the output of as inputs A primary input (PI) node is a node with no predecessor. A primary output (PO) is a node, which has no successor. The level of a node is the length of the longest path from the primary input to that node. The depth of a graph is the largest level of a node in the graph. For a node , is defined as the set of nodes which are fan-in of A Boolean network is K-Bounded, if for all nodes in the graph.

A tree or fan-out-free circuit is one in which each node has a maximal fan-out of one. A forest is an independent set of trees A leaf-DAG is a combinatorial circuit in which the only combinatorial gates with a fan-out greater than one are the primary inputs. The depth of a graph is the largest level of a node in the graph. For a node , is defined as the set of nodes which are fan-in of A Boolean network is K-Bounded, if for all nodes in the graph.

A Cone at a node is the three with root and which spans from to the primary inputs. A Cone at a node is K-feasible if: Any path connecting a node in and lies entirely in The LUT-technology mapping can be defined as the problem of covering a Boolean network with a set of K- feasible cones. A K-feasible Cone at v Graph covering with cones LUT Mapping

2.1 The Chortle-crf algorithm
Developed by Francis et al, University of Toronto in 1991. Two steps approach: 1st step: Partition the circuit in a set of trees Separately map the trees into circuits of K-inputs LUTs 2nd step: Assemble the circuits implementing the trees to produce the final circuit The two main goals are: Minimizing the number of LUTs and therefore the device area. Minimizing the number of used pins at the output LUTs. Transformation of the original graph in trees Partitioning through duplication of node with fan-out greater than 1. Leaf-DAG are converted to trees by duplicating common inputs

Mapping the threes into LUT-netlist Bin packing approach which traverses the node from the PIs to the POs At each node , the best-circuit implementing the K-feasible cone at is searched for. Best circuit: The three routed at should contain the minimum number of LUT The output LUT, i.e the cone at should contain the maximum number of unused inputs. The second objective is to minimize the number of input of Approach: At each node, construct a tree of LUTs that implement: The function of the fan-in LUT The decomposition of the node The construction of the three is done in two steps

2.1 Chortle-crf algorithm
First step: Two-level decomposition The two-levels consist of a single first-level and several second- level nodes (the fan-in). Each second-level node implements the operation of the nodes being decomposed over a set of fan-in LUTs. The first-level nodes will be implemented in the second phase The construction is done using bin-packing approach. bin-packing : find a minimum number of bins with a given capacity to hold a set of boxes In this case: the boxes are the second level or fan-in LUTs and the bins are the resulting LUTs. The capacity of a bin is the number, K, of LUT-inputs.

Packing consist of combining two two-input-LUTs LUT1 (implementing the function f1) and LUT2 (implementing the function f2) into a new LUTr that implements the function f = f1 Ø f2,,where Ø is the function implemented in the fan-out node

Two-level decomposition First-fit decreasing Algorithm Two-Level-decomposition { start with an empty list of LUT while there are unpacked fanin LUTs do if the largest unpacked fanin LUT will not fit within any LUT in the list create an empty LUT and add it to the end of the LUT list } pack the largest unpacked fanin LUT into the first LUT it will fit within

Second step: Multi-level decomposition The first-level node are implemented with a three of LUTs Reduction of the number of LUTs is done by using unused pins of the 2nd level LUTs to implement a portion of the first-level LUTs. Algorithm MultiLevel { while there is more than one unconnected LUT do if there are no free inputs among the remaining unconnected LUT create an empty LUT and add it to the end of the LUT list } connect the most filled unconnected LUT to the next unconnected LUT with a free input

Improvement Preprocessing step to insure before the creation of the forest Insures that inverted egdes are only available at leaf No consecutive OR and no consecutive AND available Exploiting reconvergent paths A reconvergent path is caused by a node with fan-in > 1 Creates two paths in the graph that terminates at same node Pack reconvergent paths cause by an input in just one LUT

Improvement Logic replication at fan-out nodes reduces the number of LUTs

2.2 The FlowMap algorithm The FlowMap algorithm is a network flow-based method aimed at minimizing signal delays of mapped designs. We first recall some basics of network flow. Given is a network (which is a graph with the set of nodes and the set of edges ) with source and a sink A cut is a partition of with and The cut-size of a cut is the number of nodes in adjacent to some nodes in A cut is K-feasible iff The edge cut-size of is the weighted sum of crossing edges. For each node , we define the label of as the depth of the optimal LUT which implements in an optimal mapping of the subgraph of (where is the cone at ) The height of is the maximum label in The volume of a cut is the number of nodes in

2.2 The FlowMap algorithm The objective of the FlowMap algorithm is the minimization of the signal delays determined by: The delay in the LUTs. The interconnection delay. LUT placement is not known during the technology mapping step. only LUT delay is considered. Interconnection delay is assumed to be the same for all signals. The delay of a signal is therefore the number of LUTs that the signal traverses on a path from input to output.  minimization of the depth of the resulting DAG. The FlowMap algorithm is a two-steps Method: Node labelling phase. Node mapping phase.

Network transformation
2.2 The FlowMap algorithm The First phase of the algorithm computes the labels of the nodes in a topological order.  each nodes is processed after all its predecessors The labelling is done as follow: Each primary input is assigned the label 0. For a given node to be processed, the cone is transformed into a network by inserting a source node whose output is connected to all inputs of . With the assumption that implements in an optimal mapping of , the cut , where is the set of nodes in and is K-feasible. The level of is then given by: Lemma 1: If is the maximum label in , then Network transformation

2.2 The FlowMap algorithm Proof: Let then for any cut in either
Lemma 1: If is the maximum label in , then Proof: Let then for any cut in either or also determines a K-feasible cut in with , where and In the first case, we have and therefore In the second case we have  the label of a node cannot be smaller than that of its predecessor.

2.2 The FlowMap algorithm is a K-feasible cut, because is K-bounded ( ) Because each node in is either in or is a predecessor of some node in , the maximum label of the nodes in is  ,i.e

2.2 The FlowMap algorithm is the set of collapsed nodes
Lemma 2: Let be the network obtained from by collapsing all the nodes with maximum label p in into a single node has a K-feasible cut of height iff has a K-feasible cut. Proof: if has a K-feasible cut , set and , then is a K-feasible cut in No node in has a label  according to lemma 1 we have   if has a K-feasible cut of height , then cannot contain a node with label  ,i.e Forms a K-feasible cut in is the set of collapsed nodes Network collapsing

2.2 The FlowMap algorithm The problem of testing if a K-feasible cut with height exists can be done by first transforming into A second transformation is done to transform into a new network For each node in other than and , two new nodes and are introduced and connected by a bridging edge The source and sink are also inserted in For each edge , an edge is inserted in For each edge in a new edge is introduced in The capacity of each bridging edge is set to 0 and that of non bridging edge is set to The goal of this step is: to reduce the node cut-size in into an edge cut-size in applied well known methods to solve the edge cut-size in finally derive the equivalent solution in This will be done using the following Lemma Second transformation

2.2 The FlowMap algorithm Lemma 3: has a K-feasible cut iff has a cut whose edge size is no more than K. Testing is such a cut exists in is done using Min-cut max-flow theorem (the minimum cut produce the maximal flow between source and sink). The augmenting path method is then used to increasingly detect if the value of a flow in is more than K. Second transformation Derived solution

2.2 The FlowMap algorithm Flow in a network: stream of data from the source to sink Residual value = flow – capacity. The residual value can be added to the flow on an edge in order to saturate that edge Capacity of a cut = sum of all positive crossing edge capacity (not influenced by negative crossing edges) Residual network = residual edges + associated nodes Augmenting path = path from source to sink in the residual network Max-flow min-cut theorem: Ford and Fulkerson The value of flow is bounded by the capacity of any cut in the network  the maximum flow is bounded from abobe by the minimum cut capacity K-feasible cut only exists iff the maximum value of any flow is less than K Approach: since we are only interested in testing if the value of a cut is less than K, we use the method of augmenting path Augmenting path approach: increase the value on the residual path and test Test the value of the resulting flow. If les than K continue. If more stop

2.2 The FlowMap algorithm In the second phase of the FlowMap algorithm, nodes are mapped to K- LUTs. The algorithm works on the set of outputs of the Boolean network. Initially contains all primary outputs For each node , it is assumed that a minimum K-feasible cut have been computed in the first phase. A K-LUT is created to implement the function of as well as that of all nodes in Is then updated to Nodes belonging to two different cut-set and will be automatically duplicated.

2.2 The FlowMap algorithm Improvement: Predecessor packing

Reconfigurable Computing

Similar presentations

Presentation on theme: "Reconfigurable Computing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reconfigurable Computing

Similar presentations

Presentation on theme: "Reconfigurable Computing"— Presentation transcript:

Similar presentations

About project

Feedback