Download presentation
Presentation is loading. Please wait.
1
Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification
2
Outline Motivation Technology mapping for combinational circuits Generalizing the concept of combinational delay to sequential circuit using the concept of l-value Technology mapping for sequential circuits Computation of cuts Computation of cuts Search for the optimum-delay solution Search for the optimum-delay solution Computation of optimum l-valuesComputation of optimum l-values Constructing the solution Constructing the solution Retiming for optimum delay Retiming for optimum delay
3
Traditional Tech Mapping Approach Cut sequential circuit at the latch boundary Optimize and map the combinational part Pros: Preserves latch encoding Pros: Preserves latch encoding Cons: Potentially suboptimal Cons: Potentially suboptimal (Optional) Retime the mapped circuit LI PO PI LO Logic Latches
4
Motivating Example: LUT Size = 3 ab c i1i1 i2i2 f ab c i1i1 i2i2 f i2i2 i1i1 f i1i1 f i2i2 2 LUTs mapping retiming 1 LUT
5
Basic Mapping: Overview Pre-compute truth tables of gates (supergates) Represent netlist as an AND-INV graph (AIG) For each node, compute cuts Map network for delay Recover area using heuristics Select final mapping
6
What is Mapping? Mapping expresses functions using gates z1z1 z2z2 z3z3 x5x5 x4x4 x3x3 x2x2 x1x1
7
cd ab 00011110 000010 010011 110110 100010 F(a,b,c,d) = ab + d(ac’+bc) F(a,b,c,d) = ac’(b’d’)’ + c(a’d’)’ = ac’(b+d) + bc(a+d) cd ab 00011110000010 010011 110110 100010 6 nodes 4 levels 7 nodes 3 levels bcac a b d acbdbcad Basic Mapping: AND-INV Graphs
8
Basic Mapping: Computing AIG Technology-independent synthesis Any synthesis flow can be used Any synthesis flow can be used Constructing AIG from factored forms SOPs are factored using algebraic factoring SOPs are factored using algebraic factoring Balancing AIG Reduces delay Reduces delay z1z1 z2z2 z3z3 x5x5 x4x4 x3x3 x2x2 x1x1 n Fn= x 2 x 3 ’ x 4
9
Basic Mapping: Cuts Definition. A cut C for a node n is a set of nodes, such that all paths from the primary inputs to n passes through a node in C Node itself is an elementary cut Node itself is an elementary cut k-feasible cuts are cuts containing at most k nodes k-feasible cuts are cuts containing at most k nodes An average number of 5-feasible cuts in benchmarks is ~20 cuts per node An average number of 5-feasible cuts in benchmarks is ~20 cuts per node n x3x3 x2x2 x1x1
10
Basic Mapping: Computing Cuts Compute all 2-feasible cuts of node n. Cuts for node p = {{p}, {s,x 2 }, {x 1,x 2 }} Cuts for node q = {{q}, {x 2,t}, {x 2,x 3 }} Cuts for node n = {{p}, {s,x 2 }, {x 1,x 2 }} {{q}, {x 2,t}, {x 2,x 3 }} {n} = {{n}, {p,q}, {p,x 2,t}, {p,x 2,x 3 }, …} 2-feasible cuts for node n = {{n}, {p,q}} n x3x3 x2x2 x1x1 q p s t All k-feasible cuts are computed in one pass over the AIG Assign elementary cuts for primary inputs Assign elementary cuts for primary inputs For each internal node For each internal node merge the cut sets of children while removing duplicated cutsmerge the cut sets of children while removing duplicated cuts add the elementary cut composed of the node itselfadd the elementary cut composed of the node itself
11
Basic Mapping: Truth Tables Truth table is a bit-string representing Boolean function of a cut Truth tables are computed for all cuts of all nodes For each cut, assign elementary variables to cut leaves For each cut, assign elementary variables to cut leaves Compute the truth tables for the internal nodes in topological order Compute the truth tables for the internal nodes in topological order x3x3 x1x1 t q x2x2 x1 = 10101010 x2 = 11001100 x3 = 11110000 t = x2 & x3 = 11000000 q = x1 & t = 10000000 LSB MSB
12
Basic Mapping: Delay Optimality Assign the arrival times of the primary inputs For each node, in topological order Compare the truth table of the cut with the truth tables of the gates (when they are equal, we have a match) Compare the truth table of the cut with the truth tables of the gates (when they are equal, we have a match) Compute the arrival times of each cut, in both phases Compute the arrival times of each cut, in both phases Select the best cut for each phase Select the best cut for each phase When arrival times are equal, use area as a tie-breaker When arrival times are equal, use area as a tie-breaker c1c1 c2c2 c3c3 c4c4 T c2 < T c3 < T c1 < T c4 C 2 is the best cut
13
Basic Mapping: Area Recovery Performs three passes Minimize area flow Minimize area flow Minimize exact area for best matches Minimize exact area for best matches Minimize area by phase assignment Minimize area by phase assignment In each pass, for all nodes, in topological order Consider matches with Consider matches with ArrivalTime <= RequiredTime ArrivalTime <= RequiredTime Among these matches, pick the one minimizing area(flow) Among these matches, pick the one minimizing area(flow) When area(flows) are equal, use delay as a tie-breaker When area(flows) are equal, use delay as a tie-breaker c1c1 c2c2 c3c3 c4c4 A c2 < A c3 < A c1 < A c4 C 2 is the best cut
14
Basic Mapping: Area Flow Definition: Area flow of a primary input is 0 Area flow of a primary input is 0 Area flow of a node in the network is Area flow of a node in the network is AF(n) = [ Area(n) + i AF(fanin i (n)) ] / NumFanouts(n) 0 0 1/3 (1+1/3) / 2 = 2/3 0
15
Basic Mapping: Area of a Match Definition. Area of a match is the sum total of the areas of all the gates in maximum fanout-free cone (MFFC) of the root gate (includes the root gate and some of the fanins) M1M1 g1g1 g2g2 g3g3 g4g4 g5g5 g6g6 g7g7 g8g8 g9g9 g 10 g 11 g 12 g 13 A(M 1 )=A(g 1 )+ A(g 3 )+ A(g 4 )+ A(g 5 )+A(g 9 )
16
Basic Mapping: Select Final Mapping Extracting the final mapping from the AIG after the best matches are assigned to each node Select the best match for each primary output node Select the best match for each primary output node Recursively, for each fanin of a selected match, select its best matches Recursively, for each fanin of a selected match, select its best matches z1z1 z2z2 z3z3 x5x5 x4x4 x3x3 x2x2 x1x1
17
Mapping for Sequential Circuits Represent netlist as an AND-INV graph (AIG) For each node, compute cuts (iteration over the circuit) For each node, compute l-values (iteration over the circuit) Map network for delay (iteration over the clock periods) Recover area using heuristics Select final mapping P. Pan and C.-C. Lin, “A new retiming-based technology mapping algorithm for LUT-based FPGAs”, Proc. FPGA ’98.
18
l-Value: A Generalization of Combinational Delay Definition. For each edge e: u v in S, we assign l-weight equal to - d+ u v, where is the clock period, is the clock period, d is the number of latches on the edge, and d is the number of latches on the edge, and u v is the combinational delay of pin u of node v. u v is the combinational delay of pin u of node v. Definition. The l-value of a node in S is defined as the maximum weight of the paths from the PIs to the node using the l-weights. Theorem: S can be retimed to a clock period iff the l-value of each PO is less than or equal to .
19
Example ab c i1i1 i2i2 f D = 1 = 1 - infeasible l(a) = 1, l(c)=2, etc D = 1 = 2 - feasible l(a) = 1, l(c)=2, l(a) = 1, l(c) = 2, etc D = 1 = 3 - feasible l(a) = 1, l(c)=2, l(a) = 0, l(c) = 1, etc
20
Computing Cuts for each non-PO node v in N L v = {{v 0 }}; L v = {{v 0 }}; done = false; while ( done == false ) do done = true; done = true; for each node v (not PI or PO) in N do for each node v (not PI or PO) in N do tmp = merge (L u1, L u2, …, L ui ); tmp = merge (L u1, L u2, …, L ui ); if ( tmp L v ) then if ( tmp L v ) then Lv = tmp {{v 0 }}; Lv = tmp {{v 0 }}; done = false; done = false; return success; // L v settled to C v for each v merge(C u1,C u2,…,C ut ) = {c = c 1 d1 c 2 d2 … c t dt |c i C ui and |c| k } where where c i di = {x d+di | x d c i } and c i di = {x d+di | x d c i } and d i is the number of latches on the edge from u i to v. d i is the number of latches on the edge from u i to v.
21
Example i 1 i 2 a b c i 1 i 2 a b c 0: {i 1 0 } {i 2 0 } {a 0 } {b 0 } {c 0 } 0: {i 1 0 } {i 2 0 } {a 0 } {b 0 } {c 0 } 1: {i 1 0, c 1 } {i 2 0, c 0 } {a 0, b 1 } 1: {i 1 0, c 1 } {i 2 0, c 0 } {a 0, b 1 } {a 0, i 2 1, c 1 } {a 0, i 2 1, c 1 } {i 1 0, c 1, b 1 } {i 1 0, c 1, b 1 } {i 1 0, c 1, i 2 1 } {i 1 0, c 1, i 2 1 } 2: {i 1 0, a 1, b 2 } {i 2 0, a 0, b 1 } 2: {i 1 0, a 1, b 2 } {i 2 0, a 0, b 1 } ab c i1i1 i2i2
22
Finding Minimum l-Values for each node v in N do if (v is a PI) l(v) = 0; if (v is a PI) l(v) = 0; else l(v) = - ; else l(v) = - ; done = false; while ( done == false ) do done = true; done = true; for each non-PI node v in N do for each non-PI node v in N do tmp = min c, a cut of v ( max[ l(u) - d+ u v | u d c] ) tmp = min c, a cut of v ( max[ l(u) - d+ u v | u d c] ) if ( l(v) < tmp ) if ( l(v) < tmp ) l(v) = tmp; done = false; l(v) = tmp; done = false; if ( v is a PO and l(v) > ) return failure; if ( v is a PO and l(v) > ) return failure; return success; // bound have settled
23
Constructing Mapping Solution U = the set of POs S = { v | v is a PI or PO } while ( U ) do while ( U ) do v = any node in U; U = U – {v}; v = any node in U; U = U – {v}; for each non-trivial cut c C v do for each non-trivial cut c C v do if ( l opt (v) == max[ l opt (u) - d+ u v | u d c] ) if ( l opt (v) == max[ l opt (u) - d+ u v | u d c] ) c best = c; c best = c; for each u d c best do for each u d c best do if ( u is not in S ) if ( u is not in S ) S = S {u}; U = U {u}; S = S {u}; U = U {u}; create an edge is S from u to v with d FFs; create an edge is S from u to v with d FFs; return S;
24
Performing Final Retiming Retime each node v with the following retiming lag: where l opt (v) is the optimal retiming value and where l opt (v) is the optimal retiming value and is the selected clock period is the selected clock period
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.