Download presentation
Presentation is loading. Please wait.
1
Delay-Optimal Technology Mapping by DAG Covering Yuji Kukimoto Robert K Brayton Prashant Sawkar Presented by Bret Victor, 4/5/00
2
Abstract An algorithm for minimal-delay library-based technology mappingAn algorithm for minimal-delay library-based technology mapping Subject graphs can be mapped directly as DAGs, without decomposition into treesSubject graphs can be mapped directly as DAGs, without decomposition into trees Algorithm is polynomial timeAlgorithm is polynomial time Experiments show that it worksExperiments show that it works
3
Outline Review of standard library-based tech mappingReview of standard library-based tech mapping FlowMap algorithm for FPGA tech mappingFlowMap algorithm for FPGA tech mapping Application of FlowMap to library-based designsApplication of FlowMap to library-based designs Extensions to the algorithmExtensions to the algorithm Experimental resultsExperimental results
4
Algorithmic tech mapping Decompose everything into NAND2s and invertersDecompose everything into NAND2s and inverters Decomposed circuit is the “subject graph”Decomposed circuit is the “subject graph” Decomposed library gates are the “pattern graphs”Decomposed library gates are the “pattern graphs” Cover subject graph using pattern graphsCover subject graph using pattern graphs Try to optimize somethingTry to optimize something
5
Tech mapping for area If subject graph is a DAG, optimizing for minimum area is NP-hardIf subject graph is a DAG, optimizing for minimum area is NP-hard But if subject and pattern graphs are trees, can cover optimally in linear timeBut if subject and pattern graphs are trees, can cover optimally in linear time So, decompose DAG into treesSo, decompose DAG into trees Cover each treeCover each tree Glue the results togetherGlue the results together
6
Tree decomposition Trees can’t have multiple-fanout nodes!Trees can’t have multiple-fanout nodes! Snip DAG at multiple-fanout nodes to form treesSnip DAG at multiple-fanout nodes to form trees or
7
Tech mapping for delay There is also an algorithm to optimize delay while tree mapping, in linear timeThere is also an algorithm to optimize delay while tree mapping, in linear time Further work included loading effects and buffer treesFurther work included loading effects and buffer trees But what about directly mapping DAGs for minimum delay?But what about directly mapping DAGs for minimum delay?
8
FPGA mapping Fundamental node in FPGAs is the lookup table (LUT)Fundamental node in FPGAs is the lookup table (LUT) LUT implements any function of up to k inputs (k depends on the FPGA technology)LUT implements any function of up to k inputs (k depends on the FPGA technology) FPGA mapping to minimize area is NP-hard for k > 3FPGA mapping to minimize area is NP-hard for k > 3 FPGA mapping to minimize delay can be solved in linear time using FlowMap algorithmFPGA mapping to minimize delay can be solved in linear time using FlowMap algorithm
9
FlowMap algorithm Maps circuit directly as DAG (no tree decomposition)Maps circuit directly as DAG (no tree decomposition) Two steps:Two steps: –Labeling Visit nodes in topological order (input to output)Visit nodes in topological order (input to output) Clump nodes into LUT such that delay is minimizedClump nodes into LUT such that delay is minimized Label node with best clump and best delayLabel node with best clump and best delay –Clumping Visit nodes in reverse topological order (output to input)Visit nodes in reverse topological order (output to input) Create LUT for each clump, and the clump’s faninsCreate LUT for each clump, and the clump’s fanins
10
FlowMap: Labeling Primary inputs are labeled “0” (because they are available at time t = 0)Primary inputs are labeled “0” (because they are available at time t = 0) At each intermediate node, investigate all cuts of size k and find the one that gives the best delay (smallest depth)At each intermediate node, investigate all cuts of size k and find the one that gives the best delay (smallest depth) Label node with best cut and its depthLabel node with best cut and its depth
11
FlowMap abcdefgh
12
FlowMap 12 3 4 5 9 6 7 8 10 abcdefgh Topological ordering of nodes
13
FlowMap 12 3 4 5 9 6 7 8 10 abcdefgh Labeling optimum depths, using k = 3 11 1 1 2 1 1 1 2 2
14
FlowMap 12 3 4 5 9 6 7 8 10 abcdefgh The shown cut has a maximum fanin depth of 1. So node 9 gets labeled with “2”. 11 1 11 1 1 2 2 2 1 0 1
15
FlowMap: Clumping Start at a primary output node and form a LUT to implement the cut that the node was labeled withStart at a primary output node and form a LUT to implement the cut that the node was labeled with Move to the cut’s fanin nodes, and use their labels to form LUTsMove to the cut’s fanin nodes, and use their labels to form LUTs Continue until you are down to primary inputsContinue until you are down to primary inputs Repeat for each primary outputRepeat for each primary output
16
FlowMap 12 3 4 5 9 6 7 8 10 abcdefgh Start at last node (node 10) and turn 3-input cut into 3-input LUT 11 1 1 2 1 1 1 2 2
17
FlowMap 12 3 4 5 9 6 7 8 10 abcdefgh Make another LUT for its fanin gate 11 1 1 2 1 1 1 2 2
18
FlowMap 12 3 4 5 9 6 7 8 10 abcdefgh Next primary output is node 9 11 1 1 2 1 1 1 2 2
19
FlowMap 12 3 4 5 9 6 7 8 10 abcdefgh And form the LUTs for its fanins. Note that nodes 6, 7, and 8 have been duplicated! 11 1 1 2 1 1 1 2 2
20
Back to library-based design Ideas behind FlowMap’s labeling procedure can be used for library-based minimum-delay DAG mapping tooIdeas behind FlowMap’s labeling procedure can be used for library-based minimum-delay DAG mapping too Two main changes to FlowMap:Two main changes to FlowMap: –instead of k-input cuts, look at library patterns that match at node –pin-to-pin delays of library patterns must be used instead of unit delay assumed for FPGA LUT
21
Pattern matching Three types of pattern matches:Three types of pattern matches: –Standard match: one-to-one mapping of pattern graph nodes into subject graph nodes –Exact match: standard match, except a subject node covered by an intermediate pattern node cannot fanout to other nodes not covered by the pattern –Extended match: standard match, except not necessarily one-to-one
22
Pattern matching Subject graph
23
Pattern matching x Subject graph Pattern graph This is a standard match, but not an exact match, because node x fans out to a node not covered by the pattern.
24
Pattern matching x Subject graph Pattern graph This is an extended match, because pattern nodes a and b can both be mapped to node x in the subject graph. ba
25
Pattern matching Conventional tree mapping requires exact matchesConventional tree mapping requires exact matches FlowMap-based DAG mapping can use either standard or extended matchesFlowMap-based DAG mapping can use either standard or extended matches
26
DAG library mapping Like FlowMap, step one is labelingLike FlowMap, step one is labeling Visit nodes in topological orderVisit nodes in topological order For each node, find best library gate to implement that node (gate that minimizes total delay to the node)For each node, find best library gate to implement that node (gate that minimizes total delay to the node) Label node with best gate and best delay informationLabel node with best gate and best delay information
27
DAG library mapping 12 3 4 5 9 6 7 8 10 abcdefgh W4W4W4W4 X3X3X3X3 Y6Y6Y6Y6 Z7Z7Z7Z7 Library: (name and delay)
28
DAG library mapping 12 3 4 5 9 6 7 8 10 abcdefgh W4W4W4W4 X3X3X3X3 Y6Y6Y6Y6 Z7Z7Z7Z7 Library: (name and delay) W4W4W4W4 X3X3X3X3 X3X3X3X3 W4W4W4W4 Y6Y6Y6Y6 Y 10 Y 10 or W 10 X7X7X7X7 W 11 Z 14 Z 14 Y 13 Y 13
29
DAG library mapping 12 3 4 5 9 6 7 8 10 abcdefgh W4W4W4W4 X3X3X3X3 Y6Y6Y6Y6 Z7Z7Z7Z7 Library: (name and delay) W4W4W4W4 X3X3X3X3 X3X3X3X3 W4W4W4W4 Y6Y6Y6Y6 X7X7X7X7 Z 14 Z 14 Y 13 Y 13 6 3 7 0
30
DAG library mapping Step two is committing nodes to library gatesStep two is committing nodes to library gates Choose a primary output, and implement the gate that it was labeled withChoose a primary output, and implement the gate that it was labeled with Go to the nodes that are the fanins of that gate, and implement their labeled gatesGo to the nodes that are the fanins of that gate, and implement their labeled gates Continue until down to primary inputsContinue until down to primary inputs Repeat for each primary outputRepeat for each primary output
31
DAG library mapping 12 3 4 5 9 6 7 8 10 abcdefgh W4W4W4W4 X3X3X3X3 Y6Y6Y6Y6 Z7Z7Z7Z7 Library: (name and delay) W4W4W4W4 X3X3X3X3 X3X3X3X3 W4W4W4W4 Y6Y6Y6Y6 X7X7X7X7 Z 14 Z 14 W 15 Y 10 Y 10 or W 10 W 11
32
DAG library mapping 12 3 4 5 9 6 7 8 10 abcdefgh W4W4W4W4 X3X3X3X3 Y6Y6Y6Y6 Z7Z7Z7Z7 Library: (name and delay) W4W4W4W4 X3X3X3X3 X3X3X3X3 W4W4W4W4 Y6Y6Y6Y6 X7X7X7X7 Z 14 Z 14 W 15 Y 10 Y 10 or W 10 W 11
33
DAG library mapping 12 3 4 5 9 6 7 8 10 abcdefgh W4W4W4W4 X3X3X3X3 Y6Y6Y6Y6 Z7Z7Z7Z7 Library: (name and delay) W4W4W4W4 X3X3X3X3 X3X3X3X3 W4W4W4W4 Y6Y6Y6Y6 X7X7X7X7 Z 14 Z 14 W 15 Y 10 Y 10 or W 10 W 11
34
DAG library mapping 12 3 4 5 9 6 7 8 10 abcdefgh W4W4W4W4 X3X3X3X3 Y6Y6Y6Y6 Z7Z7Z7Z7 Library: (name and delay) W4W4W4W4 X3X3X3X3 X3X3X3X3 W4W4W4W4 Y6Y6Y6Y6 X7X7X7X7 Z 14 Z 14 W 15 Y 10 Y 10 or W 10 W 11 Z Y X W X W W Note that node 8 is duplicated!
35
Complexity Finding all matches at a given node is O(p), where p is the number of nodes in the pattern graphsFinding all matches at a given node is O(p), where p is the number of nodes in the pattern graphs In step one, this is done once for every subject node, so step one is O(sp), where s is the number of subject nodesIn step one, this is done once for every subject node, so step one is O(sp), where s is the number of subject nodes Step two only visits subject nodes once: O(s)Step two only visits subject nodes once: O(s) Algorithm is O(sp)Algorithm is O(sp) p is a constant defined by the library, so algorithm is linear with respect to number of subject nodesp is a constant defined by the library, so algorithm is linear with respect to number of subject nodes
36
Comparison: DAG and tree mapping Subject graphs with multiple-fanout nodes must be snipped before tree mapping and reglued afterward. Thus, multiple-fanout points in the subject graph are completely preserved in the final mapping.Subject graphs with multiple-fanout nodes must be snipped before tree mapping and reglued afterward. Thus, multiple-fanout points in the subject graph are completely preserved in the final mapping. DAG mapping does not mind multiple-fanout nodes, and can map across them in order to optimize the delay.DAG mapping does not mind multiple-fanout nodes, and can map across them in order to optimize the delay.
37
Comparison: DAG and tree mapping In tree mapping, pattern nodes and subject nodes match one-to-one. No duplication is allowed.In tree mapping, pattern nodes and subject nodes match one-to-one. No duplication is allowed. In DAG mapping, subject nodes may be duplicated in order to take advantage of fancy library patterns and minimize delay.In DAG mapping, subject nodes may be duplicated in order to take advantage of fancy library patterns and minimize delay.
38
Extensions Lehman-Watanabe mapping (mapping graphs, choice nodes, etc.) is compatible with this DAG mapping technique and can be used in conjuction with itLehman-Watanabe mapping (mapping graphs, choice nodes, etc.) is compatible with this DAG mapping technique and can be used in conjuction with it A FlowMap-like method developed for finding the minimum cycle time for an FPGA design can be easily adapted for use on library- based designsA FlowMap-like method developed for finding the minimum cycle time for an FPGA design can be easily adapted for use on library- based designs
39
Experiment SIS’s technology mapper modified to do delay-optimal mapping using DAG coveringSIS’s technology mapper modified to do delay-optimal mapping using DAG covering Standard matches (not extended matches) were usedStandard matches (not extended matches) were used
40
Experiment Results with a standard sized library ( lib2.genlib )
41
Experiment Results with a tiny sized library ( 44-1.genlib ) with 7 gates
42
Experiment Results with a big “rich” library ( 44-3.genlib ) with 625 gates
43
Conclusion Delay-optimal library-based technology mapping of DAGs can be solved in linear timeDelay-optimal library-based technology mapping of DAGs can be solved in linear time Experiments confirm that using DAGs instead of trees gives a significant performance improvementExperiments confirm that using DAGs instead of trees gives a significant performance improvement
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.