Download presentation
Presentation is loading. Please wait.
Published byFilip Christoffersen Modified over 5 years ago
1
Reinventing The Wheel: Developing a New Standard-Cell Synthesis Flow
Alan Mishchenko Niklas Een Hamid Savoj Robert Brayton University of California, Berkeley
2
Outline Motivation The flow Experimental results Conclusion
Technology-independent synthesis Technology mapping Buffering Sizing Experimental results Conclusion
3
Motivation Synthesis tools are out there, but they are slow suboptimal
complicated expensive
4
ABC It is a public-domain tool developed by our research group since 2005 It addresses both synthesis and verification of synchronous hardware It is based on years of experience in developing efficient data-structures and algorithms It is used in industry and academia For more information, visit
5
The Flow Technology-independent synthesis Technology mapping Buffering
Sizing These steps are not disconnected; they overlap Synthesis talks to mapping through structural choices Mapping talks to buffering through fanout estimations Buffer and sizing can be interleaved
6
Synthesis: Old and New “AIG rewriting” Delay/area costs Restructuring
AND2 levels/nodes Restructuring for all 4-input cuts, try all AIG subgraphs, choose the one with the min nodes under delay constraint Results Acceptable quality Acceptable runtime Problems “Over-re-structuring” Slow for large, deep logic “AIG reshaping” Delay/area cost user-specified cost for n-input AND/XOR/MUX/MAJ Restructuring iterate “mapping” and “unmapping” several times Results Comparable quality 3-10 faster Problems None so far
7
Mapping: Old and New “Traditional” cut-based mapping
iterate over the subject graph re-compute priority cuts use structural or functional matching (ICCAD’97) For standard-cell mapping use a gain-based library map both (pos and neg) phase of each node into gates select best cuts (gates) Results Acceptable quality Tolerable runtime “Improved” cut-based mapping pre-compute priority cuts iterate over the subject graph evaluate cuts using different costs use structural or functional matching For standard-cell mapping use a gain-based library map into NPN classes of functions from the library select best cuts (NPN classes) perform phase-assignment and determine gates during buffering Results Quality not known yet Runtime is expected 3-10x faster
8
Buffering: Old and New Several ideas tried, none is a clear winner
Enumerating buffer tree topologies Buffering for near-continuous libraries Other incremental local fanout optimization methods Several ideas tried, none is a clear winner “Technology-independent” buffering after the gain-based library Buffer-tree construction given required times and loads of the fanouts Incremental buffering interleaved with incremental sizing Results are mixed
9
Incremental Buffering Illustrated
Growing Bypassing
10
Sizing: Old and New Non-linear programming Linear programming
Lagrangian multipliers Incremental sizing find critical region find best gates to resize perform the resizing incrementally update timing Iterate until no improvement Can be combined with incremental buffering Results Reasonable Surprisingly fast If an optimum solution is known, seems to converge to it
11
Commands of The Flow read_lib write_lib print_lib read_scl write_scl
dump_genlib print_gs stime buffer unbuffer minsize maxsize upsize dnsize print_buf read_constr print_constr reset_constr
12
Experimental Setting 19 OpenCore designs were synthesized and mapped by an industrial tool using public library vsclib013.lib from Delay, area, and runtime were collected and used as a reference Sizing was tested by applying min-sizing, followed by re-sizing Buffering was tested by un-buffering and min-sizing, followed by re-buffering and re-sizing The flow was tested by restructuring the design, followed by mapping, buffering, and sizing
13
Experimental Results
14
Comments on The Table Column “Gate” shows the number of gates produced by the industrial tool Other columns “Gate” show the percentage of change in the number of gates after reach transform, compared to the result produced by the industrial too. Positive is improvement. Negative is degradation. Similarly, columns “Area” and “Delay” show the percentage of change in area and delay, respectively. The flows are tuned differently This is why the area increase after buffering/sizing is more than after synthesis/buffering/sizing. Runtimes are in seconds on an old desktop computer On a new computer, the runtimes are expected to be 2x smaller
15
Potential Issues Not specifying input driving cells and output loads
This was addressed and experiments show it is fine Over-tuning for one particular library Not sure heuristics will hold for submicron libraries Not looking at power Not taking high and low Vt cells into account Not mapping into multi-output cells Not mapping sequential elements Not considering multiple clock domains
16
Conclusion A new synthesis flow is being developed and implemented in ABC An opportunity to rethink some of the classical problems improve on some of the known solutions come up with a new public implementation Results are encouraging delay (in delay-oriented synthesis) is within 5-15% area (in area-oriented synthesis) is within 1-3% runtime is about 20-50x better
17
Abstract This presentation focuses on adding new capabilities to synthesize standard cell designs in the public-domain synthesis/verification tool ABC. An optimization flow has been developed, which included gain-based technology mapping, fanout-optimization by buffering and gate duplication, and gate-sizing. Novel heuristic algorithms have been proposed for several well-known optimization steps. For example, buffer tree construction can be performed not as a separate step, but concurrently with gate-sizing by reshaping initial well-balanced buffer trees. Each tree reshaping and each gate resizing transform are evaluated for delay/area improvement using a common cost-function and the most promising one is selected. The delay is measured by lookup table based delay model, which computes the delay of a gate from its input flew and output capacitance. Experiments show that the flow produces results that are 10% within those of industrial tools 20x faster.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.