Standard-Cell Mapping Revisited Alan Mishchenko Robert Brayton (with special thanks to Satrajit Chatterjee and Niklas Een) Department of EECS UC Berkeley
Overview Introduction Technical part Wrap-up Why mapping is a fundamental problem History of technology mapping in ABC Motivation to continue working on mapping Technical part How one new idea led to three new mappers in one year Several other ideas Our most recent work Wrap-up Preliminary experimental results Conclusions and future work 2
Pros and Cons of Load-Independent Delay Model Approximations are inevitable An approximation is “proper” when it allows us to simplify a problem without missing essential points In our experience, load-independent delay model is a “proper” approximation It simplifies mappers, allows them to scale, and leads to next stages when more accurate models are used Gain-based approach, itself an approximation, enables this delay model From now on, we use load-independent model
Delay Optimality Computing best arrival time at each node in a topological order from inputs to outputs ensures that the earliest possible arrival time at the outputs is found
Area Recovery Two complementary heuristics are traditionally used for area recovery Global view heuristic (area flow) Combines area of a cone and fanout count by computing an “average” area per fanout Local view heuristic (exact area) Provides a detailed view of each gate and allows mapping to be locally optimized (These are somewhat similar to global and detailed placement)
Mapper Pseudo-Code Pre-compute functions implementable using the library Currently, we only look at single gates (no “super-gates”) Enumerate cuts for the subject graph In practice, we enumerate all K-feasible cuts but store only those that have matches with the library Iterate over the subject graph Forward passes First pass computes best delay Next few passes minimize area-flow under delay constraints Next few passes minimize exact-area under delay constraints Backward passes First backward pass produces a legal mapping to be incrementally improved All backward passes compute required times Write out the mapped network
New Idea Used Keep not one, but two cuts at each node A delay-oriented cut Guarantees that the node can meet required times An area-oriented cut Allows for area optimization to kick in, if possible How it impacts the implementation Different procedure to assign matches Different way of computing required times Different procedure to produce a legal mapping Consequences The QoR improves The implementation is more complex but not prohibitively so
Impact on Area Recovery Area flow computation remains roughly the same Exact area computation took some time to implement correctly and efficiently The main difficulty was Using exact area in the forward pass is prohibitive in terms of runtime Using exact area in the backward pass requires clever way of propagating required times After several failures, an efficient implementation was found
Several Other Ideas Used Integers instead of floating-point numbers can be used to represent timing information Makes implementation platform-independent Cuts with matches do not have to be recomputed by the mapper in each round We precompute and store them to reduce runtime Area recovery using the exact-area heuristic can be efficiently performed in a reverse topo order Reduces runtime while keeping the same quality
Comparison of ABC Mappers (delay, before synthesis)
Comparison of ABC Mappers (delay, after synthesis)
Comparison of ABC Mappers (area, before and after synthesis)
Conclusions Introduced technology mapping Elaborated on one interesting idea Reviewed current results and future work
Future Work Small ideas Big ideas Skip dominated matches (2x less runtime) Select better alternative cuts during area recovery (area 1-2% better) Big ideas Use load-dependent delay model Combine mapping with buffering and sizing
Abstract Technology mapping is one of the fundamental problems, along with such problems as circuit restructuring and SAT solving. One flavor of technology mapping looks into minimizing both area and delay, or rather area under delay constraints. A new approach to building delay/area-aware mappers for standard cells and FPGAs was recently proposed. The main idea of this approach is to store two cuts at each node (delay-oriented and area-oriented) rather than one cut, as in much of previous work. This presentation surveys our experience developing a standard-cell mapper based on these ideas