Improving Runtime and Memory Requirements in EDA Applications

Slides:



Advertisements
Similar presentations
Hybrid BDD and All-SAT Method for Model Checking Orna Grumberg Joint work with Assaf Schuster and Avi Yadgar Technion – Israel Institute of Technology.
Advertisements

Symbol Table.
Introduction to Logic Synthesis Alan Mishchenko UC Berkeley.
DAG-Aware AIG Rewriting Alan Mishchenko, Satrajit Chatterjee, Robert Brayton Department of EECS, University of California Berkeley Presented by Rozana.
CAD Algorithms and Tools. Overview Introduction Multi-level logic synthesis SIS as a representative CAD tool Boolean networks Transformations of Boolean.
Identifying Reversible Functions From an ROBDD Adam MacDonald.
Enhancing and Integrating Model Checking Engines Robert Brayton Alan Mishchenko UC Berkeley June 15, 2009.
Good Programming Practices for Building Less Memory-Intensive EDA Applications Alan Mishchenko University of California, Berkeley.
Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.
Logic Synthesis: Past and Future Alan Mishchenko UC Berkeley.
1 Stephen Jang Kevin Chung Xilinx Inc. Alan Mishchenko Robert Brayton UC Berkeley Power Optimization Toolbox for Logic Synthesis and Mapping.
1 Alan Mishchenko Research Update June-September 2008.
A Semi-Canonical Form for Sequential Circuits Alan Mishchenko Niklas Een Robert Brayton UC Berkeley Michael Case Pankaj Chauhan Nikhil Sharma Calypto Design.
Enhancing Model Checking Engines for Multi-Output Problem Solving Alan Mishchenko Robert Brayton Berkeley Verification and Synthesis Research Center Department.
Linked Lists Source: presentation based on notes written by R.Kay, A. Hill and C.Noble ● Lists in general ● Lists indexed using pointer arrays ● Singly.
Improving Runtime and Memory Requirements in EDA Applications Alan Mishchenko UC Berkeley.
Sections 10.5 – 10.6 Hashing.
Hybrid BDD and All-SAT Method for Model Checking
Module 11: File Structure
Synthesis for Verification
Power Optimization Toolbox for Logic Synthesis and Mapping
Alan Mishchenko UC Berkeley
Mapping into LUT Structures
Delay Optimization using SOP Balancing
Review Graph Directed Graph Undirected Graph Sub-Graph
Database Performance Tuning and Query Optimization
Faster Logic Manipulation for Large Designs
Robert Brayton Alan Mishchenko Niklas Een
New Directions in the Development of ABC
Introduction to Logic Synthesis with ABC
Magic An Industrial-Strength Logic Optimization, Technology Mapping, and Formal Verification System Alan Mishchenko UC Berkeley.
Simple Circuit-Based SAT Solver
A Semi-Canonical Form for Sequential AIGs
Alan Mishchenko University of California, Berkeley
Applying Logic Synthesis for Speeding Up SAT
Integrating an AIG Package, Simulator, and SAT Solver
A Boolean Paradigm in Multi-Valued Logic Synthesis
LPSAT: A Unified Approach to RTL Satisfiability
Standard-Cell Mapping Revisited
Faster Logic Manipulation for Large Designs
Introduction to Formal Verification
Fast Computation of Symmetries in Boolean Functions Alan Mishchenko
SAT-Based Area Recovery in Technology Mapping
Alan Mishchenko University of California, Berkeley
Canonical Computation without Canonical Data Structure
SAT-Based Optimization with Don’t-Cares Revisited
Canonical Computation Without Canonical Data Structure
Scalable and Scalably-Verifiable Sequential Synthesis
Improvements to Combinational Equivalence Checking
File Storage and Indexing
Alan Mishchenko UC Berkeley (With many thanks to Donald Knuth,
Integrating an AIG Package, Simulator, and SAT Solver
Introduction to Logic Synthesis
Canonical Computation without Canonical Data Structure
Introduction to Data Structure
Chapter 11 Database Performance Tuning and Query Optimization
Recording Synthesis History for Sequential Verification
Delay Optimization using SOP Balancing
Logic Synthesis: Past and Future
Canonical Computation without Canonical Data Structure
Innovative Sequential Synthesis and Verification
Robert Brayton Alan Mishchenko Niklas Een
SAT-based Methods: Logic Synthesis and Technology Mapping
Fast Min-Register Retiming Through Binary Max-Flow
Introduction to Logic Synthesis with ABC
Robert Brayton Alan Mishchenko Niklas Een
Alan Mishchenko Department of EECS UC Berkeley
Integrating AIG Package, Simulator, and SAT Solver
Improving Runtime and Memory Requirements in EDA Applications
Presentation transcript:

Improving Runtime and Memory Requirements in EDA Applications Alan Mishchenko UC Berkeley

Overview Introduction Topics Conclusion Network traversal AIG package SAT solving Memory management Locality of computation Conclusion

Network Traversal Optimizing node memory for DFS traversal Storing fanins/fanouts in the node Using traversal IDs Using wave-front traversals Minimizing memory footprint

Memory Alloc In Topological Order Optimize node memory for DFS traversal Allocate memory from an array in a DFS order Primary outputs 8 3 7 1 6 2 5 4 Primary inputs

Store Fanins/Fanouts in the Node Embed the dynamic array into the node Leads to direct pointing or storing integer IDs of the fanin/fanouts In rare cases when memory reallocation is needed (<0.1% of nodes), use a new piece of memory to store extended array of fanins/fanouts struct Nwk_Obj_t_ { … int nFanins; // the number of fanins int nFanouts; // the number of fanouts int nFanioAlloc; // the number of allocated fanins/fanouts Nwk_Obj_t ** pFanio; // fanins/fanouts }; pObj = (Nwk_Obj_t *)Aig_MmFlexEntryFetch( sizeof(Nwk_Obj_t) + sizeof(Nwk_Obj_t *) * (nFanins + nFanouts + p->nFanioPlus) ); pObj->pFanio = (Nwk_Obj_t **)((char *)pObj + sizeof(Nwk_Obj_t));

Traversal ID Use a specialized integer data-member of the node to remember the number of the last traversal that visited this node void Nwk_ManDfs_rec( Nwk_Man_t * p, Nwk_Obj_t * pObj, Vec_Ptr_t * vNodes ) { if ( Nwk_ObjIsTravIdCurrent(p, pObj) ) return; Nwk_ObjSetTravIdCurrent(p, pObj); Nwk_ManDfs_rec( p, Nwk_ObjFanin0(pObj), vNodes ); Nwk_ManDfs_rec( p, Nwk_ObjFanin1(pObj), vNodes ); Vec_PtrPush( vNodes, pObj ); } Vec_Ptr_t * Nwk_ManDfs( Nwk_Man_t * p ) Vec_Ptr_t * vNodes; Nwk_Obj_t * pObj; int i; Nwk_ManIncrementTravId( p ); vNodes = Vec_PtrAlloc(); Nwk_ManForEachPo( p, pObj, i ) Nwk_ManDfs_rec( p, pObj, vNodes ); return vNodes;

Wave-Front Traversals Some applications use additional memory at each node Examples: Simulation, cut enumeration, support computation 1K per node for 1M nodes = 1Gb of additional memory! Case study: Computing input supports of each output of the network Used, for example, to compute (a) output partitioning, (b) register dependency matrix (A. Dasdan et al, “An experimental study of minimum mean cycle algorithms”, 1998) Code: procedure Aig_ManSupports() in file “abc\src\aig\aig\aigPart.c” Wave-front Wave-front Wave-front At any time during traversal, a wave-front is the set of nodes such that: all fanins are already visited and at least one fanout is not yet visited. Additional memory is only needed for the nodes on the wave-front. For most industrial designs, wave-front is about 1% of all nodes (1Gb  10Mb).

Minimizing Memory Footprint When repeatedly traversing a large network, runtime is determined by memory pumped through the CPU (pointer chasing) Examples when repeated traversal cannot be avoided Sequential simulation of a network for many cycles Computing maximum-network flow during retiming, etc In such applications, it is better to develop a specialized, static, low-memory representation of the network Reducing memory 2x may improve runtime 3-5x Example: Most-forward retiming (code in “abc\src\aig\aig\aigRet.c”) If repeated topological and reverse topological traversals are performed, it may be better to have two networks, each having memory allocated to facilitate each traversal order

Implementation of AIG Package Fixed amount of memory for each AIG node Arbitrary fanout also uses fixed amount of memory per node! Different memory configurations Structural hashing The only potentially non-cache-friendly operation Tricks to speed up structural hashing AIGER: Compact binary AIG representation format Work of Armin Biere (Johannes Kepler University, Linz, Austria) Available at http://fmv.jku.at/aiger

AIG Node ABC has several AIG packages 24 bytes (32b) / 40 bytes (64b) A low-memory package is used to represent local functions after mapping A more elaborate package is used for general AIG manipulation 24 bytes (32b) / 40 bytes (64b) struct Hop_Obj_t_ { Hop_Obj_t * pNext; // strashing table Hop_Obj_t * pFanin0; // fanin Hop_Obj_t * pFanin1; // fanin void * pData; // misc unsigned int Type : 3; // object type unsigned int fPhase : 1; // value under 00...0 unsigned int fMarkA : 1; // multipurpose mask unsigned int fMarkB : 1; // multipurpose mask unsigned int nRefs : 26; // reference counter int Id; // unique ID }; 36 bytes (32b) / 56 bytes (64b) struct Aig_Obj_t_ { Aig_Obj_t * pNext; // strashing table Aig_Obj_t * pFanin0; // fanin Aig_Obj_t * pFanin1; // fanin Aig_Obj_t * pHaig; // pointer to the HAIG node unsigned int Type : 3; // object type unsigned int fPhase : 1; // value under 00...0 pattern unsigned int fMarkA : 1; // multipurpose mask unsigned int fMarkB : 1; // multipurpose mask unsigned int nRefs : 26; // reference count unsigned Level : 24; // the topological level unsigned nCuts : 8; // the number of cuts int Id; // unique ID int TravId; // ID of the last traversal union { // temporary storage void * pData; int iData; float fData; }; Open question: How to store fanins of the node, as pointers or as integer IDs?

Fixed-Memory Fanout for AIGs Solution (due to Satrajit Chatterjee): Use 5 pointers (integers) for each node One pointer (integer) contains the first fanout of the node Other pointers (integers) are used to create two double-linked linked lists Each list stores fanout representation of the corresponding fanin Double-linked lists allow for constant-time addition/removal of node fanouts Code in file “abc\src\aig\aig\aigFanout.c” a b c NULL NULL n n n n node first fanout } fanouts of the first fanin } fanouts of the second fanin fanins

Structural Hashing The only potentially non-cache-friendly AIG operation Structural hashing is very valuable – but cannot avoid hashing The standard hash-table is used, with nodes having the same hash key being linked into single-linked lists The pointer to the next node is embedded in the AIG node Tried the linear-probing hash-table without improvement Trick to sometimes avoid hash-table look-up When building a new node, do not look it up in the table if at least one of its fanins has reference counter 0

AIGER Uses 3 bytes per AIG node, on average 1M node AIG can be written into a 3Mb file ~12x more compact than Verilog, BLIF, or BENCH ~5x faster reading/writing for large files Key observations used by AIGER To represent a node, two integers (fanin literals) need to be represented The fanin literals are often numerically close Only the difference between them can be stored, which typically takes only one byte

SAT Solving A modern SAT solver (in particular, MiniSAT) is a treasure-trove of tricks for efficient implementation To mentions just a few Representing clauses as arrays of integers Using signatures to check clause containment Using two-literal watching scheme An idea for ~30% faster BCP: For watched lists, use single-linked lists instead of dynamic arrays Embed the list pointers into clauses

Custom Memory Management Three types of memory managers in ABC Fixed-size Allocates/recycles entries of a fixed size Used for AIG nodes Flexible-size Allocates (but does not recycle) entries of variable size Used for signal names Step-size Steps are degrees of 2 (4-8-16-32-etc) in bytes Use for CNF clauses in the customized version of MiniSat Code in package “abc\src\aig\mem”

Locality of Computation To improve speed Use less memory Make transformations local Use contiguous data-structures Case study: BDDs vs. truth tables (TTs) In the past: “BDDs are present-day truth tables” These days: “Truth tables are present-day BDDs” Advantages of TTs Computation is more local Memory usage is predictable For functions up to 16 vars, TTs lead to faster computation ISOP, DSD, matching, decomposition, etc Limitations of TTs Does not work for more than 16 variables Some operations are faster using BDDs, even for functions with 10 variables E.g. cofactor satisfy counting

Conclusion Lessons learned while developing ABC Topics considered Network traversal AIG representation SAT solving Memory management Locality of computation Locality of computation is important Allows for efficient control of the resources Leads to scalability and parallelism