A Robust Algorithm for Approximate Compatible Observability Don’t Care (CODC) Computation Nikhil S. Saluja University of Colorado Boulder, CO Sunil P. Khatri Texas A&M University, College Station, TX
Outline Motivation Computation of Don’t Cares ACODC Algorithm Proof of correctness Experimental Results Possible extensions Conclusions
Motivation … ….. z1z1 z2z2 z3z3 zpzp x1x1 x2x2 x3x3 xnxn y j = F j y1y1 y2y2 ywyw Technology independent logic optimization Typically compute Don’t Cares after a higher level description of a design is encoded and translated into gate level description. Don’t Cares (DCs) eXternal Don’t Cares (XDCs) Satisfiability Don’t Cares (SDCs) Observability Don’t Cares (ODCs)
Motivation - 2 The DCs computed are a function of the PIs and internal variables of the Boolean network Image computation used to express the DCs in terms of node fanins ROBDD based operation Finally, the node function is minimized (using ESPRESSO) with respect to the computed (local) DCs Literal count reduction is the figure of merit
Don’t Cares ODC based Very powerful, represent maximum flexibility Minimizing a node j with respect to its ODC requires recomputation of other nodes’ ODCs Compatible ODC (CODC) based Subset of ODC, requires ordering of fanins Recomputation not required, useful in many cases In either case, image computation required To obtain DCs in the fanin support of the node Involves ROBDD computation Not robust
Note that is the consensus operator The first fanin has which is the maximum flexibility A new edge e ik should have its CODC as the conjunction of with the condition that other inputs j < i are not insensitive to input y j ( ) or are independent of y j ( ) CODC Computation Traverse circuit in reverse topological order CODC of primary output z initialized to its XDC Computation performed in 2 phases for each node Phase 1 ykyk fkfk y1y1 y2y2 y i-1 yiyi y 1 < y 2 < … < y i
CODC Computation Phase 2 - image computation using ROBDDs Build global BDDs of each node in the network, including POs For large circuits this step fails This is the main weakness of the CODC computation Next compute CODCs of node k in terms of PIs Substitute each internal node literal by its global BDD Compute image of this function in the space of local fanins of node k Yields CODC in terms of local fanins of node k Finally, call ESPRESSO on the cover of node k, with the newly computed CODC as don’t care
Contributions of this Work Perform CODC based Don’t Care computation approximately Yields 25X speedup Yields 33X reduction in memory utilization Obtains 80% of the literal reduction of the full CODC computation Handles large circuits extremely fast (circuits which CODC based computation times out on) Formal proof of correctness of the approximate CODC technique
Approximate CODCs Consider a sub-network rooted at the node j of interest Sub-network can have user defined topological depth k Compute the CODC of j in the sub-network (called ACODC) This ACODC is a subset of the CODC of j jjjj j
Algorithm Traverse η in reverse topological order for (each node j in network η) do η j = extract_subnetwork(j,k) ACODC(j) = compute_acodc(η j,j) optimize(j,ACODC(j)) end for
Proof of Correctness Terminology Boolean network ηxz X primary inputs Z primary outputs W and V are two cuts ηxw, ηvz and ηvw define sub-networks is the CODC of y k where P is either X or V and Q is either W or Z is the CODC of y k mapped back to its fanin support after image computation vw x z y1y1 y2y2 y i-1 yiyi ykyk fkfk
Cutset as Primary Output To show ≥ For any PO z, = ø For, ≠ ø For W nodes as POs, = ø CODC computation of y k is identical for both cases except last term in equation In general, the last term for a node in first case, contains last term for same node in latter case since ≥ Hence ≥ w x z ykyk fkfk y1y1 y2y2 y i-1 yiyi
Cutset as Primary Input Define To compute ACODC at y k, compute, then compute image I 1 of this on the V space, and then project the result back to local fanins of y k The full CODC is.We then compute the image I 2 of this on the X space, and next project the result back to local fanins of y k I 3 is projection of I 2 on V Hence Therefore I 3 ≥ I 1 Finally, ≥ v x z ykyk fkfk y1y1 y2y2 y i-1 yiyi I1I1 I2I2 I3I3
Cutsets as Primary Input and Primary Output This result follows directly from the previous two proofs as they are orthogonal Hence ≤ w x z ykyk fkfk y1y1 y2y2 y i-1 yiyi v Therefore, an ACODC computation which utilizes a sub- network of depth k rooted at any node yields a subset of the full CODC of the node. This proves the correctness of our method.
Experimental Results Implemented in SIS Used mcnc91 and itc99 benchmark circuits Run on IBM IntelliStation (1.7 GHz Pentium-4 with 1 GB RAM) running Linux Our algorithm is built as a replacement to full_simplify Read design and run ACODC algorithm followed by sweep Compare our method by running full_simplify followed by sweep
Metrics for Comparison 3 measures of effectiveness for comparison with full_simplify Effectiveness #1 compares the ratio of the number of minterms computed by our technique compared to that for full_simplify Effectiveness #2 compares the number of nodes for which ACODCs and CODCs are identical We also compare the literal count reduction obtained by both techniques
Effectiveness Results CircuitEff1 (k=4)Eff1 (k=6)Eff2 (k=4)Eff2 (k=6)Lits-originalLits % (fs)Lits % (k=4)Lits % (k=6) C C C C C C C dalu i b01_C b03_C b04_C b05_C b06_C b07_C b08_C b09_C b10_C b11_C b12_C b13_C AVG Literal reduction about 80% of full_simplify Very little improvement from k=4 to k=6
Runtime is about 25X better than full_simplify Memory utilization is about 33X better than full_simplify Runtime and Memory Results CircuitTime (fs)Time % (k=4)Time % (k=6) C C C C C C C dalu i b01_C b03_C b04_C b05_C b06_C0.04 b07_C b08_C0.30 b09_C b10_C b11_C b12_C b13_C AVG Mem (fs)Mem (k=4)Mem (k=6)
Circuit#Nodes#Literal s Node%Lit%Time(s)Mem C C b b14_ b b20_ b b21_ AVG Results for Large Circuits full_simplify did not complete for all the examples below k = 4 for these experiments Maximum runtime < 2 minutes Peak memory utilization < 106K BDD nodes
Possible Extensions Can compute AODCs in a similar fashion Yields more flexibility at a node However, each node must be minimized after its AODC computation Compatibility not maintained Useful if only node minimization is desired Compatibility is useful if the nodes are to be optimized simultaneously at a later stage Proof of correctness is similar
Conclusions Presented a robust technique for ACODC computation Dynamic extraction of sub-networks to compute CODCs ACODCs computed exactly once for a node 19% reduction in node count and 9.5% reduction in literal count (large circuits) 23% reduction in literal count as compared to 28.5% for full_simplify (medium circuits) 25X better run-time than full_simplify 33X better memory utilization than full_simplify