Download presentation
Presentation is loading. Please wait.
1
A Robust Algorithm for Approximate Compatible Observability Don’t Care (CODC) Computation Nikhil S. Saluja University of Colorado Boulder, CO Sunil P. Khatri Texas A&M University, College Station, TX
2
Outline Motivation Computation of Don’t Cares ACODC Algorithm Proof of correctness Experimental Results Possible extensions Conclusions
3
Motivation … ….. z1z1 z2z2 z3z3 zpzp x1x1 x2x2 x3x3 xnxn y j = F j y1y1 y2y2 ywyw Technology independent logic optimization Typically compute Don’t Cares after a higher level description of a design is encoded and translated into gate level description. Don’t Cares (DCs) eXternal Don’t Cares (XDCs) Satisfiability Don’t Cares (SDCs) Observability Don’t Cares (ODCs)
4
Motivation - 2 The DCs computed are a function of the PIs and internal variables of the Boolean network Image computation used to express the DCs in terms of node fanins ROBDD based operation Finally, the node function is minimized (using ESPRESSO) with respect to the computed (local) DCs Literal count reduction is the figure of merit
5
Don’t Cares ODC based Very powerful, represent maximum flexibility Minimizing a node j with respect to its ODC requires recomputation of other nodes’ ODCs Compatible ODC (CODC) based Subset of ODC, requires ordering of fanins Recomputation not required, useful in many cases In either case, image computation required To obtain DCs in the fanin support of the node Involves ROBDD computation Not robust
6
Note that is the consensus operator The first fanin has which is the maximum flexibility A new edge e ik should have its CODC as the conjunction of with the condition that other inputs j < i are not insensitive to input y j ( ) or are independent of y j ( ) CODC Computation Traverse circuit in reverse topological order CODC of primary output z initialized to its XDC Computation performed in 2 phases for each node Phase 1 ykyk fkfk y1y1 y2y2 y i-1 yiyi y 1 < y 2 < … < y i
7
CODC Computation Phase 2 - image computation using ROBDDs Build global BDDs of each node in the network, including POs For large circuits this step fails This is the main weakness of the CODC computation Next compute CODCs of node k in terms of PIs Substitute each internal node literal by its global BDD Compute image of this function in the space of local fanins of node k Yields CODC in terms of local fanins of node k Finally, call ESPRESSO on the cover of node k, with the newly computed CODC as don’t care
8
Contributions of this Work Perform CODC based Don’t Care computation approximately Yields 25X speedup Yields 33X reduction in memory utilization Obtains 80% of the literal reduction of the full CODC computation Handles large circuits extremely fast (circuits which CODC based computation times out on) Formal proof of correctness of the approximate CODC technique
9
Approximate CODCs Consider a sub-network rooted at the node j of interest Sub-network can have user defined topological depth k Compute the CODC of j in the sub-network (called ACODC) This ACODC is a subset of the CODC of j jjjj j
10
Algorithm Traverse η in reverse topological order for (each node j in network η) do η j = extract_subnetwork(j,k) ACODC(j) = compute_acodc(η j,j) optimize(j,ACODC(j)) end for
11
Proof of Correctness Terminology Boolean network ηxz X primary inputs Z primary outputs W and V are two cuts ηxw, ηvz and ηvw define sub-networks is the CODC of y k where P is either X or V and Q is either W or Z is the CODC of y k mapped back to its fanin support after image computation vw x z y1y1 y2y2 y i-1 yiyi ykyk fkfk
12
Cutset as Primary Output To show ≥ For any PO z, = ø For, ≠ ø For W nodes as POs, = ø CODC computation of y k is identical for both cases except last term in equation In general, the last term for a node in first case, contains last term for same node in latter case since ≥ Hence ≥ w x z ykyk fkfk y1y1 y2y2 y i-1 yiyi
13
Cutset as Primary Input Define To compute ACODC at y k, compute, then compute image I 1 of this on the V space, and then project the result back to local fanins of y k The full CODC is.We then compute the image I 2 of this on the X space, and next project the result back to local fanins of y k I 3 is projection of I 2 on V Hence Therefore I 3 ≥ I 1 Finally, ≥ v x z ykyk fkfk y1y1 y2y2 y i-1 yiyi I1I1 I2I2 I3I3
14
Cutsets as Primary Input and Primary Output This result follows directly from the previous two proofs as they are orthogonal Hence ≤ w x z ykyk fkfk y1y1 y2y2 y i-1 yiyi v Therefore, an ACODC computation which utilizes a sub- network of depth k rooted at any node yields a subset of the full CODC of the node. This proves the correctness of our method.
15
Experimental Results Implemented in SIS Used mcnc91 and itc99 benchmark circuits Run on IBM IntelliStation (1.7 GHz Pentium-4 with 1 GB RAM) running Linux Our algorithm is built as a replacement to full_simplify Read design and run ACODC algorithm followed by sweep Compare our method by running full_simplify followed by sweep
16
Metrics for Comparison 3 measures of effectiveness for comparison with full_simplify Effectiveness #1 compares the ratio of the number of minterms computed by our technique compared to that for full_simplify Effectiveness #2 compares the number of nodes for which ACODCs and CODCs are identical We also compare the literal count reduction obtained by both techniques
17
Effectiveness Results CircuitEff1 (k=4)Eff1 (k=6)Eff2 (k=4)Eff2 (k=6)Lits-originalLits % (fs)Lits % (k=4)Lits % (k=6) C135598.04 98.34 10324.653.88 C190881.5684.6987.1388.89149737.1430.6631.46 C267094.13 86.79 204339.3032.94 C43271.43 92.81 37219.899.95 C49998.56 97.34 6167.796.50 C88080.0084.4494.5695.7770311.109.6710.38 C354085.4397.8184.1597.51293433.7826.8928.42 dalu78.0079.8675.7879.55358839.689.70 i1099.34 85.45 537629.5527.47 b01_C92.68 83.33 8045.0043.75 b03_C68.8975.5687.2389.4325460.0039.3741.34 b04_C63.42 85.35 126731.9628.65 b05_C74.7084.8576.3888.02185845.8014.96 b06_C92.11 87.10 8351.8045.78 b07_C69.5281.9091.0295.2174911.8811.08 b08_C98.33 96.09 3069.809.48 b09_C79.0095.0083.0490.1827761.0044.0445.85 b10_C80.6583.8792.9094.1935312.3911.05 b11_C83.4485.9489.5091.65137822.7114.36 b12_C67.1079.0487.1790.92196724.055.80 b13_C65.08 91.53 55818.8110.57 AVG81.9785.5287.8589.52-28.3622.3422.82 Literal reduction about 80% of full_simplify Very little improvement from k=4 to k=6
18
Runtime is about 25X better than full_simplify Memory utilization is about 33X better than full_simplify Runtime and Memory Results CircuitTime (fs)Time % (k=4)Time % (k=6) C135539.281.661.80 C190854.682.402.50 C267011.774.204.66 C4324.911.251.45 C4992.411.201.31 C8802.050.700.72 C3540835.6425.2527.45 dalu210.096.237.12 i10332.228.569.21 b01_C0.030.05 b03_C0.190.20 b04_C12.151.471.66 b05_C24.502.432.50 b06_C0.04 b07_C4.040.840.86 b08_C0.30 b09_C0.370.260.27 b10_C0.400.300.32 b11_C9.970.260.34 b12_C81.723.233.55 b13_C0.280.200.21 AVG-0.0370.041 Mem (fs)Mem (k=4)Mem (k=6) 3127323066 1062883066 1727184088 2892266132 991348176 7358420443066 3217468176 4997581124212264 5079344088 1022 11753020443066 255505110 1022 265722044 1022 20441022 224842044 153304088 20441022 -0.0280.032
19
Circuit#Nodes#Literal s Node%Lit%Time(s)Mem C6288241648004.393.693.651022 C75523466609840.6826.156.119198 b1497681891717.1510.12117.60105582 b14_165701288620.039.5619.0450078 b20196833821318.648.6865.3969496 b20_1139002707418.968.9539.3434748 b21200283899317.918.4866.3765408 b21_1138992716417.569.4538.3234748 AVG--18.779.49-- Results for Large Circuits full_simplify did not complete for all the examples below k = 4 for these experiments Maximum runtime < 2 minutes Peak memory utilization < 106K BDD nodes
20
Possible Extensions Can compute AODCs in a similar fashion Yields more flexibility at a node However, each node must be minimized after its AODC computation Compatibility not maintained Useful if only node minimization is desired Compatibility is useful if the nodes are to be optimized simultaneously at a later stage Proof of correctness is similar
21
Conclusions Presented a robust technique for ACODC computation Dynamic extraction of sub-networks to compute CODCs ACODCs computed exactly once for a node 19% reduction in node count and 9.5% reduction in literal count (large circuits) 23% reduction in literal count as compared to 28.5% for full_simplify (medium circuits) 25X better run-time than full_simplify 33X better memory utilization than full_simplify
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.