Synthesis of the Optimal 4-bit Reversible Circuits Dmitri Maslov (spkr) University of Waterloo Waterloo, ON, Canada Oleg GolubitskySean Falconer Stanford University Stanford, CA Google Inc. Waterloo, ON, Canada
Basic Definitions NOT CNOT Toffoli-4 Toffoli Reversible circuit is a string of gates. Reversible n- bit function is a permutation of 2 n elements. page 1/15
Problem Synthesize optimal 4-bit reversible circuits, i.e., containing minimal number of gates. Complexity -- There are 16!=20,922,789,888,000 reversible functions. -- There 32 gates. -- An average optimal circuit requires gates. :: 20,922,789,888,000 * log 2 32 * bits > 100 TB. Murphy, David. "Western Digital Launches World-First 2TB Hard Drive". PC World. Retrieved page 2/15
Importance Library for physicists interested in performing a small experiment, but having very limited control over their system. page 3/15 Indispensable for peep-hole optimization methods. Peep- hole optimizations are an important part of any modern compiler. Mathematical curiosity. Computing the value of Shannon’s complexity function. L(3)=8, L(4)=[14,17], L(5)=?
Solution Rough complexity analysis -- space: -- time: Denote 16!=N (formally, N:=2 n !). Next, reduce these complexity figures to something manageable. page 4/15
Solution Rough complexity analysis -- space: -- time: Synthesize and save only halves of all optimal circuits. An optimal circuit for any function may be found by searching for both of its halves. Optimization 1 page 5/15
Solution Rough complexity analysis -- space: -- time: soft Store optimal halves in a hash table. Optimization 2 Actual complexity is closer to -- space: -- time: soft page 6/15
Solution Simultaneous input/output relabeling does not change optimality of a circuit. Thus, we store a single (canonical--- binary string with least lexicographic order) representative. Optimization 3 In practice, there are almost 24=4! different relabelings, reducing the storage complexity by a factor of almost 24, and helping to reduce runtime. page 7/15
Solution If an optimal circuit is found for a function f, an optimal circuit for the inverse function, f -1, can be obtained by reversing the optimal circuit for f. Optimization 4 In practice, random f frequently differs from f -1 resulting in the reduction of storage requirement by an additional factor of almost 2, and helping to further reduce the runtime. page 8/15
Performance k789 Size Memory usage256 MB2 GB32 GB Load factor Parameters of the linear hash table storing canonical representatives. Using a high performance server with 16 AMD Opteron 2300 MHz processors, 64 GB RAM, and Seagate Barracuda ES2 SCSI 7200 RPM HDD running Linux it took 10,549 seconds (under 3 hours) to synthesize all optimal circuits with up to 9 gates. page 9/15
Performance SizeFunctions 1417, ,371, ,110, ,051, , ,861 85, Synthesis of 10,000,000 random functions (Fisher-Yates shuffle over Mersenne twister random number generator) took 104, seconds (about 29 hours) of user time with the maximal memory usage of GB. Loading optimal circuits with up to 9 gates into RAM took 1111 seconds. On average, it took only seconds to synthesize an optimal circuit. A 5400-RPM HDD access time may be expected to be on the order of 0.01— 0.02 seconds. page 10/15
Performance Distribution of the number of functions requiring a circuit of a specified size (gate count). page 11/15
Performance Distribution of the number of linear functions requiring a circuit of a specified size (gate count). It took under 2 seconds to synthesize all these circuits. page 12/15 SizeFunctions , , , , ,182 46, WA: Total: 322,560
Performance page 13/15
Future directions Larger circuits -- There are 80 transformations resulting from the application of all possible Toffoli-type gates on 5 bits *(log 2 80)/5!/2 ~ 7.1 billion bits, fits into RAM memory =12. Meaning, it is reasonable to expect that extending the search for optimal 4-bit reversible circuits will allow to find optimal 5-bit reversible circuits with up to 12 gates. page 14/15
Future directions Optimal circuits using other cost metrics This search can be easily extended to account for other cost metrics: Weighted gate count optimal circuits---organize breadth first search such that a gate with cost G is assigned to a circuit of cost C at the iteration number G+C. Depth optimal circuits---choose a different set of elementary transformations, e.g., circuit NOT(a)CNOT(b,c) is now an elementary transformation. Depth optimal weighted gate circuits---combine previous two modifications. page 15/15
END Questions?
2 10 != page __/__ Classically: 2 10 !/(Lifespan_of_universe_in_Planck_time_units * Estimated_number_of_atoms) ~ universes!
Work in progress Synthesize all optimal 4-bit circuits -- Store circuits with up to 9 gates as we do it now. -- Store a bit vector (~250 GB) for canonical representatives of circuits with 10, 11, 12, 13, and 14 gates, one at a time. -- Use a minimal number of uploads/downloads of parts of each of such vectors into RAM. page __/__