Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Parallel Deposit (bit scatter)  Deposits in the result register, at positions flagged by 1’s in r 3, the right justified bits from r 2 Yedidya Hilewitz.

Similar presentations


Presentation on theme: " Parallel Deposit (bit scatter)  Deposits in the result register, at positions flagged by 1’s in r 3, the right justified bits from r 2 Yedidya Hilewitz."— Presentation transcript:

1  Parallel Deposit (bit scatter)  Deposits in the result register, at positions flagged by 1’s in r 3, the right justified bits from r 2 Yedidya Hilewitz and Ruby B. Lee, “Fast Bit Gather, Bit Scatter and Bit Permutation Instructions for Commodity Microprocessors,” to appear in Journal of VLSI Signal Processing Systems. Yedidya Hilewitz and Ruby B. Lee, “Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions,” Proceedings of the IEEE 17th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 65-72, September 11-13, 2006 (Best Paper Award).1111111111111 Advanced Bit Manipulation Instructions for Commodity Processors Yedidya Hilewitz and Ruby B. Lee Princeton Architecture Laboratory for Multimedia and Security Department of Electrical Engineering, Princeton University Background and Motivation  Advanced bit manipulations are not well supported by commodity microprocessors  These operations are performed using “programming tricks” (see Hacker’s Delight )  Bit manipulations play a role in applications of increasing importance  We propose adding direct support for a few key bit manipulation operations to accelerate these applications Example Applications New Instructions Butterfly and Inverse Butterfly Parallel Extract and Parallel Deposit Bit Matrix Multiply Summary and Conclusions Ongoing and Future Work Applications (and Speedup)  Permutation  Butterfly and Inverse Butterfly  Bit Gather and Bit Scatter  Parallel Extract and Parallel Deposit  Bit Matrix Multiply  Other bit manipulation instructions (not covered here)  Bit matrix transpose  Population count  Advanced bit manipulations play an important role in many applications  We have introduced a few select bit manipulation instructions that speed up these applications  We have evolved the shifter to a new design using butterfly and inverse butterfly datapaths to support basic and advanced bit manipulation instructions  Advanced bit manipulations are no longer esoteric “programming tricks” but rather supported directly by microprocessors at only a marginal cost  Cryptography  Random number generation  Von Neumann Extractor  Toeplitz Matrix Multiply  Steganography  Cryptanalysis (Gaussian elimination)  Other applications:  Binary compression  Binary image morphology  Bioinformatics  Communications coding  FFT  Finite field arithmetic  Integer compression  Pattern matching  Other applications suggested by you! (up to 2.24× speedup) (9.9× speedup) (14.9× speedup) (2.92× speedup)  Identify new applications where bit manipulation instructions are useful (e.g., LFSR and FCSR RNGs, software radio)  Implementation  Refine current circuit implementation  Integrate new shifter in scalable crypto co- processor (PAX)  Butterfly  lg( n ) stages of n 2:1 MUXes split into n /2 pairs that pass through or swap inputs  bfly+ibfly = general permutation network  Any of the n ! permutations of n bits can be done with one pass of both instructions  Inverse Butterfly  Parallel Extract (bit gather)  extracts bits from r 2 flagged by 1’s in r 3 and compresses and right justifies in result register r2r2r2r2 r1r1r1r1 r3r3r3r31111111111111 r2r2r2r2 r1r1r1r1 r3r3r3r3  Cryptography – permutations in ciphers and hash functions, e.g., TDES:  Random Number Generators – extract bits from source of entropy  Von Neumann Extractor (Intel RNG) – given bit-pair sequence { x 2 i, x 2 i +1 } from entropy pool, extract x 2 i if the bits differ:  Toeplitz Matrix Multiply Extractor – multiply bit string from entropy pool by a binary Toeplitz matrix:  LSB Steganography – embed secret message in least significant bits of image or audio file:  bmm.n C = B, A A, B, C : n × n bit matrices: C = A × B mod 2 for i from 1 to n for j from 1 to n c i, j = a i,1 b 1,j  a i,2 b 2,j  …  a i,n b n,j  bmm.8 unit (pictured above) can be directly incorporated into the ALU (<¼ size) Yedidya Hilewitz and Ruby B. Lee, “Achieving Very Fast Bit Matrix Multiplication in Commodity Microprocessors,” Princeton University Department of Electrical Engineering Technical Report CE-L2007-006, August 2007. New Shifter Architecture  Brand new shifter architecture that replaces the shifter with a new unit that directly supports bit manipulation operations  New shifter performs  basic shifter operations:  shift, rotate, extract and deposit  multimedia shift-permute operations:  mix  advanced bit manipulation operations:  bfly, ibfly, pex, pdep Yedidya Hilewitz and Ruby B. Lee, “A New Basis for Shifters in General-Purpose Processors for Existing and Advanced Bit Manipulations,” to appear in IEEE Transactions on Computers. Yedidya Hilewitz and Ruby B. Lee, “Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors,” Proceedings of 18 th IEEE Symposium on Computer Arithmetic (ARITH-18), June 2007.


Download ppt " Parallel Deposit (bit scatter)  Deposits in the result register, at positions flagged by 1’s in r 3, the right justified bits from r 2 Yedidya Hilewitz."

Similar presentations


Ads by Google