Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul Sridharan 1 of 25
Motivation Background ◦ Code compression using Bitmasks Challenges in Bitmask-based approach Application-Aware Code Compression ◦ Mask Selection ◦ Bitmask-aware Dictionary Selection ◦ Code Compression Algorithm Results Conclusion 2 of 25
Bitmask-based code compression ◦ Addresses issue of memory constraints in Embedded Systems improving power and performance ◦ Constraints code size Application-Aware code compression algorithm ◦ Improve compression efficiency without introducing decompression penalty 3 of 25
Background: Code Compression Compressed Code (Memory) Decompression Engine Processor (Fetch and Execute) Application Program (Binary) Compression Algorithm Static Encoding (Offline) Dynamic Decoding (Online) 4 of 25
Format for Uncompressed Code Format for Compressed Code Uncompressed Data (32 Bits) Decision (1 Bit) Decision (1 Bit) # of Bit Changes Dictionary Index Location (5 Bits) Locatio n (5 Bits ) … Dictionary based ◦ Frequency based Dictionary-selection Format for Uncompressed Code (32 Bit Code) Format for Compressed Code Uncompressed Data (32 Bits) Decision (1 Bit ) Dictionary Index Decision (1 Bit) Hamming Distance based ◦ Remembering Mismatches Bit-mask based 5 of 25
Bitmask Encoding 32-bit instructions Format for uncompressed code Format for compressed code Uncompressed Data (32 Bits) Decision (1 Bit) Decision (1 Bit) Number of Masks Dictionary Index … Mask Type Location Mask Pattern Mask Type Location Mask Pattern Location to apply the bitmask Actual mask pattern Type of the mask e.g., 2-bit, 4-bit etc. 6 of 25
Code Compression with Bitmasks Original Program Compressed Program Dictionary IndexEntry – Compressed 1 – Not Compressed 0 – Bit Mask Used 1 – No Bit Mask Used Bit Mask PositionBit Mask Value 7 of 25
Challenges in Bitmask-based Compression Selection of appropriate mask pattern ◦ Larger bitmask generates more matches 4-bit mask can handle up to 16 mismatches 8-bit mask can handle up to 256 mismatches ◦ Larger bitmask incurs higher cost 4-bit mask costs 7 bits 8-bit mask costs 10 bits Efficient Dictionary Selection ◦ Frequency-based selection not always optimum Need for efficient masking and dictionary selection schemes to improve efficiency 8 of 25
Frequency v/s Spanning based Dictionary Selection Frequency-based DS CR = 97.5% Spanning-based DS CR = 87.5% 9 of 25
Bitmask Selection Bitmask-Aware Dictionary Selection ◦ Nondeterministic polynomial-time-hard problem Code Compression Algorithm ◦ Based on the combination of the two approaches 10 of 25
Mask Selection How many bitmask patterns are needed? Which of them are profitable? Fixed and sliding bitmask patterns MaskFixedSliding 1 BitX 2 BitsXX 3 BitsX 4 BitsXX 5 BitsX 6 BitsX 7 BitsX 8 BitsXX Bit Changes Size of Mask Pattern 1 Bit 2 Bits 4 Bits 8 Bits 16 Bits 32 Bits 32Bits Bits Bits Bits Bits116 1Bit5 11 of 25
Mask Selection Bits needed to indicate particular location ◦ Size of mask ◦ Type of mask No. of bitmask patterns needed ◦ Up to two mask patterns Minimum cost to store three bitmasks is bits for a 32-bit vector Not very profitable Which combinations are profitable? ◦ Eleven possibilities 1s, 2s, 2f, 3s, 4s, 4f, 5s, 6s, 7s, 8s, 8f ◦ Select one/two from eleven possibilities Number of combinations can be further reduced 12 of 25
Comparison of Bitmask Combinations Benchmarks are compiled for TI TMS320C6x (1s, 4f) and (2f, 2s) provide the best compression s (1s, 4f) (2s, 2f) 13 of 25
Mask Selection: Observations Factors of 32 (1, 2, 4 and 8) produce better results ◦ Since they can be applied cost-effectively on fixed locations 8-bit fixed/sliding is not helpful ◦ Probability of more than 4 consecutive changes is low ◦ Two smaller masks perform better than a larger one ◦ 4-bit sliding does not perform better than 4-bit fixed Two bitmasks provide better results than a single one Choose two from four bitmasks: (1s, 2f, 2s, 4s) MaskFixedSliding 1 BitX 2 BitsXX 4 BitsX 14 of 25
Dictionary Selection DynamicStatic Frequency Spanning Bit Savings Select most frequently occurring binary patterns Select patterns to ensure uniform coverage of all patterns based on hamming distance. Select patterns based on bit savings due to self and mask-matched repetitions 15 of 25
16 of 25
BitSavings-based Dictionary Selection A = 0+10 = 10 B = 7+15 = 22 C = 7+15 = 22 D = 0+5 = 5 E = 0+15 = 15 F = 7+20 = 27 G =14+10 = 24 A(0) B(7) C(7) D(0) E(0)F(7) G(14) Node Weight: number of bits saved due to frequency of the pattern Edge Weight: number of bits saved due to use of the bitmask based match Total weight: node weight + all edge weights (connected to the node) 17 of 25
BitSavings-based Dictionary Selection A = 0+10 = 10 B = 7+15 = 22 D = 0+5 = 5 G =14+10 = 24 A(0) B(7) D(0) G(14) Node Weight: number of bits saved due to frequency of the pattern Edge Weight: number of bits saved due to use of the bitmask based match Total weight: node weight + all edge weights (connected to the node) Continues until the dictionary is full or the graph is empty 18 of 25
Application Aware Code Compression 19 of 25
Experiments Experimental Setup ◦ Benchmarks: TI and MediaBench ◦ Architectures: Sparc, TI TMS320C6x, MIPS Results ◦ BCC: Bitmask-based code compression Customized encodings for different architectures Effects of dictionary size selection Comparison with existing techniques ◦ ACC: Application-aware code compression Bitmask selection Dictionary selection 20 Of 25
Compression Ratio for adpcm_en Encoding 1 (one 8-bit mask) Encoding 2 (two 4-bit masks) Encoding 3 (4-bit and 8-bit masks) Encoding2 outperforms others 21 of 25
Comparison with other Techniques Outperforms other dictionary-based techniques by 15% Higher decompression bandwidth than existing compression techniques Smaller compression ratio is better Bitmask Approach 22 of 25
Comparison of Dictionary Selection Methods BitSavings approach outperforms both frequency- and spanning-based techniques 23 of 25
Compression Ratio Comparison BCC generates 15-20% improvement over other techniques ACC outperforms BCC by another 5-10% BCC: Bitmask-based Code Compression ACC: Application-aware Code Compression 24 of 25
??? 25 of 25