Presentation is loading. Please wait.

Presentation is loading. Please wait.

Coding Methods in Embedded Computing Wayne Wolf Dept. of Electrical Engineering Princeton University.

Similar presentations


Presentation on theme: "Coding Methods in Embedded Computing Wayne Wolf Dept. of Electrical Engineering Princeton University."— Presentation transcript:

1 Coding Methods in Embedded Computing Wayne Wolf Dept. of Electrical Engineering Princeton University

2 © 2004 Embedded Systems Group Outline Lv/Henkel/Lekatsas/Wolf: Adaptive dictionary method for bus encoding Lin/Xie/Wolf: Dictionary coding for code compression

3 © 2004 Embedded Systems Group Adaptive bus encoding Goal: Reduce bus energy Significant part of energy is related to IO Significant Impact of inter-wire capacitances Approach: Explore data properties Past success in address buses Few approaches for data buses Results: 28% average power reduction One additional line for a 32-line bus No additional cycles Applies to both address and data buses

4 © 2004 Embedded Systems Group Related work Stan/Burleson [TVLSI95]: Bus-invert Encoding Panda/Dutt [TVLSI99]: Reduce address bus switching by memory access exploitation Benini et al. [GLS-VLSI97]: T0 Encoding Mussol et al. [TVLSI98]: Working Zone Encoding Sotiriadis/Chandrakasan [ICCAD00]: Transition Pattern Coding Kim et al. [DAC00]: Coupling sensitive scheme

5 © 2004 Embedded Systems Group General Two Line Bus Bus model (I)

6 © 2004 Embedded Systems Group i R I C L C L C i R i R i R Simplify bus model by quantizing energy values: 0, 1, 2. Bus model (II)

7 © 2004 Embedded Systems Group Bus model (III) Bus Energy for multiple line buses

8 © 2004 Embedded Systems Group Source properties on data buses Correlation of transition signaling code on adjacent lines: D(x) = n x /N = transitions/total transactions Bit number

9 © 2004 Embedded Systems Group Source properties (II) Adjacent bit lines in a word are correlated.

10 © 2004 Embedded Systems Group Source properties (III) 10 most frequently-occurring patterns:

11 © 2004 Embedded Systems Group Energy savings from different compression schemes. Compare transition, interwire energy savings.

12 © 2004 Embedded Systems Group Dictionary techniques Look up symbol strings in dictionary; replace with shorter code. Types of dictionaries: Static dictionary Adaptive dictionary ‘ is ’ ‘ the ’ ‘ are ’ ‘ do ’

13 © 2004 Embedded Systems Group Approach Use dictionary scheme to take advantage of frequent patterns. Word divided into key, index, bypassed part:

14 © 2004 Embedded Systems Group Adaptive Dictionary Encode Scheme (ADES)

15 © 2004 Embedded Systems Group Encoder miss 0xFFFF000 0x0000000 0x1234FFE 0xFEFA830 0xF1234FF0 =? 0 0 0 0xF1234FF 0 0 Upper Part Non-compress Part Index Part 0xF1234FF

16 © 2004 Embedded Systems Group Decoder hit 0xFFFF000 0x0000000 0x1234FFE 0xFEFA830 1 2 3 X Read 0x1234FFEB t

17 © 2004 Embedded Systems Group Decoder miss 0xFFFF000 0x0000000 0x1234FFE 0xFEFA830 0 0 0 0xF1234FF Write 0xF1234FF0 0xF1234FF

18 © 2004 Embedded Systems Group Architecture ADES

19 © 2004 Embedded Systems Group Area, delay, energy Area 750 Gates Energy Primarily consumed by relatively small memory Latency Encoding/decoding can be finished in one cycle

20 © 2004 Embedded Systems Group Results: experimental setup SimpleScalar simulator 32-bit data bus Various real-world applications in SPEC95 and Mediabench ApplicationDescription Adpcm-encADPCM encoder for voice Adpcm-decADPCM decoder for voice CompressFile compression program in UNIX system GccGnu c compiler GoGo is a game program in SPEC95 IjpegJPEG encoder/decoder program LiLisp interpreter M88ksimA small operating system PerlPerl language interpreter

21 © 2004 Embedded Systems Group Results: detail

22 © 2004 Embedded Systems Group Results: comparison Scheme Avg. Energy per Mem Access Avg. Energy Reduced Num. of Additional Lines Number of Gates (approximately) Delay Raw 1.94e-11J 0%N/A BI4 1.90e-11J 2.5%4100Low WZE 1.60e-11J 17.8%41800High TPC 3.28e-11J -68.9%12N/ALow ADES with BI 1.38e-11J 28.9%2750Low

23 © 2004 Embedded Systems Group Results: graphical comparison of energy savings

24 © 2004 Embedded Systems Group Summary: adaptive bus encoding Upcoming technologies induce inter-wire capacitances in the order of magnitude of intrinsic capacitances Ordinary methods (e.g. Hamming distance) minimization can ’ t capture those effects Exploits information redundancy on data buses ADES Average 28% energy savings on data bus Extendable to address buses Low cost

25 © 2004 Embedded Systems Group Code compression Memory size is critical for embedded system Program size grows with application complexity Code compression is a solution to reduce code size Code size grows as RISC or VLIW is used Improved VLIW code compression is needed (Xie,2002) Code Size of MPEG2 Encoder

26 © 2004 Embedded Systems Group Base l1l1 l2l2 l3l3 lklk... b1b2 b4 b3 ck2 block4 block1 blo- block3 block4 Requirements on code compression Random Access Start decompression at block boundaries Synchronize model and arithmetic coder Byte Alignment Faster Decoding Easier and more compact indexing Indexing LAT Patching branch offsets (only for code compression)

27 © 2004 Embedded Systems Group Previous work Wolfe and Chanin (1992) IBM CodePack (1998) Larin and Conte (1999) Huffman coding Xie et al. (2001-02) F2VCC and V2FCC Power PC 40x Embedded Processor Cache External Memory Decompression Core Processor Local Bus Decoder Table

28 © 2004 Embedded Systems Group Our approach Problem definition Propose code compression schemes to reduce code size on VLIW embedded system Texas Instruments ’ TMS320C6x VLIW DSP Our contribution Branch blocks Branch targets are fixed once the code is compiled Average: 80.1 blocks, 454 bytes LZW-based code compression schemes Selective code compression schemes

29 © 2004 Embedded Systems Group Compression/decompression

30 © 2004 Embedded Systems Group Decompression architecture Works for pre-/post-cache:

31 © 2004 Embedded Systems Group LZW data compression Input:a a b ab aba aa Output:0 0 1 3 5 2 Compression Engine Decompression Engine Codeword Longest Phrase Original Phrase Table N+1 N = N?? Welch (1984) modified Ziv- Lempel (1978) Generate coding table on-the-fly Search for the longest phrase already in the table Output the index of the phrase Add the phrase with the next element as a new table entry Decompressi on lags compression by one codeword

32 © 2004 Embedded Systems Group Example IndexPhraseDerivation 0aInitial 1b 2aa0 + a 3ab0 + b 4ba1 + a 5aba3 + a 6abaa5 + a Input:a a b ab aba aa Output:0 0 1 3 5 2 Compression Engine Decompression Engine Codeword Longest Phrase Original Phrase Table N+1 N = N??

33 © 2004 Embedded Systems Group LZW-based code compression Use BYTE ( 0x00 ~ 0xFF ) as basic element. Variable-to-fixed code compression: Longer codeword means: Larger table (exponentially) More decompression overhead Useless when the block is too small Use more bits to encode same phrase CR: 83, 83, 84, 87% for 9-12 bit LZW Wider decoding table means: Larger table (linearly) Wider decoding bandwidth Less than 1% CR difference for 8-20 bytes

34 © 2004 Embedded Systems Group Compression ratio vs. codeword size for two examples small large

35 © 2004 Embedded Systems Group Compression ratio vs. codeword size on benchmark set

36 © 2004 Embedded Systems Group Selective code compression Motivation Branch blocks vary in size No benefit to use longer codeword if the block can not fill up the coding table Only 12.8% of the branch blocks can fill up 9-bit LZW table Only < 1% of the branch blocks can fill up 12-bit LZW table Selective Code Compression Apply different compression methods on different branch blocks Block size, instruction frequency, … are collected during profiling Profile is used to determine the compression method Source Program Branch Blocks Profiling Method Selection Compression Compressed Code

37 © 2004 Embedded Systems Group Selective compression (cont ’ d.) Minimum table-usage selective compression (MTUSC) Calculate the number of phrases generated during compression Select the smallest table that all the phrases could fit in the table Average compression ratio is 79.2% Minimum code-size selective compression (MCSSC) Some compressed blocks use more bytes than original data Compress the blocks using different codeword length The smallest compressed or uncompressed block is selected Average compression ratio is 76.8% Dynamic LZW Codeword length grows as compression goes on 75.8% and 75.2% for MTUSC and MCSSC

38 © 2004 Embedded Systems Group Experiments Benchmarks Collected from Texas Instruments and Mediabench Compression Ratio Longer codeword works better in large benchmarks Dynamic MCSSC is always the best

39 © 2004 Embedded Systems Group Compression ratio vs. algorithm

40 © 2004 Embedded Systems Group Average throughput 1.72 bytes for 12-bit LZW and 1.82 bytes for dynamic MCSSC

41 © 2004 Embedded Systems Group Parallel decompression Parallel Decompression Execution time: 0.51x, 0.27x, 0.14x Throughput: 3.31, 6.37, 12.29 bytes Hardware Features 2-30 kBytes decoding table < 4500  m 2 using TSMC.25  m model 5508 cycles to decompress 9344 bytes ADPCM decoder 90k cycles to decompress 182k bytes MPEG-2 encoder Current Code = 300 DC1DC2 Code 295 Code 277 DC1DC2 Code 295Code 301

42 © 2004 Embedded Systems Group Comparison with previous work Wolfe Chanin MIPSHuffman73%< 1mm 2 1 byteserial CodePack PowerPC CodePack 60%< 1mm 2 1 byteserial LekatsasMIPSSAMC57%4K tableNAserial XieTMS320F2V V2F 65% 70%- 82% 6-48K table 2-30K table 4.9 bits avg, 13 bits max 89 bits max IID is parallel UsC6xLZW MCSSC 83%- 87% 75% < 0.05mm 2 30K table 1.3-1.7 avg 1.8 bytes avg, 13 bytes max parallel

43 © 2004 Embedded Systems Group Code compression summary We proposed code compression schemes using branch blocks as compression unit. Compression ratio is around 83% and 75% respectively. Low power is achieved by smaller memory required. Compare to previous work, our schemes have less decompression overhead, larger decompression bandwidth with comparable compression ratio. Parallel decompression could be applied to achieve faster decompression which is suitable for VLIW architecture. Compiler techniques could be used to generate source programs more suitable for code compression. Find other schemes can take advantage of branch blocks.


Download ppt "Coding Methods in Embedded Computing Wayne Wolf Dept. of Electrical Engineering Princeton University."

Similar presentations


Ads by Google