Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

Slides:



Advertisements
Similar presentations
Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid Department of Computer Science and Engineering.
Advertisements

Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
Instruction Set Design
Data Compression CS 147 Minh Nguyen.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Greedy Algorithms (Huffman Coding)
Performance Analysis and Optimization (General guidelines; Some of this is review) Outline: introduction evaluation methods timing space—code compression.
Compression & Huffman Codes
University of Michigan Electrical Engineering and Computer Science 1 Reducing Control Power in CGRAs with Token Flow Hyunchul Park, Yongjun Park, and Scott.
Power Efficient IP Lookup with Supernode Caching Lu Peng, Wencheng Lu*, and Lide Duan Dept. of Electrical & Computer Engineering Louisiana State University.
Turbo Codes Azmat Ali Pasha.
An Arithmetic Structure for Test Data Horizontal Compression Marie-Lise FLOTTES, Regis POIRIER, Bruno ROUZEYRE Laboratoire d’Informatique, de Robotique.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
A Hybrid Test Compression Technique for Efficient Testing of Systems-on-a-Chip Aiman El-Maleh King Fahd University of Petroleum & Minerals, Dept. of Computer.
Instruction Set Architecture (ISA) for Low Power Hillary Grimes III Department of Electrical and Computer Engineering Auburn University.
A Geometric-Primitives-Based Compression Scheme for Testing Systems-on-a-Chip Aiman El-Maleh 1, Saif al Zahir 2, Esam Khan 1 1 King Fahd University of.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
Compressed Instruction Cache Prepared By: Nicholas Meloche, David Lautenschlager, and Prashanth Janardanan Team Lugnuts.
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
1 Code Compression Motivations Data compression techniques Code compression options and methods Comparison.
Schedule-Aware and Compression-Aware Dynamic Cache Reconfiguration in Real Time Embedded Systems SHREYA SRIPERUMBUDURI SIDDHARTH AMBADASU JAYALAKSHMI MUTHIAH.
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
Computer Vision – Compression(2) Hanyang University Jong-Il Park.
CS 395 T Real-Time Graphics Architectures, Algorithms, and Programming Systems Spring’03 Vector Quantization for Texture Compression Qiu Wu Dept. of ECE.
: Chapter 12: Image Compression 1 Montri Karnjanadecha ac.th/~montri Image Processing.
TECH 6 VLIW Architectures {Very Long Instruction Word}
Huffman Encoding Veronica Morales.
Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj.
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
What have mr aldred’s dirty clothes got to do with the cpu
Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.
Survey on Improving Dynamic Web Performance Guide:- Dr. G. ShanmungaSundaram (M.Tech, Ph.D), Assistant Professor, Dept of IT, SMVEC. Aswini. S M.Tech CSE.
Computer architecture Lecture 11: Reduced Instruction Set Computers Piotr Bilski.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
A Decompression Architecture for Low Power Embedded Systems Lekatsas, H.; Henkel, J.; Wolf, W.; Computer Design, Proceedings International.
© Jalal Kawash 2010 Trees & Information Coding Peeking into Computer Science.
CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC Data Compression.
Coding Methods in Embedded Computing Wayne Wolf Dept. of Electrical Engineering Princeton University.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Xiaoke Qin, Member, IEEE Chetan Murthy, and Prabhat Mishra, Senior Member, IEEE IEEE Transactions in VLSI Systems, March 2011 Presented by: Sidhartha Agrawal.
OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer.
Hanyang University Hyunok Oh Energy Optimal Bit Encoding for Flash Memory.
Lecture 7 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Lightweight Runtime Control Flow Analysis for Adaptive Loop Caching + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing Marisha.
Exam 2 Review Two’s Complement Arithmetic Ripple carry ALU logic and performance Look-ahead techniques, performance and equations Basic multiplication.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
High Performance Embedded Computing © 2007 Elsevier Lecture 7: Memory Systems & Code Compression Embedded Computing Systems Mikko Lipasti, adapted from.
Sunpyo Hong, Hyesoon Kim
Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]
1 3 Computing System Fundamentals 3.2 Computer Architecture.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Compression & Huffman Codes
Data Compression.
The Greedy Method and Text Compression
The Greedy Method and Text Compression
Huffman Coding, Arithmetic Coding, and JBIG2
Data Compression CS 147 Minh Nguyen.
Code Compression Motivation Efficient Compression
Improving Program Efficiency by Packing Instructions Into Registers
On Efficient Graph Substructure Selection
Ann Gordon-Ross and Frank Vahid*
Esam Ali Khan M.S. Thesis Defense
September 17 Test 1 pre(re)view Fang-Yi will demonstrate Spim
Efficient Placement of Compressed Code for Parallel Decompression
Presentation transcript:

Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul Sridharan 1 of 25

 Motivation  Background ◦ Code compression using Bitmasks  Challenges in Bitmask-based approach  Application-Aware Code Compression ◦ Mask Selection ◦ Bitmask-aware Dictionary Selection ◦ Code Compression Algorithm  Results  Conclusion 2 of 25

 Bitmask-based code compression ◦ Addresses issue of memory constraints in Embedded Systems improving power and performance ◦ Constraints code size  Application-Aware code compression algorithm ◦ Improve compression efficiency without introducing decompression penalty 3 of 25

Background: Code Compression Compressed Code (Memory) Decompression Engine Processor (Fetch and Execute) Application Program (Binary) Compression Algorithm Static Encoding (Offline) Dynamic Decoding (Online) 4 of 25

Format for Uncompressed Code Format for Compressed Code Uncompressed Data (32 Bits) Decision (1 Bit) Decision (1 Bit) # of Bit Changes Dictionary Index Location (5 Bits) Locatio n (5 Bits ) …  Dictionary based ◦ Frequency based Dictionary-selection Format for Uncompressed Code (32 Bit Code) Format for Compressed Code Uncompressed Data (32 Bits) Decision (1 Bit ) Dictionary Index Decision (1 Bit)  Hamming Distance based ◦ Remembering Mismatches  Bit-mask based 5 of 25

Bitmask Encoding  32-bit instructions  Format for uncompressed code  Format for compressed code Uncompressed Data (32 Bits) Decision (1 Bit) Decision (1 Bit) Number of Masks Dictionary Index … Mask Type Location Mask Pattern Mask Type Location Mask Pattern Location to apply the bitmask Actual mask pattern Type of the mask e.g., 2-bit, 4-bit etc. 6 of 25

Code Compression with Bitmasks Original Program Compressed Program Dictionary IndexEntry – Compressed 1 – Not Compressed 0 – Bit Mask Used 1 – No Bit Mask Used Bit Mask PositionBit Mask Value 7 of 25

Challenges in Bitmask-based Compression  Selection of appropriate mask pattern ◦ Larger bitmask generates more matches  4-bit mask can handle up to 16 mismatches  8-bit mask can handle up to 256 mismatches ◦ Larger bitmask incurs higher cost  4-bit mask costs 7 bits  8-bit mask costs 10 bits  Efficient Dictionary Selection ◦ Frequency-based selection not always optimum  Need for efficient masking and dictionary selection schemes to improve efficiency 8 of 25

Frequency v/s Spanning based Dictionary Selection Frequency-based DS CR = 97.5% Spanning-based DS CR = 87.5% 9 of 25

 Bitmask Selection  Bitmask-Aware Dictionary Selection ◦ Nondeterministic polynomial-time-hard problem  Code Compression Algorithm ◦ Based on the combination of the two approaches 10 of 25

Mask Selection  How many bitmask patterns are needed?  Which of them are profitable?  Fixed and sliding bitmask patterns MaskFixedSliding 1 BitX 2 BitsXX 3 BitsX 4 BitsXX 5 BitsX 6 BitsX 7 BitsX 8 BitsXX Bit Changes Size of Mask Pattern 1 Bit 2 Bits 4 Bits 8 Bits 16 Bits 32 Bits 32Bits Bits Bits Bits Bits116 1Bit5 11 of 25

Mask Selection  Bits needed to indicate particular location ◦ Size of mask ◦ Type of mask  No. of bitmask patterns needed ◦ Up to two mask patterns  Minimum cost to store three bitmasks is bits for a 32-bit vector  Not very profitable  Which combinations are profitable? ◦ Eleven possibilities  1s, 2s, 2f, 3s, 4s, 4f, 5s, 6s, 7s, 8s, 8f ◦ Select one/two from eleven possibilities  Number of combinations can be further reduced 12 of 25

Comparison of Bitmask Combinations Benchmarks are compiled for TI TMS320C6x (1s, 4f) and (2f, 2s) provide the best compression s (1s, 4f) (2s, 2f) 13 of 25

Mask Selection: Observations  Factors of 32 (1, 2, 4 and 8) produce better results ◦ Since they can be applied cost-effectively on fixed locations  8-bit fixed/sliding is not helpful ◦ Probability of more than 4 consecutive changes is low ◦ Two smaller masks perform better than a larger one ◦ 4-bit sliding does not perform better than 4-bit fixed  Two bitmasks provide better results than a single one  Choose two from four bitmasks: (1s, 2f, 2s, 4s) MaskFixedSliding 1 BitX 2 BitsXX 4 BitsX 14 of 25

Dictionary Selection DynamicStatic Frequency Spanning Bit Savings Select most frequently occurring binary patterns Select patterns to ensure uniform coverage of all patterns based on hamming distance. Select patterns based on bit savings due to self and mask-matched repetitions 15 of 25

16 of 25

BitSavings-based Dictionary Selection  A = 0+10 = 10  B = 7+15 = 22  C = 7+15 = 22  D = 0+5 = 5  E = 0+15 = 15  F = 7+20 = 27  G =14+10 = 24 A(0) B(7) C(7) D(0) E(0)F(7) G(14) Node Weight: number of bits saved due to frequency of the pattern Edge Weight: number of bits saved due to use of the bitmask based match Total weight: node weight + all edge weights (connected to the node) 17 of 25

BitSavings-based Dictionary Selection  A = 0+10 = 10  B = 7+15 = 22  D = 0+5 = 5  G =14+10 = 24 A(0) B(7) D(0) G(14) Node Weight: number of bits saved due to frequency of the pattern Edge Weight: number of bits saved due to use of the bitmask based match Total weight: node weight + all edge weights (connected to the node) Continues until the dictionary is full or the graph is empty 18 of 25

Application Aware Code Compression 19 of 25

Experiments  Experimental Setup ◦ Benchmarks: TI and MediaBench ◦ Architectures: Sparc, TI TMS320C6x, MIPS  Results ◦ BCC: Bitmask-based code compression  Customized encodings for different architectures  Effects of dictionary size selection  Comparison with existing techniques ◦ ACC: Application-aware code compression  Bitmask selection  Dictionary selection 20 Of 25

Compression Ratio for adpcm_en Encoding 1 (one 8-bit mask) Encoding 2 (two 4-bit masks) Encoding 3 (4-bit and 8-bit masks) Encoding2 outperforms others 21 of 25

Comparison with other Techniques  Outperforms other dictionary-based techniques by 15%  Higher decompression bandwidth than existing compression techniques Smaller compression ratio is better Bitmask Approach 22 of 25

Comparison of Dictionary Selection Methods BitSavings approach outperforms both frequency- and spanning-based techniques 23 of 25

Compression Ratio Comparison BCC generates 15-20% improvement over other techniques ACC outperforms BCC by another 5-10% BCC: Bitmask-based Code Compression ACC: Application-aware Code Compression 24 of 25

??? 25 of 25