Methodology of a Compiler that Compresses Code using Echo Instructions Framework and Design Methodology of a Compiler that Compresses Code using Echo Instructions Philip Brisk Majid Sarrafzadeh Embedded and Reconfigurable Systems Lab Computer Science Department University of California, Los Angeles philip@cs.ucla.edu majid@cs.ucla.edu
Outline Introduction Echo Instructions Compiler Framework Experimental Results Conclusion
Introductory Example: The HP DeskJet 820C Digital Controller Total chip area is 81 mm2 ROM consumes 14% of total die area Reduce Code Size Reduce ROM size Reduce Chip Area Reduce Heat Dissipation and Power Consumption “… the foremost consideration … was the final cost to the buyer.” [McWilliams, 1997]
LZ77 Compression and Echo Instructions LZ77 Compression [Ziv and Lempel, 1977] Replace of Repeated Substrings with Pointers Example: ABCDCABCDBABCAA becomes ABCDC(5, 4)B(7, 3)AA Echo Instructions [Fraser, 2002] offer ISA support for Execution of LZ77-compressed programs
Echo Instructions Echo(Offset, Length) 1. Branch to PC – Offset; Save PC+1 in register R. 2. Execute the next Length Instructions 3. Branch to the address in register R Replaces Repeated Code Segments in a Program Instruction Stream Augments a MIPS Jump-and-Link (JAL) Instruction with a Parameterized Procedure Return Mechanism. Does not Incur the Overhead Associated with Procedure Calls.
An Example 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 … 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 … $Echo(240, 5) Echo(304, 5) Repeating code sequences are replaced with echo instructions. Echo instructions are more space efficient than procedure calls No parameters No stack frame
Procedural Abstraction Techniques Predate Echo Instructions by 20+ Years Replace Repeated Instruction Sequences with Procedure Calls Substring Matching [Fraser, 1984] Reschedule/Rename [Cooper, 1999] [Lau, 2003] Our Approach: Subgraph Isomorphism
Substring Matching and Reschedule/Rename 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 … $10 $5 + $4 $11 $9 * $6 $6 $9 * $10 $10 $11 / $6 $10 $6 + 10 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 … Rename $4 : $3 $5 : $2 $6 : $8 $9 : $7 $10 : $1 $11 : $11 Reschedule
Subgraph Isomorphism 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 … $10 $5 + $4 $11 $9 * $6 $6 $9 * $10 $10 $11 / $6 $10 $6 + 10 All 3 Code Sequences have the same Data Flow Graph Representation Subgraph Isomorphism Techniques Identify Repeated Pattern Instances [Kastner, 2001]. Register Allocation and Scheduling must be reformulated to Optimize Pattern Re-Use. + * * / +
Example: 3 Dfgs + * - >> << 1 2 3 4 5 6 7 8 G1 G2 G3
Compression Example: 3 Dfgs + * - >> << 1 2 3 4 5 6 7 8 G1 G2 G3
Compression Example: 3 Dfgs + * - >> << 1 2 3 4 5 6 7 8 G1 G2 G3 6
Compression Example: 3 Dfgs + * - >> << 1 2 3 4 5 6 7 8 G1 G2 G3
Compression Example: 3 Dfgs + * - >> << 1 2 3 4 5 6 7 8 G1 G2 G3
Compression Example: 3 Dfgs + * - >> << 1 2 3 4 5 6 7 8 G1 G2 G3
Compression Example: 3 Dfgs 4 3 4 5 1 2 - 5 2 3 4 5 1 2 E 6 + * + E + + E + 1 6 >> * - << G1 G2 G3 6 7 8 7
Register Allocation by Example + << A B F Z C D G3 E X Y T5 T6 T7 T8 T1 T2 T3 T4 Both patterns reference the same instruction sequence. Schedule of operations and register usage must be identical. Data dependencies are maintained between patterns Shuffle or spill code reduces the effectiveness of compression Temporary Registers (Infinite Supply) Spilling values to memory is inevitable where register pressure is high.
Compiler Framework Challenge Optimization Strategy Design a Compiler that Minimizes Code Size for Architectures Augmented with Echo Instructions. Optimization Strategy Minimize code size. Select the lowest cost memory from a library. Apply performance enhancing transformations as long as: Code Size < Memory Capacity.
Design Overview IR Target Independent Optimization 1 Instruction Selection 2 Memory Library Compression Step 3 Register Allocation 4 Instruction Scheduling 5 Memory Selection 6 Assembly Code emit Performance Optimization 7
Implementation Status Algorithms Integrated into the Machine SUIF Compiler Retargetable: Current Implementation Targets x86 and Alpha Alpha selected as our Target Instruction Selection via do_gen pass (Machine SUIF) Compression Engine implemented successfully. Register Allocation and Scheduling are under construction. Optimization and Memory Selection will be implemented later.
Compilation Procedure Compile a source program to SUIFvm. Perform instruction selection for Alpha using the do_gen pass. Convert the SUIF IR (a linear list of instructions) to CDFG. Compress the CDFG. Compression Ratio = Compressed Code Size Original Code Size
Compression Results 56.23% 61.03% Code Size 64.60% 71.58% 72.35%
Compilation Time 62.77s 11.18s Code Size 5.68s 6.21s 0.47s
Compression Results 50.93% 59.71% Code Size 60.94% 60.29% 59.21%
Compilation Time 402.35s 87.21s Code Size 62.92s 57.05s 49.33s
Conclusion Echo Instructions Hardware support for runtime execution of compressed programs. Compiler Framework Compress IR instead of assembly code Compression ratios ranging from 72.35% to 50.93% for 10 MediaBench applications. Results do not account for register allocation.
References Cooper, K. and McIntosh, N. Enhanced Code Compression for Embedded RISC Processors. PLDI, 1999. Fraser, C. W., Myers, E. W., and Wendt, A. Analyzing and Compressing Assembly Code. SCC, 1984. Fraser, C. W. An Instruction for Direct Interpretation of LZ77-compressed Programs. Microsoft Tech. Report, 2002. Kastner, R. et. al. Instruction Generation for Hybrid-Reconfigurable Systems. ICCAD, 2001.
References Lau, J. et. al. Reducing Code Size with Echo Instructions. CASES, 2003. Lee, C., Potkonjak, M., and Mangione-Smith, W. H. MediaBench: A Tool for Evaluating Multimedia and Communication Systems. MICRO, 1997. Runeson, J. Code Compression through Procedural Abstraction before Register Allocation. Master’s Thesis. University of Uppsala, March, 2000. Ziv, J. and Lempel, A. A Universal Algorithm for Sequential Data Compression. IEEE Trans. Information Theory, May 1977.