Methodology of a Compiler that Compresses Code using Echo Instructions

Slides:



Advertisements
Similar presentations
Instruction Selection for Compilers that Target Architectures with Echo Instructions Philip BriskAni NahapetianMajid Sarrafzadeh Embedded and Reconfigurable.
Advertisements

Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /17/2013 Lecture 12: Procedures Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE CENTRAL.
1 Starting a Program The 4 stages that take a C++ program (or any high-level programming language) and execute it in internal memory are: Compiler - C++
Instruction Generation and Regularity Extraction for Reconfigurable Processors Philip Brisk, Adam Kaplan, Ryan Kastner*, Majid Sarrafzadeh Computer Science.
A Dictionary Construction Technique for Code Compression Systems with Echo Instructions Embedded and Reconfigurable Systems Lab Computer Science Department.
Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information.
CPSC Compiler Tutorial 8 Code Generator (unoptimized)
Cpeg421-08S/final-review1 Course Review Tom St. John.
Addressing Optimization for Loop Execution Targeting DSP with Auto-Increment/Decrement Architecture Wei-Kai Cheng Youn-Long Lin* Computer & Communications.
1/20 Data Communication Estimation and Reduction for Reconfigurable Systems Adam Kaplan Philip Brisk Ryan Kastner Computer Science Elec. and Computer Engineering.
Topic 6 -Code Generation Dr. William A. Maniatty Assistant Prof. Dept. of Computer Science University At Albany CSI 511 Programming Languages and Systems.
Memory management Ingrid Verbauwhede Department of Electrical Engineering University of California Los Angeles.
University of Maryland Compiler-Assisted Binary Parsing Tugrul Ince PD Week – 27 March 2012.
High Performance Architectures Dataflow Part 3. 2 Dataflow Processors Recall from Basic Processor Pipelining: Hazards limit performance  Structural hazards.
13/02/2009CA&O Lecture 04 by Engr. Umbreen Sabir Computer Architecture & Organization Instructions: Language of Computer Engr. Umbreen Sabir Computer Engineering.
Programmer's view on Computer Architecture by Istvan Haller.
Functions and Procedures. Function or Procedure u A separate piece of code u Possibly separately compiled u Located at some address in the memory used.
Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.
Survey on Improving Dynamic Web Performance Guide:- Dr. G. ShanmungaSundaram (M.Tech, Ph.D), Assistant Professor, Dept of IT, SMVEC. Aswini. S M.Tech CSE.
CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
2015/10/22\course\cpeg323-08F\Final-Review F.ppt1 Midterm Review Introduction to Computer Systems Engineering (CPEG 323)
Computer Organization and Design Computer Abstractions and Technology
Interference Graphs for Programs in Static Single Information Form are Interval Graphs Philip Brisk Processor Architecture Laboratory (LAP) EPFL Lausanne,
Module : Algorithmic state machines. Machine language Machine language is built up from discrete statements or instructions. On the processing architecture,
Compilers for Embedded Systems Ram, Vasanth, and VJ Instructor : Dr. Edwin Sha Synthesis and Optimization of High-Performance Systems.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Overview of Compilers and JikesRVM John.
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑 教授 組員 : R 張馨怡 R 林秀萍.
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO CS 219 Computer Organization.
Area-Efficient Instruction Set Synthesis for Reconfigurable System on Chip Designs Philip BriskAdam KaplanMajid Sarrafzadeh Embedded and Reconfigurable.
LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.
1 The user’s view  A user is a person employing the computer to do useful work  Examples of useful work include spreadsheets word processing developing.
Single Static Assignment Intermediate Representation (or SSA IR) Many examples and pictures taken from Wikipedia.
A Single Intermediate Language That Supports Multiple Implemtntation of Exceptions Delvin Defoe Washington University in Saint Louis Department of Computer.
CS 404 Introduction to Compiler Design
Advanced Computer Systems
Computer Science 210 Computer Organization
Chapter 1 Introduction.
L. Benini, G. DeMicheli Stanford University, USA A. Macii, E. Macii, M
Evaluating Register File Size
Selective Code Compression Scheme for Embedded System
Announcements MP 3 CS296 (Chase Geigle
Chapter 1 Introduction.
课程名 编译原理 Compiling Techniques
Introduction to Compilers Tim Teitelbaum
Functions and Procedures
Department of Electrical & Computer Engineering
Introduction to Computer Systems Engineering
Instructions - Type and Format
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.
Stephen Hines, David Whalley and Gary Tyson Computer Science Dept.
The Procedure Abstraction Part I: Basics
MIPS Instructions.
The University of Adelaide, School of Computer Science
Lecture Topics: 11/1 General Operating System Concepts Processes
In Search of Near-Optimal Optimization Phase Orderings
Lecture 16: Register Allocation
Introduction to Microprocessor Programming
Department of Electrical Engineering Joint work with Jiong Luo
Chapter 1 Computer System Overview
Compiler Construction
Compiler Construction
Fault Tolerant Systems in a Space Environment
Lecture 4: Instruction Set Design/Pipelining
rePLay: A Hardware Framework for Dynamic Optimization
Introduction to Computer Systems Engineering
Review: What is an activation record?
Presentation transcript:

Methodology of a Compiler that Compresses Code using Echo Instructions Framework and Design Methodology of a Compiler that Compresses Code using Echo Instructions Philip Brisk Majid Sarrafzadeh Embedded and Reconfigurable Systems Lab Computer Science Department University of California, Los Angeles philip@cs.ucla.edu majid@cs.ucla.edu

Outline Introduction Echo Instructions Compiler Framework Experimental Results Conclusion

Introductory Example: The HP DeskJet 820C Digital Controller Total chip area is 81 mm2 ROM consumes 14% of total die area Reduce Code Size  Reduce ROM size  Reduce Chip Area  Reduce Heat Dissipation and Power Consumption “… the foremost consideration … was the final cost to the buyer.” [McWilliams, 1997]

LZ77 Compression and Echo Instructions LZ77 Compression [Ziv and Lempel, 1977] Replace of Repeated Substrings with Pointers Example: ABCDCABCDBABCAA becomes ABCDC(5, 4)B(7, 3)AA Echo Instructions [Fraser, 2002] offer ISA support for Execution of LZ77-compressed programs

Echo Instructions Echo(Offset, Length) 1. Branch to PC – Offset; Save PC+1 in register R. 2. Execute the next Length Instructions 3. Branch to the address in register R Replaces Repeated Code Segments in a Program Instruction Stream Augments a MIPS Jump-and-Link (JAL) Instruction with a Parameterized Procedure Return Mechanism. Does not Incur the Overhead Associated with Procedure Calls.

An Example 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … $Echo(240, 5) Echo(304, 5) Repeating code sequences are replaced with echo instructions. Echo instructions are more space efficient than procedure calls No parameters No stack frame

Procedural Abstraction Techniques Predate Echo Instructions by 20+ Years Replace Repeated Instruction Sequences with Procedure Calls Substring Matching [Fraser, 1984] Reschedule/Rename [Cooper, 1999] [Lau, 2003] Our Approach: Subgraph Isomorphism

Substring Matching and Reschedule/Rename 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … $10 $5 + $4 $11 $9 * $6 $6 $9 * $10 $10  $11 / $6 $10  $6 + 10 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … Rename $4 : $3 $5 : $2 $6 : $8 $9 : $7 $10 : $1 $11 : $11 Reschedule

Subgraph Isomorphism 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1  $11 / $8 $1  $8 + 1 … $10 $5 + $4 $11 $9 * $6 $6 $9 * $10 $10  $11 / $6 $10  $6 + 10 All 3 Code Sequences have the same Data Flow Graph Representation Subgraph Isomorphism Techniques Identify Repeated Pattern Instances [Kastner, 2001]. Register Allocation and Scheduling must be reformulated to Optimize Pattern Re-Use. + * * / +

Example: 3 Dfgs + * - >> << 1 2 3 4 5 6 7 8 G1 G2 G3

Compression Example: 3 Dfgs + * - >> << 1 2 3 4 5 6 7 8 G1 G2 G3

Compression Example: 3 Dfgs + * - >> << 1 2 3 4 5 6 7 8 G1 G2 G3 6

Compression Example: 3 Dfgs + * - >> << 1 2 3 4 5 6 7 8 G1 G2 G3

Compression Example: 3 Dfgs + * - >> << 1 2 3 4 5 6 7 8 G1 G2 G3

Compression Example: 3 Dfgs + * - >> << 1 2 3 4 5 6 7 8 G1 G2 G3

Compression Example: 3 Dfgs 4 3 4 5 1 2 - 5 2 3 4 5 1 2 E 6 + * + E + + E + 1 6 >> * - << G1 G2 G3 6 7 8 7

Register Allocation by Example + << A B F Z C D G3 E X Y T5 T6 T7 T8 T1 T2 T3 T4 Both patterns reference the same instruction sequence. Schedule of operations and register usage must be identical. Data dependencies are maintained between patterns Shuffle or spill code reduces the effectiveness of compression Temporary Registers (Infinite Supply) Spilling values to memory is inevitable where register pressure is high.

Compiler Framework Challenge Optimization Strategy Design a Compiler that Minimizes Code Size for Architectures Augmented with Echo Instructions. Optimization Strategy Minimize code size. Select the lowest cost memory from a library. Apply performance enhancing transformations as long as: Code Size < Memory Capacity.

Design Overview IR Target Independent Optimization 1 Instruction Selection 2 Memory Library Compression Step 3 Register Allocation 4 Instruction Scheduling 5 Memory Selection 6 Assembly Code emit Performance Optimization 7

Implementation Status Algorithms Integrated into the Machine SUIF Compiler Retargetable: Current Implementation Targets x86 and Alpha Alpha selected as our Target Instruction Selection via do_gen pass (Machine SUIF) Compression Engine implemented successfully. Register Allocation and Scheduling are under construction. Optimization and Memory Selection will be implemented later.

Compilation Procedure Compile a source program to SUIFvm. Perform instruction selection for Alpha using the do_gen pass. Convert the SUIF IR (a linear list of instructions) to CDFG. Compress the CDFG. Compression Ratio = Compressed Code Size Original Code Size

Compression Results 56.23% 61.03% Code Size 64.60% 71.58% 72.35%

Compilation Time 62.77s 11.18s Code Size 5.68s 6.21s 0.47s

Compression Results 50.93% 59.71% Code Size 60.94% 60.29% 59.21%

Compilation Time 402.35s 87.21s Code Size 62.92s 57.05s 49.33s

Conclusion Echo Instructions Hardware support for runtime execution of compressed programs. Compiler Framework Compress IR instead of assembly code Compression ratios ranging from 72.35% to 50.93% for 10 MediaBench applications. Results do not account for register allocation.

References Cooper, K. and McIntosh, N. Enhanced Code Compression for Embedded RISC Processors. PLDI, 1999. Fraser, C. W., Myers, E. W., and Wendt, A. Analyzing and Compressing Assembly Code. SCC, 1984. Fraser, C. W. An Instruction for Direct Interpretation of LZ77-compressed Programs. Microsoft Tech. Report, 2002. Kastner, R. et. al. Instruction Generation for Hybrid-Reconfigurable Systems. ICCAD, 2001.

References Lau, J. et. al. Reducing Code Size with Echo Instructions. CASES, 2003. Lee, C., Potkonjak, M., and Mangione-Smith, W. H. MediaBench: A Tool for Evaluating Multimedia and Communication Systems. MICRO, 1997. Runeson, J. Code Compression through Procedural Abstraction before Register Allocation. Master’s Thesis. University of Uppsala, March, 2000. Ziv, J. and Lempel, A. A Universal Algorithm for Sequential Data Compression. IEEE Trans. Information Theory, May 1977.