Copyright © 2002 UCI ACES Laboratory A Design Space Exploration framework for rISA Design Ashok Halambi, Aviral Shrivastava,

Slides:



Advertisements
Similar presentations
Instruction Set Design
Advertisements

Chapter 3 Instruction Set Architecture Advanced Computer Architecture COE 501.
1 Lecture 3: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation.
INSTRUCTION SET ARCHITECTURES
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Systems Architecture Lecture 5: MIPS Instruction Set
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Chapter 11 Instruction Sets
1 Registers and MAL - Part I. Motivation So far there are some details that we have ignored instructions can have different formats most computers have.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
August Code Compaction for UniCore on Link-Time Optimization Platform Zhang Jiyu Compilation Toolchain Group MPRC.
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
© ACES Labs, CECS, ICS, UCI. Energy Efficient Code Generation Using rISA * Aviral Shrivastava, Nikil Dutt
Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.
Automatic Generation of Operation Tables for Fast Exploration of Bypasses in Embedded Systems Aviral Shrivastava 1 Nikil Dutt 1 Alex Nicolau 1 Sanghyun.
PBExplore: A Framework for CIL Exploration of Partial Bypasses in Embedded Processors Aviral Shrivastava 1 Nikil Dutt 1 Alex Nicolau 1 Eugene Earlie 2.
An Efficient Compiler Technique for Code Size Reduction using Reduced Bit-width ISAs S. Ashok Halambi, Aviral Shrivastava, Partha Biswas, Nikil Dutt, Alex.
Instruction Set Architecture (ISA) for Low Power Hillary Grimes III Department of Electrical and Computer Engineering Auburn University.
Compilation Techniques for Energy Reduction in Horizontally Partitioned Cache Architectures Aviral Shrivastava, Ilya Issenin, Nikil Dutt Center For Embedded.
Enhancing Embedded Processors with Specific Instruction Set Extensions for Network Applications A. Chormoviti, N. Vassiliadis, G. Theodoridis, S. Nikolaidis.
Compiler-in-the-Loop ADL-driven Early Architectural Exploration Aviral Shrivastava 1 Nikil Dutt 1 Alex Nicolau 1 Eugene Earlie 2 1 Center For Embedded.
DAC 2001: Paper 18.2 Center for Embedded Computer Systems, UC Irvine Center for Embedded Computer Systems University of California, Irvine
‏ Adaptive Reduced Bit-width Instruction Set Architecture (adapt-rISA) Sandro Neves Soares – UCS Ashok Halambi – UCI Aviral Shrivastava – ASU Flávio Rech.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Educational Computer Architecture Experimentation Tool Dr. Abdelhafid Bouhraoua.
1 Code Compression Motivations Data compression techniques Code compression options and methods Comparison.
A Compiler-in-the-Loop (CIL) Framework to Explore Horizontally Partitioned Cache (HPC) Architectures Aviral Shrivastava*, Ilya Issenin, Nikil Dutt *Compiler.
Instruction Set Design by Kip R. Irvine (c) Kip Irvine, All rights reserved. You may modify and copy this slide show for your personal use,
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
1 Copyright © 2011, Elsevier Inc. All rights Reserved. Appendix A Authors: John Hennessy & David Patterson.
Automated Design of Custom Architecture Tulika Mitra
Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
Macro instruction synthesis for embedded processors Pinhong Chen Yunjian Jiang (william) - CS252 project presentation.
A Decompression Architecture for Low Power Embedded Systems Lekatsas, H.; Henkel, J.; Wolf, W.; Computer Design, Proceedings International.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.
Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.
Instruction Set Architecture The portion of the machine visible to the programmer Issues: Internal storage model Addressing modes Operations Operands Encoding.
COMPUTER ORGANIZATION AND ASSEMBLY LANGUAGE Lecture 19 & 20 Instruction Formats PDP-8,PDP-10,PDP-11 & VAX Course Instructor: Engr. Aisha Danish.
Using Dynamic Binary Translation to Fuse Dependent Instructions Shiliang Hu & James E. Smith.
Architecture Selection of a Flexible DSP Core Using Re- configurable System Software July 18, 1998 Jong-Yeol Lee Department of Electrical Engineering,
Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑 教授 組員 : R 張馨怡 R 林秀萍.
Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File Stephen Hines, Gary Tyson, and David Whalley Computer Science Dept. Florida.
Varun Mathur Mingwei Liu Sanghyun Park, Aviral Shrivastava and Yunheung Paek.
Operation Tables for Scheduling in the presence of Partial Bypassing Aviral Shrivastava 1 Eugene Earlie 2 Nikil Dutt 1 Alex Nicolau 1 1 Center For Embedded.
Computer Architecture
Instruction Set Architectures Continued. Expanding Opcodes & Instructions.
A Closer Look at Instruction Set Architectures
CS161 – Design and Architecture of Computer Systems
Alvaro Mauricio Peña Dariusz Niworowski Frank Rodriguez
A Closer Look at Instruction Set Architectures
Improving Program Efficiency by Packing Instructions Into Registers
Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions Ramkumar Jayaseelan, Haibin Liu, Tulika Mitra School of Computing, National.
CS170 Computer Organization and Architecture I
Stephen Hines, David Whalley and Gary Tyson Computer Science Dept.
Systems Architecture Lecture 5: MIPS Instruction Set
Ann Gordon-Ross and Frank Vahid*
The University of Adelaide, School of Computer Science
Chapter 8 Central Processing Unit
Instruction Set Architectures Continued
Reiley Jeyapaul and Aviral Shrivastava Compiler-Microarchitecture Lab
* From AMD 1996 Publication #18522 Revision E
COMS 361 Computer Organization
Code Transformation for TLB Power Reduction
Lecture 4: Instruction Set Design/Pipelining
Systems Architecture I (CS ) Lecture 5: MIPS Instruction Set*
Computer Architecture and System Programming Laboratory
Presentation transcript:

Copyright © 2002 UCI ACES Laboratory A Design Space Exploration framework for rISA Design Ashok Halambi, Aviral Shrivastava, Partha Biswas, Nikil Dutt, Alex Nicolau. Centre for Embedded Computer Systems, University of California, Irvine, USA.

Copyright © 2002 UCI ACES Laboratory Outline Motivation rISA Model rISA Design Space Exploration Experiments Results and Conclusions

Copyright © 2002 UCI ACES Laboratory Code Size Reduction Reducing code size results in  Less memory area  Lower Cost  Less cache misses  Higher Performance  Less accesses to memory  Lower power/energy consumption Code Size Reduction is Important

Copyright © 2002 UCI ACES Laboratory rISA: r educed bit-width I nstruction S et A rchitecture rISA has “dual Instruction Set” Capability.  Normal 32-bit Instruction Set (normal IS).  Compressed 16-bit instruction set (reduced bit-width IS).  Instructions from both the ISs reside in memory.  The rISA instructions are dynamically expanded to normal 32-bit instructions before/during the decode.  Execution of only normal Instructions.

Copyright © 2002 UCI ACES Laboratory Typical rISA Implementation Most frequently occurring instructions are compressed to make reduced bit-width Instruction Set Each rISA instruction maps to a unique normal instruction  Simple and fast lookup table based “translator” logic  Can be implemented without increasing cycle length or cycle penalty Achieve good code size reduction, without much architectural modification  Best Case : 50 % code size reduction

Copyright © 2002 UCI ACES Laboratory Sample architectures supporting rISA ARM7TDMI  32-bit normal IS, and 16-bit rIS  Switching between normal and rISA instructions is done by BX (Branch Exchange) instruction (basic blocks) MIPS  32-bit normal IS, and 16-bit rIS  Switching between normal and rISA instructions is done implicitly by code alignment (function-level) ARM-Thumb and MIPS16 report 30% code size reduction on small functions. ST100 and Tangent ARC core also support rISA

Copyright © 2002 UCI ACES Laboratory Bit-width Restrictions Only a few instructions in rIS Operands of rISA instructions can access only a part of register file This paper: explore rISA designs for code size reduction 7-bit3-bit Fewer opcodesAccessibility to only 8 registers 20-bit4-bit 32-bit normal instruction: 16-bit rISA instruction: Accessibility to 16 registers

Copyright © 2002 UCI ACES Laboratory rISA Model A rISA instruction maps to a unique normal instruction. Mode change at instruction level granularity  mx, and rISA_mx Other special rISA instructions  rISA_nop To align instructions to the word boundary.  rISA_move To access all registers even in rISA mode.  rISA_extend Increase the length of immediate field in rISA instructions.

Copyright © 2002 UCI ACES Laboratory rISA Design Space No. of bits to specify opcode  No of rISA instructions No. of operands No. of bits to specify rISA operand  Register accessibility of rISA instruction w-bitx-bity-bitz-bit rISA_wxyz opcodedestop1op2 x + y + z + w = 16

Copyright © 2002 UCI ACES Laboratory Interesting rISA Designs Implied Operand Format for rISA instruction  add R 1 R 2 R 2  rISA_add_1 R 1 R 2  add R 1 R 1 4  rISA_add_2 R 1 Customized immediate field size Operands can access different sets of registers. w-bitx-bity-bitz-bit rISA_wxyz opcodedestop1op2

Copyright © 2002 UCI ACES Laboratory rISA Design Space Exploration (DSE) rISA Design Space is Large  Exploration Framework Mechanism to specify rISA architectural model.  ADL-driven Compiler-in-the-loop DSE

Copyright © 2002 UCI ACES Laboratory rISA Design Space Exploration Framework Application CompilerSimulatorAnalysis Architecture Model rISA Model EXPRESSION description + rISA description Parameters No. of opcodes No. of operands Bits per operand Implicit operand Custom Immediate value

Copyright © 2002 UCI ACES Laboratory ADL based DSE Specify the rISA design in an EXPRESSION ADL  rISA to normal instructions mapping  rISA register restrictions on operands  Immediate field size  Special instructions mx, rISA_mx, rISA_nop, rISA_extend, rISA_move etc… Create a rISA model Evaluate the rISA model  code size  performance

Copyright © 2002 UCI ACES Laboratory Compiler-in-the-loop DSE Generate the compiler from the rISA Model.  Instruction Selection Profitability Analysis  Register Allocation Honor register restrictions  Scheduling Reduce register life times

Copyright © 2002 UCI ACES Laboratory Compilation for rISA 1. Mark Instructions that can be converted to rISA instructions.  Contiguous marked instructions form a “rISA Block”. 2. Decide whether it is profitable to convert a rISA Block. 3. Replace marked instructions with rISA instructions. 4. Perform register allocation. Source File C/C++ Assembly Mark rISA Blocks GCC Front End Instruction Selection Profitability Analysis Register Allocation Generic Instruction Set 3-address code Generic Instruction Set (with rISA Blocks) Target Instruction Set (Normal + rISA) - An Efficient Compiler Technique … Halambi et. al, DATE 2002 Insert nops Insert mode change Instrs.

Copyright © 2002 UCI ACES Laboratory Profitability Heuristic Decides whether or not to convert a rISA Block to rISA Instructions.  Ideal decrease in code size rISA_block_size(normalMode) – rISA_block_size(rISAMode)  Increase in code size CS1 : due to mode change instructions. CS2 : due to NOPs. CS3 : due to extra rISA load/store/move instructions.

Copyright © 2002 UCI ACES Laboratory Register Pressure Heuristic Estimate the extra spill/load/move instructions. CS3 = Spill/Reload code needed if block is converted to rISA Instructions – Spill/Reload code needed if block is converted to normal instructions Spill code for a block is a function of  average register pressure  number of instructions  average live length

Copyright © 2002 UCI ACES Laboratory Experimental Setup Platform : MIPS 32/16 architecture Benchmarks : Livermore loops Compare 5 rISA Designs for code size reduction Our Compiler : Retargetable EXPRESS compiler for MIPS 32/16, with register pressure based code rISA generation.

Copyright © 2002 UCI ACES Laboratory rISA Designs 5 rISA Designs  rISA_7333 Opcode 7 bits, each operand 3 bits.  rISA_7333_imm Opcode 7 bits, each operand 3 bits, immediate field is extended by using unused bits from opcode field.  rISA_imp_opnd Similar to rISA_7333_imm, but allows implicit operands.  rISA_4444 Opcode 4 bits, each operand 4 bits.  rISA_hybrid Variable bits for opcode and operand, allows immediate extensions, and implicit operands. w-bitx-bity-bitz-bit rISA_wxyz opcodedestop1op2

Copyright © 2002 UCI ACES Laboratory Results: Code Size Variation for rISA Less Register Accessibility Custom immediate field size Implicit Operand Greater register accessibility More opcodes and greater register accessibility

Copyright © 2002 UCI ACES Laboratory Conclusions rISA is an effective technique for code size reduction. rISA design space is huge and thus the need of a Design Space Exploration tool. We presented a Design Space Exploration framework for rISA Designs Significant variation of Code Size Reduction using different rISA designs. Automated design space exploration ISA exploration Future Work