Adding custom instructions to Simplescalar/GCC architecture

Slides:



Advertisements
Similar presentations
Simplifications of Context-Free Grammars
Advertisements

Chapter 11 Introduction to Programming in C
RAM (cont.) 220 bytes of RAM (1 Mega-byte) 20 bits of address Address
CSC 3210 Computer Organization and Programming
ITEC 352 Lecture 13 ISA(4).
Xtensa C and C++ Compiler Ding-Kai Chen
ARM Cortex A8 Pipeline EE126 Wei Wang. Cortex A8 is a processor core designed by ARM Holdings. Application: Apple A4, Samsung Exynos What’s the.
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Instructor: Tor Aamodt
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
RISC / CISC Architecture By: Ramtin Raji Kermani Ramtin Raji Kermani Rayan Arasteh Rayan Arasteh An Introduction to Professor: Mr. Khayami Mr. Khayami.
1 ECE462/562 ISA and Datapath Review Ali Akoglu. 2 Instruction Set Architecture A very important abstraction –interface between hardware and low-level.
Computer Architecture Lecture 7 Compiler Considerations and Optimizations.
Instruction Set Architecture Classification According to the type of internal storage in a processor the basic types are Stack Accumulator General Purpose.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Inline Assembly Section 1: Recitation 7. In the early days of computing, most programs were written in assembly code. –Unmanageable because No type checking,
CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath
CS 536 Spring Code generation I Lecture 20.
1 UQC122S3 Real-Time and Embedded Systems GCC as a cross compiler.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
1 Lecture-2 CS-120 Fall 2000 Revision of Lecture-1 Introducing Computer Architecture The FOUR Main Elements Fetch-Execute Cycle A Look Under the Hood.
Henry Hexmoor1 Chapter 10- Control units We introduced the basic structure of a control unit, and translated assembly instructions into a binary representation.
4/6/08Prof. Hilfinger CS164 Lecture 291 Code Generation Lecture 29 (based on slides by R. Bodik)
Class 9.1 Computer Architecture - HUJI Computer Architecture Class 9 Microprogramming.
Data Transfer & Decisions I (1) Fall 2005 Lecture 3: MIPS Assembly language Decisions I.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Processor Organization and Architecture
1 Chapter-01 Introduction to Computers and C++ Programming.
ITEC 352 Lecture 11 ISA - CPU. ISA (2) Review Questions? HW 2 due on Friday ISA –Machine language –Buses –Memory.
Natawut NupairojAssembly Language1 Introduction to Assembly Programming.
The CPU The Central Presentation Unit Main Memory and Addresses Address bus and Address Space Data Bus Control Bus The Instructions set Mnemonics Opcodes.
Levels of Architecture & Language CHAPTER 1 © copyright Bobby Hoggard / material may not be redistributed without permission.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
Instruction Selection II CS 671 February 26, 2008.
1 4.2 MARIE This is the MARIE architecture shown graphically.
ITEC 352 Lecture 12 ISA(3). Review Buses Memory ALU Registers Process of compiling.
Computer Architecture Instruction Set Architecture Lynn Choi Korea University.
CSC 3210 Computer Organization and Programming Chapter 1 THE COMPUTER D.M. Rasanjalee Himali.
Macro instruction synthesis for embedded processors Pinhong Chen Yunjian Jiang (william) - CS252 project presentation.
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
Crosscutting Issues: The Rôle of Compilers Architects must be aware of current compiler technology Compiler Architecture.
Introduction to Compiling
Architecture Selection of a Flexible DSP Core Using Re- configurable System Software July 18, 1998 Jong-Yeol Lee Department of Electrical Engineering,
1 The Instruction Set Architecture September 27 th, 2007 By: Corbin Johnson CS 146.
The Instruction Set Architecture. Hardware – Software boundary Java Program C Program Ada Program Compiler Instruction Set Architecture Microcode Hardware.
Computer Systems – Machine & Assembly code. Objectives Machine Code Assembly Language Op-code Operand Instruction Set.
CBP 2002ITY 270 Computer Architecture1 Module Structure Whirlwind Review – Fetch-Execute Simulation Instruction Set Architectures RISC vs x86 How to build.
ECE 587 Hardware/Software Co- Design Lecture 23 LLVM and xPilot Professor Jia Wang Department of Electrical and Computer Engineering Illinois Institute.
CPIT Program Execution. Today, general-purpose computers use a set of instructions called a program to process data. A computer executes the.
Assembler, Compiler, MIPS simulator
Computer Architecture Instruction Set Architecture
Programming Language Hierarchy, Phases of a Java Program
The compilation process
课程名 编译原理 Compiling Techniques
Prof. Sirer CS 316 Cornell University
Computer Structure S.Abinash 11/29/ _02.
Lecture 30 (based on slides by R. Bodik)
MARIE: An Introduction to a Simple Computer
Prof. Sirer CS 316 Cornell University
The ARM Instruction Set
Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Computer Architecture
COINS‥ a COmpiler INfraStructure
CPU Structure CPU must:
Lecture 4: Instruction Set Design/Pipelining
A Level Computer Science Topic 5: Computer Architecture and Assembly
Control Unit (single cycle implementation)
Presentation transcript:

Adding custom instructions to Simplescalar/GCC architecture Somasundaram

Agenda Motivation GCC overall architecture Simplescalar architecture Adding a custom instruction Conclusion Motivation GCC overall architecture Simplescalar architecture Adding a custom instruction Conclusion

Motivation Extensible processors What regular ISA instructions can be combined? Which regular ISA instructions are to be combined into a CFU instruction? Retarget the compiler to produce optimised code with CFU instructions Simulate the extended processor with CFU instructions

GNU Compiler Collection Many front-ends C Fortran C++/Java/Ada Backend targeted at many processors x86, Alpha, Sparc ARC, ARM, MIPS . . .

Are we interested in everything? GCC Compiler Flow RTL? Combine small RISC ISA like patterns into bigger CISC ISA like patterns Are we interested in everything?

GCC – Low Level Optimisation Uses Lisp like RTL as IR Example: Tip: use –da compiler option to get the IR output (insn 48 47 50 (set (reg/v:SI 36) (mult:SI (reg:SI 42) (reg:SI 41))) 41 {mulsi3} (nil) (nil)) (call_insn 94 93 97 (parallel[ (set (reg:SI 0 r0) (call (mem:SI (symbol_ref:SI ("printf")) 0) (const_int 0 [0x0]))) (clobber (reg:SI 14 lr)) ] ) -1 (nil) (nil) (expr_list (use (reg:SI 1 r1)) (expr_list (use (reg:SI 0 r0)) (nil))))

GCC - Target Machine Description Use a similar language in md [machine description] file (define_insn "mulsi3" [(set (match_operand:SI 0 "s_register_operand" "=&r,&r") (mult:SI (match_operand:SI 2 "s_register_operand" "r,r") (match_operand:SI 1 "s_register_operand" "%?r,0")))] "" "mul%?\\t%0, %2, %1" [(set_attr "type" "mult")])

GCC Combine Phase Combines some standard IR pattern into a single user-defined IR pattern User-defined IR patterns are defined in the target.md file Operand constraints should be satisfied Example: MAC (Multiply-Accumulate) Merge mulsi3 and addsi3  mulsi3addsi

GCC Combine Phase How is it done? Let us assume that the following patterns are defined in the machine description addsi3  Matches C=A+B (all 32-bit regs) mulsi3  Matches C=A*B (all 32-bit regs) mulsi3addsi  Matches D=A*B+C (all 32-bit regs) mulsi4addsi  Matches E=A*B+C*D (all 32-bit regs)

GCC Combine Phase Assume this DDG sub-graph

GCC Combine Phase Try 55,45: No matching pattern Try 55,47: We have a match

GCC Combine phase Try 55,52: No matching pattern Try 55,50: Cannot try to combine more than 3 patterns! Hence, stop!

GCC Combine phase: Summary Can combine upto 3 instructions together Can recursively combine more instructions Deletes a smaller instruction once combined Always works on a function

Retargetting GCC for CFU Build a better Combiner phase Write a new combiner with better pattern merger which works on inputs from RTL Replace existing combiner with this combiner New patterns for the CFU instruction in the target.md file Changes in GAS (included in binutils package) to generate insn. word

SimpleScalar is Instruction Set simulator Profiles programs Simulates micro-architectural features Different levels of speed of simulation Vs accuracy trade-off Written in C Easily retargettable

Simplescalar: CFU issues More arguments than used by RISC instructions Out-of-order execution needs to take care of the increase in dependencies New instructions in decode tree Easy to add new instructions to the decode tree (machine.def)

Let us add a new instruction Achieve the operation E=A*B+C*D using one instruction 4 input operands and 1 output operand Extension to ARM ISA Provide Compiler Assembler Simulator

Pattern for the instruction gcc/config/arm/arm.md (define_insn "*mulsi4addsi" [(set (match_operand:SI 0 "s_register_operand" "=r") (plus:SI (mult:SI (match_operand:SI 2 "s_register_operand" "r") (match_operand:SI 1 "s_register_operand" "r")) (mult:SI (match_operand:SI 4 "s_register_operand" "r") (match_operand:SI 3 "s_register_operand" "r"))))] "" "ml2a%?\\t%0, %2, %1, %4, %3" [(set_attr "type" "mult")])

Simplescalar changes Instruction Decode Tree target-arm/arm.def Chain of decoders: Each looking at a set of bits target-arm/arm.def New chain of decoder macros for CFU class of instructions Increase the number of input dependencies in all the instructio macros from 5 to 6 (predication in ARM)

Simplescalar changes sim-outorder.c Increase the number of input dependencies to be monitored in the reservation unit Both macros and code has to be changed Other files need to be changed for the same purpose Compile ‘test program’ and verify!

Summary Identify the ways to add new instructions to Simplescalar and GCC Determine the capabilities of the current combiner in GCC Demonstrate the addition of a new custom instruction Understand GCC to some extent!