Download presentation
Published byJulissa Drews Modified over 9 years ago
1
Adding custom instructions to Simplescalar/GCC architecture
Somasundaram
2
Agenda Motivation GCC overall architecture Simplescalar architecture
Adding a custom instruction Conclusion Motivation GCC overall architecture Simplescalar architecture Adding a custom instruction Conclusion
3
Motivation Extensible processors
What regular ISA instructions can be combined? Which regular ISA instructions are to be combined into a CFU instruction? Retarget the compiler to produce optimised code with CFU instructions Simulate the extended processor with CFU instructions
4
GNU Compiler Collection
Many front-ends C Fortran C++/Java/Ada Backend targeted at many processors x86, Alpha, Sparc ARC, ARM, MIPS . . .
5
Are we interested in everything?
GCC Compiler Flow RTL? Combine small RISC ISA like patterns into bigger CISC ISA like patterns Are we interested in everything?
6
GCC – Low Level Optimisation
Uses Lisp like RTL as IR Example: Tip: use –da compiler option to get the IR output (insn (set (reg/v:SI 36) (mult:SI (reg:SI 42) (reg:SI 41))) 41 {mulsi3} (nil) (nil)) (call_insn (parallel[ (set (reg:SI 0 r0) (call (mem:SI (symbol_ref:SI ("printf")) 0) (const_int 0 [0x0]))) (clobber (reg:SI 14 lr)) ] ) -1 (nil) (nil) (expr_list (use (reg:SI 1 r1)) (expr_list (use (reg:SI 0 r0)) (nil))))
7
GCC - Target Machine Description
Use a similar language in md [machine description] file (define_insn "mulsi3" [(set (match_operand:SI 0 "s_register_operand" "=&r,&r") (mult:SI (match_operand:SI 2 "s_register_operand" "r,r") (match_operand:SI 1 "s_register_operand" "%?r,0")))] "" "mul%?\\t%0, %2, %1" [(set_attr "type" "mult")])
8
GCC Combine Phase Combines some standard IR pattern into a single user-defined IR pattern User-defined IR patterns are defined in the target.md file Operand constraints should be satisfied Example: MAC (Multiply-Accumulate) Merge mulsi3 and addsi3 mulsi3addsi
9
GCC Combine Phase How is it done?
Let us assume that the following patterns are defined in the machine description addsi3 Matches C=A+B (all 32-bit regs) mulsi3 Matches C=A*B (all 32-bit regs) mulsi3addsi Matches D=A*B+C (all 32-bit regs) mulsi4addsi Matches E=A*B+C*D (all 32-bit regs)
10
GCC Combine Phase Assume this DDG sub-graph
11
GCC Combine Phase Try 55,45: No matching pattern Try 55,47:
We have a match
12
GCC Combine phase Try 55,52: No matching pattern Try 55,50:
Cannot try to combine more than 3 patterns! Hence, stop!
13
GCC Combine phase: Summary
Can combine upto 3 instructions together Can recursively combine more instructions Deletes a smaller instruction once combined Always works on a function
14
Retargetting GCC for CFU
Build a better Combiner phase Write a new combiner with better pattern merger which works on inputs from RTL Replace existing combiner with this combiner New patterns for the CFU instruction in the target.md file Changes in GAS (included in binutils package) to generate insn. word
15
SimpleScalar is Instruction Set simulator Profiles programs
Simulates micro-architectural features Different levels of speed of simulation Vs accuracy trade-off Written in C Easily retargettable
16
Simplescalar: CFU issues
More arguments than used by RISC instructions Out-of-order execution needs to take care of the increase in dependencies New instructions in decode tree Easy to add new instructions to the decode tree (machine.def)
17
Let us add a new instruction
Achieve the operation E=A*B+C*D using one instruction 4 input operands and 1 output operand Extension to ARM ISA Provide Compiler Assembler Simulator
18
Pattern for the instruction
gcc/config/arm/arm.md (define_insn "*mulsi4addsi" [(set (match_operand:SI 0 "s_register_operand" "=r") (plus:SI (mult:SI (match_operand:SI 2 "s_register_operand" "r") (match_operand:SI 1 "s_register_operand" "r")) (mult:SI (match_operand:SI 4 "s_register_operand" "r") (match_operand:SI 3 "s_register_operand" "r"))))] "" "ml2a%?\\t%0, %2, %1, %4, %3" [(set_attr "type" "mult")])
19
Simplescalar changes Instruction Decode Tree target-arm/arm.def
Chain of decoders: Each looking at a set of bits target-arm/arm.def New chain of decoder macros for CFU class of instructions Increase the number of input dependencies in all the instructio macros from 5 to 6 (predication in ARM)
20
Simplescalar changes sim-outorder.c
Increase the number of input dependencies to be monitored in the reservation unit Both macros and code has to be changed Other files need to be changed for the same purpose Compile ‘test program’ and verify!
21
Summary Identify the ways to add new instructions to Simplescalar and GCC Determine the capabilities of the current combiner in GCC Demonstrate the addition of a new custom instruction Understand GCC to some extent!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.