Adding custom instructions to Simplescalar/GCC architecture Somasundaram
Agenda Motivation GCC overall architecture Simplescalar architecture Adding a custom instruction Conclusion Motivation GCC overall architecture Simplescalar architecture Adding a custom instruction Conclusion
Motivation Extensible processors What regular ISA instructions can be combined? Which regular ISA instructions are to be combined into a CFU instruction? Retarget the compiler to produce optimised code with CFU instructions Simulate the extended processor with CFU instructions
GNU Compiler Collection Many front-ends C Fortran C++/Java/Ada Backend targeted at many processors x86, Alpha, Sparc ARC, ARM, MIPS . . .
Are we interested in everything? GCC Compiler Flow RTL? Combine small RISC ISA like patterns into bigger CISC ISA like patterns Are we interested in everything?
GCC – Low Level Optimisation Uses Lisp like RTL as IR Example: Tip: use –da compiler option to get the IR output (insn 48 47 50 (set (reg/v:SI 36) (mult:SI (reg:SI 42) (reg:SI 41))) 41 {mulsi3} (nil) (nil)) (call_insn 94 93 97 (parallel[ (set (reg:SI 0 r0) (call (mem:SI (symbol_ref:SI ("printf")) 0) (const_int 0 [0x0]))) (clobber (reg:SI 14 lr)) ] ) -1 (nil) (nil) (expr_list (use (reg:SI 1 r1)) (expr_list (use (reg:SI 0 r0)) (nil))))
GCC - Target Machine Description Use a similar language in md [machine description] file (define_insn "mulsi3" [(set (match_operand:SI 0 "s_register_operand" "=&r,&r") (mult:SI (match_operand:SI 2 "s_register_operand" "r,r") (match_operand:SI 1 "s_register_operand" "%?r,0")))] "" "mul%?\\t%0, %2, %1" [(set_attr "type" "mult")])
GCC Combine Phase Combines some standard IR pattern into a single user-defined IR pattern User-defined IR patterns are defined in the target.md file Operand constraints should be satisfied Example: MAC (Multiply-Accumulate) Merge mulsi3 and addsi3 mulsi3addsi
GCC Combine Phase How is it done? Let us assume that the following patterns are defined in the machine description addsi3 Matches C=A+B (all 32-bit regs) mulsi3 Matches C=A*B (all 32-bit regs) mulsi3addsi Matches D=A*B+C (all 32-bit regs) mulsi4addsi Matches E=A*B+C*D (all 32-bit regs)
GCC Combine Phase Assume this DDG sub-graph
GCC Combine Phase Try 55,45: No matching pattern Try 55,47: We have a match
GCC Combine phase Try 55,52: No matching pattern Try 55,50: Cannot try to combine more than 3 patterns! Hence, stop!
GCC Combine phase: Summary Can combine upto 3 instructions together Can recursively combine more instructions Deletes a smaller instruction once combined Always works on a function
Retargetting GCC for CFU Build a better Combiner phase Write a new combiner with better pattern merger which works on inputs from RTL Replace existing combiner with this combiner New patterns for the CFU instruction in the target.md file Changes in GAS (included in binutils package) to generate insn. word
SimpleScalar is Instruction Set simulator Profiles programs Simulates micro-architectural features Different levels of speed of simulation Vs accuracy trade-off Written in C Easily retargettable
Simplescalar: CFU issues More arguments than used by RISC instructions Out-of-order execution needs to take care of the increase in dependencies New instructions in decode tree Easy to add new instructions to the decode tree (machine.def)
Let us add a new instruction Achieve the operation E=A*B+C*D using one instruction 4 input operands and 1 output operand Extension to ARM ISA Provide Compiler Assembler Simulator
Pattern for the instruction gcc/config/arm/arm.md (define_insn "*mulsi4addsi" [(set (match_operand:SI 0 "s_register_operand" "=r") (plus:SI (mult:SI (match_operand:SI 2 "s_register_operand" "r") (match_operand:SI 1 "s_register_operand" "r")) (mult:SI (match_operand:SI 4 "s_register_operand" "r") (match_operand:SI 3 "s_register_operand" "r"))))] "" "ml2a%?\\t%0, %2, %1, %4, %3" [(set_attr "type" "mult")])
Simplescalar changes Instruction Decode Tree target-arm/arm.def Chain of decoders: Each looking at a set of bits target-arm/arm.def New chain of decoder macros for CFU class of instructions Increase the number of input dependencies in all the instructio macros from 5 to 6 (predication in ARM)
Simplescalar changes sim-outorder.c Increase the number of input dependencies to be monitored in the reservation unit Both macros and code has to be changed Other files need to be changed for the same purpose Compile ‘test program’ and verify!
Summary Identify the ways to add new instructions to Simplescalar and GCC Determine the capabilities of the current combiner in GCC Demonstrate the addition of a new custom instruction Understand GCC to some extent!