Generating a software loop with memory accesses TigerSHARC assembly syntax
Concepts Learning just enough TigerSHARC assembly code to make a software loop “work” Comparing the timings for rectification of integer and floating point arrays, using debug C++ code, Release C++ code Our FIRST_ASM code Looking in “MIXED mode” at the code generated by the compiler 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Passing integer rectify 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Add the “ASM” tests Want link to fail to find mangled name Name mangled function name 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
More detailed look at the code As with 68K needs a .section But name and format different As with 68K need .align statement Is the “4” in bytes (8 bits) or words (32 bits) As with 68K need .global to tell other code that this function exists Single semi-colons Double semi-colons Start function label End function label Label format similar to 68K Needs leading underscore and final colon 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Using J8 for returned int * value Now passing this test “by accident Should be conditionally passing back NULL 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Parameter passing Spaces for first four parameters present on the stack (as with 68K) But the first four parameters are passed in registers (J4, J5, J6 and J7 most of the time) (as with MIPS) The parameters passed in registers are often stored into the spaces on the stack (like the MIPS) when assembly code functions call assembly code functions J4, J5, J6 and J7 are volatile registers 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Coding convention // int *HalfWaveRectifyRelease(int initial_array[ ], // int final_array[ ], int N) #define initial_pt_inpar1 J4 #define final_pt_inpar2 J5 #define M_J6_inpar3 J6 #define return_pt_J8 J8 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Note END_IF not defined and not yet recognized as an error ELSE is a KEYWORD Missing ;; means all these instructions are joined into “1-line” of more than 4 instructions Note END_IF not defined and not yet recognized as an error 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Jumps can be predicted to happen (default) Quad stuff issue Personally, because of name mangling issues, I cut-and-paste function name into labels Two issues Jumps can be predicted to happen (default) Quad stuff issue 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
The code was not exactly what we designed (C++ equivalent) – refactor and retest after the refactoring NEXT STEP 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
For – loop structure – Use 68K style of looping jumps 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
For – loop structure – Use 68K style of looping – tests and jumps 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Accessing memory Basic mode Special register J31 – acts as zero when used in additions Pt_J5 is a pointer register into an array Read_J1 is being used as a data register J registers like MIPS registers (used as pointer and data). NOT like 68K registers – either data or address but not both Read_J1 = [Pt_J5];; read value from memory location pointed to by J5 -- Compare to 68K MOVE.L (A5), D1 Read_J1 = [Pt_J5 + 8];; read value from memory location pointed to by the value (J5 + 8) -- Compare to 68K MOVE.L 8(A5), D1 PREMODIFY – address used J5 + 8, no change in J5 Read_J1 = [Pt_J5 + J31];; read value from memory location pointed to by J5 – but read somewhere that this CAN be faster than just Read_J1 = [Pt_J5];; -- NEED TO CONFIRM 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Accessing memory – step 2 Basic mode Pt_J5 is a pointer register into an array Offset_J4 is used as an offset Read_J1 is being used as a data register Read_J1 = [Pt_J5 + Offset_J4];; read value from memory location pointed to by (J5 + J4) PRE-MODIFY – address used J5 + J4, no change in J5 Compare to 68K MOVE.L (A5, D4), D1 Read_J1 = [Pt_J5 += Offset_J4];; read value from memory location pointed to by J5, and then perform add POST-MODIFY – address used J5, then perform J5 = J5 + J4 Compare to 68K MOVE.L (A5), D1 ADD.L A4, A5 but as single instruction 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Many other addressing modes Normal memory accesses Merged memory accesses Broadcast memory accesses Single register accesses Dual register accesses Quad register accesses Cross-over accesses Access of COMPLEX numbers 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
For – loop structure – Use 68K style of looping QUAD ERROR ISSUE AGAIN 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Write the “float-asm” Integer 0 has bit pattern 0x0000 0000 Float 0.0 has bit pattern 0x0000 0000 Integer has format b S??? ???? ???? ???? ? ??? ???? ???? ???? Float has format b S??? ???? ???? ???? ? ??? ???? ???? ???? Float algorithm - if S == 1 (negative) set to zero Otherwise leave unchanged – same as integer algorithm Just re-use integer algorithm with a change of name EXPONENT 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Float ASM test 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Do the timing tests 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Weird results DEBUG RELEASE FIRST_ASM INTEGER 426 416 124 118 316 320 FLOAT 462 458 210 216 224 222 Variation of about 6 cycles in testing Our first ASM is faster than debug and slower than release – that was expected Our integer code was slower than our float code – that was unexpected since the same code Can we optimize an improve the timing? 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Integer release code – identify new instructions 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Float release – identify new instructions 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Exercise 1 – needed for Lab. 1 FIR filter operation -- data and filter-coefficients are both integer arrays – in C++ 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Exercise 1 – needed for Lab. 1 FIR filter operation -- data and filter-coefficients are both integer arrays – in ASM 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Insert C++ code 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Insert assembler code version 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Concepts Learning just enough TigerSHARC assembly code to make a software loop “work” Comparing the timings for rectification of integer and floating point arrays, using debug C++ code, Release C++ code Our FIRST_ASM code Looking in “MIXED mode” at the code generated by the compiler 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada