Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generating a software loop with memory accesses

Similar presentations


Presentation on theme: "Generating a software loop with memory accesses"— Presentation transcript:

1 Generating a software loop with memory accesses
TigerSHARC assembly syntax

2 Concepts Learning just enough TigerSHARC assembly code to make a software loop “work” Comparing the timings for rectification of integer and floating point arrays, using debug C++ code, Release C++ code Our FIRST_ASM code Looking in “MIXED mode” at the code generated by the compiler 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

3 Passing integer rectify
12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

4 Add the “ASM” tests Want link to fail to find mangled name
Name mangled function name 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

5 More detailed look at the code
As with 68K needs a .section But name and format different As with 68K need .align statement Is the “4” in bytes (8 bits) or words (32 bits) As with 68K need .global to tell other code that this function exists Single semi-colons Double semi-colons Start function label End function label Label format similar to 68K Needs leading underscore and final colon 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

6 Using J8 for returned int * value
Now passing this test “by accident Should be conditionally passing back NULL 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

7 Parameter passing Spaces for first four parameters present on the stack (as with 68K) But the first four parameters are passed in registers (J4, J5, J6 and J7 most of the time) (as with MIPS) The parameters passed in registers are often stored into the spaces on the stack (like the MIPS) when assembly code functions call assembly code functions J4, J5, J6 and J7 are volatile registers 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

8 Coding convention // int *HalfWaveRectifyRelease(int initial_array[ ],
// int final_array[ ], int N) #define initial_pt_inpar1 J4 #define final_pt_inpar2 J5 #define M_J6_inpar J6 #define return_pt_J J8 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

9 Note END_IF not defined and not yet recognized as an error
ELSE is a KEYWORD Missing ;; means all these instructions are joined into “1-line” of more than 4 instructions Note END_IF not defined and not yet recognized as an error 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

10 Jumps can be predicted to happen (default) Quad stuff issue
Personally, because of name mangling issues, I cut-and-paste function name into labels Two issues Jumps can be predicted to happen (default) Quad stuff issue 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

11 The code was not exactly what we designed (C++ equivalent) – refactor and retest after the refactoring NEXT STEP 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

12 For – loop structure – Use 68K style of looping jumps
12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

13 For – loop structure – Use 68K style of looping – tests and jumps
12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

14 Accessing memory Basic mode
Special register J31 – acts as zero when used in additions Pt_J5 is a pointer register into an array Read_J1 is being used as a data register J registers like MIPS registers (used as pointer and data). NOT like 68K registers – either data or address but not both Read_J1 = [Pt_J5];; read value from memory location pointed to by J Compare to 68K MOVE.L (A5), D1 Read_J1 = [Pt_J5 + 8];; read value from memory location pointed to by the value (J5 + 8) -- Compare to 68K MOVE.L 8(A5), D PREMODIFY – address used J5 + 8, no change in J5 Read_J1 = [Pt_J5 + J31];; read value from memory location pointed to by J5 – but read somewhere that this CAN be faster than just Read_J1 = [Pt_J5];; -- NEED TO CONFIRM 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

15 Accessing memory – step 2
Basic mode Pt_J5 is a pointer register into an array Offset_J4 is used as an offset Read_J1 is being used as a data register Read_J1 = [Pt_J5 + Offset_J4];; read value from memory location pointed to by (J5 + J4) PRE-MODIFY – address used J5 + J4, no change in J5 Compare to 68K MOVE.L (A5, D4), D1 Read_J1 = [Pt_J5 += Offset_J4];; read value from memory location pointed to by J5, and then perform add POST-MODIFY – address used J5, then perform J5 = J5 + J4 Compare to 68K MOVE.L (A5), D ADD.L A4, A but as single instruction 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

16 Many other addressing modes
Normal memory accesses Merged memory accesses Broadcast memory accesses Single register accesses Dual register accesses Quad register accesses Cross-over accesses Access of COMPLEX numbers 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

17 For – loop structure – Use 68K style of looping
QUAD ERROR ISSUE AGAIN 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

18 Write the “float-asm” Integer 0 has bit pattern 0x0000 0000
Float has bit pattern 0x Integer has format b S??? ???? ???? ???? ? ??? ???? ???? ???? Float has format b S??? ???? ???? ???? ? ??? ???? ???? ???? Float algorithm - if S == 1 (negative) set to zero Otherwise leave unchanged – same as integer algorithm Just re-use integer algorithm with a change of name EXPONENT 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

19 Float ASM test 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

20 Do the timing tests 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

21 Weird results DEBUG RELEASE FIRST_ASM INTEGER 426 416 124 118 316 320
FLOAT 462 458 224 222 Variation of about 6 cycles in testing Our first ASM is faster than debug and slower than release – that was expected Our integer code was slower than our float code – that was unexpected since the same code Can we optimize an improve the timing? 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

22 Integer release code – identify new instructions
12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

23 Float release – identify new instructions
12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

24 Exercise 1 – needed for Lab. 1
FIR filter operation -- data and filter-coefficients are both integer arrays – in C++ 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

25 Exercise 1 – needed for Lab. 1
FIR filter operation -- data and filter-coefficients are both integer arrays – in ASM 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

26 Insert C++ code 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

27 Insert assembler code version
12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

28 Concepts Learning just enough TigerSHARC assembly code to make a software loop “work” Comparing the timings for rectification of integer and floating point arrays, using debug C++ code, Release C++ code Our FIRST_ASM code Looking in “MIXED mode” at the code generated by the compiler 12/5/2018 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada


Download ppt "Generating a software loop with memory accesses"

Similar presentations


Ads by Google