Presentation is loading. Please wait.

Presentation is loading. Please wait.

A first attempt at learning about optimizing the TigerSHARC code

Similar presentations


Presentation on theme: "A first attempt at learning about optimizing the TigerSHARC code"— Presentation transcript:

1 A first attempt at learning about optimizing the TigerSHARC code
TigerSHARC assembly syntax

2 Concepts Learning some optimizing techniques
What part of the code will likely give us a lot of optimization What part of the code will likely give us a lot of little optimization Most TigerSHARC instructions are conditional Why? Hardware loop – max of 2 Code optimization after 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

3 Passing integer rectify
8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

4 The code was not exactly what we designed (C++ equivalent) – refactor and retest after the refactoring NEXT STEP 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

5 Accessing memory Basic mode
Special register J31 – acts as zero when used in additions Pt_J5 is a pointer register into an array Read_J1 is being used as a data register J registers like MIPS registers (used as pointer and data). NOT like 68K registers – either data or address but not both Read_J1 = [Pt_J5];; read value from memory location pointed to by J Compare to 68K MOVE.L (A5), D1 Read_J1 = [Pt_J5 + 8];; read value from memory location pointed to by the value (J5 + 8) -- Compare to 68K MOVE.L 8(A5), D PREMODIFY – address used J5 + 8, no change in J5 Read_J1 = [Pt_J5 + J31];; read value from memory location pointed to by J5 – but read somewhere that this CAN be faster than just Read_J1 = [Pt_J5];; -- NEED TO CONFIRM 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

6 Accessing memory – step 2
Basic mode Pt_J5 is a pointer register into an array Offset_J4 is used as an offset Read_J1 is being used as a data register Read_J1 = [Pt_J5 + Offset_J4];; read value from memory location pointed to by (J5 + J4) PRE-MODIFY – address used J5 + J4, no change in J5 Compare to 68K MOVE.L (A5, D4), D1 Read_J1 = [Pt_J5 += Offset_J4];; read value from memory location pointed to by J5, and then perform add POST-MODIFY – address used J5, then perform J5 = J5 + J4 Compare to 68K MOVE.L (A5), D ADD.L A4, A but as single instruction 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

7 For – loop structure – Use 68K style of looping
QUAD ERROR ISSUE AGAIN 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

8 Weird results DEBUG RELEASE FIRST_ASM INTEGER 426 416 124 118 316 320
FLOAT 462 458 224 222 Variation of about 6 cycles in testing Our first ASM is faster than debug and slower than release – that was expected Our integer code was slower than our float code – that was unexpected since the same code Can we optimize an improve the timing? 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

9 Most TigerSHARC instructions are conditional
8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

10 Why mostly conditional instructions?
TigerSHARC has a very deep pipeline, so that conditional jumps cause a potential large disruption of the pipeline Better to use non-jump instructions which don’t disrupt pipeline, even if instruction is not executed (acts as nop) If (N < 1) return_value = NULL; else return_value = NULL; 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

11 Why mostly conditional instructions?
If (N < 1) return_value = NULL; else return_value = value; COMP(N, 1);; IF NJLT, JUMP _ELSE;; J5 = NULL;; JUMP _END_IF;; _ELSE: J5 = value;; If (N < 1) return_value = NULL; else return_value = value; COMP(N, 1);; IF NJLT; DO J5 = NULL;; IF JLT; DO J5 = value;; Concept is there – we need to Check on whether syntax is correct 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

12 Does not work quite as expected
Code of the form J8 = Number, and JUMPS Don’t seem to fit into same instruction 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

13 Better code 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

14 Optimizing wrong part of code N test is only used once
DEBUG RELEASE FIRST_ASM INTEGER 124 118 316 320 428 424 122 110 Faster N 322 328 FLOAT 462 458 224 222 476 478 190 194 228 220 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

15 Improve the test with zero
Remember the comma, after the DO Also check that tests still work 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

16 Optimizing test for > 0
DEBUG RELEASE FIRST_ASM INTEGER 426 124 316 428 122 Faster N test 322 Faster > 0 test 234 FLOAT 462 210 224 476 190 228 182 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

17 Hardware – zero overhead loop
LC0 = N;; Load counter 0 with value N Start of loop: Loop code here ;; IF NLC0E, JUMP Start_of_loop;; NLC0E – Not LC0 expired – essentially Compare LC0 with 2 If less than 2, continue (don’t jump) If 2 or more, then decrement LC0 and jump All sorts of stall issues if not properly aligned – see TigerSHARC manual 8-23 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

18 Hardware loop – 1st attempt
count_J0 = 0; _HalfWaveRectifyASM__FPiT1i_LOOP: COMP(count_J0, N_inpar3);; if NJLT, JUMP _END;; array_value_J1 = [initial_inpar1 + J0];; COMP(array_value_J1, 0);; IF JLT; DO, array_value_J1 = 0;; [final_inpar2 + J0] = array_value_J1; count_J0 = count_J0 + 1;; JUMP _HalfWaveRectifyASM__FPiT1i_LOOP;; count_J0 = 0;; LC0 = N;; _HalfWaveRectifyASM__FPiT1i_LOOP: array_value_J1 = [initial_inpar1 + J0];; IF NLC0E, JUMP _HalfWaveRectifyASM__FPiT1i_LOOP: Problem need J0 as index register 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

19 Hardware loop – 2nd attempt
count_J0 = 0;; LC0 = N;; _HalfWaveRectifyASM__FPiT1i_LOOP: array_value_J1 = [initial_inpar1 + J0];; COMP(array_value_J1, 0);; IF JLT; DO, array_value_J1 = 0;; [final_inpar2 + J0] = array_value_J1; count_J0 = count_J0 + 1;; IF NLC0E, JUMP _HalfWaveRectifyASM__FPiT1i_LOOP: LC0 = N;; _HalfWaveRectifyASM__FPiT1i_LOOP: array_value_J1 = [initial_inpar1 += 1];; [final_inpar2 += 1] = array_value_J1; Problem need J0 as index register Would it be faster to set J0 = 1 and use initial_inpar1 += J0? 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

20 Integer hardware loop using +=1
DEBUG RELEASE FIRST_ASM INTEGER 426 124 316 428 122 Faster N test 322 Faster > 0 test 234 Hardware loop, J+=1 134 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

21 Integer hardware loop using += J2
DEBUG RELEASE FIRST_ASM INTEGER 426 124 316 428 122 Faster N test 322 Faster > 0 test 234 Hardware loop, J+=1 134 Hardware loop, J+=J2 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

22 Integer hardware loop Max instr
DEBUG RELEASE FIRST_ASM INTEGER 426 124 316 428 122 Faster N test 322 Faster > 0 test 234 Hardware loop, J+=1 134 Hardware loop, J+=J2 Max instruction 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

23 Memory writes with jumps
DEBUG RELEASE FIRST_ASM INTEGER 426 124 316 428 122 Faster N test 322 Faster > 0 test 234 Hardware loop, J+=1 134 Hardware loop, J+=J2 Max instruction write + jump 132 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

24 Memory writes with jumps
DEBUG RELEASE FIRST_ASM INTEGER 426 124 316 428 122 Faster N test 322 Faster > 0 test 234 Hardware loop, J+=1 134 Hardware loop, J+=J2 Max instruction Memory + jump 132 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

25 Jumps, reads and writes 8/21/2019
TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

26 Avoid problem – read 2 values, write 2 values – use X and Y registers
Syntax concept is correct Gives wrong answer And is slower 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

27 Explain why this code might work faster
// for (int count = 0; count < N; count++) { // count_J0 = 0;; XR1 = N_inpar3;; XR1 = ASHIFT R1 BY -1;; // Valid if N even LC0 = XR1;; _HalfWaveRectifyASM__FPiT1i_LOOP: array_value_J1 = [initial_inpar1 += 1];; J2 = [initial_inpar1 += 1];; array_value_J1 = MAX(array_value_J1, 0);; J2 = MAX(J2, 0);; [final_inpar2 += 1] = array_value_J1;; IF NLC0E, JUMP _HalfWaveRectifyASM__FPiT1i_LOOP; [final_inpar2 += 1] = J2;; 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada

28 Concepts Learning some optimizing techniques
What part of the code will likely give us a lot of optimization What part of the code will likely give us a lot of little optimization Most TigerSHARC instructions are conditional Why? Hardware loop – max of 2 Code optimization after 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada


Download ppt "A first attempt at learning about optimizing the TigerSHARC code"

Similar presentations


Ads by Google