Download presentation
Presentation is loading. Please wait.
Published byΦυλλίς Ελευθεριάδης Modified over 5 years ago
1
A first attempt at learning about optimizing the TigerSHARC code
TigerSHARC assembly syntax
2
Concepts Learning some optimizing techniques
What part of the code will likely give us a lot of optimization What part of the code will likely give us a lot of little optimization Most TigerSHARC instructions are conditional Why? Hardware loop – max of 2 Code optimization after 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
3
Passing integer rectify
8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
4
The code was not exactly what we designed (C++ equivalent) – refactor and retest after the refactoring NEXT STEP 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
5
Accessing memory Basic mode
Special register J31 – acts as zero when used in additions Pt_J5 is a pointer register into an array Read_J1 is being used as a data register J registers like MIPS registers (used as pointer and data). NOT like 68K registers – either data or address but not both Read_J1 = [Pt_J5];; read value from memory location pointed to by J Compare to 68K MOVE.L (A5), D1 Read_J1 = [Pt_J5 + 8];; read value from memory location pointed to by the value (J5 + 8) -- Compare to 68K MOVE.L 8(A5), D PREMODIFY – address used J5 + 8, no change in J5 Read_J1 = [Pt_J5 + J31];; read value from memory location pointed to by J5 – but read somewhere that this CAN be faster than just Read_J1 = [Pt_J5];; -- NEED TO CONFIRM 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
6
Accessing memory – step 2
Basic mode Pt_J5 is a pointer register into an array Offset_J4 is used as an offset Read_J1 is being used as a data register Read_J1 = [Pt_J5 + Offset_J4];; read value from memory location pointed to by (J5 + J4) PRE-MODIFY – address used J5 + J4, no change in J5 Compare to 68K MOVE.L (A5, D4), D1 Read_J1 = [Pt_J5 += Offset_J4];; read value from memory location pointed to by J5, and then perform add POST-MODIFY – address used J5, then perform J5 = J5 + J4 Compare to 68K MOVE.L (A5), D ADD.L A4, A but as single instruction 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
7
For – loop structure – Use 68K style of looping
QUAD ERROR ISSUE AGAIN 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
8
Weird results DEBUG RELEASE FIRST_ASM INTEGER 426 416 124 118 316 320
FLOAT 462 458 224 222 Variation of about 6 cycles in testing Our first ASM is faster than debug and slower than release – that was expected Our integer code was slower than our float code – that was unexpected since the same code Can we optimize an improve the timing? 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
9
Most TigerSHARC instructions are conditional
8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
10
Why mostly conditional instructions?
TigerSHARC has a very deep pipeline, so that conditional jumps cause a potential large disruption of the pipeline Better to use non-jump instructions which don’t disrupt pipeline, even if instruction is not executed (acts as nop) If (N < 1) return_value = NULL; else return_value = NULL; 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
11
Why mostly conditional instructions?
If (N < 1) return_value = NULL; else return_value = value; COMP(N, 1);; IF NJLT, JUMP _ELSE;; J5 = NULL;; JUMP _END_IF;; _ELSE: J5 = value;; If (N < 1) return_value = NULL; else return_value = value; COMP(N, 1);; IF NJLT; DO J5 = NULL;; IF JLT; DO J5 = value;; Concept is there – we need to Check on whether syntax is correct 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
12
Does not work quite as expected
Code of the form J8 = Number, and JUMPS Don’t seem to fit into same instruction 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
13
Better code 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
14
Optimizing wrong part of code N test is only used once
DEBUG RELEASE FIRST_ASM INTEGER 124 118 316 320 428 424 122 110 Faster N 322 328 FLOAT 462 458 224 222 476 478 190 194 228 220 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
15
Improve the test with zero
Remember the comma, after the DO Also check that tests still work 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
16
Optimizing test for > 0
DEBUG RELEASE FIRST_ASM INTEGER 426 124 316 428 122 Faster N test 322 Faster > 0 test 234 FLOAT 462 210 224 476 190 228 182 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
17
Hardware – zero overhead loop
LC0 = N;; Load counter 0 with value N Start of loop: Loop code here ;; IF NLC0E, JUMP Start_of_loop;; NLC0E – Not LC0 expired – essentially Compare LC0 with 2 If less than 2, continue (don’t jump) If 2 or more, then decrement LC0 and jump All sorts of stall issues if not properly aligned – see TigerSHARC manual 8-23 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
18
Hardware loop – 1st attempt
count_J0 = 0; _HalfWaveRectifyASM__FPiT1i_LOOP: COMP(count_J0, N_inpar3);; if NJLT, JUMP _END;; array_value_J1 = [initial_inpar1 + J0];; COMP(array_value_J1, 0);; IF JLT; DO, array_value_J1 = 0;; [final_inpar2 + J0] = array_value_J1; count_J0 = count_J0 + 1;; JUMP _HalfWaveRectifyASM__FPiT1i_LOOP;; count_J0 = 0;; LC0 = N;; _HalfWaveRectifyASM__FPiT1i_LOOP: array_value_J1 = [initial_inpar1 + J0];; IF NLC0E, JUMP _HalfWaveRectifyASM__FPiT1i_LOOP: Problem need J0 as index register 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
19
Hardware loop – 2nd attempt
count_J0 = 0;; LC0 = N;; _HalfWaveRectifyASM__FPiT1i_LOOP: array_value_J1 = [initial_inpar1 + J0];; COMP(array_value_J1, 0);; IF JLT; DO, array_value_J1 = 0;; [final_inpar2 + J0] = array_value_J1; count_J0 = count_J0 + 1;; IF NLC0E, JUMP _HalfWaveRectifyASM__FPiT1i_LOOP: LC0 = N;; _HalfWaveRectifyASM__FPiT1i_LOOP: array_value_J1 = [initial_inpar1 += 1];; [final_inpar2 += 1] = array_value_J1; Problem need J0 as index register Would it be faster to set J0 = 1 and use initial_inpar1 += J0? 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
20
Integer hardware loop using +=1
DEBUG RELEASE FIRST_ASM INTEGER 426 124 316 428 122 Faster N test 322 Faster > 0 test 234 Hardware loop, J+=1 134 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
21
Integer hardware loop using += J2
DEBUG RELEASE FIRST_ASM INTEGER 426 124 316 428 122 Faster N test 322 Faster > 0 test 234 Hardware loop, J+=1 134 Hardware loop, J+=J2 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
22
Integer hardware loop Max instr
DEBUG RELEASE FIRST_ASM INTEGER 426 124 316 428 122 Faster N test 322 Faster > 0 test 234 Hardware loop, J+=1 134 Hardware loop, J+=J2 Max instruction 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
23
Memory writes with jumps
DEBUG RELEASE FIRST_ASM INTEGER 426 124 316 428 122 Faster N test 322 Faster > 0 test 234 Hardware loop, J+=1 134 Hardware loop, J+=J2 Max instruction write + jump 132 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
24
Memory writes with jumps
DEBUG RELEASE FIRST_ASM INTEGER 426 124 316 428 122 Faster N test 322 Faster > 0 test 234 Hardware loop, J+=1 134 Hardware loop, J+=J2 Max instruction Memory + jump 132 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
25
Jumps, reads and writes 8/21/2019
TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
26
Avoid problem – read 2 values, write 2 values – use X and Y registers
Syntax concept is correct Gives wrong answer And is slower 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
27
Explain why this code might work faster
// for (int count = 0; count < N; count++) { // count_J0 = 0;; XR1 = N_inpar3;; XR1 = ASHIFT R1 BY -1;; // Valid if N even LC0 = XR1;; _HalfWaveRectifyASM__FPiT1i_LOOP: array_value_J1 = [initial_inpar1 += 1];; J2 = [initial_inpar1 += 1];; array_value_J1 = MAX(array_value_J1, 0);; J2 = MAX(J2, 0);; [final_inpar2 += 1] = array_value_J1;; IF NLC0E, JUMP _HalfWaveRectifyASM__FPiT1i_LOOP; [final_inpar2 += 1] = J2;; 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
28
Concepts Learning some optimizing techniques
What part of the code will likely give us a lot of optimization What part of the code will likely give us a lot of little optimization Most TigerSHARC instructions are conditional Why? Hardware loop – max of 2 Code optimization after 8/21/2019 TigerSHARC assemble code 1, M. Smith, ECE, University of Calgary, Canada
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.