Presentation is loading. Please wait.

Presentation is loading. Please wait.

General Optimization Issues

Similar presentations


Presentation on theme: "General Optimization Issues"— Presentation transcript:

1 General Optimization Issues
M. Smith

2 To be tackled today Most optimized TigerSHARC instruction
Integer and float Systematic optimization procedure SISD and SIMD modes Exercises 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

3 Most optimized SIMD Floating point (32-bit)TigerSHARC instruction
xR3:0 = CB Q[j0 += 4]; yR3:0 = CB Q[k0 += 4]; xyFR4 = R5 * R6; xyFR7 = R8 + R9, FR10 = R8 - R9;; xR3:0 = CB Q[j0 += 4]; /* Fetches 4 values on J BUS into x compute registers XR3, XR2, XR1, XR Increments J register and adjusts for circular buffer operation */ yR3:0 = CB Q[k0 += 4]; /* Fetches 4 values on J BUS into x compute registers XR3, XR2, XR1, XR Increments J register and adjusts for circular buffer operation */ xyFR4 = R5 * R6; /* Two multiplications XFR5 * XFR6 and YFR5 * YFR6 */ xyFR7 = R8 + R9, FR10 = R8 - R9;; /* Two additions XFR8 + XFR9 and YFR8 + YFR9 AND Two subtractions XFR8 - XFR9 and YFR8 - YFR9 */ /* Same register must be used either side of + and – operators */ 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

4 Most optimized SIMD Integer (short) (16-bit)TigerSHARC instruction
xR3:0 = CB Q[j0 += 4]; yR3:0 = CB Q[k0 += 4]; R7:6 = R5:4 * R3:2; xySR9:8 = R7:6+R1:0,SR11:10 = R7:6-R1:0;; xR3:0 = CB Q[j0 += 4]; /* Fetches 4 values on J BUS into x compute registers XR3, XR2, XR1, XR Increments J register and adjusts for circular buffer operation */ yR3:0 = CB Q[k0 += 4]; /* Fetches 4 values on J BUS into x compute registers XR3, XR2, XR1, XR Increments J register and adjusts for circular buffer operation */ xyR7:6 = R5:4 * R3:2; /* Eight multiplications XR5.H * XR3.H, and XR5.L * XR3.L, XR4.H * XR2.H, XR4.L * XR3.L ditto YR */ xySR9:8 = R7:6 + R1:0, R11:10 = R7:6 + R1:0;; /* Eight additions ??????? AND Eight subtractions ????????????????? */ 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

5 Exercise Write out the 16 operations performed
xySR9:8 = R7:6 + R1:0, R11:10 = R7:6 + R1:0;; /* Eight additions ??????? AND Eight subtractions ????????????????? */ Now do a sideways add on xySR9:8 and get a value 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

6 Steps to optimize Get the algorithm to work in “C”
Determine how much time is available If Timing already okay – quit Determine maximum number of each type of operation (add, subtract, multiple, memory fetches) Divide the calculated maximum by the number of available resources for that type of operation The largest division result is the – in theory – number of cycles needed for the algorithm If that minimum time is more than 100% of the time available – find a new algorithm If that minimum time is less than 40% of the time available – perhaps you can optimize the code to meet the speed requirements 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

7 Code optimization – 32 bit integers or 32-bit floats
2 * SIZE additions 2 * SIZE Memory fetches If done correctly Can do 2 additions AND 2 memory fetches each cycle Therefore optimum is SIZE cycles IFF can find all optimizations 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

8 Code optimization – 32 bit integers or 32-bit floats
2 * SIZE additions 2 * SIZE Memory fetches Left fetched on J-bus And done in X-compute Right fetched on K-bus And done in Y-compute 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

9 16-bit integers (short int) might be okay in some circumstances
2 * SIZE additions 2 * SIZE Memory fetches If done correctly Can do 8 short additions AND 32 short memory fetches each cycle Therefore optimum is SIZE / 4 cycles IFF can find all optimizations 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

10 FIR optimization SIZE additions SIZE multiplications
SIZE * 2 memory fetches 2 additions, 2 multiplications and 8 fetches per cycles Should be able to do it in SIZE / 2 cycles 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

11 FIR optimization SIZE additions SIZE multiplications
SIZE * 2 memory fetches Fetch 2 values along J-bus into XA and YA compute Fetch 2 coefficients along K-bus into XB and YB compute 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

12 Need a systematic approach to handling the optimization of code
Get the C++ code to work Rewrite code in simplest format – one operation per line Recommend – rewrite code using register names Unwrap the loop – start with “twice” Rewrite the second part of the loop using different register names – avoids setting up unexpected dependencies Overlap the first and second parts of loops Rearrange “start-up” and ending code 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

13 STAGE 1 Get the C++ code to work
11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

14 Need a systematic approach to handling the optimization of code
Get the C++ code to work Rewrite code in simplest format – one operation per line Recommend – rewrite code using register names Unwrap the loop – start with “twice” Rewrite the second part of the loop using different register names – avoids setting up unexpected dependencies Overlap the first and second parts of loops Rearrange “start-up” and ending code 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

15 Stage 2 – Rewrite in simplest format
Note naming convention Single operation per line Note other changes 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

16 Need a systematic approach to handling the optimization of code
Get the C++ code to work Rewrite code in simplest format – one operation per line Recommend – rewrite code using register names Unwrap the loop – start with “twice” Rewrite the second part of the loop using different register names – avoids setting up unexpected dependencies Overlap the first and second parts of loops Rearrange “start-up” and ending code 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

17 Step 3 -- Unwrap the loop Again Note naming convention
11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

18 Need a systematic approach to handling the optimization of code
Get the C++ code to work Rewrite code in simplest format – one operation per line Recommend – rewrite code using register names Unwrap the loop – start with “twice” Rewrite the second part of the loop using different register names – avoids setting up unexpected dependencies Overlap the first and second parts of loops Rearrange “start-up” and ending code 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

19 Step 4 Overlap the first and second parts of loops
Note The “C++” code goes no faster, but using this format for translating into parallel assembly code will Step * N Step 3 – 8 * (N / 2) + 2 Step 4 – 6 * (N / 2) + 2 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

20 Need a systematic approach to handling the optimization of code
Get the C++ code to work Rewrite code in simplest format – one operation per line Recommend – rewrite code using register names Unwrap the loop – start with “twice” Rewrite the second part of the loop using different register names – avoids setting up unexpected dependencies Overlap the first and second parts of loops Rearrange “start-up” and ending code 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

21 Step 5A - Rearrange “start-up” and ending code
“Software” Pipeline Move first read outside Need to add “extra read” at the end of the loop Timing 2 + (N/2 – 1) * 6 Need to adjust loop start (Is it done correctly? Are we “one-out”) CAUTION – NEED TO FIX 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

22 Step 5B - Rearrange “start-up” and ending code
Can now parallel additional adds and memory fetches Note loop still in error 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

23 Exercise -- Get the loop control correct
11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

24 Exercise 1 -- Get the loop control correct
BUFFER_SIZE = 1 BUFFER_SIZE = 2 BUFFER_SIZE = 4 BUFFER_SIZE = 5 BUFFER_SIZE = 8 BUFFER_SIZE = 128 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

25 Exercise 2 -- Rewrite the code when it is known that BUFFER_SIZE = 127
11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

26 Code to this point is SISD parallel optimization
SISD – single instruction single data Using X_compute block and J memory bus Next stage – SIMD – single instruction multiple data Using X_compute block and J memory bus for left Using Y_compute block and K memory bus for right Will need similar but different code when you are doing FIR in Lab. 3 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

27 Exercise 3 -- BUFFER_SIZE = 128 Rewrite so that X and Y ops done together
11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

28 Exercise 4 -- BUFFER_SIZE = 128 Rewrite so that expect no data dependency stalls
11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada

29 To be tackled today Most optimized TigerSHARC instruction
Integer and float Systematic optimization procedure SISD and SIMD modes Exercises 11/14/2018 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


Download ppt "General Optimization Issues"

Similar presentations


Ads by Google