Download presentation
Presentation is loading. Please wait.
1
General Optimization Issues
Solving the exercise issues
2
To be tackled today Exercise 1 Exercise 2 Exercise 3 Exercise 4
Solving the loop problem SIZE = 128 Exercise 2 Solving the loop problem SIZE = 127 Exercise 3 Moving from SISD to SIMD mode, SIZE = 128 Exercise 4 Removing any expected stalls 2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
3
Most optimized SIMD Floating point (32-bit)TigerSHARC instruction
xR3:0 = CB Q[j0 += 4]; yR3:0 = CB Q[k0 += 4]; xyFR4 = R5 * R6; xyFR7 = R8 + R9, FR10 = R8 - R9;; xR3:0 = CB Q[j0 += 4]; /* Fetches 4 values on J BUS into x compute registers XR3, XR2, XR1, XR Increments J register and adjusts for circular buffer operation */ yR3:0 = CB Q[k0 += 4]; /* Fetches 4 values on J BUS into x compute registers XR3, XR2, XR1, XR Increments J register and adjusts for circular buffer operation */ xyFR4 = R5 * R6; /* Two multiplications XFR5 * XFR6 and YFR5 * YFR6 */ xyFR7 = R8 + R9, FR10 = R8 - R9;; /* Two additions XFR8 + XFR9 and YFR8 + YFR9 AND Two subtractions XFR8 - XFR9 and YFR8 - YFR9 */ /* Same register must be used either side of + and – operators */ 2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
4
Steps to optimize Get the algorithm to work in “C”
Determine how much time is available If Timing already okay – quit Determine maximum number of each type of operation (add, subtract, multiple, memory fetches) Divide the calculated maximum by the number of available resources for that type of operation The largest division result is the – in theory – number of cycles needed for the algorithm If that minimum time is more than 100% of the time available – find a new algorithm If that minimum time is less than 40% of the time available – perhaps you can optimize the code to meet the speed requirements 2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
5
Code optimization – 32 bit integers or 32-bit floats
2 * SIZE additions 2 * SIZE Memory fetches Left fetched on J-bus And done in X-compute Right fetched on K-bus And done in Y-compute SIZE / 2 cycles in theory 2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
6
STAGE 1 Get the C++ code to work
2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
7
Stage 2 – Rewrite in simplest format
Note naming convention Single operation per line Note other changes 2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
8
Step 3 -- Unwrap the loop Again Note naming convention
2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
9
Step 4 Overlap the first and second parts of loops
Note The “C++” code goes no faster, but using this format for translating into parallel assembly code will Step * N Step 3 – 8 * (N / 2) + 2 Step 4 – 6 * (N / 2) + 2 2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
10
Step 5A - Rearrange “start-up” and ending code
“Software” Pipeline Move first read outside Need to add “extra read” at the end of the loop Timing 2 + (N/2 – 1) * 6 Need to adjust loop start (Is it done correctly? Are we “one-out”) CAUTION – NEED TO FIX 2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
11
Step 5B - Rearrange “start-up” and ending code
Can now parallel additional adds and memory fetches Note loop still in error 2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
12
Exercise 1 -- Get the loop control correct
BUFFER_SIZE = 1 BUFFER_SIZE = 2 BUFFER_SIZE = 4 BUFFER_SIZE = 5 BUFFER_SIZE = 8 BUFFER_SIZE = 128 2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
13
2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
14
2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
15
2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
16
Unrecognized second key error What is it? How do you fix it?
2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
17
2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
18
Exercise 2 -- Rewrite the code when it is known that BUFFER_SIZE = 129
But loop only handles 128 Since 129 / 2 = 128 / 2 2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
19
2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
20
2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
21
Code to this point is SISD parallel optimization
SISD – single instruction single data Using X_compute block and J memory bus Next stage – SIMD – single instruction multiple data Using X_compute block and J memory bus for left Using Y_compute block and K memory bus for right Will need similar but different code when you are doing FIR in Lab. 3 2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
22
Exercise 3 -- BUFFER_SIZE = 128 Rewrite so that X and Y ops done together
2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
23
2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
24
Exercise 4 -- BUFFER_SIZE = 128 Rewrite so that expect no data dependency stalls
Leave this one for a while until we have handled multiple memory accesses as answer may changes 2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
25
Tackled today Exercise 1 Exercise 2 Exercise 3 Exercise 4
Solving the loop problem SIZE = 128 Exercise 2 Solving the loop problem SIZE = 127 Exercise 3 Moving from SISD to SIMD mode, SIZE = 128 Incomplete Exercise 4 Removing any expected stalls – left for later 2/23/2019 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.