Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding the TigerSHARC ALU pipeline

Similar presentations


Presentation on theme: "Understanding the TigerSHARC ALU pipeline"— Presentation transcript:

1 Understanding the TigerSHARC ALU pipeline
Determining the speed of one stage of IIR filter – Part 5 What syntax to make the code more parallel?

2 Understanding the TigerSHARC Parallel Operations
TigerSHARC has many pipelines Review of the COMPUTE pipeline works Interaction of memory (data) operations with COMPUTE operations Specialized C++ compiler options and #pragmas (Will be covered by individual student presentation) Optimized assembly code and optimized C++ 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

3 Processor Architecture
3 128-bit data busses 2 Integer ALU 2 Computational Blocks ALU (Float and integer) SHIFTER MULTIPLIER COMMUNICATIONS CLU 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

4 Use C++ IIR code as comments
Things to think about prior to code writing Register name reorganization Keep XR4 for xInput – save a cycle Put S1 and S2 into XR0 and XR chance to fetch 2 memory values in one cycle using L[ ] Put H0 to H5 in XR12 to XR chance to fetch 4 memory values in one cycle using Q[ ] followed by one normal fetch -- Problems – if more than one IIR stage then the second stage fetches are not quad aligned There are two sets of multiplications using S1 and S2. Can these by done in X and Y compute blocks in one cycle? float *copyStateStartAddress = state; S1 = *state++; S2 =*state++; *copyStateStartAddress++ = S1; *copyStateStartAddress++ = S2; 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

5 Register name conversion done in steps
Setting Xin – XR4 and Yout = XR8 saves one cycle Bulk conversion with no error So many errors made during bulk conversion that went to Find/replace/ test for each register individually 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

6 Fix bringing state variables in
QUESTION We have XR18 = [J6 += 1] (load S1) and R19 = [J6 += 1] (load S2) Both are valid What is the difference? 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

7 That difference – could it be used to our advantage?
XR18 = [J6 += 1];; Read the value at memory location [J6], and updates J6 to J6 + 1 after fetch. Stores fetched value in XR18 XYR19 = [J6 += 1];; Read the value at memory location [J6], and updates J6 to J6 + 1 after fetch. Stores fetched value in XR19 AND YR18 XYR19 = L[J6 += 2];; -- concept correct – but executes faster Read value at [J6], updates J6 to J6 + 1, store in XR19. AND Read value at [(new) J6], updates J6 to J6 + 1, store in XY19. PROVIDED J6 was originally aligned on 64-bit boundary 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

8 Send state variables out Go for the gusto – use L[ ] (64-bit)
Need to recalculate the test result state[1] is NOT Yout 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

9 Speed IIR -- stage 5 M. Smith, ECE, University of Calgary, Canada
Working solution -- I 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

10 Working Solution -- Part 2
4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

11 Working solution – Part 3
I could not spot where any extra stalls would occur because of memory pipeline reads and writes All values were in place when needed Need to check with pipeline viewer 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

12 Lets look at DATA MEMORY and COMPUTE pipeline issues -- 1
No problems here 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

13 Weird stuff happening with INSTRUCTION pipeline
Only 9 instructions being fetched but we are executing 21! Why all these instruction stalls? 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

14 Speed IIR -- stage 5 M. Smith, ECE, University of Calgary, Canada
Analysis We are seeing the impact of the processor doing quad-fetches of instructions (128-bits) into IAB (instruction alignment buffer) Once in the IAB, then the instructions (32-bits) are issued to the various execution units as needed. 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

15 Speed IIR -- stage 5 M. Smith, ECE, University of Calgary, Canada
Before we do any further optimization, need to understand about processor parallelism We already know about Parallel multiplications and additions and their associated stalls What about parallel memory fetches? 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

16 Parallel memory fetches
What is permissible? Can we do? Parallel fetches into XY at the same time Parallel into X and a Y registers Parallel into two X registers 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

17 Parallel memory syntax – not too difficult
Only this syntax is illegal Will need to do more research to discover whether “legal” means that the operation is performed without stalling the memory pipeline NOTE: Need to transfer INPAR3 (J6) into a K-register (K6) in order to be able to use both the J and K data busses during IIR operation 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

18 Speed IIR -- stage 5 M. Smith, ECE, University of Calgary, Canada
Question: How do you (in C++) place IIR coefficients in one memory block and state values into another? 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

19 Speed IIR -- stage 5 M. Smith, ECE, University of Calgary, Canada
Question: How do you (in assembly code) place IIR coefficients in one memory block and state values into another? 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

20 Speed IIR -- stage 5 M. Smith, ECE, University of Calgary, Canada
C++ manual talks about 2 data spaces (dm and pm) for STATIC or GLOBAL variables 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

21 Speed IIR -- stage 5 M. Smith, ECE, University of Calgary, Canada
BAD You can use the VDSP C++ extension pm to specify a different memory space. HOWEVER, there is no such thing as a pm stack so all variable must be declared “static” or “global” dm arrays can be placed on the stack but there may be alignment issues 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

22 The assembler manual says something similar but different
4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

23 Speed IIR -- stage 5 M. Smith, ECE, University of Calgary, Canada
VDSP C++ extensions dm and pm parameters are still being passed into functions via J5 and J6 as before. Notice the very big difference in the “absolute addresses” indicating that the data blocks are in very different memory spaces. Also data memory address is widely different from instruction memory space. Do instruction and 2 data fetches at same time 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

24 IIR function using TigerSHARC C++ DSP extensions dm and pm
4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

25 Using dm and pm shows up a little more parallel than only using dm
4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

26 From TigerSHARC TS201 programming reference manual
4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

27 Speed IIR -- stage 5 M. Smith, ECE, University of Calgary, Canada
4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

28 Memory block operation will need to be explored in more detail later
4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada

29 Understanding the TigerSHARC Parallel Operations
TigerSHARC has many pipelines Review of the COMPUTE pipeline works Interaction of memory (data) operations with COMPUTE operations Specialized C++ compiler options and #pragmas (Will be covered by individual student presentation) Optimized assembly code and optimized C++ 4/30/2019 Speed IIR -- stage M. Smith, ECE, University of Calgary, Canada


Download ppt "Understanding the TigerSHARC ALU pipeline"

Similar presentations


Ads by Google