Download presentation
Presentation is loading. Please wait.
1
Working with the Compute Block
M. R. Smith, ECE University of Calgary Canada
2
Tackled today Problems with using I-ALU as an “integer” processor
TigerSHARC processor architecture What features are available for DSP optimization, and what “do we have to worry about” when using these features? Moving the DCremoval( ) over to the X Compute block Using test macros – useful to know, real time waster for the labs in this class. 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
3
DCRemoval( ) Not as complex as FIR, but many of the same requirements
Memory intensive Addition intensive Loops for main code FIFO implemented as circular buffer Not as complex as FIR, but many of the same requirements Easier to handle You use same ideas in optimizing FIR over Labs 2 and 3 Two issues – speed and accuracy. Develop suitable tests for CPP code and check that various assembly language versions satisfy the same tests 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
4
Set up time In principle 1 cycle / instruction
2 + 4 instructions 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
5
First key element – Sum Loop -- Order (N) Second key element – Shift Loop – Order (log2N)
4 instructions N * 5 instructions 1 + 2 * log2N 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
6
Third key element – FIFO circular buffer -- Order (N)
6 3 6 * N 2 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
7
Time in theory Set up pointers to buffers Insert values into buffers 2
SUM LOOP SHIFT LOOP Update outgoing parameters Update FIFO Function return 2 4 4 + N * 5 1 + 2 * log2N 6 3 + 6 * N N + 2 log2N N = 128 – instructions = 1444 1444 cycles delay cycles C++ debug mode – 9500 cycles??????? 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
8
Is the code too slow? Code is slow IFF (if and only if) you don’t have 2,500 cycles available to perform this part of the software defined radio algorithm. Other components of SDR + other components of complete system must complete within the time between 2 samples at 48 kHz 48,000 interrupts per second 500,000,000 cycles available every second 10,500 cycles available per interrupt My ball-park – Never design code that at the design stage takes more than 50% of available cycles. From take-home quiz 1 – DCremoval( ) – 17% of code time – Need 6 * 2,500 cycles = 15,000 for SDR component alone 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
9
The code is too slow because we are not taking advantage of the available resources
Bring in up to 128 bits (4 instructions) per cycle Ability to bring in 4 32-bit values along J data bus (data1) and 4 along K bus (data2) Perform address calculations in J and K ALU – single cycle hardware circular buffers Perform math operations on both X and Y compute blocks Background DMA activity Off-load some of the processing to the second processor 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
10
Version 2 – Move the algorithm component from I-ALU over to Compute Block
5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
11
Steps for faster code development Cut and paste old code – Change name only
_DCremovalASM_JALU__FPiT1 Becomes _DCremovalASM_Compute__FPiT1 Run test to confirm 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
12
Add timing and execution tests
5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
13
Element we want to change
void DCremovalASM(int *, int *) Setting up the static arrays Defining and then setting pointers Moving incoming parameters in FIFO Summing the FIFO values Performing (FAST) division Returning the correct values Updating the FIFO in preparation for next time this function is called – discarding oldest value, and “rippling” the FIFO to make the “newest” FIFO slot empty 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
14
Perform sum – using I-ALU
5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
15
Perform sum – using Compute Block
#define left_sum_XR6 XR left_sum_XR6 = 0;; #define left_XR2 XR2 left_XR2 = [left_buffpt_J0 + i_J8];; left_sum_XR6 = R6 + R2;; NOTE SYNTAX left_sum_XR6 = ASHIFT R6 BY -7;; 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
16
Final sum code Don’t use XR6 = J31
J31 is NOT A ZERO if used with COMPUTE block – condition code reg. 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
17
Other necessary changes
5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
18
Time in theory Set up pointers to buffers 2 Insert values into buffers
SUM LOOP SHIFT LOOP Update outgoing parameters Update FIFO Function return 2 4 4 + N * 5 Was * log2N 6 3 + 6 * N N Was N + 2 log2N N = 128 – instructions = 1430 Was cycles 1444 cycles delay cycles 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
19
Time in Practice Set up pointers to buffers Insert values into buffers SUM LOOP SHIFT LOOP Update outgoing parameters Update FIFO Function return 2 4 4 + N * 5 Was * log2N 6 3 + 6 * N N Was N + 2 log2N N = 128 – instructions = 1430 delay cycles = 1730 cycles Was 2,500 cycles 1444 cycles delay cycles Improved more than expected as accidentally making better use of available resources 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
20
Possible explanation of speed improvement
Must wait for value to arrive from memory Must wait for I-ALU to become available so can calculate address or do add Remember – working in a loop Wait for I-ALU Savings 2 * N = 256 Actual 700 = 6 * N 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
21
Next stage in improving code speed Software and hardware circular buffers
Set up pointers to buffers Insert values into buffers SUM LOOP SHIFT LOOP Update outgoing parameters Update FIFO Function return 2 4 4 + N * 5 Was * log2N 6 3 + 6 * N N Was N + 2 log2N N = 128 – instructions = 1430 delay cycles = 1730 cycles 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
22
Making the tests quicker to develop
Is there an alternative to – cut-and-paste? Do you want to bother to learn and then use it? 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
23
Develop Call-RETURN test macro
5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
24
Develop – Validate operation test macro
In practice: Not as trivial an exercise as it looks Acts as “1 long C++ line”. Any error message – unspecific My favourite error Tabs and / or spaces after final \ on each line Solution – use “Home / End” keys to check that \ is at the end of the line 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
25
Timing test macro – not trivial
Need a new special loop control function generated for each test Name must change Print statement contents must change 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
26
Some standard “C++” macro issues
A #define must be one line “by definition” So cheat – use final \ -- says newline that follows the \ is not a “new-line character” #define FOO_MACRO(FEE, FUM) \ /* Must have C like comments */ \ /* # character means – turn parameter to string array */ \ puts(#FEE); \ /* ## character means – concatenate parameter \ DoLoop##FUM( ); \ /* Watch out for trailing ; and } – may be required / definitely not wanted */ \ THIS BREAK OVER 2 LINES -- ILLEGAL ; 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
27
5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
28
Using macros Learning how to do the concatenation and print formatting macros took me about 10 times as long as just cut-and-pasting In the labs – you use test macros at your own risk – the T.A.s and myself will not help you debug them In the exams – you can’t use macros Please note, I have defined macros and am now using them Exam macro -- PLEASE_ANSWER_EXAMQUESTION_FOR_ME( ) causes the marker macro ZERO_OUT_OF_100( ) to be activated Personal opinion – learn the concept for use at a later time – don’t worry about them in the labs 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
29
Tackled today Problems with using I-ALU as an “integer” processor
TigerSHARC processor architecture What features are available for DSP optimization, and what “do we have to worry about” when using these features? Moving the DCremoval( ) over to the X Compute block Using test macros – useful to know, real time waster for the labs in this class. 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.