Explaining issues with DCremoval( ) Common problems to avoid
DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Tackled today Testing the performance of the CPP version First assembly version – using I-ALU operations – testing and timing Details of the code 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Memory intensive Addition intensive Loops for main code FIFO implemented as circular buffer Not as complex as FIR, but many of the same requirements Easier to handle You use same ideas in optimizing FIR over Labs 2 and 3 Two issues – speed and accuracy. Develop suitable tests for CPP code and check that various assembly language versions satisfy the same tests 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Call and return test Basically – if the code gets here it is probably that we did not crash the system I use a cut-and-paste approach to develop code variants. This test is (embarrassingly) useful. 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
Initially we expect the code to fail to work correctly If the code works initially, then it is doing so by accident Use XF_CHECK_EQUAL( ) Expected to fail NOTE: This test is just a “cut-and-paste” version of C++ test with three changes of function name 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Timing test Once 10 times 100 times Normalized the timing tests to “process the function once” Need to develop various other routines to make tests work -- DoNothing loop, run C++ and assembly code routines in a loop May not be correctly performing timing – but gives initial concepts 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
Other functions needed to run the test Do Nothing Careful – may be optimized to “nothing” C++ function loop J-ALU function loop 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
Use build failure information to determine assembly code function name Required name for void DCremovalASM_JALU(int *, int *) _DCremoval_JALU__FPiT1 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
Proper test run and exit – lib_prog_term Yellow indicates that there are NO failures but some expected failures All successes and failures shown in console window 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Quick look at the code void DCremovalASM(int *, int *) Setting up the static arrays Defining and then setting pointers Moving incoming parameters in FIFO Summing the FIFO values Performing (FAST) division Returning the correct values Updating the FIFO in preparation for next time this function is called – discarding oldest value, and “rippling” the FIFO to make the “newest” FIFO slot empty 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
Developing the assembly code static arrays – “section data1” In later algorithms we will show that using multiple data sections in different parts of TigerSHARC memory allow us to bring in 256-bits of data per cycle 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
Developing the assembly code static arrays – “section data1” 2) .align 4; Later will use ability to bring in 4 words (32-bits) of data at the same time. Works best when the array starts on a 4 word boundary 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
Developing the assembly code static arrays – “section data1” 3) .var array[128]; The .var syntax allows declaring of “word” arrays. Other syntax for short int and byte arrays NOTE: -- reused .align 4 before next array 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
Developing the assembly code static arrays – “section data1” 4) .var array[128]; Array is “static” – known in this file only – as we don’t globalize the name TRUE or FALSE? KEY – switch between data and program memory is “really key” 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Define the (250) register names for code maintainability (and marking) ease Actual static array declaration DEFINE pointers into arrays DEFINE temps DEFINE Inpars SET pointers into arrays 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Value into FIFO buffer RISC processor LOAD and STORE architecture – Use pointer value (came in J4) to read “left value” passed in by reference into a register MIPS – like rather than CISC Now place this value into last element of FIFO array (make sure that not one element out. NOTE – BUFFERSIZE – 1 is converted BY ASSEMBLER and does not happen at run time Using index with pre-modify offset – J2 is not changed 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Perform sum Hardware loop 1 Set up an index i_J8 to be used as offset into Array – note how this syntax follows C++ Set up LOOP COUNTER 0 Perform test and jumo 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Perform sum Hardware loop 2 Set up LOOP COUNTER 0 Division by 128 is performed by shift (What did C++ do) Note that with the I-ALU you can only shift by 1 bit (not a barrel shifter). Perform test and jumo 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
Some obvious multiple instructions. Can they go wrong? Note Add occurs whether the jump does or does not occur Should this be a predicted or non-predicted jump One shift too many? 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
Correcting INPARS and then updating the FIFO buffer Adjust the INPARS remember int * Update FIFO memory using load / store approach SLOW 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
Adjust tests for expected success 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Run the tests 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Examine the timing In “debug” mode, we are already “beating” the compiler” Questions Why is C++ slower? Is it doing something that us (in ignorance) don’t know we need to do? What happens with “release mode”? 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Can you explain this 10% change in the results depending on how many tests? Timing with all the tests Timing Test only 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada
DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Tackled today What are the basic characteristics of a DSP algorithm? A near perfect “starting” example DCRemoval( ) has many of the features of the FIR filters used in all the Labs Testing the performance of the CPP version First assembly version – using I-ALU operations – testing and timing Code will be examined in more detail in the next lecture 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada