Generating a software loop with memory accesses TigerSHARC assembly syntax.

Slides:



Advertisements
Similar presentations
Review of the MIPS Instruction Set Architecture. RISC Instruction Set Basics All operations on data apply to data in registers and typically change the.
Advertisements

Lecture 6 Programming the TMS320C6x Family of DSPs.
Blackfin BF533 EZ-KIT Control The O in I/O Activating a FLASH memory “output line” Part 2.
Review of Blackfin Syntax Moves and Adds 1) What we already know and have to remember to apply 2) What we need to learn.
Software and Hardware Circular Buffer Operations First presented in ENCM There are 3 earlier lectures that are useful for midterm review. M. R.
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter.
TigerSHARC processor General Overview. 6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 2 Concepts tackled Introduction to.
Blackfin BF533 EZ-KIT Control The O in I/O Activating a FLASH memory “output line” Part 2.
Just enough information to program a Blackfin Familiarization assignment for the Analog Devices’ VisualDSP++ Integrated Development Environment.
Assembly Language Review Being able to repeat on the Blackfin the things we were able to do on the MIPS 9/19/2015 Review of 50% OF ENCM369 in 50 minutes1.
Processor Architecture Needed to handle FFT algoarithm M. Smith.
Blackfin Array Handling Part 2 Moving an array between locations int * MoveASM( int foo[ ], int fee[ ], int N);
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter – Part 3 Understanding the memory pipeline issues.
Averaging Filter Comparing performance of C++ and ‘our’ ASM Example of program development on SHARC using C++ and assembly Planned for Tuesday 7 rd October.
Averaging Filter Comparing performance of C++ and ‘our’ ASM Example of program development on SHARC using C++ and assembly Planned for Thursday 3 rd October.
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter – Part 2 Understanding the pipeline.
Generating “Rectify( )” Test driven development approach to TigerSHARC assembly code production Assembly code examples Part 1 of 3.
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Blackfin Array Handling Part 1 Making an array of Zeros void MakeZeroASM(int foo[ ], int N);
12/14/2015 Concept of Test Driven Development applied to Embedded Systems M. Smith University of Calgary, Canada 1 Automated Testing Environment Concepts.
A first attempt at learning about optimizing the TigerSHARC code TigerSHARC assembly syntax.
Building a simple loop using Blackfin assembly code If you can handle the while-loop correctly in assembly code on any processor, then most of the other.
Assembly Language Review Being able to repeat on the Blackfin the things we were able to do on the MIPS 3/3/2016 Review of 50% OF ENCM369 in 50 minutes1.
“Lab. 5” – Updating Lab. 3 to use DMA Test we understand DMA by using some simple memory to memory DMA Make life more interesting, since hardware is involved,
Lecture 3 Translation.
User-Written Functions
RISC Concepts, MIPS ISA Logic Design Tutorial 8.
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Software and Hardware Circular Buffer Operations
General Optimization Issues
TigerSHARC processor General Overview.
Generating the “Rectify” code (C++ and assembly code)
Generating “Rectify( )”
Introduction to Test Driven Development
Automated Testing Environment
Overview of SHARC processor ADSP Program Flow and other stuff
Trying to avoid pipeline delays
Generating a software loop with memory accesses
Understanding the TigerSHARC ALU pipeline
Fundamentals of Computer Organisation & Architecture
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
VisualDSP++ and Test Driven Development What happened last lecture?
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Assembly Language Review
Understanding the TigerSHARC ALU pipeline
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
Using Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
Assembly Language Review
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
* M. R. Smith 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint.
Getting serious about “going fast” on the TigerSHARC
General Optimization Issues
Explaining issues with DCremoval( )
General Optimization Issues
COMS 361 Computer Organization
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
An Introduction to Debugging
Blackfin BF533 EZ-KIT Control The O in I/O
Building a simple loop using Blackfin assembly code
Understanding the TigerSHARC ALU pipeline
Mistakes, Errors and Defects
A first attempt at learning about optimizing the TigerSHARC code
CPU Structure CPU must:
Working with the Compute Block
Blackfin Syntax Moves and Adds
Blackfin Syntax Stores, Jumps, Calls and Conditional Jumps
A first attempt at learning about optimizing the TigerSHARC code
Building tests and code for a “software radio”
Presentation transcript:

Generating a software loop with memory accesses TigerSHARC assembly syntax

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 2 / 38 Concepts Learning just enough TigerSHARC assembly code to make a software loop “work” Comparing the timings for rectification of integer and floating point arrays, using debug C++ code, Release C++ code Our FIRST_ASM code Looking in “MIXED mode” at the code generated by the compiler

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 3 / 38 Test Driven Development CUSTOMER DEVELOPER Work with customer to check that the tests properly express what the customer wants done. Iterative process with customer “heavily involved” – “Agile” methodology.

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 4 / 38 Note Special marker Compiler optimization FLOATS 927  THREE FOLD INTS 960  150 – SIX FOLD Why the difference, and can we do better, and do we want to? Note the failures – what are they

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 5 / 38 Write tests about passing values back from an assembly code routine

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 6 / 38 More detailed look at the code Single semi-colons Double semi-colons Start function label End function label Used for “profiling code” Label format similar to 68K Needs leading underscore and final colon As with 68K and Blackfin needs a.section But name and format different As with 68K need.align statement Is the “4” in bytes (8 bits) or words (32 bits) As with 68K need.global to tell other code that this function exists

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 7 / 38 Return registers There are many, depending on what you need to return Here we need to use J8 as the return register to pass back “integer” pointer Many registers available – need ability to control usage J0 to J31 – registers (integers and pointers) (SISD mode) XR0 to XR31 – registers (integers) (SISD mode) XFR0 to XFR31 – registers (floats) (SISD mode) Did I also mention I0 to I31 – registers (integers and pointers) (SISD mode) YR0 to YR31, YFR0 to YFR31 (SIMD mode) XYR, YXR and R registers (SIMD mode) And also the MIMD modes And the double registers and the quad registers ……. #define return_pt_J8 J8 // J8 is a VOLATILE, NON-PRESERVED register

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 8 / 38 Parameter passing SPACES for first four parameters ARE ALWAYS present on the stack (as with 68K) But the first four parameters are passed in registers (J4, J5, J6 and J7 most of the time) (as with MIPS and Blackfin) The parameters passed in registers are often stored into the spaces on the stack (like the MIPS) as the first step when assembly code functions call assembly code functions J4, J5, J6 and J7 are volatile, non-preserved registers

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 9 / 38 Can we pass back the start of the final array Still passing tests by accident and this needs to be conditional return value

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 10 / 38 What we need to know based on experiences from other processors Can we return from an assembly language routine without crashing the processor? Return a parameter from assembly language routine (Is it same for ints and floats?) Pass parameters into assembly language (Is it same for ints and floats?) Do IF THEN ELSE statements Read and write values to memory Read and write values in a loop Do some mathematics on the values fetched from memory All this stuff is demonstrated by coding HalfWaveRectifyASM( )

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 11 / 38 Why is ELSE a keyword FOUR PART ELSE INSTRUCTION IS LEGAL IF JLT; ELSE, J1 = J2 + J3; // Conditional execution – if true ELSE, XR1 = XR2 + XR3; // Conditional – if true YFR1 = YFR2 + YFR3;; // Unconditional -- always IF JLT; DO, J1 = J2 + J3; // Conditional execution -- if true DO, XR1 = XR2 + XR3; // Conditional -- if true YFR1 = YFR2 + YFR3;; // Unconditional -- always Having this sort of format means that the instruction pipeline is not disrupted when we do IF statements

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 12 / 38 Label name is not the problem NOTE: This is “C-like” syntax, But it is not “C” Statement must end in ;; Not ; ONE semicolon = end of instruction TWO semicolons = end of parallel instruction line

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 13 / 38 Add dual-semicolons everywhere Worry about “multiple issues” later This dual semi-colon Is so important that you MUST code review for it all the time or else you waste so much time in the Lab. Key in exams / quizzes At last an error I know how to fix

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 14 / 38 Well I thought I understood it !!! Speed issue – JUMP instructions can’t be too close together when stored in memory Not normally a problem when “if” code is larger

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 15 / 38 Add a single instruction of 4 NOPs nop; nop; nop; nop;; TEMPORARY Fix the last error as part of Assignment 1 Fix the remaining error In handling the IF THEN ELSE as part of assignment 1 Worry about code efficiency later (refactor) when all code working

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 16 / 38 What we need to know based on experiences from other processors Can we return from an assembly language routine without crashing the processor? Return a parameter from assembly language routine (Is it same for ints and floats?) Pass parameters into assembly language (Is it same for ints and floats?) Do IF THEN ELSE statements Read and write values to memory Read and write values in a loop Do some mathematics on the values fetched from memory All this stuff is demonstrated by coding HalfWaveRectifyASM( )

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 17 / 38 Target. Changing this C++ code into assembly (to get “more” speed) Code we generated yesterday was similar to parts of this, but not equivalent. Re-factor the code to make the assembly code and C++ functionality equivalent

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 18 / 38 The code was not exactly what we designed (C++ equivalent) – re-factor and retest after the re-factoring NEXT STEP

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 19 / 38 Refactored C++ code I THINK I UNDERSTAND ENOUGH TO CHANGE THE FORMAT OF THE IF-THEN-ELSE TO OPTIMIZE THIS PARTICULAR CODE BIT USE : IF TRUE EXECUTE THIS STATEMENT – SINGLE LINE Avoiding JUMPS in the main flow of the code will speed the flow of the code Almost right. SYNTAX ERROR Look in the manual to find the correct syntax IF NJLE; DO, J8 = 0

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 20 / 38 No syntax errors (No CODE ERRORS). Code does not work (CODE DEFECTS) We don’t have enough code to pass all the tests but we are failing tests we did not expect to fail

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 21 / 38 Run “forensic tests” to find out where DEFECT is being introduced Identify mistake by removing “code sections” Without the IF

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 22 / 38 Add another line to the code Can now spot the error New format of IF-THEN-ELSE Is doing exactly the opposite of what we want IF NOT TRUE return NULL (0) Need JLE not NJLE

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 23 / 38 Assignment 1 – code the following as a software loop – follow MIPS / Blackfin approach DONE DURING TUTOTIAL int CalculateSum(void) { int sum = 0; for (int count = 0; count < 6; count++) { sum = sum + count; } return sum; }

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 24 / 38 Reminder – software for-loop becomes “while loop” with initial test int CalculateSum(void) { int sum = 0; int count = 0; while (count < 6) { sum = sum + count; count++; } return sum; } Do line by line translation into assembly code

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 25 / 38 USE SOFTWARE LOOP HERE Do loop control first Have some jumps too close together NOTE JGE is ILLEGAL USE NJLT Customize? #define JGE NJLT

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 26 / 38 Run the tests with 4 nop padding to check that get out of loop as expected Adding 4 nops -- lose 1 cycle gain an hour not trying to solve the problem If need the 1 cycle refactor the code later

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 27 / 38 Accessing memory Basic mode Special register J31 – acts as zero when used in additions Pt_J5 is a pointer register into an array Value_J1 is being used as a data register J registers like MIPS registers (used as pointer and data). NOT like 68K or Blackfin registers – those can be used as either data or address registers but not both NOTE: Later we will find that using TigerSHARC registers for data operations is a BAD idea 1. Value_J1 = [Pt_J5];; read value from memory location pointed to by J5 -- Compare to Blackfin Value_R0 = [Pt_P0];; 2. Value_J1 = [Pt_J5 + J31];; read value from memory location pointed to by J5 – but read somewhere that this CAN be faster than just Value_J1 = [Pt_J5];; -- NEED TO CONFIRM

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 28 / 38 Accessing memory – step 2 Basic mode Pt_J5 is a pointer register into an array Offset_J4 is used as an offset Value_J1 is being used as a data register to receive the memory value – load / store architecture 1. Read_J1 = [Pt_J5 + Offset_J4];; read value from memory location pointed to by (J5 + J4) PRE-MODIFY – address used J5 + J4, no change in J5 2. Read_J1 = [Pt_J5 += Offset_J4];; read value from memory location pointed to by J5, and then perform add operation on the J5 register (points to NEXT location) POST-MODIFY – address used J5, then perform J5 = J5 + J4

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 29 / 38 Add in the memory accesses FORGET TigerSHARC = RISC PROCESSOR LOAD/STORE ONLY Like MIPS and Blackfin Must place value into register, and then copy register to memory NO [J5 +J0] = 0; NO J3 = 0 ; [J5 + J0] = J3; Uses wrong J3 – Remember TigerSHARC can handle parallel instructions YES J3 = 0 ;; [J5 + J0] = J3;

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 30 / 38 Understand the error message Too many J resource usage = missing ;; Unintentionally doing the parallel instruction line [J5 + J0] = J2; J0 = J0 + 1;;

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 31 / 38 Note: Missing label is not an assembler error, it’s a linker error Fix warnings DEFECT may be days before try to link then hard to find

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 32 / 38 NOW the assembler know where “CONTINUE” is, then it can tell you that you have two JUMP instructions too close together Fix with magic 4 nops; and lose one cycle / loop

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 33 / 38 Not getting expected Test results Something is logically wrong (DEFECT)

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 34 / 38 Obvious question – are we even getting into the loop. Add BREAKPOINT to TEST code flow. (We don’t add BREAKPOINTS to code follow in detail) CODE NEVER GOT TO BREAKPOINT means code never entered loop Forgot to do count = 0 So not even getting into loop as there is a garbage value already in Count_J0 from code we executed earlier -- DEFECT

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 35 / 38 Not bad for a first effort Faster than compiler in debug mode

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 36 / 38 Where did the float ASM code suddenly appear from? Integer 0 has bit pattern 0x Float 0.0 has bit pattern 0x Integer +6 has format b 0??? ???? ???? ???? ???? ???? ???? ???? Float +6.0 has format b 0??? ???? ???? ???? ???? ???? ???? ???? Integer -6 has format b 1??? ???? ???? ???? ???? ???? ???? ???? Float -6.0 has format b 1??? ???? ???? ???? ???? ???? ???? ???? Format’s are very different, but the sign bit is in the same place Float algorithm - if S == 1 (negative) set to zero Otherwise leave unchanged – same as integer algorithm Just re-use integer algorithm with a change of name EXPONENT

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 37 / 38 Final code – Float rectify code just has a different name

10/1/2016 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada 38 / 38 What we NOW KNOW Can we return from an assembly language routine without crashing the processor? Return a parameter from assembly language routine (Is it same for ints and floats?) Pass parameters into assembly language (Is it same for ints and floats?) Do IF THEN ELSE statements Read and write values to memory Read and write values in a loop Do some mathematics on the values fetched from memory All this stuff is demonstrated by coding HalfWaveRectifyASM( )