180909 ajay patil 1 TMS320C6000 Assembly Language and its Rules Assignment One of the simplest operations in C is to assign a constant to a variable: One.

180909 ajay patil 1 TMS320C6000 Assembly Language and its Rules Assignment One of the simplest operations in C is to assign a constant to a variable: One of the simplest operations in C is to assign a constant to a variable: int x; x = 10; The variable x will contain the value 10 decimal. The variable x will contain the value 10 decimal.

180909 ajay patil 2 TMS320C6000 Assembly Language and its Rules Load a Constant 1 of 4 We will use register A1 to hold variable x : We will use register A1 to hold variable x : In assembly language we write: In assembly language we write: MVK 10, A1; MVK 10, A1; The instruction MVK moves (copies) the constant 10 into register A1 The instruction MVK moves (copies) the constant 10 into register A1 Register A1 now contains 00000000Ah Register A1 now contains 00000000Ah

180909 ajay patil 3 TMS320C6000 Assembly Language and its Rules Load a Constant 2 of 4 The correct syntax is number then register. The number can also be in hexadecimal: The correct syntax is number then register. The number can also be in hexadecimal: MVK 10, A1;  MVK 10, A1;  MVK 0xA, A1;  MVK 0xA, A1;  MVK 0Ah, A1;  MVK 0Ah, A1;  Do not use # Do not use # MVK #10, A1; X

180909 ajay patil 4 TMS320C6000 Assembly Language and its Rules Load a Constant 3 of 4 To load full 32 bits of a register needs 2 instructions: To load full 32 bits of a register needs 2 instructions: MVK 0x5678, B2; MVK 0x5678, B2; MVKLH 0x1234, B2; MVKLH 0x1234, B2; Register B2 now contains 12345678h Register B2 now contains 12345678h

180909 ajay patil 5 TMS320C6000 Assembly Language and its Rules Load a Constant 4 of 4 To load the full 32 bits of a register with 0 (zero) requires only a single instruction: To load the full 32 bits of a register with 0 (zero) requires only a single instruction: ZERO B2; ZERO B2; Register B2 now contains 00000000h Register B2 now contains 00000000h

180909 ajay patil 6 TMS320C6000 Assembly Language and its Rules Incrementing a Register 1 of 2 To increment a variable in C we can write: To increment a variable in C we can write: int x; int x; x++; x++; This adds 1 to the value of the variable x This adds 1 to the value of the variable x

180909 ajay patil 7 TMS320C6000 Assembly Language and its Rules Incrementing a Register 1 of 2 In assembly language we use the instruction ADDK (add constant) In assembly language we use the instruction ADDK (add constant) ADDK 1, A2 ; ADDK 1, A2 ; This adds the constant 1 to the contents of register A2 This adds the constant 1 to the contents of register A2

180909 ajay patil 8 TMS320C6000 Assembly Language and its Rules Decrementing a Register 1 of 2 To decrement a variable in C we can write: To decrement a variable in C we can write: int x; int x; x--; x--; This subtracts 1 from the variable x. This subtracts 1 from the variable x.

180909 ajay patil 9 TMS320C6000 Assembly Language and its Rules Decrementing a Register 2 of 2 In assembly language we again use the instruction ADDK (add constant) In assembly language we again use the instruction ADDK (add constant) ADDK -1, A2; ADDK -1, A2; This adds the constant -1 to the contents of register A2 This adds the constant -1 to the contents of register A2 There is no such instruction as SUBK There is no such instruction as SUBK

180909 ajay patil 10 TMS320C6000 Assembly Language and its Rules No Operation The TMS320C6000 provides an instruction that does nothing except take time. This is called NOP (No operation) The TMS320C6000 provides an instruction that does nothing except take time. This is called NOP (No operation) NOP ; NOP ; If we want to execute 4 NOP instructions one after another we can write: If we want to execute 4 NOP instructions one after another we can write: NOP 4; This instruction can be used to generate time delays. This instruction can be used to generate time delays.

180909 ajay patil 11 TMS320C6000 Assembly Language and its Rules Topic Two Controlling program flow Controlling program flow

180909 ajay patil 12 TMS320C6000 Assembly Language and its Rules Testing Conditions 1 of 5 Testing Conditions 1 of 5 The if-else construct is widely used in C. The if-else construct is widely used in C. Consider the following simple piece of Consider the following simple piece of code: code: int x, y ; int x, y ; if ( x != 0 ) if ( x != 0 ) { y++; } { y++; } This means if x is not equal to zero, then increment variable y. This means if x is not equal to zero, then increment variable y.

180909 ajay patil 13 TMS320C6000 Assembly Language and its Rules Testing Conditions 2 of 5 Testing Conditions 2 of 5 The assembler provides a neat way to do this. Assuming x is stored in register A1 and y is stored in register A2: The assembler provides a neat way to do this. Assuming x is stored in register A1 and y is stored in register A2: [A1] ADDK 1, A2; [A1] ADDK 1, A2; The term in [ ] is the condition to be tested. If the condition is A1 is not equal to zero is true, add 1 to the value in A2. Otherwise do nothing. The term in [ ] is the condition to be tested. If the condition is A1 is not equal to zero is true, add 1 to the value in A2. Otherwise do nothing.

180909 ajay patil 14 TMS320C6000 Assembly Language and its Rules Testing Conditions 3 of 5 Testing Conditions 3 of 5 Consider another piece of C code: Consider another piece of C code: int x, y ; int x, y ; if ( x == 0 ) if ( x == 0 ) { y--; } { y--; } This means if x is equal to zero, decrement y. This means if x is equal to zero, decrement y.

180909 ajay patil 15 TMS320C6000 Assembly Language and its Rules Testing Conditions 4 of 5 Testing Conditions 4 of 5 Again the assembler provides a neat way to do this. Assuming x is stored in register A1 and y is stored in A2: Again the assembler provides a neat way to do this. Assuming x is stored in register A1 and y is stored in A2: [!A1] ADDK -1, A2; [!A1] ADDK -1, A2; The term in [ ] is the condition to be tested. If the condition is A1 is equal to zero is true, add -1 to the value in A2. Otherwise do nothing. The term in [ ] is the condition to be tested. If the condition is A1 is equal to zero is true, add -1 to the value in A2. Otherwise do nothing.

180909 ajay patil 16 TMS320C6000 Assembly Language and its Rules Testing Conditions 5 of 5 Testing Conditions 5 of 5 The test can use register A1, A2, B0, B1 or B2: The test can use register A1, A2, B0, B1 or B2: [A1] MVK 10, A2;  [A1] MVK 10, A2;  [A0] MVK 10, A2; X [A0] MVK 10, A2; X [A3] MVK 10, A2; X [A3] MVK 10, A2; X [!B0] MVK 10, A2;  [!B0] MVK 10, A2;  [B1] MVK 10, A2;  [B1] MVK 10, A2;  [B3] MVK 10, A2; X [B3] MVK 10, A2; X

180909 ajay patil 17 TMS320C6000 Assembly Language and its Rules Branch Instructions 1 of 3 Branch Instructions 1 of 3 Program execution can forced to a different place using the B (branch) instruction: Program execution can forced to a different place using the B (branch) instruction: label: B label; label: B label; When the B (branch) is reached, the next instruction to be executed will be at the address label. It is similar to the goto instruction in C. When the B (branch) is reached, the next instruction to be executed will be at the address label. It is similar to the goto instruction in C.

180909 ajay patil 18 TMS320C6000 Assembly Language and its Rules Branch Instructions 2 of 3 Branch Instructions 2 of 3 Rather than using a label with the instruction B, a register can be used. Rather than using a label with the instruction B, a register can be used. MVKH label, B3; MVKL label, B3; B B3; This method is used by the C compiler, usually with B3, to return from a function. This method is used by the C compiler, usually with B3, to return from a function.

180909 ajay patil 19 TMS320C6000 Assembly Language and its Rules Branch Instructions 3 of 3 Branch Instructions 3 of 3 The instruction B can also be combined with a test for a condition. The instruction B can also be combined with a test for a condition. label: ADDK 1, A3; label: ADDK 1, A3; [A1] B label; [A1] B label; When the B (branch) is reached, the next instruction to be executed will be at the address label, but only if A1 is non-zero. When the B (branch) is reached, the next instruction to be executed will be at the address label, but only if A1 is non-zero.

180909 ajay patil 20 TMS320C6000 Assembly Language and its Rules Implementing a Delay Loop 1 of 2 Implementing a Delay Loop 1 of 2 In C, a delay loop can be implemented using the do- while construct: In C, a delay loop can be implemented using the do- while construct: int i = 10; int i = 10; do { do { i--; i--; } while (i != 0); } while (i != 0); We start with i = 10. Every time through the loop i is decremented. When i == 0 then the loop terminates. We start with i = 10. Every time through the loop i is decremented. When i == 0 then the loop terminates.

180909 ajay patil 21 TMS320C6000 Assembly Language and its Rules Implementing a Delay Loop 2 of 2 Implementing a Delay Loop 2 of 2 In assembly language, we can use A1 to hold i. This can be decremented and tested: In assembly language, we can use A1 to hold i. This can be decremented and tested: MVK 10, A1 ; A1 = 10 MVK 10, A1 ; A1 = 10 loop: ADDK –1, A1; Decrement A1 [A1] B loop; Branch to loop [A1] B loop; Branch to loop We start with A1 = 10. Every time through the loop A1 is decremented. When A1 == 0 then the loop terminates. We start with A1 = 10. Every time through the loop A1 is decremented. When A1 == 0 then the loop terminates.

180909 ajay patil 22 TMS320C6000 Assembly Language and its Rules Topic Three Allocating storage for variables and constants. Allocating storage for variables and constants.

180909 ajay patil 23 TMS320C6000 Assembly Language and its Rules To Declare a Variable 1 of 2 In C code, a 32-bit variable can be declared as follows: In C code, a 32-bit variable can be declared as follows: int x; int x; In assembly language use: In assembly language use: x:.usect ".far",4, 4 x:.usect ".far",4, 4

180909 ajay patil 24 TMS320C6000 Assembly Language and its Rules To Declare a Variable 2 of 2 This means: This means: x: A label. Where to find variable x: A label. Where to find variable.usect In un-initialised data.usect In un-initialised data ".far", Large memory model ".far", Large memory model 4, How many bytes 4, How many bytes 4; Align on 4-byte boundary 4; Align on 4-byte boundary

180909 ajay patil 25 TMS320C6000 Assembly Language and its Rules To Declare a Buffer 1 of 2 In C code, a 32-element buffer can be declared as an array: In C code, a 32-element buffer can be declared as an array: int buffer[32]; int buffer[32]; In assembly language use: In assembly language use: buffer:.usect ".far",128, 4 buffer:.usect ".far",128, 4

180909 ajay patil 26 TMS320C6000 Assembly Language and its Rules To Declare a Buffer 2 of 2 This means: This means: buffer: A label. Where to find data buffer: A label. Where to find data.usect In un-initialised data.usect In un-initialised data ".far", Large memory model ".far", Large memory model 128, How many bytes 128, How many bytes 4; Align on 4-byte boundary 4; Align on 4-byte boundary

180909 ajay patil 27 TMS320C6000 Assembly Language and its Rules To Declare Constants 1 of 3 In C code, an array of constants can be declared as: In C code, an array of constants can be declared as: const int constants[5] = {1,2,3,4,5}; const int constants[5] = {1,2,3,4,5}; This is an array of 5 read-only constants of value 1, 2, 3, 4 and 5. This is an array of 5 read-only constants of value 1, 2, 3, 4 and 5.

180909 ajay patil 28 TMS320C6000 Assembly Language and its Rules To Declare Constants 2 of 3 In assembly language use: In assembly language use:.sect ".const".sect ".const".align 4 coefficients:.field 1, 32 ;.field 2, 32 ;.filed 3, 32 ;.field 4, 32 ;.field 5, 32 ;

180909 ajay patil 29 TMS320C6000 Assembly Language and its Rules To Declare Constants 3 of 3 Here.sect “.const” tells the linker where in memory to store the values. Here.sect “.const” tells the linker where in memory to store the values..align 4 means align on a 4-byte boundary..align 4 means align on a 4-byte boundary. The constants are found at the address coefficients. The constants are found at the address coefficients. Each constant is declared as a field of a given value, and size 32 bits. Each constant is declared as a field of a given value, and size 32 bits..field 3, 32.field 3, 32

180909 ajay patil 30 TMS320C6000 Assembly Language and its Rules Topic Four Using pointers. Using pointers.

180909 ajay patil 31 TMS320C6000 Assembly Language and its Rules Pointing to a Buffer 1 of 3 To set up a pointer to a buffer in C we write: To set up a pointer to a buffer in C we write: int buffer[32]; int buffer[32]; int *ptr; int *ptr; ptr = &buffer[0]; ptr = &buffer[0]; The pointer ptr is given the address of the start of the buffer. The pointer ptr is given the address of the start of the buffer.

180909 ajay patil 32 TMS320C6000 Assembly Language and its Rules Pointing to a Buffer 2 of 3 When using TMS320C6000 assembly language, it is usual practice to use the following registers as pointers: When using TMS320C6000 assembly language, it is usual practice to use the following registers as pointers: A4, A5, A6, A7 A4, A5, A6, A7 B4, B5, B6, B7 B4, B5, B6, B7 These registers also support circular addressing. These registers also support circular addressing.

180909 ajay patil 33 TMS320C6000 Assembly Language and its Rules Pointing to a Buffer 2 of 3 To use register A4 as the pointer to the buffer: To use register A4 as the pointer to the buffer: buffer:.usect ".far",128, 4 buffer:.usect ".far",128, 4 MVKL buffer, A4 MVKL buffer, A4 MVKH buffer, A4 MVKH buffer, A4 First instruction MVKL writes to the low half of register A4 First instruction MVKL writes to the low half of register A4 The second instruction MVKH writes to the high half of register A4 The second instruction MVKH writes to the high half of register A4

180909 ajay patil 34 TMS320C6000 Assembly Language and its Rules Moving Data to a Register 1 of 2 We can load a register with the 32-bit contents of a data memory address. Assume that register A4 points to buffer[0] We can load a register with the 32-bit contents of a data memory address. Assume that register A4 points to buffer[0] LDW *A4, A5; LDW *A4, A5; The instruction LDW (load word) copies a word of data from buffer[0] to register A5 The instruction LDW (load word) copies a word of data from buffer[0] to register A5 Here W = word = 32 bits Here W = word = 32 bits

180909 ajay patil 35 TMS320C6000 Assembly Language and its Rules Moving Data to a Register 2 of 2 The instruction LDW takes 4 cycles to get the data, which makes it slow. Care is needed to wait the required time, for example using 4 NOP s. The instruction LDW takes 4 cycles to get the data, which makes it slow. Care is needed to wait the required time, for example using 4 NOP s. LDW *A4, A5; LDW *A4, A5; NOP 4 ; A5 not ready NOP 4 ; A5 not ready ADDK 2, A5 ; A5 now ready ADDK 2, A5 ; A5 now ready

180909 ajay patil 36 TMS320C6000 Assembly Language and its Rules Moving Data from a Register We can store the 32-bit contents of a register at an address in data memory. Assume that register A5 points to buffer[0]: We can store the 32-bit contents of a register at an address in data memory. Assume that register A5 points to buffer[0]: STW A4, *A5; STW A4, *A5; The instruction STW (store word) copies a word of data from register A4 to buffer[0]. The data are available immediately. The instruction STW (store word) copies a word of data from register A4 to buffer[0]. The data are available immediately. Here W = word = 32 bits Here W = word = 32 bits

180909 ajay patil 37 TMS320C6000 Assembly Language and its Rules Operations on Pointers 1 of 4 Several pointer operations are possible in C: Several pointer operations are possible in C: *ptr++; Post-increment *ptr++; Post-increment *ptr--; Post-decrement *ptr--; Post-decrement ++*ptr; Pre-increment ++*ptr; Pre-increment --*ptr; Pre-decrement --*ptr; Pre-decrement

180909 ajay patil 38 TMS320C6000 Assembly Language and its Rules Operations on Pointers 2 of 4 The same pointer operations are available in assembly language: The same pointer operations are available in assembly language: *A4++; Post-increment *A4++; Post-increment *A5--; Post-decrement *A5--; Post-decrement ++*B6; Pre-increment ++*B6; Pre-increment --*B7; Pre-decrement --*B7; Pre-decrement

180909 ajay patil 39 TMS320C6000 Assembly Language and its Rules Operations on Pointers 3 of 4 The pointer increment and decrement operators can be used with load and store instructions. The pointer increment and decrement operators can be used with load and store instructions. Suppose we want to copy data from one place to another. In C we might write: Suppose we want to copy data from one place to another. In C we might write: for ( i = 0 ; i < 10 ; i++) for ( i = 0 ; i < 10 ; i++) { *ptr2++ = *ptr1++; } { *ptr2++ = *ptr1++; }

180909 ajay patil 40 TMS320C6000 Assembly Language and its Rules Operations on Pointers 4 of 4 In assembly language, the part: In assembly language, the part: *ptr2++ = *ptr1++; *ptr2++ = *ptr1++; Could be written as: Could be written as: LDW *A4++, A0; LDW *A4++, A0; NOP 4; NOP 4; STW A0, *A5++; STW A0, *A5++;

180909 ajay patil 41 TMS320C6000 Assembly Language and its Rules Topic Five Multiplications and Division Multiplications and Division

180909 ajay patil 42 TMS320C6000 Assembly Language and its Rules Multiplications 1 of 5 Multiplication is widely used in DSP for Finite Impulse Response (FIR) filters and correlation. Multiplication is widely used in DSP for Finite Impulse Response (FIR) filters and correlation.

180909 ajay patil 43 TMS320C6000 Assembly Language and its Rules Multiplications 2 of 5 Multiply instructions use registers: Multiply instructions use registers: MPY A1, A2, A3; MPY A1, A2, A3; Multiply the 16-bit value in register A1 by the 16-bit value in register A2 and put the 32-bit product in register A3. Multiply the 16-bit value in register A1 by the 16-bit value in register A2 and put the 32-bit product in register A3. In other words, A3 = A1 x A2 In other words, A3 = A1 x A2

180909 ajay patil 44 TMS320C6000 Assembly Language and its Rules Multiplications 3 of 5 Multiplication instructions can only use registers. They cannot use pointer operations: Multiplication instructions can only use registers. They cannot use pointer operations: MPY A3, A4, A5  MPY A3, A4, A5  MPY *A3, A4, A5 X MPY *A3, A4, A5 X MPY A3, *A4++, A5 X MPY A3, *A4++, A5 X

180909 ajay patil 45 TMS320C6000 Assembly Language and its Rules Multiplications 4 of 5 The MPY instruction has one delay slot. This means that the product is not available until 2 cycles after the MPY instruction. The MPY instruction has one delay slot. This means that the product is not available until 2 cycles after the MPY instruction. MPY A3, A4, A5; MPY A3, A4, A5; NOP ; Wait 1 cycle NOP ; Wait 1 cycle STW A5, *A4 ; Store product STW A5, *A4 ; Store product It may be necessary to follow the MPY instruction with a NOP. It may be necessary to follow the MPY instruction with a NOP.

180909 ajay patil 46 TMS320C6000 Assembly Language and its Rules Multiplications 5 of 5 For multiplications by powers of 2, for example 2, 4, 8, 16, 32 etc, use the instruction SHL (Shift Left). For multiplications by powers of 2, for example 2, 4, 8, 16, 32 etc, use the instruction SHL (Shift Left). SHL A3, 1, A3; Multiply by 2 SHL A3, 1, A3; Multiply by 2 SHL A4, 2, A4; Multiply by 4 SHL A4, 2, A4; Multiply by 4 SHL B5, 3, B5; Multiply by 8 SHL B5, 3, B5; Multiply by 8 SHL B7, 8, B7; Multiply by 256 SHL B7, 8, B7; Multiply by 256 This is a single-cycle instruction. This is a single-cycle instruction.

180909 ajay patil 47 TMS320C6000 Assembly Language and its Rules Division To divide by powers of 2, for example 2, 4, 8, 16, 32 etc, use the instruction SHR (Shift Right). To divide by powers of 2, for example 2, 4, 8, 16, 32 etc, use the instruction SHR (Shift Right). SHR B3, 1, B3; Divide by 2 SHR B3, 1, B3; Divide by 2 SHR A4, 2, A5; Divide by 4 SHR A4, 2, A5; Divide by 4 SHR B5, 3, A3; Divide by 8 SHR B5, 3, A3; Divide by 8 SHR B7, 8, B7; Divide by 256 SHR B7, 8, B7; Divide by 256 This is a single-cycle instruction. This is a single-cycle instruction.

180909 ajay patil 48 TMS320C6000 Assembly Language and its Rules Topic Six Introducing Delay Slots Introducing Delay Slots

180909 ajay patil 49 TMS320C6000 Assembly Language and its Rules Delay Slots 1 of 3 Delay Slots 1 of 3 So far we have ignored the time it takes the processor to implement an instruction. So far we have ignored the time it takes the processor to implement an instruction. In fact, the instruction B takes 6 cycles before the branch actually occurs. In fact, the instruction B takes 6 cycles before the branch actually occurs. Rather than just waiting 6 cycles, the TMS320C6000 allows another 5 other instructions to be executed. These are called delay slots. Rather than just waiting 6 cycles, the TMS320C6000 allows another 5 other instructions to be executed. These are called delay slots.

180909 ajay patil 50 TMS320C6000 Assembly Language and its Rules Delay Slots 2 of 3 Delay Slots 2 of 3 For correct operation of the processor we need to put 5 NOP s (or other instructions) after the B instruction. For correct operation of the processor we need to put 5 NOP s (or other instructions) after the B instruction. loop: B loop; 1 cycle loop: B loop; 1 cycle NOP; 1st delay slot NOP; 1st delay slot NOP; 2nd delay slot NOP; 2nd delay slot NOP; 3rd delay slot NOP; 3rd delay slot NOP; 4th delay slot NOP; 4th delay slot NOP; 5th delay slot NOP; 5th delay slot NOP; B taken here. NOP; B taken here.

180909 ajay patil 51 TMS320C6000 Assembly Language and its Rules Delay Slots 3 of 3 Delay Slots 3 of 3 For correct operation of the next instruction, the delay loop we saw earlier should be written as: For correct operation of the next instruction, the delay loop we saw earlier should be written as: MVK 10, A1 ; A1 = 10 loop: ADDK –1, A1; Decrement A1 [A1] B loop; Branch to loop [A1] B loop; Branch to loop NOP 5 ; 5 delay slots

180909 ajay patil 52 TMS320C6000 Assembly Language and its Rules Topic Seven Writing an assembly language function callable from C code. Writing an assembly language function callable from C code.

180909 ajay patil 53 TMS320C6000 Assembly Language and its Rules C Callable Function 1 of 6 Suppose we want to write the following assembly language function that adds together two numbers: Suppose we want to write the following assembly language function that adds together two numbers: int sum ( int x, int y) {return (x + y);} {return (x + y);} To use the function in C we might write: To use the function in C we might write: int result; result = function (100, 200);

180909 ajay patil 54 TMS320C6000 Assembly Language and its Rules C Callable Function 2 of 6 The C compiler implements the function as follows: The C compiler implements the function as follows: Parameter x is passed in A4 Parameter x is passed in A4 Parameter y is passed in B4 Parameter y is passed in B4 The return value is in A4: The return value is in A4: The C function can be thought of as: The C function can be thought of as: A4 sum ( A4, B4); A4 sum ( A4, B4);

180909 ajay patil 55 TMS320C6000 Assembly Language and its Rules C Callable Function 3 of 6 In assembly language we write: In assembly language we write:.global _sum;.global _sum;.sect “.text”;.align 4; _sum:ADD A4, B4, A4; B B3 ; B B3 ; NOP 5 ;

180909 ajay patil 56 TMS320C6000 Assembly Language and its Rules C Callable Function 4 of 6 The.global assembler directive makes the label _sum available outside this module. The.global assembler directive makes the label _sum available outside this module..global _sum; Notice the underscore at the beginning of _sum. This is a C compiler convention. Notice the underscore at the beginning of _sum. This is a C compiler convention.

180909 ajay patil 57 TMS320C6000 Assembly Language and its Rules C Callable Function 5 of 6 The line.sect “.text” puts the code in the code segment. The line.sect “.text” puts the code in the code segment. The assembler directive. align 4 aligns the code on a 32-bit boundary. The assembler directive. align 4 aligns the code on a 32-bit boundary. The instruction ADD A4, B4, A4 adds the value in A4 to the value in B4 and puts the result in A4. The instruction ADD A4, B4, A4 adds the value in A4 to the value in B4 and puts the result in A4.

180909 ajay patil 58 TMS320C6000 Assembly Language and its Rules C Callable Function 6 of 6 Finally, return from the function: B B3 ; B B3 ; NOP 5 ; Just before the function sum() is called, the compiler puts the return address in register B3. Just before the function sum() is called, the compiler puts the return address in register B3. Important: Do not change the register B3 inside the function. The return address will be lost and the program may well crash! Important: Do not change the register B3 inside the function. The return address will be lost and the program may well crash!

180909 ajay patil 59 TMS320C6000 Assembly Language and its Rules Allocating Local Variables 1 of 5 In C, variables are sometimes used only in a particular function. These are local variables. In C, variables are sometimes used only in a particular function. These are local variables. int my_function (void) int my_function (void) { int x; // A local variable. int x; // A local variable. }; }; The variable x is only available within my_function() The variable x is only available within my_function()

180909 ajay patil 60 TMS320C6000 Assembly Language and its Rules Allocating Local Variables 2 of 5 To allocate temporary storage for a 32-bit variable (4 bytes), subtract 4+4 from the Stack Pointer (SP) at the beginning of the function. To allocate temporary storage for a 32-bit variable (4 bytes), subtract 4+4 from the Stack Pointer (SP) at the beginning of the function..asg SP, B15 ; Make B15 the SP.asg SP, B15 ; Make B15 the SP function: function: SUB SP -8, SP SUB SP -8, SP

180909 ajay patil 61 TMS320C6000 Assembly Language and its Rules Allocating Local Variables 3 of 5 To write to a local variable use: To write to a local variable use: STW A4,*+SP(4) STW A4,*+SP(4) This stores the contents of register A4 at the data memory location with a positive offset of 4 bytes from SP. This stores the contents of register A4 at the data memory location with a positive offset of 4 bytes from SP. This can be used as a “ push ” instruction. This can be used as a “ push ” instruction.

180909 ajay patil 62 TMS320C6000 Assembly Language and its Rules Allocating Local Variables 4 of 5 To read a local variable use: To read a local variable use: LDW *+SP(4), A2 LDW *+SP(4), A2 Here *+SP(4) means the contents of the memory location at a positive offset of 4 bytes from SP. Here *+SP(4) means the contents of the memory location at a positive offset of 4 bytes from SP. This can be used as a “ pop ” instruction. This can be used as a “ pop ” instruction.

180909 ajay patil 63 TMS320C6000 Assembly Language and its Rules Allocating Local Variables 5 of 5 At the end of the function, add the same number of bytes to the SP to restore it to its original value. At the end of the function, add the same number of bytes to the SP to restore it to its original value. ADDK 8, SP ADDK 8, SP B B3 ; Return B B3 ; Return

180909 ajay patil 64 TMS320C6000 Assembly Language and its Rules Topic Eight Parallel Operations. Parallel Operations.

180909 ajay patil 65 TMS320C6000 Assembly Language and its Rules Parallel Operations 1 of 4 We have already seen how to write constants to a registers. We have already seen how to write constants to a registers. MVK 1234h, A4; 1 cycle MVK 5678h, B4; 1 cycle MVK 5678h, B4; 1 cycle In this case we write the value 1234h to register A4 then write 5678h to register B4. In this case we write the value 1234h to register A4 then write 5678h to register B4. This takes 2 cycles. This takes 2 cycles.

180909 ajay patil 66 TMS320C6000 Assembly Language and its Rules Parallel Operations 2 of 4 We can write the same instructions using the || operator: We can write the same instructions using the || operator: MVK 1234h, A4; MVK 1234h, A4; || MVK 5678h, B4; || MVK 5678h, B4; In this case we write the value 1234h to register A4, at the same time as we write 5678h to register B5. In this case we write the value 1234h to register A4, at the same time as we write 5678h to register B5. This takes 1 cycle. This takes 1 cycle.

180909 ajay patil 67 TMS320C6000 Assembly Language and its Rules Parallel Operations 3 of 4 We need one register from A0 to A15 and the other from one of B0 to B15. We need one register from A0 to A15 and the other from one of B0 to B15. MVK 1234h, A4; MVK 1234h, A4; || MVK 5678h, A5; X || MVK 5678h, A5; X We cannot perform parallel operations on two registers from the same register bank. We cannot perform parallel operations on two registers from the same register bank.

180909 ajay patil 68 TMS320C6000 Assembly Language and its Rules Parallel Operations 4 of 4 Parallel operations are very useful for stereo audio processing. Parallel operations are very useful for stereo audio processing. We can process both the left channel and the right channel at exactly the same time. We can process both the left channel and the right channel at exactly the same time. This means it can be as fast to process two channels as it is to process one. This means it can be as fast to process two channels as it is to process one. This cannot be done in C. This cannot be done in C.

180909 ajay patil 69 TMS320C6000 Assembly Language and its Rules Topic Nine Using Circular Addressing for Circular Buffers. Using Circular Addressing for Circular Buffers.

180909 ajay patil 70 TMS320C6000 Assembly Language and its Rules Circular Buffers 1 of 7 We can implement a circular buffer in C code as follows: We can implement a circular buffer in C code as follows: int buffer[16]; int buffer[16]; static int * ptr; // Pointer static int * ptr; // Pointer ptr = &buffer[0]; // Initialise ptr = &buffer[0]; // Initialise if ( ptr < &buffer[15] ) if ( ptr < &buffer[15] ) ptr++; // Increment ptr++; // Increment else else ptr = &buffer[0]; // Back to start ptr = &buffer[0]; // Back to start

180909 ajay patil 71 TMS320C6000 Assembly Language and its Rules Circular Buffers 2 of 7 The TMS320C6000 can support circular buffers of size 8, 16, 32, 64, 128 etc bytes. The TMS320C6000 can support circular buffers of size 8, 16, 32, 64, 128 etc bytes. To set up a particular register for use as circular buffer, we must configure the AMR (Address Mode Register). To set up a particular register for use as circular buffer, we must configure the AMR (Address Mode Register). At power up, the AMR contains 00000000h. At power up, the AMR contains 00000000h.

180909 ajay patil 72 TMS320C6000 Assembly Language and its Rules Circular Buffers 3 of 7 To read the AMR register we must use the special instruction MVC (move control register). To read the AMR register we must use the special instruction MVC (move control register). MVC AMR, B1;  Copy AMR to B1 MVC AMR, B1;  Copy AMR to B1 MVC AMR, A1; X Must be B-side MVC AMR, A1; X Must be B-side To write to the AMR register we again use the special instruction MVC (move control register). To write to the AMR register we again use the special instruction MVC (move control register). MVC A1, AMR; Copy A1 to AMR

180909 ajay patil 73 TMS320C6000 Assembly Language and its Rules Circular Buffers 4 of 7 For example, to set up A7 for use in circular addressing, we write the value 00050040h to the AMR (Address Mode Register). For example, to set up A7 for use in circular addressing, we write the value 00050040h to the AMR (Address Mode Register). MVKL 00050040h, A2 MVKH 00050040h, A2 MVC A2, AMR ; Update AMR

180909 ajay patil 74 TMS320C6000 Assembly Language and its Rules Circular Buffers 5 of 7 The buffer has a size 16 * 4 = 64 bytes. The buffer has a size 16 * 4 = 64 bytes. int buffer[16]; int buffer[16]; For circular addressing, the buffer must be aligned on a boundary equal to the size of the buffer. For circular addressing, the buffer must be aligned on a boundary equal to the size of the buffer. buffer:.usect ".far", 64, 64 buffer:.usect ".far", 64, 64

180909 ajay patil 75 TMS320C6000 Assembly Language and its Rules Circular Buffers 6 of 7 To store the value of the ptr we can write: To store the value of the ptr we can write: ptr:.usect “.far”, 4, 4 ptr:.usect “.far”, 4, 4 At the beginning of program, set ptr to contain the starting address of the buffer. At the beginning of program, set ptr to contain the starting address of the buffer.

180909 ajay patil 76 TMS320C6000 Assembly Language and its Rules Circular Buffers 7 of 7 To read a value from the buffer with increment of A7: To read a value from the buffer with increment of A7: LDW *A7++,A3; Read from buffer To write a value from a register back to the buffer : To write a value from a register back to the buffer : STW A3, *A7; Write to buffer STW A3, *A7; Write to buffer

180909 ajay patil 77 TMS320C6000 Assembly Language and its Rules Topic Ten 40-bit Operations. 40-bit Operations.

180909 ajay patil 78 TMS320C6000 Assembly Language and its Rules 40-bit Operations 1 of 8 So far we have used registers A0 to A15 and B0 to B15 for 32-bit operations. So far we have used registers A0 to A15 and B0 to B15 for 32-bit operations. The TMS320C6000 also supports 40-bit maths. The TMS320C6000 also supports 40-bit maths. Let us look at a simple addition. Let us look at a simple addition. ADD A2, A1:A0, A3:A2 ADD A2, A1:A0, A3:A2

180909 ajay patil 79 TMS320C6000 Assembly Language and its Rules 40-bit Operations 2 of 8 Here A1:A0 is a 40-bit register. Here A1:A0 is a 40-bit register. 8 bits are provided by A1 8 bits are provided by A1 32 bits are provided by A0 32 bits are provided by A0 ADD A2, A1:A0, A3:A2 ADD A2, A1:A0, A3:A2 This means: add the 40-bit value register pair A1:A0 to 32-bit register A2, then put the 40-bit result in register pair A3:A2. This means: add the 40-bit value register pair A1:A0 to 32-bit register A2, then put the 40-bit result in register pair A3:A2.

180909 ajay patil 80 TMS320C6000 Assembly Language and its Rules 40-bit Operations 3 of 8 40-bit operations are particularly useful for FIR filters, which use a large number of multiplies and additions. 40-bit operations are particularly useful for FIR filters, which use a large number of multiplies and additions. Suppose we are implementing a 64-element FIR filter using 32-bit maths. To prevent overflow, we have to divide each multiplication by 32 before adding. This can mean a loss of accuracy. Suppose we are implementing a 64-element FIR filter using 32-bit maths. To prevent overflow, we have to divide each multiplication by 32 before adding. This can mean a loss of accuracy.

180909 ajay patil 81 TMS320C6000 Assembly Language and its Rules 40-bit Operations 4 of 8 In C code, the divide by 32 is implemented as a shift right 5 places: In C code, the divide by 32 is implemented as a shift right 5 places: int temp = 0; // 32-bit variable for ( i = 0 ; i < 64 ; i++) { temp = input[i]*coeff[i]; temp = input[i]*coeff[i]; result += (temp >> 5); result += (temp >> 5); } result >>= 10;

180909 ajay patil 82 TMS320C6000 Assembly Language and its Rules 40-bit Operations 5 of 8 Using 40-bit maths, there is no need to perform a division before each addition: Using 40-bit maths, there is no need to perform a division before each addition: long temp = 0; // 40 bits long temp = 0; // 40 bits for ( i = 0 ; i < 64 ; i++) for ( i = 0 ; i < 64 ; i++) { temp = input[i]*coeff[i]; temp = input[i]*coeff[i]; } temp >>= 15; temp >>= 15;

180909 ajay patil 83 TMS320C6000 Assembly Language and its Rules 40-bit Operations 6 of 8 The FIR implementation becomes: The FIR implementation becomes: MVK 64, B1 MVK 64, B1 loop: LDW *B4++, A0 ; B4 -> coeffs LDW *B5++, A1 ; B5 -> inputs LDW *B5++, A1 ; B5 -> inputs NOP 4 MPY A0, A1, A2 ; Multiply ADDK –1, B1 ADDK –1, B1 ADD A2, A3:A2,A3:A2 ; Accumulate ADD A2, A3:A2,A3:A2 ; Accumulate [B1] B loop [B1] B loop NOP 5 NOP 5 SHR A3:A2,15,A3:A2 ; Divide once SHR A3:A2,15,A3:A2 ; Divide once

180909 ajay patil 84 TMS320C6000 Assembly Language and its Rules 40-bit Operations 7 of 8 When performing 40-bit operations, the instruction ADDK cannot be used. The instruction ADD must be used instead: When performing 40-bit operations, the instruction ADDK cannot be used. The instruction ADD must be used instead: ADDK 1, A1:A0 X ADDK 1, A1:A0 X ADD 1, A1:A0, A1:A0  ADD 1, A1:A0, A1:A0  Similarly, the instruction SUB must be used to subtract from a 40-bit register: Similarly, the instruction SUB must be used to subtract from a 40-bit register: ADDK -1, A5:A4 X ADDK -1, A5:A4 X SUB 1, A5:A4, A5:A4  SUB 1, A5:A4, A5:A4 

180909 ajay patil 85 TMS320C6000 Assembly Language and its Rules 40-bit Operations 7 of 8 When converting a 40-bit value to 32- bits, it is wise to use the SAT (saturate) instruction to prevent the sign changing: When converting a 40-bit value to 32- bits, it is wise to use the SAT (saturate) instruction to prevent the sign changing: SAT A1:A0, A4 SAT A1:A0, A4 Suppose A1:A0 contains 00 FFFF FFFFh. This is a positive number. Suppose A1:A0 contains 00 FFFF FFFFh. This is a positive number. However, the 32-bit value in A0 is FFFF FFFFh. This is a negative number. However, the 32-bit value in A0 is FFFF FFFFh. This is a negative number.

180909 ajay patil 86 TMS320C6000 Assembly Language and its Rules Topic Eleven Optimising assembly code for speed. Optimising assembly code for speed.

180909 ajay patil 87 TMS320C6000 Assembly Language and its Rules Optimising Code 1 of 5 Let us start with a very simple C function that copies one block of data to another: Let us start with a very simple C function that copies one block of data to another: void copy(int* p1, int* p2, int size) void copy(int* p1, int* p2, int size) { while (size--) while (size--) { *p2++ = *p1++; *p2++ = *p1++; } }

180909 ajay patil 88 TMS320C6000 Assembly Language and its Rules Optimising Code 2 of 5 In assembly language we could write: In assembly language we could write: loop: LDW *A4++, A0 loop: LDW *A4++, A0 NOP 4 NOP 4 STW A0, *B4++ STW A0, *B4++ ADDK –1, A1 ADDK –1, A1 [B1] B loop [B1] B loop NOP 5 NOP 5 B B3 B B3 NOP 5 NOP 5 This takes 144 cycles to execute. This takes 144 cycles to execute.

180909 ajay patil 89 TMS320C6000 Assembly Language and its Rules Optimising Code 3 of 5 We can move the ADDK and B loop instructions upwards: We can move the ADDK and B loop instructions upwards: loop: LDW *A4++, A0 ADDK –1, A1 ADDK –1, A1 [B1] B loop [B1] B loop NOP 2 ; Lose 2 NOPs here NOP 2 ; Lose 2 NOPs here STW A0, *B4++ STW A0, *B4++ NOP 2 ; Lose 2 NOPs here NOP 2 ; Lose 2 NOPs here B B3 B B3 NOP 5 NOP 5

180909 ajay patil 90 TMS320C6000 Assembly Language and its Rules Optimising Code 4 of 5 We can also move the B B3 instruction upwards: We can also move the B B3 instruction upwards: loop: LDW *A4++, A0 ADDK –1, A1 ADDK –1, A1 [A1] B copy [A1] B copy || [!A1]B B3 ; Add test || [!A1]B B3 ; Add test NOP 1 NOP 1 STW A0, *B4++ STW A0, *B4++ NOP 3 NOP 3 This takes 93 cycles to execute. This takes 93 cycles to execute.

180909 ajay patil 91 TMS320C6000 Assembly Language and its Rules Optimising Code 5 of 5 The optimised version runs 35% faster than the un- optimised version. The optimised version runs 35% faster than the un- optimised version. The only downside is that the code is harder to read and debug. The only downside is that the code is harder to read and debug. It is therefore recommended that the code is written and tested, then optimised and re-tested again. It is therefore recommended that the code is written and tested, then optimised and re-tested again.

180909 ajay patil 92 TMS320C6000 Assembly Language and its Rules Topic Twelve A typical application of assembly language. A typical application of assembly language.

180909 ajay patil 93 TMS320C6000 Assembly Language and its Rules Implementing a Stereo FIR Filter 1 of 6 We will design a stereo Finite Impulse Response (FIR) filter that processes both the left and right hand audio channels at exactly the same time. It will use 64 coefficients. We will design a stereo Finite Impulse Response (FIR) filter that processes both the left and right hand audio channels at exactly the same time. It will use 64 coefficients. We will bring together several techniques explained earlier. We will bring together several techniques explained earlier. Compare performance with C code version. Compare performance with C code version.

180909 ajay patil 94 TMS320C6000 Assembly Language and its Rules Implementing a Stereo FIR Filter 2 of 6 We will start with a C callable function. We will start with a C callable function. int stereo_FIR (const int *, int x, int x, int y ) int y ) { }; { }; const int * points to the filter coefficients. const int * points to the filter coefficients. Here x and y are the inputs. Here x and y are the inputs.

180909 ajay patil 95 TMS320C6000 Assembly Language and its Rules Implementing a Stereo FIR Filter 3 of 6 The C function has register usage as follows: The C function has register usage as follows: A4 stereo_FIR ( A4, B4, A6 ) A4 points to the coefficients. A4 points to the coefficients. B4 contains x. A6 contains y. B4 contains x. A6 contains y. Note that there can only be one return value, so A4 will contain both outputs. Note that there can only be one return value, so A4 will contain both outputs.

180909 ajay patil 96 TMS320C6000 Assembly Language and its Rules Implementing a Stereo FIR Filter 4 of 6 We can start with the same code we used for 40-bit operations, but modified for buffer1. We can start with the same code we used for 40-bit operations, but modified for buffer1. MVK 64, B1 MVK 64, B1 loop: LDW *A4++, A0 ; B4 -> coeffs LDW *A5++, A1 ; A5 -> buffer1 LDW *A5++, A1 ; A5 -> buffer1 NOP 4 NOP 4 MPY A0, A1, A2 ; Multiply MPY A0, A1, A2 ; Multiply ADDK –1, B1 ADDK –1, B1 ADD A2, A7:A6, A7:A6; Accumulate ADD A2, A7:A6, A7:A6; Accumulate [B1] B loop [B1] B loop

180909 ajay patil 97 TMS320C6000 Assembly Language and its Rules Implementing a Stereo FIR Filter 5 of 6 Now we add parallel operations for the second channel using B instead of A: Now we add parallel operations for the second channel using B instead of A: loop: LDW *A4++, A0 ; B4 -> coeffs LDW *A5++, A1 ; A5 -> buffer1 LDW *A5++, A1 ; A5 -> buffer1 || LDW *B5++, B1 ; B5 -> buffer2 || LDW *B5++, B1 ; B5 -> buffer2 NOP 4 MPY A0, A1, A2 ; Multiply || MPY B0, B1, B2 ; || MPY B0, B1, B2 ; ADDK –1, B1 ADDK –1, B1 ADD A2,A7:A6,A7:A6 ; Accumulate ADD A2,A7:A6,A7:A6 ; Accumulate || ADD B2,B7:B6,B7:B6 ; || ADD B2,B7:B6,B7:B6 ; [B1] B loop [B1] B loop

180909 ajay patil 98 TMS320C6000 Assembly Language and its Rules Implementing a Stereo FIR Filter 6 of 6 The operations on the B registers is done exactly at the same time as those on the A registers. The operations on the B registers is done exactly at the same time as those on the A registers. The full assembly code for the stereo FIR filter is given in the files FIR_filters_asm.asm and FIR_filters_asm.h The full assembly code for the stereo FIR filter is given in the files FIR_filters_asm.asm and FIR_filters_asm.h

180909 ajay patil 99 TMS320C6000 Assembly Language and its Rules Topic Twelve Some other information. Some other information.

180909 ajay patil 100 TMS320C6000 Assembly Language and its Rules Other Instructions Besides the instructions given here, the C67xx and C64xx have additional assembly language instructions. Besides the instructions given here, the C67xx and C64xx have additional assembly language instructions. The C67xx has floating point instructions. The C67xx has floating point instructions. The C64xx has more registers and supports 32-bit maths. The C64xx has more registers and supports 32-bit maths. See the References section for details. See the References section for details.

180909 ajay patil 101 TMS320C6000 Assembly Language and its Rules References TMS320C6000 CPU and Instruction Set Reference Guide SPRU189. TMS320C6000 CPU and Instruction Set Reference Guide SPRU189. TMS320C6000 Assembly Language Tools User's Guide SPRU186. TMS320C6000 Assembly Language Tools User's Guide SPRU186.

180909 ajay patil 1 TMS320C6000 Assembly Language and its Rules Assignment One of the simplest operations in C is to assign a constant to a variable: One.

Similar presentations

Presentation on theme: "180909 ajay patil 1 TMS320C6000 Assembly Language and its Rules Assignment One of the simplest operations in C is to assign a constant to a variable: One."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

180909 ajay patil 1 TMS320C6000 Assembly Language and its Rules Assignment One of the simplest operations in C is to assign a constant to a variable: One.

Similar presentations

Presentation on theme: "180909 ajay patil 1 TMS320C6000 Assembly Language and its Rules Assignment One of the simplest operations in C is to assign a constant to a variable: One."— Presentation transcript:

Similar presentations

About project

Feedback