TMS320 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Overview A.A.2009-10.

Slides:



Advertisements
Similar presentations
Chapter 1 Introduction.
Advertisements

CH10 Instruction Sets: Characteristics and Functions
Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.
Goal: Write Programs in Assembly
Review of the MIPS Instruction Set Architecture. RISC Instruction Set Basics All operations on data apply to data in registers and typically change the.
Instruction Set-Intro
TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.
Branches Two branch instructions:
TMS320C6713 Assembly Language (cont’d). Module 1 Exam (solution) 1. Functional Units a. How many can perform an ADD? Name them. a. How many can perform.
Lecture 6 Programming the TMS320C6x Family of DSPs.
Superscalar and VLIW Architectures Miodrag Bolic CEG3151.
Assembly and Linear Assembly Evgeny Kirshin, 05/10/2011
TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
1 Lecture 3: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation.
INSTRUCTION SET ARCHITECTURES
TMS320C6000 Architectural and Programming Overview.
TMS320C6000 Architectural Overview.  Describe C6000 CPU architecture.  Introduce some basic instructions.  Describe the C6000 memory map.  Provide.
ECSE DSP architecture Review of basic computer architecture concepts C6000 architecture: VLIW Principle and Scheduling Addressing Assembly and linear.
Execution of an instruction
Introduction.
Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.
Lecture 5 Sept 14 Goals: Chapter 2 continued MIPS assembly language instruction formats translating c into MIPS - examples.
What is an instruction set?
Embedded Systems Programming
Digital Signal Processor
IntroductionIntroduction Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.
Part II: Addressing Modes
Machine Instruction Characteristics
Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CS-334: Computer.
Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.
Registers and MAL Lecture 12. The MAL Architecture MAL is a load/store architecture. MAL supports only those addressing modes supported by the MIPS RISC.
DSP Processors We have seen that the Multiply and Accumulate (MAC) operation is very prevalent in DSP computation computation of energy MA filters AR filters.
CSCI 136 Lab 1: 135 Review.
Chapter 5 The LC Instruction Set Architecture ISA = All of the programmer-visible components and operations of the computer memory organization.
1 Instruction Set Architecture (ISA) Alexander Titov 10/20/2012.
Instruction Set Architecture The portion of the machine visible to the programmer Issues: Internal storage model Addressing modes Operations Operands Encoding.
Chapter 5 The LC Instruction Set Architecture ISA = All of the programmer-visible components and operations of the computer memory organization.
Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per.
ajay patil 1 TMS320C6000 Assembly Language and its Rules Assignment One of the simplest operations in C is to assign a constant to a variable: One.
Computer Architecture and Organization
Chapter 1 Introduction. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 1, Slide 2 Learning Objectives  Why process signals.
Computer Architecture EKT 422
Lecture 04: Instruction Set Principles Kai Bu
ALU (Continued) Computer Architecture (Fall 2006).
Chapter 10 Instruction Sets: Characteristics and Functions Felipe Navarro Luis Gomez Collin Brown.
Group # 3 Jorge Chavez Henry Diaz Janty Ghazi German Montenegro.
Arrays in MIPS Assembly Computer Organization and Assembly Language: Module 6.
Computer Organization Instructions Language of The Computer (MIPS) 2.
Instruction Sets. Instruction set It is a list of all instructions that a processor can execute. It is a list of all instructions that a processor can.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
Assembly Variables: Registers Unlike HLL like C or Java, assembly cannot use variables – Why not? Keep Hardware Simple Assembly Operands are registers.
A Closer Look at Instruction Set Architectures
TMS320C6713 Assembly Language
/ Computer Architecture and Design
Morgan Kaufmann Publishers The Processor
Introduction.
Details .L and .S units TMS320C6000.
Chapter 5 The LC-3.
EECE-276 Fall 2003 Microprocessors & Microcontrollers II
Chapter 1 Introduction.
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
Introduction to Micro Controllers & Embedded System Design
Chapter 9 Instruction Sets: Characteristics and Functions
Computer Instructions
Computer Architecture
COMS 361 Computer Organization
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
Superscalar and VLIW Architectures
Chapter 10 Instruction Sets: Characteristics and Functions
Presentation transcript:

TMS320 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Overview A.A

A.A VoIP Texas Instruments’ TMS320 Family C2000 C5000 C6000 Lowest Cost Control Systems - Motor Control - Storage - Digital Ctrl Systems Efficiency Best MIPS per Watt / Dollar / Size Best MIPS per Watt / Dollar / Size - Wireless phones - Internet audio players - Digital still cameras - Modems - Telephony - VoIP Performance & Best Ease-of-Use - Multi Channel and Multi Function App's Function App's - Comm Infrastructure - Wireless Base-stations - DSL - Imaging - Multi-media Servers - Video Digital Signal Processor

3 C 64x: C 64x: The C64x fixed-point DSPs offer the industry's highest level of performance to address the demands of the digital age. At clock rates of up to 1 GHz, C64x DSPs can process information at rates up to 8000 MIPS with costs as low as $ In addition to a high clock rate, C64x DSPs can do more work each cycle with built-in extensions. These extensions include new instructions to accelerate performance in key application areas such as digital communications infrastructure and video and image processing. c62 x : c62 x : These first-generation fixed-point DSPs represent breakthrough technology that enables new equipments and energizes existing implementations for multi- channel, multi-function applications, such as wireless base stations, remote access servers (RAS), digital subscriber loop (xDSL) systems, personalized home security systems, advanced imaging/biometrics, industrial scanners, precision instrumentation and multi-channel telephony systems. C 67 x: C 67 x: For designers of high-precision applications, C67x floating-point DSPs offer the speed, precision, power savings and dynamic range to meet a wide variety of design needs. These dynamic DSPs are the ideal solution for demanding applications like audio, medical imaging, instrumentation and automotive. Digital Signal Processor A.A C6000

TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview A.A

5 TMS320 C6xx Y = N  a n x n n = 1 * Prodotto Scalare tra due vettori Instruction Set DSP System Block Diagram A.A

6 TMS320 C6xx It has been shown in Introduction that SOP is the key element for most DSP algorithms. Two basic Operations are required for this algorithm. - Multiplication - Addition Therefore two basic Instructions are required Scalar Product of Vectors The implementation in this module will be done in assembly for C6xx = a 1 * x 1 + a 2 * x a N * x N Y = N  a n x n n = 1 * SOP Algorithm A.A

7 Multiply (MPY) The multiplication of a 1 by x 1 is done in assembly by the following instruction: MPY a1, x1, Y MPY a1, x1, Y This instruction is performed by a multiplier unit that is called “.M” Y = N  a n x n n = 1 * = a 1 * x 1 + a 2 * x a N * x N A.A

8 Multiply (.M unit).M.M Y = 40  a n x n n = 1 * The.M unit performs multiplications in hardware MPY.Ma1, x1, Y MPY.Ma1, x1, Y Note: 16-bit by 16-bit multiplier provides a 32-bit result. 32-bit by 32-bit multiplier provides a 64-bit result. 32-bit by 32-bit multiplier provides a 64-bit result. A.A

9 Addition (.L unit).M.M.L.L Y = 40  a n x n n = 1 * MPY.Ma1, x1, prod ADD.LY, prod, Y RISC processors such as the C6000 use registers to hold the operands, so lets change this code. A.A

10 Register File A Y = 40  a n x n n = 1 * MPY.Ma1, x1, prod ADD.LY, prod, Y MPY.MA0, A1, A3 ADD.LA4, A3, A4 MPY.MA0, A1, A3 ADD.LA4, A3, A4.M.M.L.L A0A1A2A3A4A15 Register File A a1 x1 prod 32-bits Y Let us correct this by replacing a, x, prod and Y by the registers as shown above. A.A

11 Specifying Register Names Register File A contains 16 registers (A0 -A15) which are 32-bits wide..M.M.L.L A0A1A2A3A4A15 Register File A a1 x1 prod 32-bits Y A.A

12 Data loading (load unit.D) Q: How do we load the operands into the registers?.M.M.L.L A0A1A2A3A4A15 Register File A a1 x1 prod 32-bits Y A: The operands are loaded into the registers by loading them from the memory using the.D unit..D.D Data Memory It is worth noting at this stage that the only way to access memory is through the.D unit. A.A

13 Load Instruction Q: Which instruction(s) can be used for loading operands from the memory to the registers?.M.M.L.L A0A1A2A3A4A15 Register File A a1 x1 prod 32-bits Y D.D Data Memory A.A

14 Load Instructions A: The load instructions. Q: Which instruction(s) can be used for loading operands from the memory to the registers?.M.M.L.L A0A1A2A3A4A15 Register File A a1 x1 prod 32-bits Y LDB, LDH LDW,LDDW.D.D Data Memory A.A

15 Using the Load Instructions Before using the load unit you have to be aware that this processor is, which means that each byte is represented by a unique address. Before using the load unit you have to be aware that this processor is byte addressable, which means that each byte is represented by a unique address. Also the addresses are 32-bit wide Data16-bits 32 bits address FFFFFFFF Byte A.A

16 The syntax for the load instruction is: Where: Rn is a register that contains the address of the operand to be loaded and Rm is the destination register. LD *Rn,Rm Data16-bitsaddressFFFFFFFF a1 x1 prod Y Using the Load Instructions A.A

17 The syntax for the load instruction is: The question now is how many bytes are going to be loaded into the destination register? LD *Rn,Rm Data16-bitsaddressFFFFFFFF a1 x1 prod Y Using the Load Instructions A.A

18 The syntax for the load instruction is: LD *Rn,Rm The answer, is that it depends on the instruction you choose: LD B : loads one byte (8-bit) LD B : loads one byte (8-bit) LD H : loads half word (16-bit) LD H : loads half word (16-bit) LD W : loads a word (32-bit) LD W : loads a word (32-bit) LD DW : loads a double word (64-bit) LD DW : loads a double word (64-bit) Note: LD on its own does not exist Data16-bitsaddressFFFFFFFF a1 x1 prod Y Using the Load Instructions A.A

19 Example: If we assume that A5 = 0x4 then: (1)LDB *A5, A7 ; gives A7 = 0x gives A7 = 0x (2)LDH *A5,A7; gives A7 = 0x gives A7 = 0x (3)LDW *A5,A7; gives A7 = 0x gives A7 = 0x (3)LDDW *A5,A7:A6; gives A7:A6 = 0x gives A7:A6 = 0x Data16-bits address FFFFFFFF 0xB0xA 0xD0xC 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 01 The syntax for the load instruction is: LD *Rn,Rm Little Endian la parte meno significativa va memorizzata per prima, all'indirizzo più basso di memoria Using the Load Instructions A.A

20 Q: If data can only be accessed by the load instruction and the.D unit, how can we load the register pointer Rn in the first place? Data16-bits address FFFFFFFF 0xB0xA 0xD0xC 0x10x2 0x30x4 0x50x6 0x70x8 01 The syntax for the load instruction is: LD *Rn,Rm Using the Load Instructions A.A

21 The instruction MVKL will allow a move of a 16-bit constant into a register as shown below: The instruction MVKL will allow a move of a 16-bit constant into a register as shown below: MVKL.? a, A5 (‘a’ is a constant or label) How many bits represent a full address? How many bits represent a full address? 32 bits So why does the instruction not allow a 32-bit move? So why does the instruction not allow a 32-bit move? All instructions are 32-bit wide (see instruction opcode). Loading the Pointer *Rn A.A

22 To solve this problem another instruction is available: MVKH To solve this problem another instruction is available: MVKH eg. MVKH.? a, A5 (‘a’ is a constant or label) ahahahah ahahahahx alalalal a A5 Finally, to move the 32-bit address to a register we can use: Finally, to move the 32-bit address to a register we can use: MVKL a, A5 MVKH a, A5 Loading the Pointer *Rn A.A

23 Always use MVKL then MVKH, look at the following examples: Always use MVKL then MVKH, look at the following examples: MVKL0x1234FABC, A5 A5 = 0xFFFFFABC Wrong Example 1 (A5 = 0x ) MVKL0x1234FABC, A5 A5 = 0xFFFFFABC (sign extension) MVKH0x1234FABC, A5 A5 = 0x1234FABC OK Example 2 (A5 = 0x ) MVKH0x1234FABC, A5 A5 = 0x Loading the Pointer *Rn A.A

24 MVKL pt1, A5 MVKL pt1, A5 MVKH pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKL pt2, A6 MVKH pt2, A6 MVKH pt2, A6 LDH.D*A5, A0 LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 pt1 and pt2 point to some locations in the data memory..M.M.L.L A0A1A2A3A4A15 Register File A a1 x1 prod 32-bits Y Le componenti aixi ai e xi sono numeri 16 bits interi da 16 bits.D.D Data Memory LDH, MVKL and MVKH IPOTESI A.A

25

26 Creating a loop MVKL pt1, A5 MVKL pt1, A5 MVKH pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKL pt2, A6 MVKH pt2, A6 MVKH pt2, A6 LDH.D*A5, A0 LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 So far we have only implemented the SOP for one tap only, i.e. Y= a 1 * x 1 So let’s create a loop so that we can implement the SOP for N Taps. A.A

27 With the C6000 processors there are no dedicated instructions such as block repeat. The loop is created using the B instruction. So far we have only implemented the SOP for one tap only, i.e. Y= a 1 * x 1 So let’s create a loop so that we can implement the SOP for N Taps. Creating a loop – B instruction A.A

28  Create a label to branch to.  Add a branch instruction, B.  Create a loop counter (LC).  Add an instruction to decrement the LC.  Make the branch conditional based on LC. What are the steps for creating a loop ? A.A

29 1. Create a label to branch to MVKL pt1, A5 MVKL pt1, A5 MVKH pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKL pt2, A6 MVKH pt2, A6 MVKH pt2, A6 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 A.A

30 MVKL pt1, A5 MVKL pt1, A5 MVKH pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKL pt2, A6 MVKH pt2, A6 MVKH pt2, A6 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 B.?loop 2. Add a branch instruction, B. A.A

31 Which unit is used by the B instruction? MVKL.S pt1, A5 MVKL.S pt1, A5 MVKH.S pt1, A5 MVKH.S pt1, A5 MVKL.S pt2, A6 MVKL.S pt2, A6 MVKH.S pt2, A6 MVKH.S pt2, A6 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 B.Sloop.M.M.L.L A0A1A2A3A15 Register File A a x prod 32-bits Y.D.D.M.M.L.L A0A1A2A3A a x prod 32-bits Y.S.S.D.D Data Memory A.A

32 3. Create a loop counter. MVKL.S pt1, A5 MVKL.S pt1, A5 MVKH.S pt1, A5 MVKH.S pt1, A5 MVKL.S pt2, A6 MVKL.S pt2, A6 MVKH.S pt2, A6 MVKH.S pt2, A6 MVKL.Scount, B0 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 B.Sloop B registers will be introduced later.M.M.L.L A0A1A2A3A15 Register File A a x prod 32-bits Y.D.D.M.M.L.L A0A1A2A3A a x prod 32-bits Y.S.S.D.D Data Memory A.A

33 4. Decrement the loop counter MVKL.S pt1, A5 MVKL.S pt1, A5 MVKH.S pt1, A5 MVKH.S pt1, A5 MVKL.S pt2, A6 MVKL.S pt2, A6 MVKH.S pt2, A6 MVKH.S pt2, A6 MVKL.Scount, B0 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 B.Sloop.M.M.L.L A0A1A2A3A15 Register File A a x prod 32-bits Y.D.D.M.M.L.L A0A1A2A3A a x prod 32-bits Y.S.S.D.D Data Memory A.A

34 What is the syntax for making instruction conditional? What is the syntax for making instruction conditional? [condition] InstructionLabel e.g.[B1]Bloop (1) The condition can be one of the following registers: A1, A2, B0, B1, B2. (2) Any instruction can be conditional. 5. Make the branch conditional based on the value in the loop counter A.A

35 The condition can be inverted by adding the exclamation symbol “!” as follows: The condition can be inverted by adding the exclamation symbol “!” as follows: [!condition] InstructionLabel e.g. [!B0]Bloop; branch if B0 = 0 [B0]Bloop; branch if B0 != 0 5. Make the branch conditional based on the value in the loop counter A.A

36 MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop 5. Make the branch conditional.M.M.L.L A0A1A2A3A15 Register File A a x prod 32-bits Y.D.D.M.M.L.L A0A1A2A3A a x prod 32-bits Y.S.S.D.D Data Memory A.A

37  With this processor all the instructions are encoded in a 32-bit.  Therefore the label must have a dynamic range of less than 32-bit as the instruction B has to be coded.  Case 1: B.S1 label  Relative branch.  Label limited to +/ offset. More on the Branch Instruction (1) 21-bit relative address B32-bitA.A

38 More on the Branch Instruction (2)  By specifying a register as an operand instead of a label, it is possible to have an absolute branch.  This will allow a dynamic range of  Case 2: B.S2register  Absolute branch.  Operates on.S2 ONLY! 5-bit register code B32-bitA.A

39 Testing the code This code performs the following operations: a 0 *x 0 + a 0 *x 0 + a 0 *x 0 + … + a 0 *x 0 However, we would like to perform: a 0 *x 0 + a 1 *x 1 + a 2 *x 2 + a 0 *x 0 + a 1 *x 1 + a 2 *x 2 + … + a N *x N … + a N *x N MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop A.A

40 Modifying the pointers The solution is to modify the pointers A5 and A6 MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop A.A

41 Indexing Pointers Description Pointer + Pre-offset - Pre-offset Pre-incrementPre-decrementPost-incrementPost-decrement Syntax Pointer Modified *R *+R[ disp ] *-R[ disp ] *++R[ disp ] *--R[ disp ] *R++[ disp ] *R--[ disp ] NoNoNoYesYesYesYes  [disp] specifies # elements - size in DW, W, H, or B.  disp = R or 5-bit constant.  R can be any register. A.A

42 Modify and testing the code This code now performs the following operations: a 0 *x 0 + a 1 *x 1 + a 2 *x a N *x N MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5 ++, A0 LDH.D*A6 ++, A1 LDH.D*A6 ++, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop A.A

43 Store the final result This code now performs the following operations: a 0 *x 0 + a 1 *x 1 + a 2 *x a N *x N MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5++, A0 LDH.D*A6++, A1 LDH.D*A6++, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop STH.DA4, *A7 A.A

44 Store the final result The Pointer A7 is now initialised. MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2pt3, A7 MVKH.S2pt3, A7 MVKL.S2count, B0 loop LDH.D*A5++, A0 LDH.D*A6++, A1 LDH.D*A6++, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop STH.DA4, *A7 A.A

45 What is the initial value of A4? A4 is used as an accumulator, so it needs to be so it needs to be reset to zero. MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2pt3, A7 MVKH.S2pt3, A7 MVKL.S2count, B0 ZERO.LA4 loop LDH.D*A5++, A0 LDH.D*A6++, A1 LDH.D*A6++, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop STH.DA4, *A7 A.A

46 TMS320 C6xx Codex Complete MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2pt3, A7 MVKH.S2pt3, A7 MVKL.S2N, B0 ZERO.L1A4 loop LDH.D1*A5++, A0 LDH.D1*A6++, A1 MPY.M1A0, A1, A3 ADD.L1A4, A3, A4 SUB.S2B0, 1, B0 [B0]B.S1loop STH.D1A4, *A7 32 bits 16 bits *A5 = a n (16 bits) 16 bits (pt1) 16 bits (pt2) 16 bits (pt3) *A6 = x n (16 bits) *A7 = Y (16 bits) B0 = N (16 bits) A0 A1 A4 A.A

47 Y = 40  a n x n n = 1 * Code Review (using side A only) MVK.S140, A2 MVK.S140, A2; A2 = 40, loop count loop:LDH.D1*A5++, A0 loop:LDH.D1*A5++, A0; A0 = a(n) LDH.D1*A6++, A1 LDH.D1*A6++, A1; A1 = x(n) MPY.M1A0, A1, A3 MPY.M1A0, A1, A3; A3 = a(n) * x(n) ADD.L1A4, A3, A4 ADD.L1A4, A3, A4; Y = Y + A3 SUB.L1A2, 1, A2 SUB.L1A2, 1, A2; decrement loop count [A2]B.S1loop [A2]B.S1loop; if A2  0, branch STH.D1A4, *A7 STH.D1A4, *A7; *A7 = Y Note: Assume that A4 was previously cleared and the pointers are initialised. Assume that A2 is B0 A.A