Chapter 2 TMS320C6000 Architectural Overview. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 2  Describe C6000 CPU.

Slides:



Advertisements
Similar presentations
TMS320C6xx Instruction Set TMS320 C6xx
Advertisements

TMS320C6xx Architecture C6xx
Architecture and Instruction Set of the C6x Processor Module 1.
Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.
TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.
TMS320C6713 Assembly Language (cont’d). Module 1 Exam (solution) 1. Functional Units a. How many can perform an ADD? Name them. a. How many can perform.
Lecture 6 Programming the TMS320C6x Family of DSPs.
TMS320 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Overview A.A
TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.
Dr. Rabie A. Ramadan Al-Azhar University Lecture 3
DSP Module 1 Open Exam. Module 1 Exam You have 20 minutes to complete the exam. Exam is open mind, open book, open eyes. Sharing answers, cheating, asking.
TMS320C6000 Architectural and Programming Overview.
TMS320C6000 Architectural Overview.  Describe C6000 CPU architecture.  Introduce some basic instructions.  Describe the C6000 memory map.  Provide.
ECSE DSP architecture Review of basic computer architecture concepts C6000 architecture: VLIW Principle and Scheduling Addressing Assembly and linear.
MICROPROCESSORS TWO TYPES OF MODELS ARE USED :  PROGRAMMER’S MODEL :- THIS MODEL SHOWS FEATURES, SUCH AS INTERNAL REGISTERS, ADDRESS,DATA & CONTROL BUSES.
Execution of an instruction
Chapter 9 Bootloader. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 9, Slide 2 Learning Objectives  Need for a bootloader.
Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02.
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
Machine Instruction Characteristics
Texas Instruments Course DSP. Module 1 Outline Introduction Introduction  TI DSP Families  C6000 Architecture  C6000 Roadmap.
C66x CorePac: Achieving High Performance. Agenda 1.CorePac Architecture 2.Single Instruction Multiple Data (SIMD) 3.Memory Access 4.Pipeline Concept.
Execution of an instruction
ajay patil 1 TMS320C6000 Assembly Language and its Rules Assignment One of the simplest operations in C is to assign a constant to a variable: One.
80386DX functional Block Diagram PIN Description Register set Flags Physical address space Data types.
EKT 221 : Chapter 4 Computer Design Basics
Computer Organization Instructions Language of The Computer (MIPS) 2.
MODULE 5 INTEL TODAY WE ARE GOING TO DISCUSS ABOUT, FEATURES OF 8086 LOGICAL PIN DIAGRAM INTERNAL ARCHITECTURE REGISTERS AND FLAGS OPERATING MODES.
8085 INTERNAL ARCHITECTURE.  Upon completing this topic, you should be able to: State all the register available in the 8085 microprocessor and explain.
Logic Gates Dr.Ahmed Bayoumi Dr.Shady Elmashad. Objectives  Identify the basic gates and describe the behavior of each  Combine basic gates into circuits.
I NTEL 8086 M icroprocessor بسم الله الرحمن الرحيم 1.
Chapter 2 TMS320C6000 Architectural Overview. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 2  Describe C6000 CPU.
Dr.Ahmed Bayoumi Dr.Shady Elmashad
Basic Computer Organization and Design
ELEC 418 Advanced Digital Systems Dr. Ron Hayne
The 8085 Microprocessor Architecture
Single-Cycle Datapath and Control
Control Unit Lecture 6.
TMS320C6713 Assembly Language
History – 2 Intel 8086.
UNIT – Microcontroller.
Chap 7. Register Transfers and Datapaths
/ Computer Architecture and Design
8086 Microprocessor.
Lecture 1 Introduction.
Introduction of microprocessor
Sega Dreamcast Visual Memory Unit FPGA Implementation
Dr. Michael Nasief Lecture 2
Microcomputer Programming
Details .L and .S units TMS320C6000.
An Introduction to Microprocessor Architecture using intel 8085 as a classic processor
Introduction to Microprocessors and Microcontrollers
Number Representations and Basic Processor Architecture
The TMS320C6x Family of DSPs
Single-Cycle CPU DataPath.
Chapter 8 Central Processing Unit
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
Guest Lecturer TA: Shreyas Chand
Computer Architecture
INTRODUCTION TO DIGITAL SIGNAL PROCESSORS (DSPs)
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
Digital Signal Processors-1
Control units In the last lecture, we introduced the basic structure of a control unit, and translated our assembly instructions into a binary representation.
Chapter 9 Bootloader.
Review: The whole processor
Basic components Instruction processing
INTRODUCTION TO THE TMS320C6000 VLIW DSP
ADSP 21065L.
Computer Architecture Assembly Language
Computer Operation 6/22/2019.
Presentation transcript:

Chapter 2 TMS320C6000 Architectural Overview

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 2  Describe C6000 CPU architecture.  Introduce some basic instructions.  Describe the C6000 memory map.  Provide an overview of the peripherals. Learning Objectives

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 3 General DSP System Block Diagram PERIPHERALSPERIPHERALSPERIPHERALSPERIPHERALS CentralProcessing Unit Unit Internal Memory Internal Buses ExternalMemory

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 4 Implementation of Sum of Products (SOP) Implementation of Sum of Products (SOP) It has been shown in Chapter 1 that SOP is the key element for most DSP algorithms. So let’s write the code for this algorithm and at the same time discover the C6000 architecture. Two basic operations are required for this algorithm. (1) Multiplication (2) Addition Therefore two basic instructions are required Y = N  a n x n n = 1 * = a 1 * x 1 + a 2 * x a N * x N

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 5 Two basic operations are required for this algorithm. (1) Multiplication (2) Addition Therefore two basic instructions are required Implementation of Sum of Products (SOP) Implementation of Sum of Products (SOP) Y = N  a n x n n = 1 * So let’s implement the SOP algorithm! The implementation in this module will be done in assembly. = a 1 * x 1 + a 2 * x a N * x N

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 6 Multiply (MPY) The multiplication of a 1 by x 1 is done in assembly by the following instruction: MPYa1, x1, Y This instruction is performed by a multiplier unit that is called “.M” Y = N  a n x n n = 1 * = a 1 * x 1 + a 2 * x a N * x N

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 7 Multiply (.M unit).M.M Y = 40  a n x n n = 1 * The. M unit performs multiplications in hardware MPY.Ma1, x1, Y Note: 16-bit by 16-bit multiplier provides a 32-bit result. 32-bit by 32-bit multiplier provides a 64-bit result.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 8 Addition (.?).M.M.?.? Y = 40  a n x n n = 1 * MPY.Ma1, x1, prod ADD.?Y, prod, Y

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 9 Add (.L unit).M.M.L.L Y = 40  a n x n n = 1 * MPY.Ma1, x1, prod ADD.LY, prod, Y RISC processors such as the C6000 use registers to hold the operands, so lets change this code.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 10 Register File - A Y = 40  a n x n n = 1 * MPY.Ma1, x1, prod ADD.LY, prod, Y Let us correct this by replacing a, x, prod and Y by the registers as shown above..M.M.L.L A0A1A2A3A15 Register File A a1 x1 prod 32-bits Y

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 11 Specifying Register Names Y = 40  a n x n n = 1 * MPY.MA0, A1, A3 ADD.LA4, A3, A4 The registers A0, A1, A3 and A4 contain the values to be used by the instructions..M.M.L.L A0A1A2A3A15 Register File A a1 x1 prod 32-bits Y

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 12 Specifying Register Names Y = 40  a n x n n = 1 * MPY.MA0, A1, A3 ADD.LA4, A3, A4 Register File A contains 16 registers (A0 -A15) which are 32-bits wide..M.M.L.L A0A1A2A3A15 Register File A a1 x1 prod 32-bits Y

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 13 Data loading Q: How do we load the operands into the registers?.M.M.L.L A0A1A2A3A15 Register File A a1 x1 prod 32-bits Y

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 14 Load Unit “.D” A: The operands are loaded into the registers by loading them from the memory using the.D unit..M.M.L.L A0A1A2A3A15 Register File A a1 x1 prod 32-bits Y.D.D Data Memory Q: How do we load the operands into the registers?

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 15 Load Unit “.D” It is worth noting at this stage that the only way to access memory is through the.D unit..M.M.L.L A0A1A2A3A15 Register File A a1 x1 prod 32-bits Y.D.D Data Memory

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 16 Load Instruction Q: Which instruction(s) can be used for loading operands from the memory to the registers?.M.M.L.L A0A1A2A3A15 Register File A a1 x1 prod 32-bits Y.D.D Data Memory

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 17 Load Instructions (LDB, LDH,LDW,LDDW) A: The load instructions..M.M.L.L A0A1A2A3A15 Register File A a1 x1 prod 32-bits Y.D.D Data Memory Q: Which instruction(s) can be used for loading operands from the memory to the registers?

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 18 Using the Load Instructions Data 16-bits Before using the load unit you have to be aware that this processor is byte addressable, which means that each byte is represented by a unique address. Also the addresses are 32-bit wide. address FFFFFFFF

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 19 The syntax for the load instruction is: Where: Rn is a register that contains the address of the operand to be loaded and Rm is the destination register. Using the Load Instructions Data a1 x1 prod 16-bits Y address FFFFFFFF LD *Rn,Rm

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 20 The syntax for the load instruction is: The question now is how many bytes are going to be loaded into the destination register? Using the Load Instructions Data a1 x1 prod 16-bits Y address FFFFFFFF LD *Rn,Rm

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 21 The syntax for the load instruction is: LD *Rn,Rm Using the Load Instructions Data a1 x1 prod 16-bits Y address FFFFFFFF The answer, is that it depends on the instruction you choose: LDB: loads one byte (8-bit) LDB: loads one byte (8-bit) LDH: loads half word (16-bit) LDH: loads half word (16-bit) LDW: loads a word (32-bit) LDW: loads a word (32-bit) LDDW: loads a double word (64-bit) LDDW: loads a double word (64-bit) Note: LD on its own does not exist.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 22 Using the Load Instructions Data 16-bits address FFFFFFFF 0xB0xA 0xD0xC Example: If we assume that A5 = 0x4 then: (1) LDB *A5, A7 ; gives A7 = 0x (2) LDH *A5,A7; gives A7 = 0x (3) LDW *A5,A7; gives A7 = 0x (4) LDDW *A5,A7:A6; gives A7:A6 = 0x x10x2 0x30x4 0x50x6 0x70x8 The syntax for the load instruction is: LD *Rn,Rm 01

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 23 Using the Load Instructions Data 16-bits address FFFFFFFF 0xB0xA 0xD0xC Question: If data can only be accessed by the load instruction and the.D unit, how can we load the register pointer Rn in the first place? 0x10x2 0x30x4 0x50x6 0x70x8 The syntax for the load instruction is: LD *Rn,Rm

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 24  The instruction MVKL will allow a move of a 16-bit constant into a register as shown below: MVKL.?a, A5 (‘a’ is a constant or label)  How many bits represent a full address? 32 bits  So why does the instruction not allow a 32-bit move? All instructions are 32-bit wide (see instruction opcode). Loading the Pointer Rn

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 25  To solve this problem another instruction is available: MVKH Loading the Pointer Rn eg. MVKH.?a, A5 (‘a’ is a constant or label) ah ahx al a A5 MVKL a, A5 MVKH a, A5  Finally, to move the 32-bit address to a register we can use:

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 26 Loading the Pointer Rn MVKL0x1234FABC, A5 A5 = 0xFFFFFABC ; Wrong Example 1 A5 = 0x MVKL0x1234FABC, A5 A5 = 0xFFFFFABC (sign extension) MVKH0x1234FABC, A5 A5 = 0x1234FABC ; OK Example 2 MVKH0x1234FABC, A5 A5 = 0x  Always use MVKL then MVKH, look at the following examples:

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 27 LDH, MVKL and MVKH.M.M.L.L A0A1A2A3A4A15 Register File A a x prod 32-bits Y.D.D Data Memory MVKL pt1, A5 MVKL pt1, A5 MVKH pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKL pt2, A6 MVKH pt2, A6 MVKH pt2, A6 LDH.D*A5, A0 LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 pt1 and pt2 point to some locations in the data memory.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 28 Creating a loop MVKL pt1, A5 MVKL pt1, A5 MVKH pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKL pt2, A6 MVKH pt2, A6 MVKH pt2, A6 LDH.D*A5, A0 LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 So far we have only implemented the SOP for one tap only, i.e. Y= a 1 * x 1 So let’s create a loop so that we can implement the SOP for N Taps.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 29 Creating a loop With the C6000 processors there are no dedicated instructions such as block repeat. The loop is created using the B instruction. So far we have only implemented the SOP for one tap only, i.e. Y= a 1 * x 1 So let’s create a loop so that we can implement the SOP for N Taps.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 30 What are the steps for creating a loop 1. Create a label to branch to. 2. Add a branch instruction, B. 3.Create a loop counter. 4. Add an instruction to decrement the loop counter. 5. Make the branch conditional based on the value in the loop counter.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide Create a label to branch to MVKL pt1, A5 MVKL pt1, A5 MVKH pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKL pt2, A6 MVKH pt2, A6 MVKH pt2, A6 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 32 MVKL pt1, A5 MVKL pt1, A5 MVKH pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKL pt2, A6 MVKH pt2, A6 MVKH pt2, A6 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 B.?loop 2. Add a branch instruction, B.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 33 Which unit is used by the B instruction?.M.M.L.L A0A1A2A3A15 Register File A a x prod 32-bits Y.D.D.M.M.L.L A0A1A2A3A a x prod 32-bits Y.D.D Data Memory.S.S MVKLpt1, A5 MVKLpt1, A5 MVKH pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKL pt2, A6 MVKH pt2, A6 MVKH pt2, A6 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 B.?loop

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 34 Data Memory Which unit is used by the B instruction?.M.M.L.L A0A1A2A3A15 Register File A a x prod 32-bits Y.D.D.M.M.L.L A0A1A2A3A a x prod 32-bits Y.D.D.S.S MVKL.S pt1, A5 MVKL.S pt1, A5 MVKH.S pt1, A5 MVKH.S pt1, A5 MVKL.S pt2, A6 MVKL.S pt2, A6 MVKH.S pt2, A6 MVKH.S pt2, A6 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 B.Sloop

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 35 Data Memory 3. Create a loop counter..M.M.L.L A0A1A2A3A15 Register File A a x prod 32-bits Y.D.D.M.M.L.L A0A1A2A3A a x prod 32-bits Y.D.D.S.S MVKL.S pt1, A5 MVKL.S pt1, A5 MVKH.S pt1, A5 MVKH.S pt1, A5 MVKL.S pt2, A6 MVKL.S pt2, A6 MVKH.S pt2, A6 MVKH.S pt2, A6 MVKL.Scount, B0 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 B.Sloop B registers will be introduced later

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide Decrement the loop counter.M.M.L.L A0A1A2A3A15 Register File A a x prod 32-bits Y.D.D Data Memory.M.M.L.L A0A1A2A3A15 Register File A a x prod 32-bits Y.D.D.S.S MVKL.S pt1, A5 MVKL.S pt1, A5 MVKH.S pt1, A5 MVKH.S pt1, A5 MVKL.S pt2, A6 MVKL.S pt2, A6 MVKH.S pt2, A6 MVKH.S pt2, A6 MVKL.Scount, B0 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 B.Sloop

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 37  What is the syntax for making instruction conditional? [condition] InstructionLabel e.g. [B1]Bloop (1) The condition can be one of the following registers: A1, A2, B0, B1, B2. (2) Any instruction can be conditional. 5. Make the branch conditional based on the value in the loop counter

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 38  The condition can be inverted by adding the exclamation symbol “!” as follows: [!condition] InstructionLabel e.g. [!B0]Bloop ;branch if B0 = 0 [B0]Bloop;branch if B0 != 0 5. Make the branch conditional based on the value in the loop counter

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 39 Data Memory.M.M.L.L A0A1A2A3A15 Register File A a x prod 32-bits Y.D.D.M.M.L.L A0A1A2A3A a x prod 32-bits Y.D.D.S.S MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop 5. Make the branch conditional

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 40  Case 1: B.S1 label  Relative branch.  Label limited to +/ offset. More on the Branch Instruction (1)  With this processor all the instructions are encoded in a 32-bit.  Therefore the label must have a dynamic range of less than 32-bit as the instruction B has to be coded. 21-bit relative address B32-bit

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 41 More on the Branch Instruction (2)  By specifying a register as an operand instead of a label, it is possible to have an absolute branch.  This will allow a dynamic range of  Case 2: B.S2register  Absolute branch.  Operates on.S2 ONLY! 5-bit register code B32-bit

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 42 Testing the code This code performs the following operations: operations: a 0 *x 0 + a 0 *x 0 + a 0 *x 0 + … + a 0 *x 0 However, we would like to perform: a 0 *x 0 + a 1 *x 1 + a 2 *x 2 + … + a N *x N MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 43 Modifying the pointers The solution is to modify the pointers A5 and A6. MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 44 Indexing Pointers Description Pointer Syntax Pointer Modified *R*R*R*RNo R can be any register In this case the pointers are used but not modified.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 45 Indexing Pointers Description Pointer + Pre-offset - Pre-offset Syntax Pointer Modified *R *+R[ disp ] *-R[ disp ] NoNoNo  [disp] specifies the number of elements size in DW (64-bit), W (32-bit), H (16-bit), or B (8-bit).  disp = R or 5-bit constant.  R can be any register. In this case the pointers are modified BEFORE being used and RESTORED to their previous values.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 46 Indexing Pointers Description Pointer + Pre-offset - Pre-offset Pre-incrementPre-decrement Syntax Pointer Modified *R *+R[ disp ] *-R[ disp ] *++R[ disp ] *--R[ disp ] NoNoNoYesYes In this case the pointers are modified BEFORE being used and NOT RESTORED to their Previous Values.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 47 Indexing Pointers Description Pointer + Pre-offset - Pre-offset Pre-incrementPre-decrementPost-incrementPost-decrement Syntax Pointer Modified *R *+R[ disp ] *-R[ disp ] *++R[ disp ] *--R[ disp ] *R++[ disp ] *R--[ disp ] NoNoNoYesYesYesYes In this case the pointers are modified AFTER being used and NOT RESTORED to their Previous Values.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 48 Indexing Pointers Description Pointer + Pre-offset - Pre-offset Pre-incrementPre-decrementPost-incrementPost-decrement Syntax Pointer Modified *R *+R[ disp ] *-R[ disp ] *++R[ disp ] *--R[ disp ] *R++[ disp ] *R--[ disp ] NoNoNoYesYesYesYes  [disp] specifies # elements - size in DW, W, H, or B.  disp = R or 5-bit constant.  R can be any register.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 49 Modify and testing the code This code now performs the following operations: operations: a 0 *x 0 + a 1 *x 1 + a 2 *x a N *x N MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5++, A0 LDH.D*A6++, A1 LDH.D*A6++, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 50 Store the final result This code now performs the following operations: operations: a 0 *x 0 + a 1 *x 1 + a 2 *x a N *x N MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5++, A0 LDH.D*A6++, A1 LDH.D*A6++, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop STH.DA4, *A7

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 51 Store the final result The Pointer A7 has not been initialised. MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5++, A0 LDH.D*A6++, A1 LDH.D*A6++, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop STH.DA4, *A7

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 52 Store the final result The Pointer A7 is now initialised. MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2pt3, A7 MVKH.S2pt3, A7 MVKL.S2count, B0 loop LDH.D*A5++, A0 LDH.D*A6++, A1 LDH.D*A6++, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop STH.DA4, *A7

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 53 What is the initial value of A4? A4 is used as an accumulator, so it needs to be reset to zero. so it needs to be reset to zero. MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2pt3, A7 MVKH.S2pt3, A7 MVKL.S2count, B0 ZERO.LA4 loop LDH.D*A5++, A0 LDH.D*A6++, A1 LDH.D*A6++, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop STH.DA4, *A7

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 54 How can we add more processing power to this processor?.S1.S1.M1.M1.L1.L1.D1.D1 A0 A1 A2 A3 A4 Register File A Data Memory A15 32-bits Increasing the processing power!

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 55 (1)Increase the clock frequency..S1.S1.M1.M1.L1.L1.D1.D1 A0 A1 A2 A3 A4 Register File A Data Memory A15 32-bits Increasing the processing power! (2)Increase the number of Processing units.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 56 To increase the Processing Power, this processor has two sides (A and B or 1 and 2) Data Memory.S1.S1.M1.M1.L1.L1.D1.D1 A0 A1 A2 A3 A4 Register File A A15 32-bits.S2.S2.M2.M2.L2.L2.D2.D2 B0 B1 B2 B3 B4 Register File B B15 32-bits

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 57 Can the two sides exchange operands in order to increase performance? Data Memory.S1.S1.M1.M1.L1.L1.D1.D1 A0 A1 A2 A3 A4 Register File A A15 32-bits B15.S2.S2.M2.M2.L2.L2.D2.D2 B0 B1 B2 B3 B4 Register File B bits

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 58 The answer is YES but there are limitations.  To exchange operands between the two sides, some cross paths or links are required. What is a cross path?  A cross path links one side of the CPU to the other.  There are two types of cross paths:  Data cross paths.  Address cross paths.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 59 Data Cross Paths  Data cross paths can also be referred to as register file cross paths.  These cross paths allow operands from one side to be used by the other side.  There are only two cross paths:  one path which conveys data from side B to side A, 1X.  one path which conveys data from side A to side B, 2X.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 60 TMS320C67x Data-Path

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 61 Data Cross Paths  Data cross paths only apply to the.L,.S and.M units.  The data cross paths are very useful, however there are some limitations in their use.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 62 Data Cross Path Limitations A 2x.L1.M1.S1 B 1x <src> <src> <dst> (1) The destination register must be on same side as unit. (2) Source registers - up to one cross path per execute packet per side. Execute packet: group of instructions that execute simultaneously.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 63 Cross Path Limitations Data Cross Path LimitationsA 2x.L1.M1.S1 B 1x <src> <src> <dst> eg: ADD.L1x A0,A1,B2 MPY.M1x A0,B6,A9 SUB.S1x A8,B2,A8 || ADD.L1x A0,B0,A2 || Means that the SUB and ADD belong to the same fetch packet, therefore execute simultaneously.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 64 Data Cross Path Limitations A 2x.L1.M1.S1 B 1x <src> <src> <dst> eg: ADD.L1x A0,A1,B2 MPY.M1x A0,B6,A9 SUB.S1x A8,B2,A8 || ADD.L1x A0,B0,A2 NOT VALID!

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 65 Data Cross Paths for both sides A 2x.L1.M1.S1 B 1x <src> <src> <dst>.L2.M2.S2 <dst> <src> <src>

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 66 Address cross paths.D1 A Addr Data LDW.D1T1 *A0,A5 STW.D1T1 A5,*A0 LDW.D1T1 *A0,A5 STW.D1T1 A5,*A0 (1) The pointer must be on the same side of the unit.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 67 Load or store to either side.D1 A *A0 B Data1A5 Data2B5 DA1 = T1 DA2 = T2 LDW.D1T1 *A0,A5 LDW.D1T2 *A0,B5 LDW.D1T1 *A0,A5 LDW.D1T2 *A0,B5

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 68 Standard Parallel Loads.D1 A A5 *A0 B B5.D2 Data1 *B0 LDW.D1T1 *A0,A5 LDW.D1T1 *A0,A5 || LDW.D2T2 *B0,B5 LDW.D1T1 *A0,A5 LDW.D1T1 *A0,A5 || LDW.D2T2 *B0,B5 DA1 = T1 DA2 = T2

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 69 Parallel Load/Store using address cross paths.D1 A A5 *A0 B B5.D2 Data1 *B0 LDW.D1T2 *A0,B5 LDW.D1T2 *A0,B5 || STW.D2T1 A5,*B0 LDW.D1T2 *A0,B5 LDW.D1T2 *A0,B5 || STW.D2T1 A5,*B0 DA1 = T1 DA2 = T2

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 70 Fill the blanks... Does this work?.D1 A *A0 B.D2 Data1 *B0 LDW.D1__ *A0,B5 LDW.D1__ *A0,B5 || STW.D2__ B6,*B0 LDW.D1__ *A0,B5 LDW.D1__ *A0,B5 || STW.D2__ B6,*B0 DA1 = T1 DA2 = T2

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 71 Not Allowed! Parallel accesses: both cross or neither cross.D1 A *A0 B B5B6.D2 Data1 *B0 LDW.D1T2 *A0,B5 LDW.D1T2 *A0,B5 || STW.D2T2 B6,*B0 LDW.D1T2 *A0,B5 LDW.D1T2 *A0,B5 || STW.D2T2 B6,*B0 DA2 = T2

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 72 Conditions Don’t Use Cross Paths  If a conditional register comes from the opposite side, it does NOT use a data or address cross-path.  Examples: [B2] ADD.L1 A2,A0,A4 [A1] LDW.D2 *B0,B5

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 73 CPU Ref Guide Full CPU Datapath (Pg 2-2) ‘C62x Data-Path Summary

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 74 ‘C67x Data-Path Summary ‘C67x

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 75 Cross Paths - Summary  Data  Destination register on same side as unit.  Source registers - up to one cross path per execute packet per side.  Use “x” to indicate cross-path.  Address  Pointer must be on same side as unit.  Data can be transferred to/from either side.  Parallel accesses: both cross or neither cross. Conditionals Don’t Use Cross Paths. Conditionals Don’t Use Cross Paths.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 76 Code Review (using side A only) MVK.S140, A2; A2 = 40, loop count loop:LDH.D1*A5++, A0; A0 = a(n) LDH.D1*A6++, A1; A1 = x(n) MPY.M1A0, A1, A3; A3 = a(n) * x(n) ADD.L1A3, A4, A4; Y = Y + A3 SUB.L1A2, 1, A2; decrement loop count [A2]B.S1loop; if A2  0, branch [A2]B.S1loop; if A2  0, branch STH.D1A4, *A7; *A7 = Y Y = 40  a n x n n = 1 * Note: Assume that A4 was previously cleared and the pointers are initialised.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 77 Let us have a look at the final details concerning the functional units. Consider first the case of the.L and.S units.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 78 So where do the 40-bit registers come from? Operands - 32/40-bit Register, 5-bit Constant  Operands can be:  5-bit constants (or 16-bit for MVKL and MVKH).  32-bit registers.  40-bit Registers.  However, we have seen that registers are only 32-bit.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 79  A 40-bit register can be obtained by concatenating two registers.  However, there are 3 conditions that need to be respected:  The registers must be from the same side.  The first register must be even and the second odd.  The registers must be consecutive. Operands - 32/40-bit Register, 5-bit Constant

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 80 A1:A0A3:A2A5:A4A7:A6A9:A8A11:A10A13:A12A15:A14 odd even : bit Reg 40-bit RegB1:B0B3:B2B5:B4B7:B6B9:B8B11:B10B13:B12B15:B14 odd even : 32 8 Operands - 32/40-bit Register, 5-bit Constant  All combinations of 40-bit registers are shown below:

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S Operands - 32/40-bit Register, 5-bit Constant instr.unit,, instr.unit,,

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 82 Operands - 32/40-bit Register, 5-bit Constant instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 83 Operands - 32/40-bit Register, 5-bit Constant OR.L1 A0, A1, A2 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 84 Operands - 32/40-bit Register, 5-bit Constant OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 85 Operands - 32/40-bit Register, 5-bit Constant OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 ADD.L1 A2, A3, A5:A4 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 86 Operands - 32/40-bit Register, 5-bit Constant OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 ADD.L1 A2, A3, A5:A4 SUB.L1 A2, A5:A4, A5:A4 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 87 Operands - 32/40-bit Register, 5-bit Constant OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 ADD.L1 A2, A3, A5:A4 SUB.L1 A2, A5:A4, A5:A4 ADD.L2 3, B9:B8, B9:B8 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 88  To move the content of a register (A or B) to another register (B or A) use the move “MV” Instruction, e.g.: MV A0, B0 MV B6, B7  To move the content of a control register to another register (A or B) or vice-versa use the MVC instruction, e.g.: MVC IFR, A0 MVC A0, IRP Register to register data transfer

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 89 TMS320C6000 Instruction Set

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 90 'C62x Instruction Set (by category) Arithmetic ABS ADD ADDA ADDK ADD2 MPY MPYH NEG SMPY SMPYH SADD SAT SSUB SUB SUBA SUBC SUB2 ZERO Program Ctrl B IDLE NOP Logical AND CMPEQ CMPGT CMPLT NOT OR SHL SHR SSHL XOR Data Mgmt LDB/H/W MV MVC MVK MVKL MVKH MVKLH STB/H/W Bit Mgmt CLR EXT LMBD NORM SET Note: Refer to the 'C6000 CPU Reference Guide for more details.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 91 'C62x Instruction Set (by unit).S Unit MVKLH NEG NOT OR SET SHL SHR SSHL SUB SUB2 XOR ZERO ADD ADDK ADD2 AND B CLR EXT MV MVC MVK MVKL MVKH.M Unit SMPY SMPYH MPY MPYH.L Unit NOT OR SADD SAT SSUB SUB SUBC XOR ZERO ABS ADD AND CMPEQ CMPGT CMPLT LMBD MV NEG NORM.D Unit STB/H/W SUB SUBA ZERO ADD ADDA LDB/H/W MV NEG Other IDLENOP Note: Refer to the 'C6000 CPU Reference Guide for more details.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 92 ' C6700: Superset of Fixed-Point ( by unit ).L.D.S.M No Unit Used IDLENOP.S Unit NEG NOT OR SET SHL SHR SSHL SUB SUB2 XOR ZERO ADD ADDK ADD2 AND B CLR EXT MV MVC MVK MVKL MVKH ABSSP ABSDP CMPGTSP CMPEQSP CMPLTSP CMPGTDP CMPEQDP CMPLTDP RCPSP RCPDP RSQRSP RSQRDP SPDP.L Unit NOT OR SADD SAT SSUB SUB SUBC XOR ZERO ABS ADD AND CMPEQ CMPGT CMPLT LMBD MV NEG NORM ADDSP ADDDP SUBSP SUBDP INTSP INTDP SPINT DPINT SPRTUNC DPTRUNC DPSP.M Unit SMPY SMPYH MPY MPYH MPYLH MPYHL MPYSP MPYDP MPYI MPYID.D Unit NEG STB(B/H/W) SUB SUBAB (B/H/W) ZERO ADD ADDAB(B/H/W) ADDAD LDB(B/H/W) LDDW MV Note: Refer to the 'C6000 CPU Reference Guide for more details. ‘C67x

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 93 ‘C62x: Dual 32-Bit Load/Store ‘C64x: Dual 64-Bit Load/Store ‘C67x: Dual 64-Bit Load/32-Bit Store Instruction Decode Instruction Dispatch Instruction Fetch Control Registers Interrupt Control Emulation Registers (B0 - B15) D2 + M2 X S2 + + L2 + Registers (A0 - A15) D1 + M1 X S1 + + L1 + Advanced Instruction Packing Advanced Emulation Registers (B16 - B31)Registers (A16 - A31) + x x x x X x x x x X Superset of Fixed-Point

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 94 'C64x: Superset of ‘C62x Data Pack/Un PACK2 PACKH2 PACKLH2 PACKHL2 PACKH4 PACKL4 UNPKHU4 UNPKLU4 SWAP2/4 Dual/Quad Arith ABS2 ADD2 ADD4 MAX MIN SUB2 SUB4 SUBABS4 Bitwise Logical ANDN Shift & Merge SHLMB SHRMB Load Constant MVK (5-bit).L.D.S.M Bit Operations BITC4 BITR DEAL SHFL Move MVD Average AVG2 AVG4 Shifts ROTL SSHVL SSHVR Multiplies MPYHI MPYLI MPYHIR MPYLIR MPY2 SMPY2 DOTP2 DOTPN2 DOTPRSU2 DOTPNRSU2 DOTPU4 DOTPSU4 GMPY4 XPND2/4 Mem Access LDDW LDNW LDNDW STDW STNW STNDW Load Constant MVK (5-bit) Dual Arithmetic ADD2 SUB2 Bitwise Logical AND ANDN OR XOR Address Calc. ADDAD Data Pack/Un PACK2 PACKH2 PACKLH2 PACKHL2 UNPKHU4 UNPKLU4 SWAP2 SPACK2 SPACKU4 Dual/Quad Arith SADD2 SADDUS2 SADD4 Bitwise Logical ANDN Shifts & Merge SHR2 SHRU2 SHLMB SHRMB Compares CMPEQ2 CMPEQ4 CMPGT2 CMPGT4 Branches/PC BDEC BPOS BNOP ADDKPC

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 95 TMS320C6000 Memory

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 96 Memory size per device DevicesInternalEMIFAEMIFB C6201,C6701 C6204,C6205 P = 64 kB D = 64 kB 52M Bytes 52M Bytes (32-bits wide) (32-bits wide) N/A N/A C6202 P = 256 kB D = 128 kB C6203 P = 384 kB D = 512 kB C6211C6711 L1P= 4 kB L1D= 4 kB L2= 64 kB 128M Bytes (32-bits wide) (32-bits wide) N/A N/A C M Bytes (16-bits wide) (16-bits wide) C6713 L1P= 4 kB L1D= 4 kB L2= 256 kB 128M Bytes (32-bits wide) (32-bits wide)N/A C6411DM642 L1P= 16 kB L1D= 16 kB L2= 256 kB 128M Bytes (32-bits wide) (32-bits wide)N/A C6414C6415C6416 L1P=16 kB L1D=16 kB L2=1 MB 256M Bytes (64-bits wide) (64-bits wide) 64M Bytes 64M Bytes (16-bits wide) (16-bits wide)

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 97 Internal Memory Summary DevicesInternal(L2)External C6211C6711C kB 512M (32-bit wide) C kB 512M (16-bit wide) LINK: TMS320C6000 DSP Generation DevicesInternal(L2)ExternalC6414C6415C MB A:1GB (64-bit) B:256kB (16-bit) DM kB 1GB (64-bit) C kB 256MB (32-bit)

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 98 TMS320C6000 Peripherals

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 99 'C6x System Block Diagram PERIPHERALSPERIPHERALSPERIPHERALSPERIPHERALS Memory Internal Buses ExternalMemory CPU.D1.M1.L1.S1.D2.M2.L2.S2 Regs (B0-B15) Regs (A0-A15) Control Regs

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 100 ‘C6x Internal Buses A D Internal Memory x32 A D External Interface A D x32 Peripherals can perform 64-bit data loads. ‘C67x Data Addr- T1 x32 Data Data- T1 x32/64 Data Addr- T2x32 Data Data- T2 x32/64 Aregs Bregs Program Addrx32 Program Datax256 PC DMA Addr- Readx32 DMA Data- Readx32 DMA Addr- Writex32 DMA Data- Writex32 DMA

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 101 PERIPHERALSPERIPHERALSPERIPHERALSPERIPHERALS Memory 'C6x System Block Diagram Internal Buses CPU.D1.M1.L1.S1.D2.M2.L2.S2 Regs (B0-B15) Regs (A0-A15) Control Regs EMIF Ext’lMemory

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 102 PERIPHERALSPERIPHERALSPERIPHERALSPERIPHERALS 'C6x System Block Diagram Internal Buses CPU.D1.M1.L1.S1.D2.M2.L2.S2 Regs (B0-B15) Regs (A0-A15) Control Regs EMIF Ext’lMemory - Sync - Async Program RAM Data Ram Addr D (32)

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 103 'C6000 Peripherals External Memory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory Serial GPIO VCP / TCP DMA, EDMA (Boot) Timers PLL Parallel Comm EMIF Ethernet Video Ports

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 104EMIF.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory SDRAM External Memory Interface ( EMIF )   Glueless access to async/sync memory   Works with PC100/133 SDRAM (cheap, fast, and easy!)   Byte-wide data access   16, 32, or 64-bit bus widths External Memory Interface ( EMIF )   Glueless access to async/sync memory   Works with PC100/133 SDRAM (cheap, fast, and easy!)   Byte-wide data access   16, 32, or 64-bit bus widths Async SBSRAM EMIF

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 105 HPI / XBUS / PCI External Memory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory Parallel Communication Interfaces HPI:Dedicated, slave-only, async 16/32-bit bus allows host-  P access to C6000 memory XBUS:Similar to HPI but provides …  Master/slave and sync modes  Glueless i/f to FIFOs (up to single-cycle xfer rate) PCI:Standard 32-bit, 33MHz/66MHz PCI interface These interfaces provide means to bootstrap the C6000 Parallel Communication Interfaces HPI:Dedicated, slave-only, async 16/32-bit bus allows host-  P access to C6000 memory XBUS:Similar to HPI but provides …  Master/slave and sync modes  Glueless i/f to FIFOs (up to single-cycle xfer rate) PCI:Standard 32-bit, 33MHz/66MHz PCI interface These interfaces provide means to bootstrap the C6000 EMIF Parallel Comm

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 106GPIO External Memory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory GPIO EMIF General Purpose Input/Output (GPIO)   C64x and C6713 provide 8-16 bits of general purpose bit I/O   Use to observe or control the signal of a single-pin General Purpose Input/Output (GPIO)   C64x and C6713 provide 8-16 bits of general purpose bit I/O   Use to observe or control the signal of a single-pin Parallel Comm

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 107 McBSP and Utopia External Memory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory GPIO EMIF Multi-Channel Audio Serial Port ( McASP )   McBSP features plus more …   Up to 8 stereo lines (16 channels)   IIC support   On DM642, C6713 Utopia ( C64x )   ATM connection   50 MHz wide area network connectivity Multi-Channel Audio Serial Port ( McASP )   McBSP features plus more …   Up to 8 stereo lines (16 channels)   IIC support   On DM642, C6713 Utopia ( C64x )   ATM connection   50 MHz wide area network connectivity Parallel Comm Multi-Channel Buffered Serial Port ( McBSP )   2 (or 3) full-duplex, synchronous serial-ports   Up to 100 Mb/sec performance   Supports multi-channel operation (T1, E1, MVIP, …) Multi-Channel Buffered Serial Port ( McBSP )   2 (or 3) full-duplex, synchronous serial-ports   Up to 100 Mb/sec performance   Supports multi-channel operation (T1, E1, MVIP, …) Serial

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 108 DMA / EDMA External Memory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory GPIO EMIF Direct Memory Access (DMA / EDMA)   Transfers any set of memory locations to another   4 / 16 / 64 channels (transfer parameter sets)   Transfers can be triggered by any interrupt (sync)   Operates independent of CPU   On reset, provides bootstrap from memory Direct Memory Access (DMA / EDMA)   Transfers any set of memory locations to another   4 / 16 / 64 channels (transfer parameter sets)   Transfers can be triggered by any interrupt (sync)   Operates independent of CPU   On reset, provides bootstrap from memory Parallel Comm Serial DMA, EDMA (Boot)

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 109Timer/Counter External Memory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory GPIO EMIF Timer / Counter   Two (or three) 32-bit timer/counters   Can generate interrupts   Both input and output pins Timer / Counter   Two (or three) 32-bit timer/counters   Can generate interrupts   Both input and output pins Parallel Comm Serial DMA, EDMA (Boot) Timers

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 110 Ethernet MAC External Memory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory Serial GPIO VCP / TCP DMA, EDMA (Boot) Timers PLL Parallel Comm EMIF Ethernet Video Ports Ethernet (DM642 only)   10/100 Ethernet MAC   Pins are muxed with PCI   TCP/IP stack available from TI Ethernet (DM642 only)   10/100 Ethernet MAC   Pins are muxed with PCI   TCP/IP stack available from TI

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 111 Video Ports External Memory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory Serial GPIO VCP / TCP DMA, EDMA (Boot) Timers PLL Parallel Comm EMIF Ethernet Video Ports Video Ports (DM642 only)   Each configurable for Capture or Display   Dual 8/10-bit BT656 or raw modes   16/20-bit raw modes and 20-bit Y/C for high definition   Horz Scaling and Chroma Resampling Support for 8-bit modes   Supports transport interface mode Video Ports (DM642 only)   Each configurable for Capture or Display   Dual 8/10-bit BT656 or raw modes   16/20-bit raw modes and 20-bit Y/C for high definition   Horz Scaling and Chroma Resampling Support for 8-bit modes   Supports transport interface mode

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 112 VCP / TCP -- 3G Wireless External Memory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory McBSP’s Utopia GPIO DMA, EDMA (Boot) Timers EMIF Parallel Comm VCP / TCP PLL Video Ports Turbo Coprocessor (TCP) (C6416 only)   Supports 35 data channels at 384 kbps   3GPP / IS2000 Turbo coder   Programmable parameters include mode, rate and frame length Viterbi Coprocessor (VCP) (C6416 only)   Supports > 500 voice channels at 8 kbps   Programmable decoder parameters include constraint length, code rate, and frame length Turbo Coprocessor (TCP) (C6416 only)   Supports 35 data channels at 384 kbps   3GPP / IS2000 Turbo coder   Programmable parameters include mode, rate and frame length Viterbi Coprocessor (VCP) (C6416 only)   Supports > 500 voice channels at 8 kbps   Programmable decoder parameters include constraint length, code rate, and frame length

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 113 External Memory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory GPIO DMA, EDMA (Boot) Timers EMIF Phase Locked Loop (PLL) Input  CLKIN Output   CLKOUT1   CLKOUT2 (reduced rate clkout) Input  CLKIN Output   CLKOUT1   CLKOUT2 (reduced rate clkout) Parallel Comm Serial VCP / TCP Timers PLL Ethernet Video Ports DMA, EDMA (Boot) PLL  Clock multiplier  Reduces EMI and cost  Rate is Pin selectable PLL  Clock multiplier  Reduces EMI and cost  Rate is Pin selectable

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 114 Clock Cycle What is a clock cycle? C6000 CLKIN CLKOUT2 (½, ¼, or 1/6 CLKOUT1) CLKOUT1 (C6000 clock cycle) PLL When we talk about cycles... The time between successive instructions CLKIN (MHz) PLL Rate CPU Clock Frequency CPU Clock Cycle Time MIPs (max) 60x MHz 1.39 ns x MHz 3.33 ns x4 200 MHz 5 ns x4 100 MHz 10 ns 800

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 115 'C6000 Peripherals Summary.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory External Memory GPIO EMIF Parallel Comm Serial Timers VCP / TCP Timers PLL Ethernet Video Ports DMA, EDMA (Boot)

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 116 ‘C6x Family Part Numbering  Example = TMS320LC6201PKGA200  TMS320= TI DSP  L = Place holder for voltage levels  C6 = C6x family  2 = Fixed-point core  01 = Memory/peripheral configuration  PKG= Pkg designator (actual letters TBD)  A = -40 to 85C (blank for 0 to 70C)  200 = Core CPU speed in Mhz

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 117 Module 1 Exam 1. Functional Units a. How many can perform an ADD? Name them. a. How many can perform an ADD? Name them. b. Which support memory loads/stores? b. Which support memory loads/stores?.M.S.D.L 2. Memory Map a. How many external ranges exist on ‘C6201?

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide Conditional Code a.Which registers can be used as cond’l registers? b.Which instructions can be conditional? 4. Performance a.What is the 'C6711 instruction cycle time? b.How can the 'C6711 execute 1200 MIPs?

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide Coding Problems a. Move contents of A0-->A1 a. Move contents of A0-->A1

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide Coding Problems a. Move contents of A0-->A1 a. Move contents of A0-->A1 b. Move contents of CSR-->A1 b. Move contents of CSR-->A1 c. Clear register A5

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide Coding Problems (cont’d) d. A2 = A0 2 + A1 d. A2 = A0 2 + A1 e. If (B1  0) then B2 = B5 * B6 e. If (B1  0) then B2 = B5 * B6 f. A2 = A0 * A g. Load an unsigned constant (19ABCh) into register A6.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide Coding Problems (cont’d) h.Load A7 with contents of mem1 and post- increment the selected pointer. h.Load A7 with contents of mem1 and post- increment the selected pointer.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 123 Module 1 Exam (solution) 1. Functional Units a. How many can perform an ADD? Name them. a. How many can perform an ADD? Name them. b. Which support memory loads/stores? b. Which support memory loads/stores?.M.S.D.L six;.L1,.L2,.D1,.D2,.S1,.S2 2. Memory Map a. How many external ranges exist on ‘C6201? Four

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide Conditional Code a.Which registers can be used as cond’l registers? b.Which instructions can be conditional? 4. Performance a.What is the 'C6711 instruction cycle time? b.How can the 'C6711 execute 1200 MIPs? A1, A2, B0, B1, B2 All of them CLKOUT MIPs = 8 instructions (units) x 150 MHz

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide Coding Problems a. Move contents of A0-->A1 a. Move contents of A0-->A1 MV.L1A0, A1 orADD.S1A0, 0, A1 orMPY.M1A0, 1, A1 (what’s the problem with this?)

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide Coding Problems a. Move contents of A0-->A1 a. Move contents of A0-->A1 b. Move contents of CSR-->A1 b. Move contents of CSR-->A1 c. Clear register A5 MV.L1A0, A1 orADD.S1A0, 0, A1 orMPY.M1A0, 1, A1 ZERO.S1A5 orSUB.L1A5, A5, A5 orMPY.M1A5, 0, A5 orCLR.S1A5, 0, 31, A5 orMVK.S10, A5 orXOR.L1A5,A5,A5 (A0 can only be a (A0 can only be a 16-bit value) MVCCSR, A1

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide Coding Problems (cont’d) d. A2 = A0 2 + A1 d. A2 = A0 2 + A1 e. If (B1  0) then B2 = B5 * B6 e. If (B1  0) then B2 = B5 * B6 f. A2 = A0 * A g. Load an unsigned constant (19ABCh) into register A6. MPY.M1A0, A0, A2 ADD.L1A2, A1, A2 [B1] MPY.M2B5, B6, B2 mvkl.s10x00019abc,a6 mvkh.s10x00019abc,a6 value.equ0x00019abc mvkl.s1value,a6 mvkh.s1value,a6 MPYA0, A1, A2 ADD10, A2, A2

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide Coding Problems (cont’d) h.Load A7 with contents of mem1 and post- increment the selected pointer. h.Load A7 with contents of mem1 and post- increment the selected pointer. x16 mem mem1 10h A7 load_mem1:MVKL.S1mem1,A6 load_mem1:MVKL.S1mem1,A6 MVKH.S1mem1,A6 LDH.D1*A6++,A7

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 129Architecture  Links:  C6711 data sheet: tms320c6711.pdf tms320c6711.pdf  C6713 data sheet: tms320c6713.pdf tms320c6713.pdf  tms320c6416.pdf  C6416 data sheet: tms320c6416.pdftms320c6416.pdf  User guide: spru189f.pdf spru189f.pdf  Errata: sprz173c.pdf sprz173c.pdf

Chapter 2 TMS320C6000 Architectural Overview - End -