Chapter 2 TMS320C6000 Architectural Overview. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 2  Describe C6000 CPU.

Slides:



Advertisements
Similar presentations
TMS320C6xx Instruction Set TMS320 C6xx
Advertisements

Chapter 9 Bootloader.
Architecture and Instruction Set of the C6x Processor Module 1.
Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.
TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.
TMS320C6713 Assembly Language (cont’d). Module 1 Exam (solution) 1. Functional Units a. How many can perform an ADD? Name them. a. How many can perform.
Lecture 6 Programming the TMS320C6x Family of DSPs.
Assembly and Linear Assembly Evgeny Kirshin, 05/10/2011
TMS320 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Overview A.A
TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.
DSP Module 1 Open Exam. Module 1 Exam You have 20 minutes to complete the exam. Exam is open mind, open book, open eyes. Sharing answers, cheating, asking.
TMS320C6000 Architectural and Programming Overview.
Slider param_definition( minVal, maxVal, increment, pageIncrement, paramName ) { statements } param_definitionParameter description that is printed on.
TMS320C6000 Architectural Overview.  Describe C6000 CPU architecture.  Introduce some basic instructions.  Describe the C6000 memory map.  Provide.
ECSE DSP architecture Review of basic computer architecture concepts C6000 architecture: VLIW Principle and Scheduling Addressing Assembly and linear.
Chapter 9 Bootloader. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 9, Slide 2 Learning Objectives  Need for a bootloader.
Chapter 10 Interrupts. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 10, Slide 2 Learning Objectives  Introduction to interrupts.
TECHNICAL SEMINAR BINAY KUMAR MISHRA 1 TMS320C6211 AND MICRO-50 EB DIGITAL SIGNAL PROCESSORS PRESENTED BY BINAY KUMAR MISHRA ROLL # EI
Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02.
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337.
ajay patil 1 TMS320C6000 Assembly Language and its Rules Assignment One of the simplest operations in C is to assign a constant to a variable: One.
MICROPROCESSORS Dr. Hugh Blanton ENTC TMS320C6x INSTRUCTION SET.
INTRODUCTION TO THE TMS320C6x VLIW DSP Prof. Brian L. Evans in collaboration with Niranjan Damera-Venkata and Magesh Valliappan Embedded Signal Processing.
Intel 8051 Another family of microcontroller is the Intel 8051 family. The basic 8051 microcontroller has four parallel input/output ports, port 0, 1,
DSP C5000 Chapter 10 Understanding and Programming the Host Port Interface (EHPI) Copyright © 2003 Texas Instruments. All rights reserved.
Computer Organization Instructions Language of The Computer (MIPS) 2.
Logic Gates Dr.Ahmed Bayoumi Dr.Shady Elmashad. Objectives  Identify the basic gates and describe the behavior of each  Combine basic gates into circuits.
I NTEL 8086 M icroprocessor بسم الله الرحمن الرحيم 1.
Chapter 2 TMS320C6000 Architectural Overview. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 2  Describe C6000 CPU.
Dr.Ahmed Bayoumi Dr.Shady Elmashad
ELEC 418 Advanced Digital Systems Dr. Ron Hayne
Addressing Modes in Microprocessors
Single-Cycle Datapath and Control
Control Unit Lecture 6.
TMS320C6713 Assembly Language
History – 2 Intel 8086.
Chap 7. Register Transfers and Datapaths
/ Computer Architecture and Design
Lecture 1 Introduction.
Morgan Kaufmann Publishers The Processor
Microcomputer Programming
Details .L and .S units TMS320C6000.
Assembly Lang. – Intel 8086 Addressing modes – 1
CS/COE0447 Computer Organization & Assembly Language
An Introduction to Microprocessor Architecture using intel 8085 as a classic processor
Chapter 5 The LC-3.
Subject Name: Digital Signal Processing Algorithms & Architecture
Interfacing Memory Interfacing.
SCHOOL OF ELECTRONICS ENGINEERING Electronics and Communication
The TMS320C6x Family of DSPs
Single-Cycle CPU DataPath.
Chapter 10 Interrupts.
Chapter 8 Central Processing Unit
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
Unit – Microcontroller Tutorial Class - 2 ANITS College
Unit 12 CPU Design & Programming
Recall: ROM example Here are three functions, V2V1V0, implemented with an 8 x 3 ROM. Blue crosses (X) indicate connections between decoder outputs and.
INTRODUCTION TO DIGITAL SIGNAL PROCESSORS (DSPs)
Data manipulation instructions
Branch instructions We’ll implement branch instructions for the eight different conditions shown here. Bits 11-9 of the opcode field will indicate the.
Chapter 12 Software Optimisation
COMS 361 Computer Organization
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
Digital Signal Processors-1
Control units In the last lecture, we introduced the basic structure of a control unit, and translated our assembly instructions into a binary representation.
Chapter 9 Bootloader.
INTRODUCTION TO THE TMS320C6000 VLIW DSP
ADSP 21065L.
Computer Operation 6/22/2019.
Presentation transcript:

Chapter 2 TMS320C6000 Architectural Overview

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 2  Describe C6000 CPU architecture.  Introduce some basic instructions.  Describe the C6000 memory map.  Provide an overview of the peripherals. Learning Objectives

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 3 General DSP System Block Diagram PERIPHERALSPERIPHERALSPERIPHERALSPERIPHERALS CentralProcessing Unit Unit Internal Memory Internal Buses ExternalMemory

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 4 Implementation of Sum of Products (SOP) Implementation of Sum of Products (SOP) It has been shown in Chapter 1 that SOP is the key element for most DSP algorithms. So let’s write the code for this algorithm and at the same time discover the C6000 architecture. Two basic operations are required for this algorithm. (1) Multiplication (2) Addition Therefore two basic instructions are required Y = N  a n x n n = 1 * = a 1 * x 1 + a 2 * x a N * x N

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 5 Two basic operations are required for this algorithm. (1) Multiplication (2) Addition Therefore two basic instructions are required Implementation of Sum of Products (SOP) Implementation of Sum of Products (SOP) Y = N  a n x n n = 1 * So let’s implement the SOP algorithm! The implementation in this module will be done in assembly. = a 1 * x 1 + a 2 * x a N * x N

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 6 Multiply (MPY) The multiplication of a 1 by x 1 is done in assembly by the following instruction: MPYa1, x1, Y This instruction is performed by a multiplier unit that is called “.M” Y = N  a n x n n = 1 * = a 1 * x 1 + a 2 * x a N * x N

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 7 Multiply (.M unit).M.M Y = 40  a n x n n = 1 * The. M unit performs multiplications in hardware MPY.Ma1, x1, Y Note: 16-bit by 16-bit multiplier provides a 32-bit result. 32-bit by 32-bit multiplier provides a 64-bit result.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 8 Addition (.?).M.M.?.? Y = 40  a n x n n = 1 * MPY.Ma1, x1, prod ADD.?Y, prod, Y

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 9 Add (.L unit).M.M.L.L Y = 40  a n x n n = 1 * MPY.Ma1, x1, prod ADD.LY, prod, Y RISC processors such as the C6000 use registers to hold the operands, so lets change this code.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 10 Register File - A Y = 40  a n x n n = 1 * MPY.Ma1, x1, prod ADD.LY, prod, Y.M.M.L.L A0A1A2A3A4A15 Register File A a1 x1 prod 32-bits Y Let us correct this by replacing a, x, prod and Y by the registers as shown above.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 11 Specifying Register Names Y = 40  a n x n n = 1 * MPY.MA0, A1, A3 ADD.LA4, A3, A4 The registers A0, A1, A3 and A4 contain the values to be used by the instructions..M.M.L.L A0A1A2A3A4A15 Register File A a1 x1 prod 32-bits Y

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 12 Specifying Register Names Y = 40  a n x n n = 1 * MPY.MA0, A1, A3 ADD.LA4, A3, A4 Register File A contains 16 registers (A0 -A15) which are 32-bits wide..M.M.L.L A0A1A2A3A4A15 Register File A a1 x1 prod 32-bits Y

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 13 Data loading Q: How do we load the operands into the registers?.M.M.L.L A0A1A2A3A4A15 Register File A a1 x1 prod 32-bits Y

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 14 Load Unit “.D” A: The operands are loaded into the registers by loading them from the memory using the.D unit..M.M.L.L A0A1A2A3A15 Register File A a1 x1 prod 32-bits Y.D.D Data Memory Q: How do we load the operands into the registers?

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 15 Load Unit “.D” It is worth noting at this stage that the only way to access memory is through the.D unit..M.M.L.L A0A1A2A3A15 Register File A a1 x1 prod 32-bits Y.D.D Data Memory

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 16 Load Instruction Q: Which instruction(s) can be used for loading operands from the memory to the registers?.M.M.L.L A0A1A2A3A15 Register File A a1 x1 prod 32-bits Y.D.D Data Memory

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 17 Load Instructions (LDB, LDH,LDW,LDDW) A: The load instructions..M.M.L.L A0A1A2A3A15 Register File A a1 x1 prod 32-bits Y.D.D Data Memory Q: Which instruction(s) can be used for loading operands from the memory to the registers?

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 18 Using the Load Instructions Data 16-bits Before using the load unit you have to be aware that this processor is byte addressable, which means that each byte is represented by a unique address. Also the addresses are 32-bit wide. address FFFFFFFF

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 19 The syntax for the load instruction is: Where: Rn is a register that contains the address of the operand to be loaded and Rm is the destination register. Using the Load Instructions Data a1 x1 prod 16-bits Y address FFFFFFFF LD *Rn,Rm

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 20 The syntax for the load instruction is: The question now is how many bytes are going to be loaded into the destination register? Using the Load Instructions Data a1 x1 prod 16-bits Y address FFFFFFFF LD *Rn,Rm

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 21 The syntax for the load instruction is: LD *Rn,Rm Using the Load Instructions Data a1 x1 prod 16-bits Y address FFFFFFFF The answer, is that it depends on the instruction you choose: LDB: loads one byte (8-bit) LDB: loads one byte (8-bit) LDH: loads half word (16-bit) LDH: loads half word (16-bit) LDW: loads a word (32-bit) LDW: loads a word (32-bit) LDDW: loads a double word (64-bit) LDDW: loads a double word (64-bit) Note: LD on its own does not exist.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 22 Using the Load Instructions Data 16-bits address FFFFFFFF 0xB0xA 0xD0xC Example: If we assume that A5 = 0x4 then: (1) LDB *A5, A7 ; gives A7 = 0x (2) LDH *A5,A7; gives A7 = 0x (3) LDW *A5,A7; gives A7 = 0x (4) LDDW *A5,A7:A6; gives A7:A6 = 0x x10x2 0x30x4 0x50x6 0x70x8 The syntax for the load instruction is: LD *Rn,Rm 01

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 23 Using the Load Instructions Data 16-bits address FFFFFFFF 0xB0xA 0xD0xC Question: If data can only be accessed by the load instruction and the.D unit, how can we load the register pointer Rn in the first place? 0x10x2 0x30x4 0x50x6 0x70x8 The syntax for the load instruction is: LD *Rn,Rm

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 24  The instruction MVKL will allow a move of a 16-bit constant into a register as shown below: MVKL.?a, A5 (‘a’ is a constant or label)  How many bits represent a full address? 32 bits  So why does the instruction not allow a 32-bit move? All instructions are 32-bit wide (see instruction opcode). Loading the Pointer Rn

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 25  To solve this problem another instruction is available: MVKH Loading the Pointer Rn eg. MVKH.?a, A5 (‘a’ is a constant or label) ah ahx al a A5 MVKL a, A5 MVKH a, A5  Finally, to move the 32-bit address to a register we can use:

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 26 Loading the Pointer Rn MVKL0x1234FABC, A5 A5 = 0xFFFFFABC ; Wrong Example 1 A5 = 0x MVKL0x1234FABC, A5 A5 = 0xFFFFFABC (sign extension) MVKH0x1234FABC, A5 A5 = 0x1234FABC ; OK Example 2 MVKH0x1234FABC, A5 A5 = 0x  Always use MVKL then MVKH, look at the following examples:

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 27 LDH, MVKL and MVKH.M.M.L.L A0A1A2A3A15 Register File A a x prod 32-bits Y.D.D Data Memory MVKL pt1, A5 MVKL pt1, A5 MVKH pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKL pt2, A6 MVKH pt2, A6 MVKH pt2, A6 LDH.D*A5, A0 LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 28 Creating a loop MVKL pt1, A5 MVKL pt1, A5 MVKH pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKL pt2, A6 MVKH pt2, A6 MVKH pt2, A6 LDH.D*A5, A0 LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 So far we have only implemented the SOP for one tap only, i.e. Y= a 1 * x 1 So let’s create a loop so that we can implement the SOP for N Taps.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 29 Creating a loop With the C6000 processors there are no dedicated instructions such as block repeat. The loop is created using the B instruction. So far we have only implemented the SOP for one tap only, i.e. Y= a 1 * x 1 So let’s create a loop so that we can implement the SOP for N Taps.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 30 What are the steps for creating a loop 1. Create a label to branch to. 2. Add a branch instruction, B. 3.Create a loop counter. 4. Add an instruction to decrement the loop counter. 5. Make the branch conditional based on the value in the loop counter.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide Create a label to branch to MVKL pt1, A5 MVKL pt1, A5 MVKH pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKL pt2, A6 MVKH pt2, A6 MVKH pt2, A6 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 32 MVKL pt1, A5 MVKL pt1, A5 MVKH pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKL pt2, A6 MVKH pt2, A6 MVKH pt2, A6 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 B.?loop 2. Add a branch instruction, B.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 33 Which unit is used by the B instruction?.M.M.L.L A0A1A2A3A15 Register File A a x prod 32-bits Y.D.D.M.M.L.L A0A1A2A3A a x prod 32-bits Y.D.D Data Memory.S.S MVKLpt1, A5 MVKLpt1, A5 MVKH pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKL pt2, A6 MVKH pt2, A6 MVKH pt2, A6 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 B.?loop

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 34 Data Memory Which unit is used by the B instruction?.M.M.L.L A0A1A2A3A15 Register File A a x prod 32-bits Y.D.D.M.M.L.L A0A1A2A3A a x prod 32-bits Y.D.D.S.S MVKL.S pt1, A5 MVKL.S pt1, A5 MVKH.S pt1, A5 MVKH.S pt1, A5 MVKL.S pt2, A6 MVKL.S pt2, A6 MVKH.S pt2, A6 MVKH.S pt2, A6 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 B.Sloop

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 35 Data Memory 3. Create a loop counter..M.M.L.L A0A1A2A3A15 Register File A a x prod 32-bits Y.D.D.M.M.L.L A0A1A2A3A a x prod 32-bits Y.D.D.S.S MVKL.S pt1, A5 MVKL.S pt1, A5 MVKH.S pt1, A5 MVKH.S pt1, A5 MVKL.S pt2, A6 MVKL.S pt2, A6 MVKH.S pt2, A6 MVKH.S pt2, A6 MVKL.Scount, B0 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 B.Sloop B registers will be introduced later

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide Decrement the loop counter.M.M.L.L A0A1A2A3A15 Register File A a x prod 32-bits Y.D.D Data Memory.M.M.L.L A0A1A2A3A15 Register File A a x prod 32-bits Y.D.D.S.S MVKL.S pt1, A5 MVKL.S pt1, A5 MVKH.S pt1, A5 MVKH.S pt1, A5 MVKL.S pt2, A6 MVKL.S pt2, A6 MVKH.S pt2, A6 MVKH.S pt2, A6 MVKL.Scount, B0 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 B.Sloop

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 37  What is the syntax for making instruction conditional? [condition] InstructionLabel e.g. [B1]Bloop (1) The condition can be one of the following registers: A1, A2, B0, B1, B2. (2) Any instruction can be conditional. 5. Make the branch conditional based on the value in the loop counter

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 38  The condition can be inverted by adding the exclamation symbol “!” as follows: [!condition] InstructionLabel e.g. [!B0]Bloop ;branch if B0 = 0 [B0]Bloop;branch if B0 != 0 5. Make the branch conditional based on the value in the loop counter

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 39 Data Memory.M.M.L.L A0A1A2A3A15 Register File A a x prod 32-bits Y.D.D.M.M.L.L A0A1A2A3A a x prod 32-bits Y.D.D.S.S MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop 5. Make the branch conditional

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 40  Case 1: B.S1 label  Relative branch.  Label limited to +/ offset. More on the Branch Instruction (1)  With this processor all the instructions are encoded in a 32-bit.  Therefore the label must have a dynamic range of less than 32-bit as the instruction B has to be coded. 21-bit relative address B32-bit

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 41 More on the Branch Instruction (2)  By specifying a register as an operand instead of a label, it is possible to have an absolute branch.  This will allow a dynamic range of  Case 2: B.S2register  Absolute branch.  Operates on.S2 ONLY! 5-bit register code B32-bit

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 42 Testing the code This code performs the following operations: operations: a 0 *x 0 + a 0 *x 0 + a 0 *x 0 + … + a 0 *x 0 However, we would like to perform: a 0 *x 0 + a 1 *x 1 + a 2 *x 2 + … + a N *x N MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 43 Modifying the pointers The solution is to modify the pointers A5 and A6. MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 44 Indexing Pointers Description Pointer Syntax Pointer Modified *R*R*R*RNo R can be any register In this case the pointers are used but not modified.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 45 Indexing Pointers Description Pointer + Pre-offset - Pre-offset Syntax Pointer Modified *R *+R[ disp ] *-R[ disp ] NoNoNo  [disp] specifies the number of elements size in DW (64-bit), W (32-bit), H (16-bit), or B (8-bit).  disp = R or 5-bit constant.  R can be any register. In this case the pointers are modified BEFORE being used and RESTORED to their previous values.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 46 Indexing Pointers Description Pointer + Pre-offset - Pre-offset Pre-incrementPre-decrement Syntax Pointer Modified *R *+R[ disp ] *-R[ disp ] *++R[ disp ] *--R[ disp ] NoNoNoYesYes In this case the pointers are modified BEFORE being used and NOT RESTORED to their Previous Values.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 47 Indexing Pointers Description Pointer + Pre-offset - Pre-offset Pre-incrementPre-decrementPost-incrementPost-decrement Syntax Pointer Modified *R *+R[ disp ] *-R[ disp ] *++R[ disp ] *--R[ disp ] *R++[ disp ] *R--[ disp ] NoNoNoYesYesYesYes In this case the pointers are modified AFTER being used and NOT RESTORED to their Previous Values.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 48 Indexing Pointers Description Pointer + Pre-offset - Pre-offset Pre-incrementPre-decrementPost-incrementPost-decrement Syntax Pointer Modified *R *+R[ disp ] *-R[ disp ] *++R[ disp ] *--R[ disp ] *R++[ disp ] *R--[ disp ] NoNoNoYesYesYesYes  [disp] specifies # elements - size in DW, W, H, or B.  disp = R or 5-bit constant.  R can be any register.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 49 Modify and testing the code This code now performs the following operations: operations: a 0 *x 0 + a 1 *x 1 + a 2 *x a N *x N MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5++, A0 LDH.D*A6++, A1 LDH.D*A6++, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 50 Store the final result This code now performs the following operations: operations: a 0 *x 0 + a 1 *x 1 + a 2 *x a N *x N MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5++, A0 LDH.D*A6++, A1 LDH.D*A6++, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop STH.DA4, *A7

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 51 Store the final result The Pointer A7 has not been initialised. MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5++, A0 LDH.D*A6++, A1 LDH.D*A6++, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop STH.DA4, *A7

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 52 Store the final result The Pointer A7 is now initialised. MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2pt3, A7 MVKH.S2pt3, A7 MVKL.S2count, B0 loop LDH.D*A5++, A0 LDH.D*A6++, A1 LDH.D*A6++, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop STH.DA4, *A7

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 53 What is the initial value of A4? A4 is used as an accumulator, so it needs to be reset to zero. so it needs to be reset to zero. MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2pt3, A7 MVKH.S2pt3, A7 MVKL.S2count, B0 ZERO.LA4 loop LDH.D*A5++, A0 LDH.D*A6++, A1 LDH.D*A6++, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop STH.DA4, *A7

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 54 How can we add more processing power to this processor?.S1.S1.M1.M1.L1.L1.D1.D1 A0 A1 A2 A3 A4 Register File A Data Memory A15 32-bits Increasing the processing power!

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 55 (1)Increase the clock frequency..S1.S1.M1.M1.L1.L1.D1.D1 A0 A1 A2 A3 A4 Register File A Data Memory A15 32-bits Increasing the processing power! (2)Increase the number of Processing units.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 56 To increase the Processing Power, this processor has two sides (A and B or 1 and 2) Data Memory.S1.S1.M1.M1.L1.L1.D1.D1 A0 A1 A2 A3 A4 Register File A A15 32-bits.S2.S2.M2.M2.L2.L2.D2.D2 B0 B1 B2 B3 B4 Register File B B15 32-bits

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 57 Can the two sides exchange operands in order to increase performance? Data Memory.S1.S1.M1.M1.L1.L1.D1.D1 A0 A1 A2 A3 A4 Register File A A15 32-bits B15.S2.S2.M2.M2.L2.L2.D2.D2 B0 B1 B2 B3 B4 Register File B bits

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 58 The answer is YES but there are limitations.  To exchange operands between the two sides, some cross paths or links are required. What is a cross path?  A cross path links one side of the CPU to the other.  There are two types of cross paths:  Data cross paths.  Address cross paths.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 59 Data Cross Paths  Data cross paths can also be referred to as register file cross paths.  These cross paths allow operands from one side to be used by the other side.  There are only two cross paths:  one path which conveys data from side B to side A, 1X.  one path which conveys data from side A to side B, 2X.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 60 TMS320C67x Data-Path

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 61 Data Cross Paths  Data cross paths only apply to the.L,.S and.M units.  The data cross paths are very useful, however there are some limitations in their use.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 62 Data Cross Path Limitations A 2x.L1.M1.S1 B 1x <src> <src> <dst> (1) The destination register must be on same side as unit. (2) Source registers - up to one cross path per execute packet per side. Execute packet: group of instructions that execute simultaneously.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 63 Cross Path Limitations Data Cross Path LimitationsA 2x.L1.M1.S1 B 1x <src> <src> <dst> eg: ADD.L1x A0,A1,B2 MPY.M1x A0,B6,A9 SUB.S1x A8,B2,A8 || ADD.L1x A0,B0,A2 || Means that the SUB and ADD belong to the same fetch packet, therefore execute simultaneously.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 64 Data Cross Path Limitations A 2x.L1.M1.S1 B 1x <src> <src> <dst> eg: ADD.L1x A0,A1,B2 MPY.M1x A0,B6,A9 SUB.S1x A8,B2,A8 || ADD.L1x A0,B0,A2 NOT VALID!

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 65 Data Cross Paths for both sides A 2x.L1.M1.S1 B 1x <src> <src> <dst>.L2.M2.S2 <dst> <src> <src>

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 66 Address cross paths.D1 A Addr Data LDW.D1T1 *A0,A5 STW.D1T1 A5,*A0 LDW.D1T1 *A0,A5 STW.D1T1 A5,*A0 (1) The pointer must be on the same side of the unit.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 67 Load or store to either side.D1 A *A0 B Data1A5 Data2B5 DA1 = T1 DA2 = T2 LDW.D1T1 *A0,A5 LDW.D1T2 *A0,B5 LDW.D1T1 *A0,A5 LDW.D1T2 *A0,B5

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 68 Standard Parallel Loads.D1 A A5 *A0 B B5.D2 Data1 *B0 LDW.D1T1 *A0,A5 LDW.D1T1 *A0,A5 || LDW.D2T2 *B0,B5 LDW.D1T1 *A0,A5 LDW.D1T1 *A0,A5 || LDW.D2T2 *B0,B5 DA1 = T1 DA2 = T2

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 69 Parallel Load/Store using address cross paths.D1 A A5 *A0 B B5.D2 Data1 *B0 LDW.D1T2 *A0,B5 LDW.D1T2 *A0,B5 || STW.D2T1 A5,*B0 LDW.D1T2 *A0,B5 LDW.D1T2 *A0,B5 || STW.D2T1 A5,*B0 DA1 = T1 DA2 = T2

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 70 Fill the blanks... Does this work?.D1 A *A0 B.D2 Data1 *B0 LDW.D1__ *A0,B5 LDW.D1__ *A0,B5 || STW.D2__ B6,*B0 LDW.D1__ *A0,B5 LDW.D1__ *A0,B5 || STW.D2__ B6,*B0 DA1 = T1 DA2 = T2

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 71 Not Allowed!.D1 A *A0 B B5B6.D2 Data1 *B0 LDW.D1T2 *A0,B5 LDW.D1T2 *A0,B5 || STW.D2T2 B6,*B0 LDW.D1T2 *A0,B5 LDW.D1T2 *A0,B5 || STW.D2T2 B6,*B0 DA2 = T2

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 72 Not Allowed! Parallel accesses: both cross or neither cross.D1 A *A0 B B5B6.D2 Data1 *B0 LDW.D1T2 *A0,B5 LDW.D1T2 *A0,B5 || STW.D2T2 B6,*B0 LDW.D1T2 *A0,B5 LDW.D1T2 *A0,B5 || STW.D2T2 B6,*B0 DA2 = T2

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 73 Conditions Don’t Use Cross Paths  If a conditional register comes from the opposite side, it does NOT use a data or address cross-path.  Examples: [B2] ADD.L1 A2,A0,A4 [A1] LDW.D2 *B0,B5

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 74 CPU Ref Guide Full CPU Datapath (Pg 2-2) ‘C62x Data-Path Summary

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 75 ‘C67x Data-Path Summary ‘C67x

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 76 Cross Paths - Summary  Data  Destination register on same side as unit.  Source registers - up to one cross path per execute packet per side.  Use “x” to indicate cross-path.  Address  Pointer must be on same side as unit.  Data can be transferred to/from either side.  Parallel accesses: both cross or neither cross. Conditionals Don’t Use Cross Paths. Conditionals Don’t Use Cross Paths.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 77 Code Review (using side A only) MVK.S140, A2; A2 = 40, loop count loop:LDH.D1*A5++, A0; A0 = a(n) LDH.D1*A6++, A1; A1 = x(n) MPY.M1A0, A1, A3; A3 = a(n) * x(n) ADD.L1A3, A4, A4; Y = Y + A3 SUB.L1A2, 1, A2; decrement loop count [A2]B.S1loop; if A2  0, branch [A2]B.S1loop; if A2  0, branch STH.D1A4, *A7; *A7 = Y Y = 40  a n x n n = 1 * Note: Assume that A4 was previously cleared and the pointers are initialised.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 78 Let us have a look at the final details concerning the functional units. Consider first the case of the.L and.S units.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 79 So where do the 40-bit registers come from? Operands - 32/40-bit Register, 5-bit Constant  Operands can be:  5-bit constants (or 16-bit for MVKL and MVKH).  32-bit registers.  40-bit Registers.  However, we have seen that registers are only 32-bit.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 80  A 40-bit register can be obtained by concatenating two registers.  However, there are 3 conditions that need to be respected:  The registers must be from the same side.  The first register must be even and the second odd.  The registers must be consecutive. Operands - 32/40-bit Register, 5-bit Constant

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 81 A1:A0A3:A2A5:A4A7:A6A9:A8A11:A10A13:A12A15:A14 odd even : bit Reg 40-bit RegB1:B0B3:B2B5:B4B7:B6B9:B8B11:B10B13:B12B15:B14 odd even : 32 8 Operands - 32/40-bit Register, 5-bit Constant  All combinations of 40-bit registers are shown below:

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S Operands - 32/40-bit Register, 5-bit Constant instr.unit,, instr.unit,,

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 83 Operands - 32/40-bit Register, 5-bit Constant instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 84 Operands - 32/40-bit Register, 5-bit Constant OR.L1 A0, A1, A2 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 85 Operands - 32/40-bit Register, 5-bit Constant OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 86 Operands - 32/40-bit Register, 5-bit Constant OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 ADD.L1 A2, A3, A5:A4 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 87 Operands - 32/40-bit Register, 5-bit Constant OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 ADD.L1 A2, A3, A5:A4 SUB.L1 A2, A5:A4, A5:A4 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 88 Operands - 32/40-bit Register, 5-bit Constant OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 ADD.L1 A2, A3, A5:A4 SUB.L1 A2, A5:A4, A5:A4 ADD.L2 3, B9:B8, B9:B8 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 89  To move the content of a register (A or B) to another register (B or A) use the move “MV” Instruction, e.g.: MV A0, B0 MV B6, B7  To move the content of a control register to another register (A or B) or vice-versa use the MVC instruction, e.g.: MVC IFR, A0 MVC A0, IRP Register to register data transfer

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 90 TMS320C6211/6711 Instruction Set

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 91 'C6211 Instruction Set (by category) Arithmetic ABS ADD ADDA ADDK ADD2 MPY MPYH NEG SMPY SMPYH SADD SAT SSUB SUB SUBA SUBC SUB2 ZERO Program Ctrl B IDLE NOP Logical AND CMPEQ CMPGT CMPLT NOT OR SHL SHR SSHL XOR Data Mgmt LDB/H/W MV MVC MVK MVKL MVKH MVKLH STB/H/W Bit Mgmt CLR EXT LMBD NORM SET Note: Refer to the 'C6000 CPU Reference Guide for more details.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 92 'C6211 Instruction Set (by unit).S Unit MVKLH NEG NOT OR SET SHL SHR SSHL SUB SUB2 XOR ZERO ADD ADDK ADD2 AND B CLR EXT MV MVC MVK MVKL MVKH.M Unit SMPY SMPYH MPY MPYH.L Unit NOT OR SADD SAT SSUB SUB SUBC XOR ZERO ABS ADD AND CMPEQ CMPGT CMPLT LMBD MV NEG NORM.D Unit STB/H/W SUB SUBA ZERO ADD ADDA LDB/H/W MV NEG Other IDLENOP Note: Refer to the 'C6000 CPU Reference Guide for more details.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 93 ‘C6711 Additional Instructions (by unit).S Unit CMPLTDP RCPSP RCPDP RSQRSP RSQRDP SPDP ABSSP ABSDP CMPGTSP CMPEQSP CMPLTSP CMPGTDP CMPEQDP.M Unit MPYI MPYID MPYSP MPYDP.L Unit INTSP INTSPU SPINT SPTRUNC SUBSP SUBDP ADDDP ADDSP DPINT DPSP INTDP INTDPU.D Unit ADDADLDDW Note: Refer to the 'C6000 CPU Reference Guide for more details. ‘C67x

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 94 TMS320C6211/6711 Memory Map

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 95 ‘C6211 Memory Map Byte Address FFFF_FFFF 0000_ K x 8 Internal (L2 cache) Internal Memory  Unified (data or prog)  4 blocks - each can be RAM or cache On-chip Peripherals 0180_0000 External Memory   Async (SRAM, ROM, etc.)   Sync (SBSRAM, SDRAM) 256M x 8 External _ _0000 A000_0000 B000_ Level 1 Cache   4KB Program   4KB Data   Not in mapCPU L264K 4KP 4KD

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 96 TMS320C6211/6711 Peripherals

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 97 'C6x System Block Diagram PERIPHERALSPERIPHERALSPERIPHERALSPERIPHERALS Memory Internal Buses ExternalMemory CPU.D1.M1.L1.S1.D2.M2.L2.S2 Regs (B0-B15) Regs (A0-A15) Control Regs

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 98 ‘C6x Internal Buses A D Internal Memory x32 A D External Interface A D x32 Peripherals can perform 64-bit data loads. ‘C67x Data Addr- T1 x32 Data Data- T1 x32/64 Data Addr- T2x32 Data Data- T2 x32/64 Aregs Bregs Program Addrx32 Program Datax256 PC DMA Addr- Readx32 DMA Data- Readx32 DMA Addr- Writex32 DMA Data- Writex32 DMA

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 99 PERIPHERALSPERIPHERALSPERIPHERALSPERIPHERALS Memory 'C6x System Block Diagram Internal Buses CPU.D1.M1.L1.S1.D2.M2.L2.S2 Regs (B0-B15) Regs (A0-A15) Control Regs EMIF Ext’lMemory

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 100 ‘C M x 8 External 0 Int’l Prog (64K instr) Int’l Data ( 128K bytes ) On-chip Peripherals 4M x 8 External 1 16M x 8 External K x 8 Internal (L2 cache) On-chip Peripherals 256M x 8 External ‘C6211 ‘C6201/11 Memory Maps

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 101 PERIPHERALSPERIPHERALSPERIPHERALSPERIPHERALS 'C6x System Block Diagram Internal Buses CPU.D1.M1.L1.S1.D2.M2.L2.S2 Regs (B0-B15) Regs (A0-A15) Control Regs EMIF Ext’lMemory - Sync - Async Program RAM Data Ram Addr D (32)

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 102 'C6x Peripherals ‘C6xCPU‘C6xCPU EMIFEMIF DMADMA BootBootExternalMemory EMIF (External Memory Interface) - Glueless access to async/sync memory EPROM, SRAM, SDRAM, SBSRAM DMA/EDMA (Enhance Direct Memory Acces) - 4/16 Channels BOOT - Boot from 4M external block - Boot from HPI/XB McBSPMcBSPHPI/XBHPI/XB TimerTimer PLLPLL McBSP (Multi-Channel Buffered Serial Port) - High speed sync serial comm - T1/E1/MVIP interface HPI (Host Port Interface) /Expansion Bus (XB) - 16/32-bit host  P access Timer/Counters - Two 32-bit Timer/Counters

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 103 Clocking - Basic Definitions What is a “clock cycle”? ‘C6x ‘C6x CLKIN CLKOUT2 (1/2 CLKOUT1) CLKOUT1 (‘C6x clock cycle) PLL x1 x4 CLKIN - MHz PLL CLKOUT2 - MHz MIPs (max) 250x x x x CLKOUT1 - MHz 250 (4ns) 200 (5ns) (10ns) When we talk about cycles...

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 104 'C6x System Block Diagram (Final) Internal Buses CPU.D1.M1.L1.S1.D2.M2.L2.S2 Regs (B0-B15) Regs (A0-A15) Control Regs EMIF Ext’lMemory - Sync - Async Program RAM Data Ram D (32) Serial Port Host Port Boot Load Timers Pwr Down DMA Addr

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 105 Program Memory Data Memory L2 Memory ’C6201B 64KB 1blkPgm/Cache 64KB 2blks 4 banksea External ’C KB 1blkPgm/Cache 64KB 2blks 8 banksea External ’C KB 1blkPgm/Cache 1blk MappedPgm 128 KB 2blks 4 banksea External ’C6211/C KB 1blk Cache 4 KB 1blk Cache 64 KB 4blk Mapped Cache L1 Memory Internal Memory Summary TMS320C67x DSP Generation Parametric Table TMS320C64x DSP Generation Parametric Table TMS320C62x DSP Generation Parametric Table

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 106 ‘C6000 Device Summary DeviceMIPSMHz Kbytes pinsmmW$Periphs 6201B D2H D3X E2H TMS320MFLOPS MHz Kbytes pinsmmW$Periphs D2H E2H Peripherals Legend: D,E:DMA,EDMA 2,3:# of McBSPs H,X:HPI, XBUS TMS320C62x DSP Generation Parametric Table TMS320C67x DSP Generation Parametric Table TMS320C64x DSP Generation Parametric Table

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 107 ’C6000 History 6201 r11Q97 coincident ‘C6x architectural announcement. Sample CPU core, minimal peripherals r2 4Q97. Full production, with peripherals. 6201B 4Q98. Power reduced,.18 micron silicon, double ports into internal data memory Q98. Pin-for-pin compatible floating-point version of ‘C GFLOP 167MHz) performance Q MHz. 2-3x 6201 on-chip memory. Replaced HPI with Expansion Bus (32-bit HPI + more) Q99. 2 cents per MIPS! 150MHz as low as $25. Double-level cache, enhanced DMA. 6711Announced 3/1/ floating-point CPU with 6211-like memory/peripherals. Volume pricing under $20.

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 108 ‘C6x Family Part Numbering  Example = TMS320LC6201PKGA200  TMS320= TI DSP  L = Place holder for voltage levels  C6 = C6x family  2 = Fixed-point core  01 = Memory/peripheral configuration  PKG= Pkg designator (actual letters TBD)  A = -40 to 85C (blank for 0 to 70C)  200 = Core CPU speed in Mhz

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 109 Device Summary Table Device Int Mem Ext MemPeripherals 6201/670164K Data3 x 16MDMA 16K Instr1 x 4M2 McBSP HPI (16-bit) 2 Timer/Counters (32-bit) K Data3 x 16MDMA 48K Instr1 x 4M2 McBSP 4 x 256M ----XBus (32-bit) 2 Timer/Counters (32-bit) 62114K Data Cache4 x 256MEDMA 4K Prog Cache2 McBSP 64K RAM/CacheHPI (16-bit) 2 Timer/Counters (32-bit)

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 110 Module 1 Exam 1. Functional Units a. How many can perform an ADD? Name them. a. How many can perform an ADD? Name them. b. Which support memory loads/stores? b. Which support memory loads/stores?.M.S.D.L six;.L1,.L2,.D1,.D2,.S1,.S2 2. Memory Map a. How many external ranges exist on ‘C6201? Four

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide Conditional Code a.Which registers can be used as cond’l registers? b.Which instructions can be conditional? 4. Performance a.What is the 'C6711 instruction cycle time? b.How can the 'C6711 execute 1200 MIPs? A1, A2, B0, B1, B2 All of them CLKOUT MIPs = 8 instructions (units) x 150 MHz

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide Coding Problems a. Move contents of A0-->A1 a. Move contents of A0-->A1 MV.L1A0, A1 orADD.S1A0, 0, A1 orMPY.M1A0, 1, A1 (what’s the problem with this?)

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide Coding Problems a. Move contents of A0-->A1 a. Move contents of A0-->A1 b. Move contents of CSR-->A1 b. Move contents of CSR-->A1 c. Clear register A5 MV.L1A0, A1 orADD.S1A0, 0, A1 orMPY.M1A0, 1, A1 ZERO.S1A5 orSUB.L1A5, A5, A5 orMPY.M1A5, 0, A5 orCLR.S1A5, 0, 31, A5 orMVK.S10, A5 orXOR.L1A5,A5,A5 (A0 can only be a (A0 can only be a 16-bit value) MVCCSR, A1

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide Coding Problems (cont’d) d. A2 = A0 2 + A1 d. A2 = A0 2 + A1 e. If (B1  0) then B2 = B5 * B6 e. If (B1  0) then B2 = B5 * B6 f. A2 = A0 * A g. Load an unsigned constant (19ABCh) into register A6. MPY.M1A0, A0, A2 ADD.L1A2, A1, A2 [B1] MPY.M2B5, B6, B2 mvkl.s10x00019abc,a6 mvkh.s10x00019abc,a6 value.equ0x00019abc mvkl.s1value,a6 mvkh.s1value,a6 MPYA0, A1, A2 ADD10, A2, A2

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide Coding Problems (cont’d) h.Load A7 with contents of mem1 and post- increment the selected pointer. h.Load A7 with contents of mem1 and post- increment the selected pointer. x16 mem mem1 10h A7 load_mem1:MVKL.S1mem1,A6 load_mem1:MVKL.S1mem1,A6 MVKH.S1mem1,A6 LDH.D1*A6++,A7

Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2, Slide 116Architecture  Links:  C6711 data sheet: tms320c6711.pdf tms320c6711.pdf  User guide: spru189f.pdf spru189f.pdf  Errata: sprz173c.pdf sprz173c.pdf

Chapter 2 TMS320C6000 Architectural Overview - End -