UNIT - VIII. DSP Introduction Digital Signal Processing: ◦ Application of mathematical operations to digitally represented signals Signals represented.

Slides:



Advertisements
Similar presentations
DSPs Vs General Purpose Microprocessors
Advertisements

CPU Review and Programming Models CT101 – Computing Systems.
Microprocessors. Von Neumann architecture Data and instructions in single read/write memory Contents of memory addressable by location, independent of.
Processor System Architecture
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Unit-1 PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE Advance Processor.
Chapter 17 Microprocessor Fundamentals William Kleitz Digital Electronics with VHDL, Quartus® II Version Copyright ©2006 by Pearson Education, Inc. Upper.
Computers Central Processor Unit. Basic Computer System MAIN MEMORY ALUCNTL..... BUS CONTROLLER Processor I/O moduleInterconnections BUS Memory.
Ehsan Shams Saeed Sharifi Tehrani. What is DSP ? Digital Signal Processing (DSP) is used in a wide variety of applications, and it is hard to find a good.
1 © Unitec New Zealand Embedded Hardware ETEC 6416 Date: - 10 Aug,2011.
Computer Architecture And Organization UNIT-II General System Architecture.
Introduction to Microprocessors
DIGITAL SIGNAL PROCESSORS. Von Neumann Architecture Computers to be programmed by codes residing in memory. Single Memory to store data and program.
MICROOCESSORS AND MICROCONTROLLER:
Stored Program A stored-program digital computer is one that keeps its programmed instructions, as well as its data, in read-write,
Computer and Information Sciences College / Computer Science Department CS 206 D Computer Organization and Assembly Language.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Computer operation is of how the different parts of a computer system work together to perform a task.
BASIC COMPUTER ARCHITECTURE HOW COMPUTER SYSTEMS WORK.
1 Chapter 1 Basic Structures Of Computers. Computer : Introduction A computer is an electronic machine,devised for performing calculations and controlling.
What’s going on here? Can you think of a generic way to describe both of these?
1 Computer System Overview Chapter 1. 2 Operating System Exploits the hardware resources of one or more processors Provides a set of services to system.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 251 Introduction to Computer Organization.
CPU Lesson 2.
GCSE Computing - The CPU
Computer Organization and Architecture Lecture 1 : Introduction
Unit Microprocessor.
Seminar On 8085 microprocessor
Basic Computer Organization and Design
COURSE OUTCOMES OF Microprocessor and programming
Microprocessor and Microcontroller Fundamentals
Assembly language.
The 8085 Microprocessor Architecture
Atmega32 Architectural Overview
Basic Processor Structure/design
Part of the Assembler Language Programmers Toolbox
Control Unit Lecture 6.
COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE
UNIT – Microcontroller.
Introduction to microprocessor (Continued) Unit 1 Lecture 2
DSP Based Electromechanical Motion Control
Chap 7. Register Transfers and Datapaths
Embedded Systems Design
Introduction of microprocessor
Dr. Michael Nasief Lecture 2
INTRODUCTION TO MICROPROCESSORS
Microcomputer Architecture
Instructions at the Lowest Level
Teaching Computing to GCSE
An Introduction to Microprocessor Architecture using intel 8085 as a classic processor
Processor Organization and Architecture
Digital Signal Processors
Subject Name: Digital Signal Processing Algorithms & Architecture
Introduction to Microprocessors and Microcontrollers
Number Representations and Basic Processor Architecture
Subject Name: Digital Signal Processing Algorithms & Architecture
Microprocessor & Assembly Language
CISC AND RISC SYSTEM Based on instruction set, we broadly classify Computer/microprocessor/microcontroller into CISC and RISC. CISC SYSTEM: COMPLEX INSTRUCTION.
Morgan Kaufmann Publishers Computer Organization and Assembly Language
Md. Mojahidul Islam Lecturer Dept. of Computer Science & Engineering
Md. Mojahidul Islam Lecturer Dept. of Computer Science & Engineering
Introduction to Microprocessor Programming
Computer Architecture
GCSE Computing - The CPU
ADSP 21065L.
Register sets The register section/array consists completely of circuitry used to temporarily store data or program codes until they are sent to the.
Computer Architecture Assembly Language
Computer Operation 6/22/2019.
Chapter 4 The Von Neumann Model
Presentation transcript:

UNIT - VIII

DSP Introduction Digital Signal Processing: ◦ Application of mathematical operations to digitally represented signals Signals represented digitally as sequences of samples Digital Signal Processor (DSP): ◦ Electronic system that processes digital signals

DSP tasks Most DSP tasks require: ◦ Repetitive numeric computations ◦ Real-time processing ◦ High memory ◦ System Flexibility DSPs must perform these tasks efficiently while minimizing: ◦ Cost ◦ Power ◦ Memory use ◦ Development time

Programmable Digital Signal Processors Low power requirement Low cost Real time I/O capability Availability of High speed on-chip memories

Advantages of DSP’s over Analog Circuits Can implement complex linear or nonlinear algorithms. Can modify easily by changing software. Reduced parts count makes fabrication easier High reliability

Difference between DSPs and Other Microprocessors Over the past few years it is seen that general purpose computers are capable of performing two major tasks. (1) Data Manipulation, and (2) Mathematical Calculations

Difference between DSPs and Other Microprocessors Data manipulation involves ◦ Storing ◦ Organizing ◦ Retrieving and ◦ Sorting of information. While mathematics ◦ occasionally used ◦ does not significantly affect the overall execution speed In comparison to this, ◦ the execution speed of most of the DSP algorithms is limited almost completely by the number of multiplications and additions required.

DSP Applications Digital cellular phones Digital cameras Satellite communication Voice mail Music synthesis Modems RADAR

TMS DSP IC.. TI TMS 320 C5X TI: Texas Instruments make TMX : Experimental device TMP : Prototype TMS : Qualified device C: CMOS Tech with on – chip non- volatile memory as ROM E: CMOS tech with on-chip non – volatile memory as EPROM nothing : NMOS tech with on-chip non – volatile memory as ROM 5 : Generation X : Version number- 0,1,2,3,4x,5,6,7

TMS DSP Types… Fixed Point DSPs  TMS320C5x & 54x  16-bit DSPs Floating Point DSPs  TMS320C3x, 4x & 6x  16 & 32-bit DSPs Multiprocessor DSPs  TMS320C8x

EVOLUTION OF TMS320 FAMILY

MULTIPLIER and MULTIPLIER ACCUMULATOR(MAC) Most common operation in DSP applications – Array Multiplication ◦ Eg: Convolution and Correlation A single dedicated MAC unit – Motorola DSP processor Texas Instruments - TMS320C5x ◦ Separate Multiplier and Accumulator

IMPLEMENTATION OF COLVOLVER WITH SINGLE MULTIPLIER/ADDER

MAC in P-DSPs TMS320C5x Special Instruction – MACD Multiply Accumulate with data move MACD pgm, dma ◦ pgm – program memory ◦ dma – data memory

MACD MAC operation with data move requires four memory accesses per instruction cycle ◦ Fetch the MACD instruction from the program memory ◦ Fetch one of the operands from the program memory ◦ Fetch the second operand from the data memory ◦ Write the content of the data memory with address dmadd in to the location with the address dmadd+ 1

Von Neumann architecture MACD instruction to be executed requires four clock cycles Single address bus and single data bus

Von Neumann architecture

Harvard Architecture Reduced number of clock cycles required for the memory access  Using more than one bus for both address and memory

Harvard Architecture

Von Neumann Vs Harvard

First development Von Neumann architecture ◦ Developed from a research paper written by John von Neumann and others in 1946 Harvard architecture ◦ Built by IBM in 1944 at Harvard university

Multiple access memory The number of memory access per clock period can also be increased by using a high speed memory ◦ DARAM – Dual Access RAM ◦ Two memory accesses/clock period Multiple access RAM may be connected to the processing unit of the P-DSP by using the Harvard architecture

Multiported Memory Another technique to increase the number of memory accesses per clock period Dual port memory ◦ Two memory accesses per clock period Need for storing the program and data in two different memory chips to permit simultaneous access to both program and data memory

Multiported Memory DUAL PORT MEMORY Address Bus 1 Address Bus 2 Data Bus 1 Data Bus 2 Limitation: Increase in the cost compared to two single port memory of the same capacity Because of the increased number of pins and larger chip area

VLIW architecture Very Long Instruction Word TMS 320 C6x More number of ALUs, MAC units, Shifters, etc VLIW architectures execute multiple instructions/cycle and use simple, regular instruction sets More parallelism, higher performance

VLIW architecture PROGRAM CONTROL UNITPROGRAM CONTROL UNIT Multiported Register File Read / Write Cross bar Functional Unit 1 ……… Functional Unit n Instruction Cache

PIPELINING Instruction cycle Micro Instructions ◦ Fetch phase ◦ Decode phase ◦ Memory read phase ◦ Execution phase Each phase may be carried out separately by different functional units

PIPELINING Value of T FetchDecodeReadExecute 1 I1I1 2 I1I1 3 I1I1 4 I1I1 5 I2I2 6 I2I2 7 I2I2 8 I2I2 9 I3I3 10 I3I3 11 I3I3 12 I3I3 INSTRUCTION CYCLES OF PROCESSOR WITH NO PIPELINING

PIPELINING Value of T FetchDecodeReadExecute 1 I1I1 2 I2I2 I1I1 3 I3I3 I2I2 I1I1 4 I4I4 I3I3 I2I2 I1I1 5 I5I5 I4I4 I3I3 I2I2 6 I6I6 I5I5 I4I4 I3I3 7 I7I7 I6I6 I5I5 I4I4 8 I8I8 I7I7 I6I6 I5I5 9 I9I9 I8I8 I7I7 I6I6 10 I9I9 I8I8 I7I7 11 I9I9 I8I8 12 I9I9 INSTRUCTION CYCLES OF PROCESSOR WITH PIPELINING

PIPELINING The number of instructions that are processed simultaneously in the CPU is referred to as depth of the instruction pipeline Instruction pipeline depth of some P-DSPs P-DSP Name / FamilyPipeline Depth Analog devices2 Motorola DSP560x3 TI TMS320C5x4 TI TMS320C54x5

SPECIAL ADDRESSING MODES Short Immediate Addressing ◦ In short immediate instructions, the operand is contained within the instruction machine code. ◦ In this example, the lower 8 bits are the operand and will be added to the ACC by the Central ALU. ◦ Length of short constant depends on  instruction type and  P-DSP

SPECIAL ADDRESSING MODES Short Direct Addressing ◦ In TI TMS 320 DSPs, the higher9-bits are stored in the data page pointer ◦ Only the lower 7-bits are specified as a part of the instruction Bits 15 through 8 contain the opcode Bit 7, with a value of 0, defines the addressing mode as direct Bits 6 through 0 contain the DMA

SPECIAL ADDRESSING MODES INDIRECT Addressing ◦ Permits an array of data to be processed in P-DSP to be efficiently fetched and stored ◦ The address of the operands can be stored in one of the registers called Indirect Address Registers ◦ In TI DSPs  Indirect Address Registers are called as Auxiliary Registers (ARS)

SPECIAL ADDRESSING MODES INDIRECT Addressing…. The ’C5x provides four indirect addressing options: ◦ No increment or decrement ◦ Increment by one ◦ Decrement by one. ◦ Increment or decrement by a value present in the Offset Register ◦ In TI DSP s Offset Register is called as INDX Register

SPECIAL ADDRESSING MODES Memory – mapped Addressing The CPU Registers and the I/O Registers of the P-DSPs are also accessible as memory location Storing them in either starting or final page of the memory space In TI DSPs Page 0 corresponds to the CPU Registers and I/O Registers When these registers are accessed using Memory-mapped addressing modes, the 9 MSBs of the address are forced to 0 This allows you to address the memory-mapped registers of data page 0 directly without affecting the current data page pointer value

SPECIAL ADDRESSING MODES Bit Reversed Addressing ◦ An auxiliary register points to the physical location of a data value. ◦ When we add INDX to the current AR using bit reversed addressing, addresses are generated in a bit-reversed fashion. ◦ Eg., FFT Circular Addressing ◦ Real time processing of signals ◦ Memory is organized as a circular buffer ◦ Beginning and Ending address defined by the programmer ◦ In this addressing mode, when the address pointer is incremented, it checks for the ending memory address of the circular buffer. ◦ If it exceeds that the address will be made equal to the beginning address of the circular buffer

ON-CHIP Peripherals P-DSPs have a number of ON-CHIP peripherals that relieve the CPU from routine functions

ON-CHIP TIMER ◦ Generation of periodic interrupts to P-DSPs ◦ Generation of sampling clocks for A/D converters ◦ Can be programmed by P-DSPs ◦ Can generate single pulse or pulse train ◦ Can generate square wave or periodic square wave ◦ The timer period is programmable

SERIAL PORT ◦ Data communication between the P-DSP and an external peripheral such as  A/D Converter  D/A Converter  RS 232 C Devices ◦ Input and Output Buffers(Write / Read) ◦ Sends and receive data to and from peripherals ◦ Synchronous mode (Tx/Rx Data lines) ◦ Asynchronous mode (Bit clock and Frame sync) ◦ Generate Interrupts  Output buffer empty, Input buffer full

TDM Serial Port Permits P-DSP to communicate with other devices or P-DSPs by using Time Division Multiplexing TDM Frame with 8 time slots

TDM Serial Port The TDM serial port normally uses 4 lines for serial communication TFRM: The frame sync signal TClock: The bit clock TADD: The address of serial device that outputs the data in a particular TDM slot TDAT: The data transmitted into the TDM channel by the authorized device

TDM Serial Port Interconnecting 8 devices using TDM serial using 4-bit bus

Parallel Port Communication between the P-DSP and other devices becomes faster compared to serial communication One approach - Data bus Separate lines – for parallel ports including the handshaking signals

BIT I/O PORTS Single bit wide Individually set, reset or read Normally used for control purposes Can also be used for data transfer No handshaking signals Used for conditional branching

HOST PORT Special Parallel port 8-bit or 16-bit wide Communicate with a microprocessor or PC called HOST Generate Interrupts

COMM PORTS Parallel ports used for inter-process communication between a number of identical P-DSPs in a multiprocessor systems 8-bit wide Data to be processed may be 32-bit or more Splitting in streams of 8-bits Assemble the 8-bits into words of 32-bits

ON CHIP A/D & D/A CONVERTERS Used towards voice applications such as ◦ Cellular telephones ◦ Tapeless answering machines

P-DSPs with RISC and CISC Arguments Advanced for RISC ◦ Small, heavily optimized instruction set executable in single short cycle ◦ All instructions same size ◦ No microcode = faster execution ◦ High Level Language support ◦ Better compiler target ◦ Simple enough for academic designs, class projects

P-DSPs with RISC and CISC Arguments Advanced for CISC ◦ Fewer instructions per task ◦ Shorter programs ◦ Hardware implementation of complex instructions faster than software ◦ HLL support ◦ Extra addressing modes help compiler

Characteristics of some of the TMS320 family DSP chips

INTERNAL ARCHITECHTURE OF TMS320C5X

Architecture of TMS320 C5x DSP Advanced Harvard Architecture Separate memory bus structures for program and data Have instructions that enable data transfer between the program and data memory area

BUS STRUCTURE Separate program and data buses Simultaneous access to program instructions and data High degree of parallelism Control mechanisms ◦ Manage interrupts ◦ Repeated operations ◦ Function calling

BUS STRUCTURE Program Bus (PB) ◦ Carries instruction code and immediate operands from program memory to CPU Program Address Bus(PAB) ◦ Provides addresses to program memory for both reads and writes Data Read Bus(DB) ◦ Interconnects various elements of CPU to data memory Data Read Address Bus(DAB) ◦ Provides the address to access the data memory space

Central Arithmetic Logic Unit (CALU) The CALU components: 16-bit X 16-bit Parallel Multiplier ◦ The C5x Processor performs 16x16 multiplication of numbers in 2’s complement form 32-bit Arithmetic Logic Unit (ALU) 32-bit Accumulator (ACC) ◦ One of the operands for ALU operation comes from ACC ◦ The results of operations performed in CALU are stored in ACC ◦ Higher order or lower order words can be loaded from ACC

CALU contd…. 32-bit Accumulator Buffer (ACCB) ◦ Used for temporary storage of ACC 32-bit Product Register (PREG) ◦ Holds the result of multiplication ◦ 16-bit Temporary Register 0 (TREG 0) holds the multiplicand 0- to 16-bit left barrel shifter and right barrel shifter ◦ Permit the contents of the memory to be left or right shifted by 0 to 16 bits before they are fed to ALU or stored from ALU to memory ◦ For example take a 4-bit barrel shifter, with inputs A, B, C and D. The shifter can cycle the order of the bits ABCD as DABC, or CDAB

The CPU registers ACC and PREG can also be shifted using these shifters A 5-bit register TREG 1 specifies the number of bits by which the scaling factor should shift CALU contd….

AUXILIARY REGISTER ALU (ARAU) Consists of ◦ Eight 16-bit auxiliary registers (ARs) AR0-AR7 ◦ A 3-bit Auxiliary Register Pointer (ARP) ◦ An unsigned 16-bit ALU Basically used for indirect addressing mode operations

The Auxiliary Registers AR0-AR7 may also be used as the general purpose registers for holding the operands for arithmetic and logic operations in CALU Some of the registers of ARAU are 16-bit Index Register (INDX) ◦ Used by ARAU as a step value(±1) to modify the address in the ARs during indirect addressing. ◦ It can also map the dimension of the address block used for bit-reversal addressing ARAU contd….

Auxiliary Register Compare Register (ARCR) The 16-bit ARCR is used for address boundary comparison Block Move Address Register (BMAR) The 16-bit BMAR holds an address value to be used with block moves and MAC operations This register provides the 16-bit indirect address for an indirect –addressed second operand ARAU contd….

Block Repeat Registers ◦ Repeat Counter Register (RPTC) ◦ Holds the repeat count in a repeat single-instruction operation and is loaded by RPT and RPTZ instructions ◦ Block Repeat Counter Register(BRCR) ◦ Holds the count value for the block repeat feature ◦ This value is loaded before the block repeat operation is initiated ◦ Block Repeat Program Address Start Register (PASR) ◦ Indicates the 16-bit address where the repeated block of code starts ◦ Block Repeat Program Address End Register (PAER) ◦ Indicates the 16-bit address where the repeated block of code ends ARAU contd….

Parallel Logic Unit (PLU) Performs Boolean operations Allows logical operations to be performed on data memory values directly without affecting the contents of ACC and PREG Results of PLU function are written back to the original data memory location

Memory Mapped Registers The ‘C5X has 96 registers mapped into page 0 of the data memory space. ‘C5X DSPs have 28 CPU registers and 16 I/O port registers and also different numbers of peripheral and reserved registers Memory mapped registers can be written to and read from in the same way as any other data memory location

Program Controller This contains logic circuitry that ◦ Decodes the instructions ◦ Manages the CPU pipeline ◦ Stores the status of CPU operations & ◦ Decodes the conditional operations

Elements of Program Controller 16-bit Program Counter (PC) 16-bit Status Registers (ST0,ST1) Instruction Register Interrupt Flag Register Interrupt Mask Register

Flags in ST0

FLAGS IN ST0 ARP – Auxiliary Register Pointer (ARB) OV – Overflow flag bit (ALU) ◦ Arithmetic Operation OVM – Overflow Mode bit ◦ ACC overflow saturation mode INTM – Interrupt Mode bit ◦ Globally masks or enables all interrupts DP – Data Memory Page Pointer

Flags in ST1

FLAGS in ST 1 ARB – Auxiliary Register Buffer CNF – On-chip RAM Configuration control bit (DARAM B0) ◦ CNF = 0 (DM) ◦ CNF = 1 (PM) TC – Test / Control Flag Bit ◦ Conditional Branch, call instructions

FLAGS in ST 1 SXM – Sign Extension Mode Bit Enables / disables sign extension of an arithmetic operation HM – Hold Mode Bit CPU stops or continues execution XF – External Flag output pin Reset = 1 PM – Product shift mode bit

ON-CHIP Memory ‘C5X architecture contains a considerable amount of ON-CHIP memory to aid in system performance and integration Program Read-Only Memory (ROM) Data/Program Dual-Access RAM (DARAM) Data/Program Single-Access RAM (SARAM)

Program ROM 16-bit ON-CHIP Programmable ROM This memory is used for booting program code from slower external ROM or EPROM to fast ON-CHIP or external RAM Once the custom program has been booted into RAM, the boot ROM space can be removed from program memory space ON-CHIP Memory…….

Data / Program Dual Access RAM ( DARAM) 1056 words X16-bit ON-CHIP DARAM The DARAM is divided into 3 individually selectable memory blocks: ◦ 512-word data or program DARAM block B0 ◦ 512-word data DARAM block B1 ◦ 32-word data DARAM block B2 Primarily intended to store data values and when needed, can be used to store programs as well The Dual data buses (DB & DAB) allow the CPU to read from and write to DARAM in the same instruction cycle ON-CHIP Memory…….

All ’C5x DSPs except the ’C52 carry a 16-bit on-chip SARAM of various sizes Code can be booted from an off-chip ROM and then executed at full speed, once it is loaded into the on-chip SARAM The SARAM can be configured by software in one of three ways: ◦ All SARAM configured as data memory ◦ All SARAM configured as program memory ◦ SARAM configured as both data memory and program memory Data / Program Single Access RAM ( SARAM) ON-CHIP Memory…….

All ’C5x CPUs support parallel accesses to these SARAM blocks. However, one SARAM block can be accessed only once per machine cycle. In other words, the CPU can read from or write to one SARAM block while accessing another SARAM block. Data / Program Single Access RAM ( SARAM) ON-CHIP Memory…….