Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.

Slides:



Advertisements
Similar presentations
Computer Architecture
Advertisements

Microprocessors A Beginning.
Is There a Real Difference between DSPs and GPUs?
DSPs Vs General Purpose Microprocessors
KeyStone Training Multicore Applications Literature Number: SPRP814
Lecture 6 Programming the TMS320C6x Family of DSPs.
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
Dr. Rabie A. Ramadan Al-Azhar University Lecture 3
CPU Review and Programming Models CT101 – Computing Systems.
Computer Architecture Lecture 7 Compiler Considerations and Optimizations.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Survey of Digital Signal Processors Michael Warner ECD: VLSI Communication Systems.
ECSE DSP architecture Review of basic computer architecture concepts C6000 architecture: VLIW Principle and Scheduling Addressing Assembly and linear.
MICROPROCESSORS TWO TYPES OF MODELS ARE USED :  PROGRAMMER’S MODEL :- THIS MODEL SHOWS FEATURES, SUCH AS INTERNAL REGISTERS, ADDRESS,DATA & CONTROL BUSES.
Survey of Digital Signal Processors Michael Warner ECD: VLSI Communication Systems.
Implementation of the Convolution Operation on General Purpose Processors Ernest Jamro AGH Technical University Kraków, Poland.
Microprocessors. Von Neumann architecture Data and instructions in single read/write memory Contents of memory addressable by location, independent of.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Blackfin ADSP Versus Sharc ADSP-21061
C66x CorePac: Achieving High Performance. Agenda 1.CorePac Architecture 2.Single Instruction Multiple Data (SIMD) 3.Memory Access 4.Pipeline Concept.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
1-1 Microprocessor Engineering Microprocessor Systems Microcontrollers Infineon 16-bit Processor Family  specifically 167CS microcontroller C Programming.
Processor Technology and Architecture
COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.
Data Manipulation Computer System consists of the following parts:
1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.
Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
ECEN4002 Spring 2002DSP Lab Intro R. C. Maher1 A Short Introduction to DSP Microprocessor Architecture R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2002.
Prardiva Mangilipally
(6.1) Central Processing Unit Architecture  Architecture overview  Machine organization – von Neumann  Speeding up CPU operations – multiple registers.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Micro controllers A self-contained system in which a processor, support, memory, and input/output (I/O) are all contained in a single package.
Introduction to Microcontrollers Dr. Konstantinos Tatas
Real time DSP Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria.
Basics and Architectures
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
Develop and Implementation of the Speex Vocoder on the TI C64+ DSP
DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000.
DSP Processors We have seen that the Multiply and Accumulate (MAC) operation is very prevalent in DSP computation computation of energy MA filters AR filters.
RISC Architecture RISC vs CISC Sherwin Chan.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
C66x CorePac: Achieving High Performance. Agenda 1.CorePac Architecture 2.Single Instruction Multiple Data (SIMD) 3.Memory Access 4.Pipeline Concept.
Chapter 2 Data Manipulation. © 2005 Pearson Addison-Wesley. All rights reserved 2-2 Chapter 2: Data Manipulation 2.1 Computer Architecture 2.2 Machine.
8085. Microcomputer Major components of the computer - the processor, the control unit, one or more memory ICs, one or more I/O ICs, and the clock Major.
DIGITAL SIGNAL PROCESSORS. Von Neumann Architecture Computers to be programmed by codes residing in memory. Single Memory to store data and program.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
Computer and Information Sciences College / Computer Science Department CS 206 D Computer Organization and Assembly Language.
Lecture 7: Overview Microprocessors / microcontrollers.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
PROGRAMMABLE LOGIC CONTROLLERS SINGLE CHIP COMPUTER
ECE354 Embedded Systems Introduction C Andras Moritz.
Instruction Level Parallelism
Embedded Systems Design
COMP4211 : Advance Computer Architecture
Digital Signal Processors
The TMS320C6x Family of DSPs
Subject Name: Digital Signal Processing Algorithms & Architecture
Introduction to Digital Signal Processors (DSPs)
Superscalar Processors & VLIW Processors
MICROCOMPUTER ARCHITECTURE
Morgan Kaufmann Publishers Computer Organization and Assembly Language
* From AMD 1996 Publication #18522 Revision E
Overheads for Computers as Components 2nd ed.
Introduction to Microprocessor Programming
Digital Signal Processors-1
Superscalar and VLIW Architectures
Computer Architecture
ADSP 21065L.
Presentation transcript:

Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 2 Outline/objectives Identify the most important DSP processor architecture features and how they relate to DSP applications Understand the types of code appropriate for DSP implementation

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 3 What is a DSP? A specialized microprocessor for real- time DSP applications –Digital filtering (FIR and IIR) –FFT –Convolution, Matrix Multiplication etc

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 4 Hardware used in DSP ASICFPGAGPPDSP PerformanceVery HighHighMediumMedium High FlexibilityVery lowHigh Power consumption Very lowlowMediumLow Medium Development Time LongMediumShort

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 5 Common DSP features Harvard architecture Dedicated single-cycle Multiply-Accumulate (MAC) instruction (hardware MAC units) Single-Instruction Multiple Data (SIMD) Very Large Instruction Word (VLIW) architecture Pipelining Saturation arithmetic Zero overhead looping Hardware circular addressing Cache DMA

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 6 Harvard Architecture Physically separate memories and paths for instruction and data

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 7 Single-Cycle MAC unit Can compute a sum of n- products in n cycles

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 8 Single Instruction - Multiple Data (SIMD) A technique for data-level parallelism by employing a number of processing elements working in parallel

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 9 Very Long Instruction Word (VLIW) A technique for instruction-level parallelism by executing instructions without dependencies (known at compile-time) in parallel Example of a single VLIW instruction: F=a+b; c=e/g; d=x&y; w=z*h;

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 10 CISC vs. RISC vs. VLIW

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 11 Pipelining DSPs commonly feature deep pipelines TMS320C6x processors have 3 pipeline stages with a number of phases (cycles): –Fetch Program Address Generate (PG) Program Address Send (PS) Program ready wait (PW) Program receive (PR) –Decode Dispatch (DP) Decode (DC) –Execute 6 to 10 phases

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 12 Saturation Arithmetic fixed range for operations like addition and multiplication normal overflow and underflow produce the maximum and minimum allowed value, respectively Associativity and distributivity no longer apply 1 signed byte saturation arithmetic examples: = – 5 = -128 ( ) – 25 = 122 ≠ 64 + (70 -25) = 109

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 13 Examples Perform the following operations using one-byte saturation arithmetic 0x77 + 0x99 = 0x4*0x42= 0x3*0x51=

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 14 Zero Overhead Looping Hardware support for loops with a constant number of iterations using hardware loop counters and loop buffers No branching No loop overhead No pipeline stalls or branch prediction No need for loop unrolling

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 15 Hardware Circular Addressing A data structure implementing a fixed length queue of fixed size objects where objects are added to the head of the queue while items are removed from the tail of the queue. Requires at least 2 pointers (head and tail) Extensively used in digital filtering y[n] = a0x[n]+a1x[n-1]+…+akx[n-k]

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 16 Direct Memory Access (DMA) The feature that allows peripherals to access main memory without the intervention of the CPU Typically, the CPU initiates DMA transfer, does other operations while the transfer is in progress, and receives an interrupt from the DMA controller once the operation is complete. Can create cache coherency problems (the data in the cache may be different from the data in the external memory after DMA) Requires a DMA controller

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 17 Cache memory Separate instruction and data L1 caches (Harvard architecture) Cache coherence protocols required, since most systems use DMA

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 18 DSP vs. Microcontroller DSP –Harvard Architecture –VLIW/SIMD (parallel execution units) –No bit level operations –Hardware MACs –DSP applications Microcontroller –Mostly von Neumann Architecture –Single execution unit –Flexible bit-level operations –No hardware MACs –Control applications

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 19 Examples Estimate how long will the following code fragment take to execute on –A general purpose processor with 1 GHz operating frequency, five-stage pipelining and 5 cycles required for multiplication, 1 cycle for addition –A DSP running at 500 MHz, zero overhead looping and 6 independent ALUs and 2 independent single- cycle MAC units? for (i=0; i<8; i++) { a[i] = 2*i + 3; b[i] = 3*i + 5; }

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 20 Review Questions Which of the following code fragments is appropriate for SIMD implementation? a[0]=b[0]+c[0];a[0]=b[0]&c[0]; a[2]=b[2]+c[2];a[0]=b[0]%c[0]; a[4]=b[4]+c[4];a[0]=b[0]+c[0]; a[6]=b[6]+c[6];a[0]=b[0]/c[0]; Can the following instructions be merged into one VLIW instruction? If not in how many? –a=b+c; –d=c/e; –f=d&a; –g=b%c;

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 21 Review Questions Which of the following is not a typical DSP feature? –Dedicated multiplier/MAC –Von Neumann memory architecture –Pipelining –Saturation arithmetic Which implementation would you choose for lowest power consumption? –ASIC –FPGA –General-Purpose Processor –DSP

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 22 Examples How many VLIW instructions does the following program fragment require if there two independent data paths (a,b), with 3 ALUs and 1 MAC available in each and 8 instructions/word? How many cycles will it take to execute if they are the first instructions in the program and all instructions require 1 cycle, assuming the pipelining architecture of slide 10 with 6 phases of execution? ADD a1,a2,a3;a3 = a1+a2 SUB b1,b3,b4;b4 = b1-b3 MUL a2,a3,a5;a5 = a2-a3 MUL b3,b4,b2;b2 = b3*b4 AND a7,a0,a1;a1 = a7 AND a0 MUL a3,a4,a5;a5 = a3*a4 OR a6,a3,a2;a2 = a6 OR a3

ACOE343 - Embedded Real-Time Processor Systems - Frederick University 23 References DR. Chassaing, “DSP Applications using C and the TMS320C6x DSK”, Wiley, 2002 Texas Instruments, TMS320C64x datasheets Analog Devices, ADSP-21xx ProcessorsADSP-21xx Processors