An introduction to Digital Signal Processors (DSP) Using the C55xx family.

Slides:



Advertisements
Similar presentations
DSPs Vs General Purpose Microprocessors
Advertisements

Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
Instruction Set Design
CPU Review and Programming Models CT101 – Computing Systems.
INSTRUCTION SET ARCHITECTURES
Processor Architecture Needed to handle FFT algoarithm M. Smith.
1 KU College of Engineering Elec 204: Digital Systems Design Lecture 9 Programmable Configurations Read Only Memory (ROM) – –a fixed array of AND gates.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
1 Lecture 2: Review of Computer Organization Operating System Spring 2007.
Processor Technology and Architecture
COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.
Introduction and Motivation Microcontrollers vs. microprocessors uC: A complete computer system optimized for h/w control that encapsulates processor,
CSCE 121, Sec 200, 507, 508 Fall 2010 Prof. Jennifer L. Welch.
Computer System Overview
Alyssa Concha Microprocessors Final Project ADSP – SHARC Digital Signal Processor.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
ECEN4002 Spring 2002DSP Lab Intro R. C. Maher1 A Short Introduction to DSP Microprocessor Architecture R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2002.
Getting the O in I/O to work on a typical microcontroller Activating a FLASH memory “output line” Part 1 Main part of Laboratory 1 Also needed for “voice.
GCSE Computing - The CPU
Unit-1 PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE Advance Processor.
An introduction to Digital Signal Processors (DSP) Using the C55xx family.
Inside The CPU. Buses There are 3 Types of Buses There are 3 Types of Buses Address bus Address bus –between CPU and Main Memory –Carries address of where.
Binary Arithmetic Math For Computers.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Input/Output devices Part 3: Programmable I/O and DSP's dr.ir. A.C. Verschueren.
Computer Organization
Real time DSP Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria.
Processor Structure & Operations of an Accumulator Machine
Basics and Architectures
Processor Architecture Needed to handle FFT algoarithm M. Smith.
Machine Instruction Characteristics
Operating Systems and Networks AE4B33OSS Introduction.
Implementation of a Stored Program Computer ITCS 3181 Logic and Computer Systems 2014 B. Wilkinson Slides2.ppt Modification date: Oct 16,
Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.
MICROCOMPUTER ARCHITECTURE 1.  2.1 Basic Blocks of a Microcomputer  2.2 Typical Microcomputer Architecture  2.3 Single-Chip Microprocessor  2.4 Program.
IT253: Computer Organization
DSP Processors We have seen that the Multiply and Accumulate (MAC) operation is very prevalent in DSP computation computation of energy MA filters AR filters.
Chapter 4 MARIE: An Introduction to a Simple Computer.
Introduction to Microprocessors
Computer Architecture Lecture 32 Fasih ur Rehman.
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
DIGITAL SIGNAL PROCESSORS. Von Neumann Architecture Computers to be programmed by codes residing in memory. Single Memory to store data and program.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
ALU (Continued) Computer Architecture (Fall 2006).
Computer and Information Sciences College / Computer Science Department CS 206 D Computer Organization and Assembly Language.
Chapter 10 Instruction Sets: Characteristics and Functions Felipe Navarro Luis Gomez Collin Brown.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
What is a program? A sequence of steps
Copyright © 2007 by Curt Hill Interrupts How the system responds.
1 Basic Processor Architecture. 2 Building Blocks of Processor Systems CPU.
An introduction to Digital Signal Processors (DSP)
GCSE Computing - The CPU
Microprocessor and Microcontroller Fundamentals
Embedded Systems Design
An introduction to Digital Signal Processors (DSP)
Digital Signal Processors
Subject Name: Digital Signal Processing Algorithms & Architecture
Subject Name: Digital Signal Processing Algorithms & Architecture
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM
EE 445S Real-Time Digital Signal Processing Lab Spring 2014
Morgan Kaufmann Publishers Computer Organization and Assembly Language
Real-time 1-input 1-output DSP systems
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
An introduction to Digital Signal Processors (DSP)
An introduction to Digital Signal Processors (DSP)
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
GCSE Computing - The CPU
ADSP 21065L.
Computer Architecture Assembly Language
Presentation transcript:

An introduction to Digital Signal Processors (DSP) Using the C55xx family

There are different kinds of embedded processors There are a fair number of different kinds of microprocessors used in embedded systems – Microcontrollers Small, fairly simple devices. Non-volatile storage. Generally a fair bit of basic I/O (GPIO, SPI, etc.) – “Processor” More-or-less a desktop processor with favorable power numbers. Atom, ARM A8, etc. – System on a Chip Generally more CPU power than a microcontroller, but has lots of “add-ons” including perhaps analog I/O and specialized devices (Ethernet controller, LCD controller, FPGA) etc.

Digital Signal Processor (DSP) DSP chips are optimized for high performance/low power on very specific types of computation. – Price: C5515 hits 100MHz – or 120) – or 75) – Tasks: Filtering, FFT are the big ones.

Fixed point vs. floating point It’s not unfair to break DSPs into two camps – Floating point – No floating point (Fixed point) Floating point – makes things a lot easier for the programmer. Fixed point – A good DSP programmer can often get better power numbers with fixed point. But can be a ton of work.

Basic fixed point “Qn” is a naming scheme used to describe fixed point numbers. – n specifies the digit which is the last before the radix point. So a normal integer is Q0. Examples – 0110 is 6 in binary – 0110 as a Q2 is 1.5 Numbers are generally 2’s complement – 1100 is -4. – 1100 as Q3 is -0.5

Factoids Signed x-bit Q x-1 numbers represent values from -1 to (almost) 1. – This is the form typically used because two numbers in that range multiplied by each other are still in that range. Multiplying two 16-bit Q15 numbers yields?

And this is important…

Lowpass filter template 9

FIR filter Basic idea is to take an input, x, but it into a big (and wide) shift register. – Multiply each of the x values (old and new) by some constant. Sum up those product terms. Example: – Say b 0 =.5, b 1 =.75, and b 2 =.25 – x is 1, -1, 0, 1, -1, 0 etc. forever. What is the output?

Consider a traditional RISC CPU For reasonably large filter, b y doesn’t fit in the register file. top: LD x++ LD b++ MULT a,x,b ADD accum, accum, a goto top (++ indicates auto increment) – That’s a lot of instructions Plus we need to shift the x values around. – Also a loop… Depending on how you count it, could be 8-10 instructions per Z -1 block…

Some FIR “tricks” Most obvious is to use a circular buffer for the x values. The problem with this is that you need more instructions to see if you’ve fallen off the end of the buffer and need to wrap around… – And it’s a branch, which is mildly annoying due to predictors etc

A slightly different version Int16 FIR(Uint16 i) { Int32 sum; Uint16 j, index; sum=0; //The actual filter work for(j=0; j<LPL; j++) { index = ASIZE + i - j; if(i>=j) index = i - j; else index = ASIZE + i - j; sum += (Int32)in[index] * (Int32)LP[j]; } sum = sum + 0x ; // So we round rather than truncate. return (Int16) (sum >> 15); // Conversion from 32 Q30 to 16 Q15. } X B  This part is icky

How fast could one do it? Well, I suppose we could try one instruction. – MAC y, x++, z++ That’s got lots of problems. – No register use for the arrays so very heavy memory use 2 data elements from memory/cache 3 register file changes (pointers, accumulator) – Plus we need to do a MAC and mults are already slow—hurts clock period. – Plus we need to worry about wrapping around in the circular buffer. – Oh yeah, we need to know when to stop.

Data I need a lot of ports to memory – Instruction fetch – 2 data elements I need a lot of ports to the register file – Or at least banked registers

C55xx Data buses

C55xx Data buses (cont.) Twelve independent buses: – Three data read buses – Two data write buses – Five data address buses – One program read bus – One program address bus So yeah, we can move data – Registers appear to go on the same buses. Registers are memory mapped…

OK, so data seems doable Well sort of, still worried about updating pointers. – 2 data reads, 1 data write, need to update 2 pointers, running out of buses.

MAC? Most CPUs don’t have a Multiply and accumulate instruction – Too slow. – Hurts clock period So unless we use the MAC a LOT it hurts. But for a DSP this is our bread and butter. – So we’ll take the 10% clock period hit or whatever so we don’t have to use two separate instructions.

Wrapping around? Seems possible. – Imagine a fairly smart memory. You can tell it the start address, end-of-buffer address and start-of-buffer address. It knows enough to be able to generate the next address, even with wrap around. – This also takes care of our pointer problem

Circular Buffer Start Address Registers (BSA01, BSA23, BSA45, BSA67, BSAC) The CPU includes five 16-bit circular buffer start address registers Each buffer start address register is associated with a particular pointer A buffer start address is added to the pointer only when the pointer is configured for circular addressing in status register ST2_55.

Circular Buffer Size Registers (BK03, BK47, BKC) Three 16-bit circular buffer size registers specify the number of words (up to 65535) in a circular buffer. Each buffer size register is associated with particular pointers In the TMS320C54x-compatible mode (C54CM = 1), BK03 is used for all the auxiliary registers, and BK47 is not used.

By the way… If we know the start and end of the buffer – We know the length of the loop. Pretty much down to one instruction once we get going. – The TI optimized FIR filter takes 25 cycles to set things up and then takes 1 cycle per MAC.

IIR filters—more of the same

FFTs Another common thing we want to do is an “FFT” – Tells you about the frequency parts of a signal Breaks down the signal into “sin bins” Useful in a lot of applications

Discrete Fourier Transform (DFT) The DFT is commonly written as: One might also use

“The” Fast Fourier Transform (FFT) Algorithm There are many fast algorithms (FFTs) that can be used to compute the Discrete Fourier Transform (DFT). – Since the DFT is defined as: – How many MACs do we need? Real or complex? Any algorithm which reduces this can be said to be “fast”

Recall W N = e -j2π/N

FFT support FFTs typically take an array in “normal” order and return the output in “bit reversed” order. – Or the other way around (as on prev. page) Hardware often able to swap the order of the address bits – makes it (much) faster to deal with the bit- reversed data.

And a bit more Other support? – Verterbi is an algorithm commonly used for error correct/communication. Provide special instructions for it – Mainly data movement, pointer, and compare instructions. Overflow is a constant worry in filters – TI’s accumulators provide 4 guard bits for detection. That’s unheard of in a mainstream processor.