Processor Architecture Needed to handle FFT algoarithm M. Smith.

Slides:



Advertisements
Similar presentations
DFT & FFT Computation.
Advertisements

David Hansen and James Michelussi
Is There a Real Difference between DSPs and GPUs?
DSPs Vs General Purpose Microprocessors
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
Digital Kommunikationselektronik TNE027 Lecture 5 1 Fourier Transforms Discrete Fourier Transform (DFT) Algorithms Fast Fourier Transform (FFT) Algorithms.
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Use of Frequency Domain Telecommunication Channel |A| f fcfc Frequency.
Processor Architecture Needed to handle FFT algoarithm M. Smith.
ECE 734: Project Presentation Pankhuri May 8, 2013 Pankhuri May 8, point FFT Algorithm for OFDM Applications using 8-point DFT processor (radix-8)
1 Analog Devices TigerSHARC® DSP Family Presented By: Mike Lee and Mike Demcoe Date: April 8 th, 2002.
Blackfin ADSP Versus Sharc ADSP-21061
What are the characteristics of DSP algorithms? M. Smith and S. Daeninck.
Software and Hardware Circular Buffer Operations First presented in ENCM There are 3 earlier lectures that are useful for midterm review. M. R.
TigerSHARC CLU Closer look at the XCORRS M. Smith, University of Calgary, Canada
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh,
FFT-based filtering and the Short-Time Fourier Transform (STFT) R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2003.
Sampling, Reconstruction, and Elementary Digital Filters R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2002.
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter.
Introduction to Fast Fourier Transform (FFT) Algorithms R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2003.
Detailed look at the TigerSHARC pipeline Cycle counting for COMPUTE block versions of the DC_Removal algorithm.
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
TigerSHARC processor General Overview. 6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 2 Concepts tackled Introduction to.
Feb 12, 2004Tiger SHARC Memory Operations REV B 1 of 17 ENEL DSP Architectures Tiger SHARC Memory Operations.
Unit 7 Fourier, DFT, and FFT 1. Time and Frequency Representation The most common representation of signals and waveforms is in the time domain Most signal.
Fast Fourier Transforms
Embedded Systems Design ICT Embedded System What is an embedded System??? Any IDEA???
Real time DSP Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria.
10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter – Part 3 Understanding the memory pipeline issues.
DSP Processors We have seen that the Multiply and Accumulate (MAC) operation is very prevalent in DSP computation computation of energy MA filters AR filters.
ECE 456 Computer Architecture
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter – Part 2 Understanding the pipeline.
Generating “Rectify( )” Test driven development approach to TigerSHARC assembly code production Assembly code examples Part 1 of 3.
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Which one? You have a vector, a[ ], of random integers, which can modern CPUs do faster and why? //find max of vector of random ints max=0; for (inda=0;
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
A first attempt at learning about optimizing the TigerSHARC code TigerSHARC assembly syntax.
Fast Fourier Transforms. 2 Discrete Fourier Transform The DFT pair was given as Baseline for computational complexity: –Each DFT coefficient requires.
بسم الله الرحمن الرحيم Digital Signal Processing Lecture 14 FFT-Radix-2 Decimation in Frequency And Radix -4 Algorithm University of Khartoum Department.
Discrete Fourier Transform
1 Paper reading A New Approach to FFT Processor Speaker: 吳紋浩 第六組 洪聖揚 吳紋浩 Adviser: Prof. Andy Wu Mentor: 陳圓覺.
Generating a software loop with memory accesses TigerSHARC assembly syntax.
DIGITAL SIGNAL PROCESSING ELECTRONICS
William Stallings Computer Organization and Architecture 8th Edition
FFT-based filtering and the
Embedded Systems Design
Subject Name: Digital Signal Processing Algorithms & Architecture
Software and Hardware Circular Buffer Operations
General Optimization Issues
TigerSHARC processor General Overview.
Generating “Rectify( )”
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM
4.1 DFT In practice the Fourier components of data are obtained by digital computation rather than by analog processing. The analog values have to be.
Trying to avoid pipeline delays
Generating a software loop with memory accesses
Understanding the TigerSHARC ALU pipeline
What are the characteristics of DSP algorithms?
Convolution, GPS and the TigerSHARC XCORRS instr.
Understanding the TigerSHARC ALU pipeline
Getting serious about “going fast” on the TigerSHARC
General Optimization Issues
Explaining issues with DCremoval( )
General Optimization Issues
Chapter 19 Fast Fourier Transform
Understanding the TigerSHARC ALU pipeline
A first attempt at learning about optimizing the TigerSHARC code
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Working with the Compute Block
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM
Presentation transcript:

Processor Architecture Needed to handle FFT algoarithm M. Smith

Tackled already this term Three types of DSP algorithms Long loops, multiplication and addition intensive, regular (simple) memory accesses – e.g. 300 taps in FIR algorithms Short loops involving multiplications and additions – e.g. 3 stages in IIR algorithms DSP Introduction, M. Smith, ECE, University of Calgary, Canada 2

3 Comparing IIR and FIR filters Infinite Impulse Response filters – few operations to produce output from input for each IIR stage 3 – 7 stages Finite Impulse Response filters – many operations to produce output from input. Long FIFO buffer which may require as many operations As FIR calculation itself. Easy to optimize

Discrete Fourier Transform  FIR and IIR algorithms directly manipulate the data in “the time domain”.  FIR -- Process M data points using N point FIR filter – involves M * (N-1) additions M * N multiplications M * N * 2 + M memory accesses Algorithm takes a time of Order (M * N)  Very slow if manipulating large amount of data DSP Introduction, M. Smith, ECE, University of Calgary, Canada 4

Frequency domain analysis  Apply discrete Fourier transform (implemented via FFT)  Transform to frequency domain takes time Order (M log M)  Perform FIR in frequency domain takes time Order (M)  Transform back to time-domain takes time Order (M log M)  FFT (Order (M log M) is orders of magnitude faster that FIR (Order (M log N) DSP Introduction, M. Smith, ECE, University of Calgary, Canada 5

6

7

4 point DFT to show concepts DSP Introduction, M. Smith, ECE, University of Calgary, Canada 8

Simplify using special complex exponential properties DSP Introduction, M. Smith, ECE, University of Calgary, Canada 9

Running FFT on data stored in array DSP Introduction, M. Smith, ECE, University of Calgary, Canada 10

8 point FFT with log 8 (= 3) stages  3 stages – with N / 2 butterflies / stage Order (N log N) in time DSP Introduction, M. Smith, ECE, University of Calgary, Canada 11

Architectural characteristics needed to handle FFT efficiently DSP Introduction, M. Smith, ECE, University of Calgary, Canada 12

Add / subtract in one instruction  The following instruction is illegal as a single instruction XFR4 = R2 + R3, XFR5 = R6 + R7;; Note: comma and NOT semi-colon, means “one instruction” using 6 registers; Not enough data paths to get data into ALU (4 in -- 2 out) XFR4 = R2 + R3; XFR5 = R6 + R7;; ILLEGAL  FFT Butterfly add is special instruction XFR4 = R2 + R3, XFR5 = R2 – R3;; Uses only “4 registers”, 2 in, 2 out DSP Introduction, M. Smith, ECE, University of Calgary, Canada 13

Memory accesses  Stage 1 Fetch X data at location k and k + N /2 Store X data at location k and k + N /2  Stage 2 Fetch X data at location k and k + N /4 Store X data at location k and k + N /4  Stage 3 -- Final stage Fetch X data at location k and k + N /8 Store X data at bit-reversed location k and k + N /4 DSP Introduction, M. Smith, ECE, University of Calgary, Canada 14

First issue – how do you store complex numbers?  One option Use 16-bit values Store real part in top 16-bits Store imaginary part in bottom 16 bits Access data on J-bus Access complex sinusoids on J-bus Access both components (R and I) in one cycle TigerSHARC has the ability to do 16-bit complex additions and multiplications as specific instructions – INTEGER only Can Use both X and Y compute blocks DSP Introduction, M. Smith, ECE, University of Calgary, Canada 15

Integer operations a pain – tend to overflow  Option 2 – floating point Store Real component in location X and imaginary component in location Y Use R1:0 = Q[J4 += 4];; Store first imaginary number in X0 and Y0 Store second imaginary number in X1 and Y1 FR3 = R1 + R0;; – performs complex floating point addition in single cycle L[J5] = R3;; stores complex answer back DSP Introduction, M. Smith, ECE, University of Calgary, Canada 16

Integer operations a pain – tend to overflow  Option 3 – floating point Access Real component along J- bus from “data1” and Imaginary component along K- bus from “data 2” Use XR3:0 = Q[J4 += 4]; YR3:0 = Q[K4 += 4]; ; Store first imaginary number in X0 and Y0 Store second, third and fourth imaginary number in XR1, YR1;; XR2, YR2;; XR3, YR3 Which option is best? Depends? How handle bring in complex sinusoids DSP Introduction, M. Smith, ECE, University of Calgary, Canada 17

Bit reverse addressing DSP Introduction, M. Smith, ECE, University of Calgary, Canada 18

Bit reverse addressing – Check manual for “accurate details” before MII  Only possible with J0, J1, J2 and J3 registers (also K0, K1, K2, K3)  You must start the array on a N aligned boundary otherwise it does not work J0 = address pointer JB0 = base register – point to start of array JL0 = length of array register JM0 = special circular buffer modify register ???? XR4 = BR [J0 += 1]; Bit-reverse addressinbg only works on POST-MODIFY (permits next address to be calculated in parallel) DSP Introduction, M. Smith, ECE, University of Calgary, Canada 19

Issues handling “FFT Butterfly DSP Introduction, M. Smith, ECE, University of Calgary, Canada 20

FFT Introduction, M. Smith, ECE, University of Calgary, Canada 21

FFT Introduction, M. Smith, ECE, University of Calgary, Canada 22

FFT Introduction, M. Smith, ECE, University of Calgary, Canada 23

Wrong again  This is using the “Radix 2” form of the algorithm – breaks down into 2-pt DFT  There is also a Radix 4 form of the algorithm – which is faster again FFT Introduction, M. Smith, ECE, University of Calgary, Canada 24

FFT Introduction, M. Smith, ECE, University of Calgary, Canada 25

FFT Introduction, M. Smith, ECE, University of Calgary, Canada 26

FFT Introduction, M. Smith, ECE, University of Calgary, Canada 27

FFT Introduction, M. Smith, ECE, University of Calgary, Canada 28

FFT Introduction, M. Smith, ECE, University of Calgary, Canada 29

FFT Introduction, M. Smith, ECE, University of Calgary, Canada 30

FFT Introduction, M. Smith, ECE, University of Calgary, Canada 31

Many special TigerSHARC features to handle FFT FFT Introduction, M. Smith, ECE, University of Calgary, Canada 32