20 October 2003WASPAA 2003 - New Paltz, NY1 Implementation of real time partitioned convolution on a DSP board Enrico Armelloni, Christian Giottoli, Angelo.

Slides:



Advertisements
Similar presentations
DFT & FFT Computation.
Advertisements

Michael Phipps Vallary S. Bhopatkar
Is There a Real Difference between DSPs and GPUs?
DSPs Vs General Purpose Microprocessors
Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
DFT properties Note: it is important to ensure that the DFTs are the same length If x1(n) and x2(n) have different lengths, the shorter sequence must be.
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
Distributed Arithmetic
Digital Filters. A/DComputerD/A x(t)x[n]y[n]y(t) Example:
Block Convolution: overlap-save method  Input Signal x[n]: arbitrary length  Impulse response of the filter h[n]: lenght P  Block Size: N  we take.
Digital Signal Processing – Chapter 11 Introduction to the Design of Discrete Filters Prof. Yasser Mostafa Kadah
Blackfin ADSP Versus Sharc ADSP-21061
Computes the partial dot products for only the diagonal and upper triangle of the input matrix. The vector computed by this architecture is added to the.
Chapter 8: The Discrete Fourier Transform
MM3FC Mathematical Modeling 3 LECTURE 3
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh,
FFT-based filtering and the Short-Time Fourier Transform (STFT) R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2003.
1 Architectural Analysis of a DSP Device, the Instruction Set and the Addressing Modes SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software.
1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.
1 Real time signal processing SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.
Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.
Real time DSP Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria.
DLS Digital Controller Tony Dobbing Head of Power Supplies Group.
Digital Signal Processing – Chapter 10
Finite-Length Discrete Transform
The Discrete Fourier Transform 主講人:虞台文. Content Introduction Representation of Periodic Sequences – DFS (Discrete Fourier Series) Properties of DFS The.
Zhongguo Liu_Biomedical Engineering_Shandong Univ. Chapter 8 The Discrete Fourier Transform Zhongguo Liu Biomedical Engineering School of Control.
Hossein Sameti Department of Computer Engineering Sharif University of Technology.
Digital Signal Processing Chapter 3 Discrete transforms.
Z TRANSFORM AND DFT Z Transform
Copyright ©2010, ©1999, ©1989 by Pearson Education, Inc. All rights reserved. Discrete-Time Signal Processing, Third Edition Alan V. Oppenheim Ronald W.
Lecture 6: DFT XILIANG LUO 2014/10. Periodic Sequence  Discrete Fourier Series For a sequence with period N, we only need N DFS coefs.
LIST OF EXPERIMENTS USING TMS320C5X Study of various addressing modes of DSP using simple programming examples Sampling of input signal and display Implementation.
Chapter 5 Finite-Length Discrete Transform
| Page Angelo Farina UNIPR | All Rights Reserved | Confidential Digital sound processing Convolution Digital Filters FFT Fast Convolution.
EEE 503 Digital Signal Processing Lecture #2 : EEE 503 Digital Signal Processing Lecture #2 : Discrete-Time Signals & Systems Dr. Panuthat Boonpramuk Department.
Digital Signal Processing
DSP-CIS Part-IV : Filter Banks & Subband Systems Chapter-12 : Frequency Domain Filtering Marc Moonen Dept. E.E./ESAT-STADIUS, KU Leuven
A Mini Stereo Digital Audio Processor Design DINESH GUNDU VIGNESH SABARINATH.
Linear filtering based on the DFT
DEPARTMENTT OF ECE TECHNICAL QUIZ-1 AY Sub Code/Name: EC6502/Principles of digital Signal Processing Topic: Unit 1 & Unit 3 Sem/year: V/III.
Fast Fourier Transforms. 2 Discrete Fourier Transform The DFT pair was given as Baseline for computational complexity: –Each DFT coefficient requires.
Rohini Ravichandran Kaushik Narayanan A MINI STEREO DIGITAL AUDIO PROCESSOR (BEHAVIORAL MODEL)
3/18/20161 Linear Time Invariant Systems Definitions A linear system may be defined as one which obeys the Principle of Superposition, which may be stated.
Analysis of Linear Time Invariant (LTI) Systems
EE345S Real-Time Digital Signal Processing Lab Fall 2006 Lecture 17 Fast Fourier Transform Prof. Brian L. Evans Dept. of Electrical and Computer Engineering.
بسم الله الرحمن الرحيم Digital Signal Processing Lecture 14 FFT-Radix-2 Decimation in Frequency And Radix -4 Algorithm University of Khartoum Department.
Application of digital filter in engineering
1 Chapter 8 The Discrete Fourier Transform (cont.)
 Carrier signal is strong and stable sinusoidal signal x(t) = A cos(  c t +  )  Carrier transports information (audio, video, text, ) across.
Prof. Brian L. Evans Dept. of Electrical and Computer Engineering The University of Texas at Austin EE445S Real-Time Digital Signal Processing Lab Spring.
Signal Processing First
DIGITAL SIGNAL PROCESSING ELECTRONICS
Lecture 11 FIR Filtering Intro
Lecture 12 Linearity & Time-Invariance Convolution
FFT-based filtering and the
Embedded Systems Design
Digital Signal Processors
Fast Fourier Transform
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM
Sampling Sampling an electrical signal means capturing its value repeatedly at a constant, very fast rate. Sampling frequency (fs) is defined as the number.
APPLICATION of the DFT: Convolution of Finite Sequences.
Z TRANSFORM AND DFT Z Transform
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Lecture 16 Outline: Linear Convolution, Block-by Block Convolution, FFT/IFFT Announcements: HW 4 posted, due tomorrow at 4:30pm. No late HWs as solutions.
Real time signal processing
ADSP 21065L.
Fast Fourier Transform
CE Digital Signal Processing Fall Discrete Fourier Transform (DFT)
Presentation transcript:

20 October 2003WASPAA New Paltz, NY1 Implementation of real time partitioned convolution on a DSP board Enrico Armelloni, Christian Giottoli, Angelo Farina. Industrial Engineering Department - University of Parma Parco Area delle Scienze 181/A, Parma – Italy

20 October 2003WASPAA New Paltz, NY2 Outline: Linear convolution; Overlap & Save method; Uniformly-partitioned Overlap & Save method; Software implementation on a DSP board; Industrial Engineering Dept. University of Parma – Italy

20 October 2003WASPAA New Paltz, NY3 Convolution (1): Industrial Engineering Dept. University of Parma – Italy Convolution of a continuous input signal x(t) with a linear filter characterized by an Impulse Response h(t) yields an output signal y(t): If the input signal and the Impulse Response are digitally sampled (t = i  t) and the Impulse Response has finite length N, we can write:

20 October 2003WASPAA New Paltz, NY4 Convolution (2): Industrial Engineering Dept. University of Parma – Italy y:=0; FOR n:=0 TO N-1 DO y:= y + a[n]·x[n]; Multiply and ACcumulate On a DSP board this instruction is performed in one cycle Clock core = 100 MHz Sample frequency f S = 48 KHz  Upper limit is 2000 MAC per sample

20 October 2003WASPAA New Paltz, NY5 Convolution (3): Industrial Engineering Dept. University of Parma – Italy If Impulse Response is very long, i.e. 2 second or plus, like an IR measured inside of a theatre, h(t) is length points or 48kHz. 

20 October 2003WASPAA New Paltz, NY6 Filtering in the frequency domain: Industrial Engineering Dept. University of Parma – Italy Could be better operate in the frequency domain Problems Filtering can be performed only when all data are available Order of FFT is too high. x(n)X(k) FFT X(k)  H(k) Y(k)y(n) IFFT x(n)  h(n) y(n) Solution Overlap & Save algorithm.

20 October 2003WASPAA New Paltz, NY7 Overlap & Save algorithm (1): Industrial Engineering Dept. University of Parma – Italy It can be shown that multiplication of two DFTs corresponds to a circular convolution of their time domain sequences. To implement a FIR filter, a linear convolution is required. A procedure for converting a circular convolution into a linear convolution is the Overlap&Save algorithm. Long duration signal sections x m (n) are overlapped of (Q-1) samples, where Q is the length of the Impulse Response h(n).

20 October 2003WASPAA New Paltz, NY8 Overlap & Save algorithm (2): Industrial Engineering Dept. University of Parma – Italy 1.Perform N-point FFT of the IR h(n) and store it: 2.Select N points from x(n) based on following expression: where: n = 0,1,2,…,N-1 m = 1,2,3,… N = FFT length Q = IR length 3N - 2Q + 12N - 2Q + 2 2N - Q + 1N - Q N - 10

20 October 2003WASPAA New Paltz, NY9 Overlap & Save algorithm (3): Industrial Engineering Dept. University of Parma – Italy 3.Multiply the stored frequency response of h(n) by the FFT of input signal batch m. 4.Perform an N-point IFFT of the product. 5.Discard the first (Q-1) points from each successive output of step 4, and append the remaining outputs to y(n): y(n) = y 1 (n), y 2 (n),…, y m (n),… y 1 (n)n = Q – 1,…,N - 1 y 2 [n – (N – Q + 1)]n = N,…,2N - Q.... y m [n – (m – 1)(N – Q + 1)]n = (m – 1)(N – Q + 1) + (Q – 1), …,(m – 1)(N – Q + 1) + (N – 1)....

20 October 2003WASPAA New Paltz, NY10 Overlap & Save algorithm (4): Industrial Engineering Dept. University of Parma – Italy FFT N-point FFT N-point x IFFT Xm(k)H(k) Select last N – Q + 1 samples Append to y(n) x m (n) h(n) Overlap & Save convolution process: Problems Latency between Input and Output data is too high. Management problem with internal memory of the DSP. Solution Uniformly-partitioned Overlap & Save algorithm.

20 October 2003WASPAA New Paltz, NY11 Uniformly-partitioned O&S algorithm (1): Industrial Engineering Dept. University of Parma – Italy The impulse response h(n) is partitioned in a reasonable number P of equally-sized blocks (i.e. P = 4), where each block is K points long. 1 st block 2 nd block 3 rd block4 th block

20 October 2003WASPAA New Paltz, NY12 Uniformly-partitioned O&S algorithm (2): Industrial Engineering Dept. University of Parma – Italy Every filter S i is convolved, using the Overlap&Save method, to L-point blocks of input data (each block begins L – K points after the previous). Each block is treated as a separate IR, zero- padded to L and transformed with FFT in order to obtain a collection of frequency domain filters S. (i.e. P = 3). The results of the multiplications of the P filters S with the FFTs of the latest P input blocks are summed in P frequency-domain accumulators, and at the end an IFFT is done on the content of first accumulator for producing a block of output data. Only the latest L-K points of the block have to be kept.

20 October 2003WASPAA New Paltz, NY13 Uniformly-partitioned O&S algorithm (3): Industrial Engineering Dept. University of Parma – Italy Total number of FFTs is minimized, in fact each block of input data needs to be FFT transformed and IFFT antitransformed just once, after frequency-domain summation. Latency of the whole filtering processing is just L points instead of N. It means that the I/O delay is kept to a low value, provided that the impulse response is partitioned in a sensible number of chunks (8 – 32).

20 October 2003WASPAA New Paltz, NY14 Analog Devices DSP platform’s features: ADDS 21161N Ez-Kit Lite board 100 MHz (10 ns) SIMD SHARC DSP core. 600 MFLOPS (32-bit floating- point data). 600 MOPS (32-bit fixed-point data). Single-cycle instruction execution, including SIMD operation in two parallel computational units (ALUs). 4 channels INPUT, 8 channels OUTPUT. AD1836 and AD1852, 48 or 96 kHz sampling frequency, 24-bits audio converters Industrial Engineering Dept. University of Parma – Italy

20 October 2003WASPAA New Paltz, NY15 DSP’s performances: Industrial Engineering Dept. University of Parma – Italy FIR filter implementation: Using a sampling frequency of 48 kHz, a 2000-taps direct- form FIR can be implemented Maximun length of the Impulse Response is around 40 ms !!  SIMD architecture of this processor (dual ALU) allows to implement a 2000-taps FIR filtering simultaneously on two independent data flows (stereo processing).

20 October 2003WASPAA New Paltz, NY16 Impulse Response processing: Industrial Engineering Dept. University of Parma – Italy Impulse Response is: downloaded on DSP partitioned into P blocks, where each block is K points length (K = 4096) each block is zero-padded to a length of L points (L = 8192) transformed by standard FFT procedure supplied by Analog Devices and stored in the external memory.

20 October 2003WASPAA New Paltz, NY17 I/O data stream processing: Industrial Engineering Dept. University of Parma – Italy A ping-pong I/O buffer was used in the implementation of the algorithm:

20 October 2003WASPAA New Paltz, NY18 Filtering procedure: Industrial Engineering Dept. University of Parma – Italy FFT[A] X Filter[0] Computation circular buffer A0A0 A1A1 A2A2 A3A3 FFT[B] B 0 +A 1 B 1 +A 2 B 2 +A 3 B3B3 A0A0 A2A2 A3A3 A1A1 X Filter[1] X Filter[2] X Filter[3] B0B0 B2B2 B3B3 B1B1 From input_buffer IFFT[A] To output_buffer IFFT[B] To output_buffer FFT[A] = FFT of the processing stream. Filter[i] = P blocks containing FFT of the IR (i.e. P = 4) IFFT[A] = IFFT of the leftmost block A 0 Last L-K of IFFT[A] are sent to Output_Buffer FFT[B] = FFT of the processing stream. Filter[i] = P blocks containing FFT of the IR (i.e. P = 4) IFFT[B] = IFFT of the block B 0 + A 1 Last L-K of IFFT[B] are sent to Output_Buffer

20 October 2003WASPAA New Paltz, NY19 Results (1): Industrial Engineering Dept. University of Parma – Italy kHz  IR length 2.3 second kHz  IR length 0.94 second. In 2x2 mode (4 filters) is far in excess than the requirements for good cross-talk canceling filters (typically 4096 taps). Ch INCh OUTNumber of blockIR length

20 October 2003WASPAA New Paltz, NY20 Results (2): Industrial Engineering Dept. University of Parma – Italy Tests performed demonstrated that the maximum efficiency is reached when the overlap between two input streams is around half of the FFT length, L. Using L = 8192 points, and a sampling frequency F s = 48 kHz, latency between Input and Output is 170 ms.

20 October 2003WASPAA New Paltz, NY21 Conclusion: Industrial Engineering Dept. University of Parma – Italy Succesfull implementation of the real-time partitioned convolution on the ADDS 21161N Ez-Kit Lite board, operated from 1 to 4 input 48kHz. Impulse Responses of points were managed, with latency between Input and Output data limited to 170 ms. When it is required to implement a light, compact system and with little number of channels, DSP is a sensible solution, otherwise a PC provides a significantly better price/performance ratio.