Reconfigurable acceleration of robust frequency-domain echo cancellation C. H. Ho 1, K.F.C.Yiu 2, J. Huo 3, S. Nordholm 3 and W. Luk 1 1.Department of.

Slides:



Advertisements
Similar presentations
David Hansen and James Michelussi
Advertisements

© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Image Compression System Megan Fuller and Ezzeldin Hamed 1.
2004 COMP.DSP CONFERENCE Survey of Noise Reduction Techniques Maurice Givens.
A Matlab Playground for JPEG Andy Pekarske Nikolay Kolev.
Digital Kommunikationselektronik TNE027 Lecture 4 1 Finite Impulse Response (FIR) Digital Filters Digital filters are rapidly replacing classic analog.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven Hardware Acceleration (1/4)
Carnegie Mellon Adaptive Mapping of Linear DSP Algorithms to Fixed-Point Arithmetic Lawrence J. Chang Inpyo Hong Yevgen Voronenko Markus Püschel Department.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
Low power and cost effective VLSI design for an MP3 audio decoder using an optimized synthesis- subband approach T.-H. Tsai and Y.-C. Yang Department of.
1 Real time signal processing SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
CELLULAR COMMUNICATIONS DSP Intro. Signals: quantization and sampling.
1 Summary of SDR Analog radio systems are being replaced by digital radio systems for various radio applications. SDR technology aims to take advantage.
1 DSP Implementation on FPGA Ahmed Elhossini ENGG*6090 : Reconfigurable Computing Systems Winter 2006.
Prepared by: Hind J. Zourob Heba M. Matter Supervisor: Dr. Hatem El-Aydi Faculty Of Engineering Communications & Control Engineering.
Discrete-Time and System (A Review)
Real time DSP Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria.
03/12/20101 Analysis of FPGA based Kalman Filter Architectures Arvind Sudarsanam Dissertation Defense 12 March 2010.
1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.
Optimising Explicit Finite Difference Option Pricing For Dynamic Constant Reconfiguration 1 Qiwei Jin*, David Thomas^, Tobias Becker*, and Wayne Luk* *Department.
Scheme for Improved Residual Echo Cancellation in Packetized Audio Transmission Jivesh Govil Digital Signal Processing Laboratory Department of Electronics.
Telecommunications and Signal Processing Seminar Ravi Bhargava * Lizy K. John * Brian L. Evans Ramesh Radhakrishnan * The University of Texas at.
Efficient FPGA Implementation of QR
Implementing a Speech Recognition System on a GPU using CUDA
200/MAPLD 2004 Craven1 Super-Sized Multiplies: How Do FPGAs Fare in Extended Digit Multipliers? Stephen Craven Cameron Patterson Peter Athanas Configurable.
EE/CS 481 Spring Founder’s Day, 2008 University of Portland School of Engineering Project Golden Eagle CMOS Fast Fourier Transform Processor Team.
1 C.H. Ho © Rapid Prototyping of FPGA based Floating Point DSP Systems C.H. Ho Department of Computer Science and Engineering The Chinese University of.
Name : Arum Tri Iswari Purwanti NPM :
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
Unit-V DSP APPLICATIONS. UNIT V -SYLLABUS DSP APPLICATIONS Multirate signal processing: Decimation Interpolation Sampling rate conversion by a rational.
Ch.5 Fixed-Point vs. Floating Point. 5.1 Q-format Number Representation on Fixed-Point DSPs 2’s Complement Number –B = b N-1 …b 1 b 0 –Decimal Value D.
1 Reconfigurable Acceleration of Microphone Array Algorithms for Speech Enhancement Ka Fai Cedric Yiu, Yao Lu, Xiaoxiang Shi The Hong Kong Polytechnic.
An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow.
ESPL 1 Wordlength Optimization with Complexity-and-Distortion Measure and Its Application to Broadband Wireless Demodulator Design Kyungtae Han and Brian.
Chapter 6 Spectrum Estimation § 6.1 Time and Frequency Domain Analysis § 6.2 Fourier Transform in Discrete Form § 6.3 Spectrum Estimator § 6.4 Practical.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
Floating-Point Divide and Square Root for Efficient FPGA Implementation of Image and Signal Processing Algorithms Xiaojun Wang, Miriam Leeser
Generating Sinusoidal Signals Prof. Brian L. Evans Dept. of Electrical and Computer Engineering The University of Texas at Austin EE 445S Real-Time Digital.
Which one? You have a vector, a[ ], of random integers, which can modern CPUs do faster and why? //find max of vector of random ints max=0; for (inda=0;
Digital Processing for EELS Data Xiang Yang WATLABS, Univeristy of Waterloo.
Frequency Domain Adaptive Filtering Project Supervisor Dr. Edward Jones Myles Ó Fríl.
Reconfigurable FFT architecture
Software Defined Radio PhD Program on Electrical Engineering Sampling Theory and Quantization José Vieira.
Linear filtering based on the DFT
Frequency Domain Coding of Speech 主講人:虞台文. Content Introduction The Short-Time Fourier Transform The Short-Time Discrete Fourier Transform Wide-Band Analysis/Synthesis.
DEPARTMENTT OF ECE TECHNICAL QUIZ-1 AY Sub Code/Name: EC6502/Principles of digital Signal Processing Topic: Unit 1 & Unit 3 Sem/year: V/III.
Variable Step-Size Adaptive Filters for Acoustic Echo Cancellation Constantin Paleologu Department of Telecommunications
 Adaptive filter based on LMS Algorithm used in different fields  Equalization, Noise Cancellation, Channel Estimation...  Easy implementation in embedded.
EE345S Real-Time Digital Signal Processing Lab Fall 2006 Lecture 17 Fast Fourier Transform Prof. Brian L. Evans Dept. of Electrical and Computer Engineering.
Digital Signal Processor HANYANG UNIVERSITY 학기 Digital Signal Processor 조 성 호 교수님 담당조교 : 임대현
بسم الله الرحمن الرحيم Digital Signal Processing Lecture 14 FFT-Radix-2 Decimation in Frequency And Radix -4 Algorithm University of Khartoum Department.
Low Power Design for a 64 point FFT Processor
Peter Tummeltshammer, Martin Delvai
Chapter 4 Discrete-Time Signals and transform
Fang Fang James C. Hoe Markus Püschel Smarahara Misra
Structures for Discrete-Time Systems
CE Digital Signal Processing Fall Discrete-time Fourier Transform
Adnan Quadri & Dr. Naima Kaabouch Optimization Efficiency
Laplace and Z transforms
Fast Fourier Transform
Generating Sinusoidal Signals
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM
Finite Wordlength Effects
DEPARTMENT OF INFORMATION TECHNOLOGY DIGITAL SIGNAL PROCESSING UNIT 4
1CECA, Peking University, China
Zhongguo Liu Biomedical Engineering
DEPARTMENT OF INFORMATION TECHNOLOGY DIGITAL SIGNAL PROCESSING UNIT 4
Real time signal processing
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Fast Fourier Transform
Presentation transcript:

Reconfigurable acceleration of robust frequency-domain echo cancellation C. H. Ho 1, K.F.C.Yiu 2, J. Huo 3, S. Nordholm 3 and W. Luk 1 1.Department of Computing, Imperial College London 2.Department of Industrial and Manufacturing Systems Engineering, The University of Hong Kong 3.Western Australian Telecommunications Research Institute, The University of Western Australia

2 Introduction Echo: affects many communication systems hands-free telephony VoIP Adaptive filters employed to cancel echo Computationally intensive involves large number of data

3 Achievements 1.Novel reconfigurable architecture for two- path frequency-domain echo cancellation 2.Bit-width optimisations with fixed-point saturation arithmetic 3.Single core: 12.5 times faster than 3.2GHz Pentium-4 machine

4 Background: transmitting signals Given input signal x(n), speech v(n), impulse response h(n), then return signal y(n) is x(n) h(n) v(n) y(n)

5 Echo filtering Search for filter coefficients to eliminate the input signals,

6 Delayless sub-band filtering downsampling input signals and error signals compute the coefficient of each sub-band apply to with weighted transform x(n) A(z) D h 0 (n) h 1 (n) h M-1 (n) Weighted Transform A(z) D - y(n) e(n)

7 Robust two-path adaptive filtering Two filter coefficient foreground coefficients background coefficients Power level to select the coefficients high power  foreground coefficients low power  background coefficients

8 Hardware architecture Support core operations signals transform and filtering Fast Fourier transform transform the input and error signal to frequency domain with multi sub-band Complex number multiplication perform the adaptive filtering Inverse Fourier transform transform the filtered signal to time domain

9 Datapath Hf, Hb: coefficients buf: interface between core and hosts X: signals in frequency domain

10 Design optimisation Optimise number representation explore quantisation error for different bitwidths Avoid overflow fixed-point number format + saturation arithmetic Compare results double-precision floating point arithmetic Use of pre-placed FFT core increase the throughput of FFT Multiple instances of adaptive filter in an FPGA increase the throughput of the overall systems

11 Bitwidth optimisation Perform filtering without introduce any near-end signal expected result all echo-signal is filtered Choose the smallest bitwidth which can perform the filtering effectively

12 Results: fixed point optimisation i = 10 f = 10 i = 10 f = 14 i = 10 f = 18 i = 10 f = 54

13 Filter performance near-end signals mixed signals filtered signals using double precision arithmetic filtered signals using optimised fixed point arithmetic

14 FPGA Filter implementations FPGA chipsXC4VSX55XC3S5000 Slices4372(17%)5255 (15%) DSP48/MULT52 (10%)48 (46%) Block RAM24 (7%)24 (23%) Frequency180.0MHz98.8MHz Throughput (samples per second) 12.5M6.87M Faster than Pentium 4 at 3.2GHz 13.2 times7.2 times

15 Multiple instances on XC4VSX55

16 Current and future work Embedded system Bit-width optimisation Power and energy consumption Run-time reconfiguration Adaptive filtering

17 Summary 1.Novel reconfigurable architecture for two- path frequency-domain echo cancellation 2.Bit-width optimisations with fixed-point saturation arithmetic 3.Single core: 12.5 times faster than 3.2GHz Pentium-4 machine