Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.

Slides:



Advertisements
Similar presentations
Chapter 19 Fast Fourier Transform
Advertisements

David Hansen and James Michelussi
Very Large Fast DFT (VL FFT) Implementation on KeyStone Multicore Applications.
Programmable FIR Filter Design
Digital Kommunikationselektronik TNE027 Lecture 5 1 Fourier Transforms Discrete Fourier Transform (DFT) Algorithms Fast Fourier Transform (FFT) Algorithms.
The Discrete Fourier Transform. The spectrum of a sampled function is given by where –  or 0 .
ECE 734: Project Presentation Pankhuri May 8, 2013 Pankhuri May 8, point FFT Algorithm for OFDM Applications using 8-point DFT processor (radix-8)
Sampling, Reconstruction, and Elementary Digital Filters R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2002.
Distributed arithmetic SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic All the slides have been copied.
MAPLD 2005 A High-Performance Radix-2 FFT in ANSI C for RTL Generation John Ardini.
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
1 Real time signal processing SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
Prepared by: Hind J. Zourob Heba M. Matter Supervisor: Dr. Hatem El-Aydi Faculty Of Engineering Communications & Control Engineering.
Processor Architecture Needed to handle FFT algoarithm M. Smith.
Student : Andrey Kuyel Supervised by Mony Orbach Spring 2011 Final Presentation High speed digital systems laboratory High-Throughput FFT Technion - Israel.
Computational Technologies for Digital Pulse Compression
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
Fast Memory Addressing Scheme for Radix-4 FFT Implementation Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Xin Xiao, Erdal Oruklu and.
High Performance Scalable Base-4 Fast Fourier Transform Mapping Greg Nash Centar 2003 High Performance Embedded Computing Workshop
ECE 8053 Introduction to Computer Arithmetic (Website: Course & Text Content: Part 1: Number Representation.
J. Greg Nash ICNC 2014 High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations J. Greg.
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Under-Graduate Project Case Study: Single-path Delay Feedback FFT Speaker: Yu-Min.
Implementation of a noise subtraction algorithm using Verilog HDL University of Massachusetts, Amherst Department of Electrical & Computer Engineering,
Copyright © 2004, Dillon Engineering Inc. All Rights Reserved. An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs  Architecture optimized.
Speaker: Darcy Tsai Advisor: Prof. An-Yeu Wu Date: 2013/10/31
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
An FFT for Wireless Protocols Dr. J. Greg Nash Centar ( HAWAI'I INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES Mobile.
EC1358 – DIGITAL SIGNAL PROCESSING
Low Power Design for a 64 point FFT Processor
CORDIC Based 64-Point Radix-2 FFT Processor
1 Paper reading A New Approach to FFT Processor Speaker: 吳紋浩 第六組 洪聖揚 吳紋浩 Adviser: Prof. Andy Wu Mentor: 陳圓覺.
Optimizing Interconnection Complexity for Realizing Fixed Permutation in Data and Signal Processing Algorithms Ren Chen, Viktor K. Prasanna Ming Hsieh.
STUDY OF PIC MICROCONTROLLERS.. Design Flow C CODE Hex File Assembly Code Compiler Assembler Chip Programming.
Fang Fang James C. Hoe Markus Püschel Smarahara Misra
Microprocessor and Microcontroller Fundamentals
CS501 Advanced Computer Architecture
Introduction to Programmable Logic
Embedded Systems Design
Instructor: Dr. Phillip Jones
Chapter 14 Instruction Level Parallelism and Superscalar Processors
DESIGN AND IMPLEMENTATION OF DIGITAL FILTER
Digital Signal Processors
Subject Name: Digital Signal Processing Algorithms & Architecture
Pipelining: Advanced ILP
Subject Name: Digital Signal Processing Algorithms & Architecture
Subject Name: Digital Signal Processing Algorithms & Architecture
Centar ( Global Signal Processing Expo
A New Approach to Pipeline FFT Processor
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM
EE 445S Real-Time Digital Signal Processing Lab Spring 2014
4.1 DFT In practice the Fourier components of data are obtained by digital computation rather than by analog processing. The analog values have to be.
Digital Control Systems Waseem Gulsher
ARM implementation the design is divided into a data path section that is described in register transfer level (RTL) notation control section that is viewed.
* From AMD 1996 Publication #18522 Revision E
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Data Wordlength Reduction for Low-Power Signal Processing Software
Digital Signal Processors-1
C Model Sim (Fixed-Point) -A New Approach to Pipeline FFT Processor
Chapter 19 Fast Fourier Transform
Binary Adder/Subtractor
Speaker: Yumin Adviser: Prof. An-Yeu Wu Date: 2013/10/24
95-1 Under-Graduate Project Fixed-point Analysis
Real time signal processing
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Speaker: Chris Chen Advisor: Prof. An-Yeu Wu Date: 2014/10/28
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM
ADSP 21065L.
Embedded Sound Processing : Implementing the Echo Effect
Presentation transcript:

Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic

Objective FFT Introduction Some FFT algorithms FFT on PDSP FFT floating to fixed-point conversion Hardware implementation of FFT

FFT for TMS320x67 with 2 buffers Buffer (ping) Destination address 1 count Source Serial address EDMA Port FFT Buffer (pong) Processing Destination address 2 event (internal timer 1 is selected) Switch address at the completion of a count transfer

FFT Fixed point - Xilinx Performing the calculations with no scaling and carrying computation The growth of the fractional bits created from the multiplication are truncated after the multiplication. The width of the output will be the (input width + number of stages + 1). For example, a 1024-pt transform with an input of 16 bits consisting of 1 integer bit and 15 fractional bits, will have an output of 27 bits with 12 integer bits and 15 fractional bits. Scaling at each stage using a fixed-scaling schedule Scaling automatically using block-floating point [Xilinx05]

Block-floating point The computation is fixed-point After every addition there is an overflow test If the overflow is detected the array is divided by ½ The number of division is counted to determine the scale factor SNR depends on how many overflows occurs

Butterfly computation for Decimation in Time Linear noise model [Oppenheim98]

[Oppenheim98]

Butterfly with Scaling multipliers [Oppenheim98]

Sequential FFT-Xilinx core

Pipelined FFT-Xilinx core

Pipelined FFT architecture • Radix-2 multipath delay commutator (R2MDC) • Radix-2 single-path delay feedback (R2SDC) • Radix-4 multipath delay commutator (R4MDC) • Radix-4 single-path delay commutator (R4SDC) • Radix-4 single-path delay feedback (R4SDF) • Radix-22 single-path delay commutator (R22SDC) [Li03]

Radix-2 multipath delay commutator The total number of delay elements is 4 + 2 + 2 + 1 + 1 = 10 for the 8-point FFT. The utilization of the butterfly and the multiplier is 50% [Li03]

Radix-2 single-path delay feedback The total number of delay elements is N – 1=N/2 + N/4 +... + 1 [Li03]

FFT processor Datapath Control unit memories, butterflies and complex multipliers. Control unit [Li03]

Requirements Requirement Steps in designing Transform length is 1024 Transform time is less than 40 ms (continuously) Continuous I/O 25.6 Msamples/sec. throughput Complex 24 bits I/O data Steps in designing Architecture selection Partitioning Scheduling Word length selection RTL model generation Validation of models [Li03]

Resource analysis Computation time for the 1024-point FFT The number of butterfly operations for Radix2 Assume 1 clock cycle per Butterfly The minimum number of Butterflies is This is optimal with the assumption that ALL data are available to ALL stages, which is impossible for continuous data streams. Each butterfly has to be idle for 50% in order to reorder the incoming data. [Li03]

Resource analysis The solution: the number of butterflies is 10 The number of complex multipliers is 9 Memory length for Radix-2 single-path delay feedback is N-1 [Li03]

RAM Based Commutator A dual-port memory is required since the read and write operation must be performed in one clock cycle. [Li03]

Complex multiplier [Li03]

Radix - 4

Radix 4

Altera radix-4 butterfly [Oppenheim98]

References [Altera05] Altera, FFT MegaCore Function User Guide, DSP Literature, 2005. [Li03] W Li, Studies on implementation of low power FFT processors, Thesis, Linköpings University, 2003 [Oppenheim98] A. V. Oppenheim, R. W. Schafer, Discrete-time signal processing, 2nd edition, Prentice Hall, 1998. [Xlinx05] Xilinx, “Fast Fourier Transform v3.2”, DS260 August 31, 2005