Fast Memory Addressing Scheme for Radix-4 FFT Implementation Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Xin Xiao, Erdal Oruklu and Jafar Saniie (Illinois Institute of Technology) Source: IEEE International Conference on Electro/Information Technology, eit ’09
Outline – Introduction – Radix4-FFT – Related Work – Proposed Method – Experimental Results – Conclusion 2
Introduction 3 Fast Fourier Transform (FFT) is widely applied in the speech processing, image processing, and communication system. One of the key components for various signal processing and communications applications such as software defined radio and OFDM.
Introduction(cont’d) 4
5 The main objective – This study is primarily Concerned Improving the performance of the address generation unit of the FFT processor by eliminating the complex critical path components.
Outline – Introduction – Radix4-FFT – Related Work – Proposed Method – Experimental Results – Conclusion 6
Introduction(cont’d) 7 Important FFT issues – High throughput – FFT size – Power consumption – Low cost – Area
Outline – Introduction – Radix4-FFT – Related Work – Proposed Method – Conclusion 8
Radix-4 9 The N-point discrete Fourier transform is defined by
Data Path of Radix-4 10
Butterfly Units The N-point FFT can be decomposed to repeated micro- operations called butterfly operations. When the size of the butterfly is r, the FFT operation is called a radix-r FFT. 11
Butterfly Units in Radix-4 12
Memory-based FFT In memory-based FFT architecture, only one butterfly structure is implemented in the chip, this butterfly unit will execute all the calculations recursively. 13
Execution Time 14
Outline – Introduction – Radix4-FFT – Related Work – Proposed Method – Experimental Results – Conclusion 15
Related Work 16 YearTitle 1969Organization of Large Scale Fourier Processors J. Assoc. Comput. Mach. 1976Simplified control of FFT hardwareIEEE Trans. Acoust., Speech, Signal Processing 1992Conflict free memory addressing for dedicated FFT hardware IEEE Trans. Circuits Syst. 1999An effective memory addressing scheme for FFT processors IEEE Trans. on Signal Process 2008An Efficient FFT Engine With Reduced Addressing Logic IEEE Transactions on Circuits and Systems II
Data Path of Radix-2 17
Data Path of Radix-4 18
Outline – Introduction – Radix4-FFT – Related Work – Proposed Method – Experimental Results – Conclusion 19
Memory Banks 20 Four memory banks are used to store the data.
Read Ports and Write Ports However, for pass 1 and pass 2, four inputs and four outputs of any butterfly stage belong to same memory bank. Since each memory bank is a two-port memory, at each clock cycle, each memory bank can export (read) once and import(write) once. Four clock cycles are necessary to perform four read and four write accesses in pass 1 and pass 2. 21
Counter D 22
Barrel Shifter The barrel shifter generates all the addresses for four memory banks based on the pass number of the FFT, which can be expressed as: RR(counter B, 2p) where RR(counter B, 2p) means rotate-right butterfly counter B by 2p bits, and p is the pass number of FFT. 23
Twiddle Factor 24
For Larger FFT Size For different length FFT transforms, the control logic of the multiplexers only depends on the last three bits of the counter,so the register and multiplexer structures are fixed for different length FFTs resulting in a common architecture for any N-point FFT. 25
Logic Minimization After logic minimization, it results in only primitive logic gates such as AND/OR gates using the least significant bits of the butterfly counter B. 26
Address Sequences(R0~R15) 27
Address Sequences(R16~R31) 28
Outline – Introduction – Radix4-FFT – Related Work – Proposed Method – Experimental Results – Conclusion 29
Experimental Results 30
Experimental Results 31
Outline – Introduction – Radix4-FFT – Related Work – Proposed Method – Experimental Results – Conclusion 32
Conclusions 33
Thanks for Listening 34