FFT VLSI Implementation 台灣大學電機系 吳安宇 VLSI Signal Processing Shousheng He and Mats Torkelson, A new approach to pipeline FFT processor. IEEE Proc. Of IPPS, P766-770, 1996. E. Bidet, D. Castelain, C. Joanblanq, and P. Senn, A fast single-chip implementation of 8192 complex point FFT. IEEE J. Solid-State Circuits, P300-305, March 1995 Updated on 4/2/2001
FFT Review
Implementation --- Two Extreme Method Slow ----------------- Speed ----------------- Fast Small ------------------Area------------------- Large Complicated ------------ Control --------------- Simple
Design Consideration System Requirement e.g., speed, area,power … Trade-off in these two cases, we need More Processing Elements (PE’s) Better Processing Element Utilization Rate Better Control Scheme
FFT Processor --- Block Diagram
Some Current Themes Radix-2 Single-path Delay Feedback. ( N = 16 ) Radix-2 Multi-path Delay Commutator. ( N = 16 )
Some Current Themes (cont.) Radix-4 Single-path Delay Feedback. ( N = 256 ) Radix-4 Single-path Delay Commutator. ( N = 256 ) Radix-4 Multi-path Delay Commutator. ( N = 256 )
Comparison Radix / Speed Low ----------------------------------- High Control Theme Simple ----------------------------------- Complex Processing Ability / Unit Low ----------------------------------- High Combine the advantages Further decompose high radix PE
Decompose Method (1) Simply ‘‘reuse’’ the repeated micro unit A radix-4 PE
Decompose Method (2) From algorithm level N/4 point FFT
Graphical Explanation (N=16)
Graphical Explanation (cont.) The Eqs are equivalent to the operations below
Circuit of BF2I and BF2II
Radix-22 Single-path Delay Feedback FFT architecture using the above technique, for N=256 Compare with original architecture
Conclusions FFT Applications: Radar Signal Processing, Fast convolution, Spectrum Estimation, OFDM-based Modulation/demodulations Efficient VLSI architectures (parallel processing) are required for real-time processing. However, most systems still employ DSP processors (e.g., TI C3x/C5x) for computations (fast algorithms like DIT and DIF FFT). VLIW (Very Long-length Instruction Word)-based processors (TI C6x) need new programming skills to utilize the two parallel MAC units.