Cost-Effective Pipeline FFT/IFFT VLSI Architecture for DVB-H System Present by: Yuan-Chu Yu Chin-Teng Lin and Yuan-Chu Yu Department of Electrical and Control Engineering, National Chiao Tung University, Taiwan.
NST Outline Introduction Radix-4 2 and Radix-4 3 based FFT/IFFT Algorithms Pipeline 4096-Point R4 2 SDF and R4 3 SDF based FFT/ IFFT VLSI Architecture Radix-4 Butterfly Memory Structure Constant Multiplier Eight-Folded Complex Multiplier Comparison Results Conclusion
NST Introduction The OFDM modulation : low receiver complexity and high performance on highly dispersive channels Handheld consumer products: High throughput, low power and hardware efficient FFT/IFFT processor The 4K mode in digital video broadcasting – Handheld (DVB-H) system: 4096-point FFT/IFFT processor Pipeline architecture: regularity, lower operating frequency and high throughput Multipath delay commutator (MDC) architecture [4] Single-path delay feedback (SDF) architecture [4, 5, 6]: low hardware cost and high cost-efficiency with a tightly scheduling arithmetic operations
NST Outline Introduction Radix-4 2 and Radix-4 3 based FFT/IFFT Algorithms Pipeline 4096-Point R4 2 SDF and R4 3 SDF based FFT/ IFFT VLSI Architecture Radix-4 Butterfly Memory Structure Constant Multiplier Eight-Folded Complex Multiplier Comparison Results Conclusion
NST The FFT of the N-point input x[n] is defined as where Applying a 3-D linear index map where 0 ≦ n1, n2, k1, k2 ≦ 3 The common factor algorithm (CFA) form: Radix-4 2 FFT/IFFT based Algorithms (1)
NST First butterfly structure Second butterfly structure Applying the CFA procedure recursively to the remaining FFTs of length N/16. Low multiplicative complexity as radix-16 algorithm Low hardware cost as radix-4 algorithm Similar radix-4 butterfly structure with only some sign inversions in IFFT computation. Radix-4 2 FFT/IFFT based Algorithms (2) Constant Multiplier
NST Applying a 4-D linear index map where 0 ≦ n1, n2, n3, k1, k2, k3 ≦ 3. The common factor algorithm (CFA) form: Radix-4 3 FFT/IFFT based Algorithms (1)
NST First butterfly structure Second butterfly structure Third butterfly structure Applying the CFA procedure recursively to the remaining FFTs of length N/64. Low multiplicative complexity as radix-64 algorithm Low hardware cost as radix-4 algorithm Similar radix-4 butterfly structure with only some sign inversions in IFFT computation. Radix-4 3 FFT/IFFT based Algorithms (2) Constant Multiplier
NST Outline Introduction Radix-4 2 and Radix-4 3 based FFT/IFFT Algorithms Pipeline 4096-Point R4 2 SDF and R4 3 SDF based FFT/ IFFT VLSI Architecture Radix-4 Butterfly Memory Structure Constant Multiplier Eight-Folded Complex Multiplier Comparison Results Conclusion
NST The Purposed SDF based Architecture R4 2 SDF: 6 radix-4 butterfly stages, 4095-word shift register, 3 constant multipliers and 2 complex multipliers R4 3 SDF: 6 radix-4 butterfly stages, 4095-word shift register, 4 constant multipliers and 1 complex multipliers
NST Architecture Analysis Multiplication complexity in 4096-point FFT/IFFT computation: R2 2 SDF[5]R4 2 SDFR4 3 SDF Multiplication # Normalize Ratio Hardware requirement in 4096-point FFT/IFFT computation: R2 2 SDF [5]R4 2 SDFR4 3 SDF Butterfly Stages1266 Shifter Register4095 Constant Mul.034 Complex Mul.521
NST Radix-4 Butterfly Butterfly hardware cost: Four four-input complex adders without multiplier SDF architecture: Fully pipeline with high utilization Highly regular High effective memory structure Simpler routing complexity
NST Memory Structure and Timing Sequence Four Modes in Butterfly : 1.Mode 0~2: data reordering 2.Mode 3: radix-4 FFT/IFFT computation Delay Feedback Memory: 1.Mode 0~2: store serial data input and push FFT/ IFFT result output 2.Mode 3: store FFT/IFFT result and push data output
NST Constant Multiplier Retrenched Constant Multiplier : 1.Shifters-and-adders 2.Complex conjugate Symmetry Rule: 83% 3.Sub-expression Elimination Algorithm [8]: 20% for shifters, 50% for adders Constant Multiplier
NST Eight-Folded Complex Multiplier Retrenched Coefficient ROM Size: 1.Complex Conjugate Symmetry Rule 2.Sub-expression Elimination [8] HAddr. Mode ROM Addr. Data Mode ROM data 0~51100a+jb 512~10231H[9:0]1b+ja 1024~ b-ja 1536~20471H[9:0]3-a+jb 2048~ a-jb 2560~30711H[9:0]5-b-ja 3072~358306b-ja 3584~40951H[9:0]7a-jb
NST Outline Introduction Radix-4 2 and Radix-4 3 based FFT/IFFT Algorithms Pipeline 4096-Point R4 2 SDF and R4 3 SDF based FFT/ IFFT VLSI Architecture Radix-4 Butterfly Memory Structure Constant Multiplier Eight-Folded Complex Multiplier Comparison Results Conclusion
NST Hardware Cost Comparisons Area Conversion [5][9]: complex mult. and memory are 50 and 1.3 complex adders Pipeline Architecture Mult. Complexity Complex Mult. # Complex Adders # Complex Mem. # Area Index ( 4096-Points ) R2SDFRadix-2log 2 N-22log 2 NN R4SDFRadix-4log 4 N-18log 4 NN R8SDFRadix-8log 8 N-1(24+2T)log 8 NN R2 2 SDFRadix-2 2 log 4 N-14log 4 NN R2 3 SDFRadix-2 3 2(log 8 N-1)6log 8 NN R2MDCRadix-2log 2 N-22log 2 N1.5N R2 2 MDCRadix-2 2 log 2 N-22log 2 N1.5N R4MDCRadix-43log 4 N-34log 2 N2.5N R8MDCRadix-87log 8 N-7(24+2T)log 8 N4.5N R4 2 SDFRadix-4 2 log 16 N-1(16+T)log 16 NN R4 3 SDFRadix-4 3 log 64 N-1 (24+2T)log 64 N N
NST Hardware Utilization Rate Comparisons Pipeline Architecture Utilization Rate of Complex Mult. Utilization Rate of Complex Adders Utilization Rate of Complex Mem. R2SDF50 % 100 % R4SDF75 %25 %100 % R8SDF87.5 %12.5 %100 % R2 2 SDF75 %50 %100 % R2 3 SDF87.5 %50 %100 % R2MDC50 % R2 2 MDC37.5 %50 % R4MDC25 % R8MDC12.5 % R4 2 SDF87.5 %56.25 %100 % R4 3 SDF96.9 %60.42 %100 %
NST Outline Introduction Radix-4 2 and Radix-4 3 FFT/IFFT based Algorithms Pipeline 4096-Point R4 2 SDF and R4 3 SDF based FFT/ IFFT VLSI Architecture Radix-4 Butterfly Memory Structure Constant Multiplier Eight-Folded Complex Multiplier Comparison Results Conclusion
NST Conclusion The proposed R4 2 SDF and R4 3 SDF design achieve the high cost effective advantages Lower multiplicative complexity as radix-16 and radix-64 algorithm Lower hardware cost (smaller chip cost) Higher hardware utilization rate The R4 3 SDF design achieve the better performance than R4 2 SDF and other pipeline architecture in 4096-points FFT/IFFT processor design.
NST References [1] ETSI, “Digital Video Broadcasting (DVB): Transmission System for Handheld Terminals (DVB- H),” ETSI EN [2] R. K. Kolagotla, J. Fridman, M. M. Hoffiman, W. C. Anderson, B. C. Aldrich, D. B. Witt, M. S. Allen, R. R. Dunton and L. A. Booth, “ A 333-MHz dual-MAC DSP architecture for next- generation wireless application,” IEEE Inter. Conf. on Acou., Speech, and Signal Proc., vol. 2, pp , May [3] W. Li and L. Wanhammar, “A pipeline FFT processor,” in Proc. IEEE Workshop on Signal Processing Systems, 1999, pp [4] S. He and M. Torkelson, “Designing pipeline FFT processor for OFDM (de)modulation, “ in Proc. URSI Int. Symp. Signals, Syst., Electron., pp , [5] Wei-Hsin Chang, Truong Nguyen, “An OFDM-specified lossless FFT architecture, “ IEEE Trans. on Circuits and Systems I, vol. 53, issue 6, pp , June [6] W. C. Yeh and C. W. Jen, “High-speed and low-power split-radix FFT,” IEEEE Trans. on Signal Processing, vol. 51, no. 3, pp , Mar [7] C. S. Burrus, “Index mapping for multidimensional formulation of the DFT and convolution, ” IEEE Trans. Acoust., Speech, Signal Processing, ASSP-25(3): , June [8] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. New York: Wiley, [9] T. Sansaloni, A. Perez-Pascual, V. Torres and J. Valls, “Efficient pipeline FFT processors for wireless LAN MIMIO-OFDM systems”, Electronics Letters, vol. 41, no. 19, Sep
NST Contact Information Address: No. 12, Innovation 1st Rd., Science-Based Industrial Park, Elan Microelectronics Corporation, Hsinchu City, Taiwan 308 R.O.C. Thanks for your attention!!