Cost-Effective Pipeline FFT/IFFT VLSI Architecture for DVB-H System Present by: Yuan-Chu Yu Chin-Teng Lin and Yuan-Chu Yu Department of Electrical and.

Slides:



Advertisements
Similar presentations
DFT & FFT Computation.
Advertisements

Digital Kommunikationselektronik TNE027 Lecture 5 1 Fourier Transforms Discrete Fourier Transform (DFT) Algorithms Fast Fourier Transform (FFT) Algorithms.
DFT and FFT FFT is an algorithm to convert a time domain signal to DFT efficiently. FFT is not unique. Many algorithms are available. Each algorithm has.
ECE 734: Project Presentation Pankhuri May 8, 2013 Pankhuri May 8, point FFT Algorithm for OFDM Applications using 8-point DFT processor (radix-8)
Las Palmas de G.C., Dec IUMA Projects and activities.
1 Final project Speaker: Team 5 電機三 黃柏森 趙敏安 Mentor : 陳圓覺 Adviser: Prof. An-Yeu Wu Date: 2007/1/22.
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh,
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Distributed Arithmetic: Implementations and Applications
1 CFO Estimation with ICI Cancellation for OFDM Systems 吳宗威.
Low power and cost effective VLSI design for an MP3 audio decoder using an optimized synthesis- subband approach T.-H. Tsai and Y.-C. Yang Department of.
Presenter: Hong Wen-Chih 2015/8/11. Outline Introduction Definition of fractional fourier transform Linear canonical transform Implementation of FRFT/LCT.
1 DSP Implementation on FPGA Ahmed Elhossini ENGG*6090 : Reconfigurable Computing Systems Winter 2006.
Prepared by: Hind J. Zourob Heba M. Matter Supervisor: Dr. Hatem El-Aydi Faculty Of Engineering Communications & Control Engineering.
Real time DSP Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria.
Introduction to Adaptive Digital Filters Algorithms
A Bit-Serial Method of Improving Computational Efficiency of Dot-Products 1.
Department of Computer Systems Engineering, N-W.F.P. University of Engineering & Technology. DSP Presentation Computing Multiplication & division using.
ABSTRACT Introduction NEW Recursive DFT/IDFT architecture Low computation cycle  1/2: Chebyshev polynomial  2/N: Folded architecture High speed  Register-splitting.
Constraint Directed CAD Tool For Automatic Latency-optimal Implementation of FPGA-based Systolic Arrays Greg Nash Reconfigurable Technology: FPGAs and.
Fast Memory Addressing Scheme for Radix-4 FFT Implementation Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Xin Xiao, Erdal Oruklu and.
Tinoosh Mohsenin and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis Split-Row: A Reduced Complexity, High Throughput.
High Performance Scalable Base-4 Fast Fourier Transform Mapping Greg Nash Centar 2003 High Performance Embedded Computing Workshop
Adviser:高永安 Student:林柏廷
J. Greg Nash ICNC 2014 High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations J. Greg.
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
Area: VLSI Signal Processing.
Paper Reading - A New Approach to Pipeline FFT Processor Presenter:Chia-Hsin Chen, Yen-Chi Lee Mentor:Chenjo Instructor:Andy Wu.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU CORDIC (Coordinate rotation digital computer) Ref: Y. H. Hu, “CORDIC based VLSI architecture.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Under-Graduate Project Case Study: Single-path Delay Feedback FFT Speaker: Yu-Min.
Multi-Split-Row Threshold Decoding Implementations for LDPC Codes
Speaker: Darcy Tsai Advisor: Prof. An-Yeu Wu Date: 2013/10/31
PAPR Reduction Method for OFDM Systems without Side Information
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
Fast Fourier Transforms. 2 Discrete Fourier Transform The DFT pair was given as Baseline for computational complexity: –Each DFT coefficient requires.
Case Study: Implementing the MPEG-4 AS Profile on a Multi-core System on Chip Architecture R 楊峰偉 R 張哲瑜 R 陳 宸.
Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.
Fast VLSI Implementation of Sorting Algorithm for Standard Median Filters Hyeong-Seok Yu SungKyunKwan Univ. Dept. of ECE, Vada Lab.
FFT VLSI Implementation
VLSI SP Course 2001 台大電機吳安宇 1 Why Systolic Architecture ? H. T. Kung Carnegie-Mellon University.
EE345S Real-Time Digital Signal Processing Lab Fall 2006 Lecture 17 Fast Fourier Transform Prof. Brian L. Evans Dept. of Electrical and Computer Engineering.
NCTU, CS VLSI Information Processing Research Lab 研究生 : ABSTRACT Introduction NEW Recursive DFT/IDFT architecture Low computation cycle  1/2: Chebyshev.
Low Power Design for a 64 point FFT Processor
CORDIC (Coordinate rotation digital computer)
1 Paper reading A New Approach to FFT Processor Speaker: 吳紋浩 第六組 洪聖揚 吳紋浩 Adviser: Prof. Andy Wu Mentor: 陳圓覺.
Optimizing Interconnection Complexity for Realizing Fixed Permutation in Data and Signal Processing Algorithms Ren Chen, Viktor K. Prasanna Ming Hsieh.
Fang Fang James C. Hoe Markus Püschel Smarahara Misra
Improved Resource Sharing for FPGA DSP Blocks
CORDIC (Coordinate rotation digital computer)
DIGITAL SIGNAL PROCESSING ELECTRONICS
Fast Fourier Transforms Dr. Vinu Thomas
Linglong Dai and Zhaocheng Wang Tsinghua University, Beijing, China
Subject Name: Digital Signal Processing Algorithms & Architecture
Real-time double buffer For hard real-time
A New Approach to Pipeline FFT Processor
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM
4.1 DFT In practice the Fourier components of data are obtained by digital computation rather than by analog processing. The analog values have to be.
Multiplier-less Multiplication by Constants
Jian Huang, Matthew Parris, Jooheung Lee, and Ronald F. DeMara
High Throughput LDPC Decoders Using a Multiple Split-Row Method
Applications of Distributed Arithmetic to Digital Signal Processing:
Linglong Dai, Jintao Wang, Zhaocheng Wang and Jun Wang
C Model Sim (Fixed-Point) -A New Approach to Pipeline FFT Processor
FFT VLSI Implementation
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Speaker: Chris Chen Advisor: Prof. An-Yeu Wu Date: 2014/10/28
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM
Fast Fourier Transform
Computer Architecture
Presentation transcript:

Cost-Effective Pipeline FFT/IFFT VLSI Architecture for DVB-H System Present by: Yuan-Chu Yu Chin-Teng Lin and Yuan-Chu Yu Department of Electrical and Control Engineering, National Chiao Tung University, Taiwan.

NST Outline Introduction Radix-4 2 and Radix-4 3 based FFT/IFFT Algorithms Pipeline 4096-Point R4 2 SDF and R4 3 SDF based FFT/ IFFT VLSI Architecture Radix-4 Butterfly Memory Structure Constant Multiplier Eight-Folded Complex Multiplier Comparison Results Conclusion

NST Introduction The OFDM modulation : low receiver complexity and high performance on highly dispersive channels Handheld consumer products: High throughput, low power and hardware efficient FFT/IFFT processor The 4K mode in digital video broadcasting – Handheld (DVB-H) system: 4096-point FFT/IFFT processor Pipeline architecture: regularity, lower operating frequency and high throughput Multipath delay commutator (MDC) architecture [4] Single-path delay feedback (SDF) architecture [4, 5, 6]: low hardware cost and high cost-efficiency with a tightly scheduling arithmetic operations

NST Outline Introduction Radix-4 2 and Radix-4 3 based FFT/IFFT Algorithms Pipeline 4096-Point R4 2 SDF and R4 3 SDF based FFT/ IFFT VLSI Architecture Radix-4 Butterfly Memory Structure Constant Multiplier Eight-Folded Complex Multiplier Comparison Results Conclusion

NST The FFT of the N-point input x[n] is defined as where Applying a 3-D linear index map where 0 ≦ n1, n2, k1, k2 ≦ 3 The common factor algorithm (CFA) form: Radix-4 2 FFT/IFFT based Algorithms (1)

NST First butterfly structure Second butterfly structure Applying the CFA procedure recursively to the remaining FFTs of length N/16. Low multiplicative complexity as radix-16 algorithm Low hardware cost as radix-4 algorithm Similar radix-4 butterfly structure with only some sign inversions in IFFT computation. Radix-4 2 FFT/IFFT based Algorithms (2) Constant Multiplier

NST Applying a 4-D linear index map where 0 ≦ n1, n2, n3, k1, k2, k3 ≦ 3. The common factor algorithm (CFA) form: Radix-4 3 FFT/IFFT based Algorithms (1)

NST First butterfly structure Second butterfly structure Third butterfly structure Applying the CFA procedure recursively to the remaining FFTs of length N/64. Low multiplicative complexity as radix-64 algorithm Low hardware cost as radix-4 algorithm Similar radix-4 butterfly structure with only some sign inversions in IFFT computation. Radix-4 3 FFT/IFFT based Algorithms (2) Constant Multiplier

NST Outline Introduction Radix-4 2 and Radix-4 3 based FFT/IFFT Algorithms Pipeline 4096-Point R4 2 SDF and R4 3 SDF based FFT/ IFFT VLSI Architecture Radix-4 Butterfly Memory Structure Constant Multiplier Eight-Folded Complex Multiplier Comparison Results Conclusion

NST The Purposed SDF based Architecture R4 2 SDF: 6 radix-4 butterfly stages, 4095-word shift register, 3 constant multipliers and 2 complex multipliers R4 3 SDF: 6 radix-4 butterfly stages, 4095-word shift register, 4 constant multipliers and 1 complex multipliers

NST Architecture Analysis Multiplication complexity in 4096-point FFT/IFFT computation: R2 2 SDF[5]R4 2 SDFR4 3 SDF Multiplication # Normalize Ratio Hardware requirement in 4096-point FFT/IFFT computation: R2 2 SDF [5]R4 2 SDFR4 3 SDF Butterfly Stages1266 Shifter Register4095 Constant Mul.034 Complex Mul.521

NST Radix-4 Butterfly Butterfly hardware cost: Four four-input complex adders without multiplier SDF architecture: Fully pipeline with high utilization Highly regular High effective memory structure Simpler routing complexity

NST Memory Structure and Timing Sequence Four Modes in Butterfly : 1.Mode 0~2: data reordering 2.Mode 3: radix-4 FFT/IFFT computation Delay Feedback Memory: 1.Mode 0~2: store serial data input and push FFT/ IFFT result output 2.Mode 3: store FFT/IFFT result and push data output

NST Constant Multiplier Retrenched Constant Multiplier : 1.Shifters-and-adders 2.Complex conjugate Symmetry Rule: 83% 3.Sub-expression Elimination Algorithm [8]: 20% for shifters, 50% for adders Constant Multiplier

NST Eight-Folded Complex Multiplier Retrenched Coefficient ROM Size: 1.Complex Conjugate Symmetry Rule 2.Sub-expression Elimination [8] HAddr. Mode ROM Addr. Data Mode ROM data 0~51100a+jb 512~10231H[9:0]1b+ja 1024~ b-ja 1536~20471H[9:0]3-a+jb 2048~ a-jb 2560~30711H[9:0]5-b-ja 3072~358306b-ja 3584~40951H[9:0]7a-jb

NST Outline Introduction Radix-4 2 and Radix-4 3 based FFT/IFFT Algorithms Pipeline 4096-Point R4 2 SDF and R4 3 SDF based FFT/ IFFT VLSI Architecture Radix-4 Butterfly Memory Structure Constant Multiplier Eight-Folded Complex Multiplier Comparison Results Conclusion

NST Hardware Cost Comparisons Area Conversion [5][9]: complex mult. and memory are 50 and 1.3 complex adders Pipeline Architecture Mult. Complexity Complex Mult. # Complex Adders # Complex Mem. # Area Index ( 4096-Points ) R2SDFRadix-2log 2 N-22log 2 NN R4SDFRadix-4log 4 N-18log 4 NN R8SDFRadix-8log 8 N-1(24+2T)log 8 NN R2 2 SDFRadix-2 2 log 4 N-14log 4 NN R2 3 SDFRadix-2 3 2(log 8 N-1)6log 8 NN R2MDCRadix-2log 2 N-22log 2 N1.5N R2 2 MDCRadix-2 2 log 2 N-22log 2 N1.5N R4MDCRadix-43log 4 N-34log 2 N2.5N R8MDCRadix-87log 8 N-7(24+2T)log 8 N4.5N R4 2 SDFRadix-4 2 log 16 N-1(16+T)log 16 NN R4 3 SDFRadix-4 3 log 64 N-1 (24+2T)log 64 N N

NST Hardware Utilization Rate Comparisons Pipeline Architecture Utilization Rate of Complex Mult. Utilization Rate of Complex Adders Utilization Rate of Complex Mem. R2SDF50 % 100 % R4SDF75 %25 %100 % R8SDF87.5 %12.5 %100 % R2 2 SDF75 %50 %100 % R2 3 SDF87.5 %50 %100 % R2MDC50 % R2 2 MDC37.5 %50 % R4MDC25 % R8MDC12.5 % R4 2 SDF87.5 %56.25 %100 % R4 3 SDF96.9 %60.42 %100 %

NST Outline Introduction Radix-4 2 and Radix-4 3 FFT/IFFT based Algorithms Pipeline 4096-Point R4 2 SDF and R4 3 SDF based FFT/ IFFT VLSI Architecture Radix-4 Butterfly Memory Structure Constant Multiplier Eight-Folded Complex Multiplier Comparison Results Conclusion

NST Conclusion The proposed R4 2 SDF and R4 3 SDF design achieve the high cost effective advantages Lower multiplicative complexity as radix-16 and radix-64 algorithm Lower hardware cost (smaller chip cost) Higher hardware utilization rate The R4 3 SDF design achieve the better performance than R4 2 SDF and other pipeline architecture in 4096-points FFT/IFFT processor design.

NST References [1] ETSI, “Digital Video Broadcasting (DVB): Transmission System for Handheld Terminals (DVB- H),” ETSI EN [2] R. K. Kolagotla, J. Fridman, M. M. Hoffiman, W. C. Anderson, B. C. Aldrich, D. B. Witt, M. S. Allen, R. R. Dunton and L. A. Booth, “ A 333-MHz dual-MAC DSP architecture for next- generation wireless application,” IEEE Inter. Conf. on Acou., Speech, and Signal Proc., vol. 2, pp , May [3] W. Li and L. Wanhammar, “A pipeline FFT processor,” in Proc. IEEE Workshop on Signal Processing Systems, 1999, pp [4] S. He and M. Torkelson, “Designing pipeline FFT processor for OFDM (de)modulation, “ in Proc. URSI Int. Symp. Signals, Syst., Electron., pp , [5] Wei-Hsin Chang, Truong Nguyen, “An OFDM-specified lossless FFT architecture, “ IEEE Trans. on Circuits and Systems I, vol. 53, issue 6, pp , June [6] W. C. Yeh and C. W. Jen, “High-speed and low-power split-radix FFT,” IEEEE Trans. on Signal Processing, vol. 51, no. 3, pp , Mar [7] C. S. Burrus, “Index mapping for multidimensional formulation of the DFT and convolution, ” IEEE Trans. Acoust., Speech, Signal Processing, ASSP-25(3): , June [8] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. New York: Wiley, [9] T. Sansaloni, A. Perez-Pascual, V. Torres and J. Valls, “Efficient pipeline FFT processors for wireless LAN MIMIO-OFDM systems”, Electronics Letters, vol. 41, no. 19, Sep

NST Contact Information Address: No. 12, Innovation 1st Rd., Science-Based Industrial Park, Elan Microelectronics Corporation, Hsinchu City, Taiwan 308 R.O.C. Thanks for your attention!!