FFT VLSI Implementation

Slides:



Advertisements
Similar presentations
David Hansen and James Michelussi
Advertisements

Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
CS364 CH16 Control Unit Operation
Cost-Effective Pipeline FFT/IFFT VLSI Architecture for DVB-H System Present by: Yuan-Chu Yu Chin-Teng Lin and Yuan-Chu Yu Department of Electrical and.
1 Final project Speaker: Team 5 電機三 黃柏森 趙敏安 Mentor : 陳圓覺 Adviser: Prof. An-Yeu Wu Date: 2007/1/22.
ELEC692 VLSI Signal Processing Architecture Lecture 9 VLSI Architecture for Discrete Cosine Transform.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Why Systolic Architecture ? VLSI Signal Processing 台灣大學電機系 吳安宇.
Instruction Level Parallelism (ILP) Colin Stevens.
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Distributed Arithmetic: Implementations and Applications
Low power and cost effective VLSI design for an MP3 audio decoder using an optimized synthesis- subband approach T.-H. Tsai and Y.-C. Yang Department of.
1 DSP Implementation on FPGA Ahmed Elhossini ENGG*6090 : Reconfigurable Computing Systems Winter 2006.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU A New Algorithm to Compute the Discrete Cosine Transform VLSI Signal Processing 台灣大學電機系.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Multirate Processing of Digital Signals: Fundamentals VLSI Signal Processing 台灣大學電機系 吳安宇.
An Energy-Efficient Reconfigurable Multiprocessor IC for DSP Applications Multiple programmable VLIW processors arranged in a ring topology –Balances its.
1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.
A Bit-Serial Method of Improving Computational Efficiency of Dot-Products 1.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Digital System Design Course Introduction Lecturer : 吳安宇 Date : 2004/02/20.
Under-Graduate Project Mid-Term Paper Reading Presentation Adviser: Prof. An-Yeu Wu Mentor: 詹承洲 第二組 溫仁揚 溫昌懌.
What have mr aldred’s dirty clothes got to do with the cpu
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Digital System Design Course Introduction Lecturer :吳安宇 Date : 2005/2/25.
Software Defined Radio 長庚電機通訊組 碩一 張晉銓 指導教授 : 黃文傑博士.
Fast Memory Addressing Scheme for Radix-4 FFT Implementation Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Xin Xiao, Erdal Oruklu and.
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
RISC architecture and instruction Level Parallelism (ILP) based on “Computer Architecture: a Quantitative Approach” by Hennessy and Patterson, Morgan Kaufmann.
RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Undergraduate Projects Speaker: Wes Adviser: Prof. An-Yeu Wu Date: 2015/09/22 Lab.
Area: VLSI Signal Processing.
Paper Reading - A New Approach to Pipeline FFT Processor Presenter:Chia-Hsin Chen, Yen-Chi Lee Mentor:Chenjo Instructor:Andy Wu.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU CORDIC (Coordinate rotation digital computer) Ref: Y. H. Hu, “CORDIC based VLSI architecture.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Under-Graduate Project Case Study: Single-path Delay Feedback FFT Speaker: Yu-Min.
Applications of Distributed Arithmetic to Digital Signal Processing:
Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
Implementing Multiuser Channel Estimation and Detection for W-CDMA Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro and Behnaam Aazhang Rice.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU 2007/03/07 大學部專題報告 DSP Engine Design for Ultrasonic Digital Beamforming Presenter: Sniper.
Speaker: Darcy Tsai Advisor: Prof. An-Yeu Wu Date: 2013/10/31
Professor A G Constantinides 1 Discrete Fourier Transforms Consider finite duration signal Its z-tranform is Evaluate at points on z-plane as We can evaluate.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
NSF/TCPP Curriculum Planning Workshop Joseph JaJa Institute for Advanced Computer Studies Department of Electrical and Computer Engineering University.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU CORDIC (COordinate Rotation DIgital Computer) For Advanced VLSI and VLSI Signal Processing.
Fast VLSI Implementation of Sorting Algorithm for Standard Median Filters Hyeong-Seok Yu SungKyunKwan Univ. Dept. of ECE, Vada Lab.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Multirate Processing of Digital Signals (II): Short-Length FIR Filter VLSI Signal Processing.
Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Brief Overview of Residue Number System (RNS) VLSI Signal Processing 台灣大學電機系 吳安宇.
Processor Organization and Architecture Module III.
Chapter One Introduction to Pipelined Processors.
VLSI SP Course 2001 台大電機吳安宇 1 Why Systolic Architecture ? H. T. Kung Carnegie-Mellon University.
 presented by- ARPIT GARG ISHU MISHRA KAJAL SINGHAL B.TECH(ECE) 3RD YEAR.
NCTU, CS VLSI Information Processing Research Lab 研究生 : ABSTRACT Introduction NEW Recursive DFT/IDFT architecture Low computation cycle  1/2: Chebyshev.
بسم الله الرحمن الرحيم Digital Signal Processing Lecture 14 FFT-Radix-2 Decimation in Frequency And Radix -4 Algorithm University of Khartoum Department.
CORDIC (Coordinate rotation digital computer)
1 Paper reading A New Approach to FFT Processor Speaker: 吳紋浩 第六組 洪聖揚 吳紋浩 Adviser: Prof. Andy Wu Mentor: 陳圓覺.
CORDIC (Coordinate rotation digital computer)
DIGITAL SIGNAL PROCESSING ELECTRONICS
Embedded Systems Design
Comp 541 Wrap Up! Montek Singh Apr 27, 2018.
Brief Overview of Residue Number System (RNS)
Real-time double buffer For hard real-time
A New Approach to Pipeline FFT Processor
Fast Fourier Transformation (FFT)
C Model Sim (Fixed-Point) -A New Approach to Pipeline FFT Processor
Speaker: Yumin Adviser: Prof. An-Yeu Wu Date: 2013/10/24
Lecture #18 FAST FOURIER TRANSFORM ALTERNATE IMPLEMENTATIONS
FFT VLSI Implementation
95-1 Under-Graduate Project Paper Reading Presentation
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Speaker: Chris Chen Advisor: Prof. An-Yeu Wu Date: 2014/10/28
Presentation transcript:

FFT VLSI Implementation VLSI Signal Processing 台灣大學電機系 吳安宇 Shousheng He and Mats Torkelson, A new approach to pipeline FFT processor. IEEE Proc. Of IPPS, P766-770, 1996. E. Bidet, D. Castelain, C. Joanblanq, and P. Senn, A fast single-chip implementation of 8192 complex point FFT. IEEE J. Solid-State Circuits, P300-305, March 1995

FFT Review

Implementation --- Two Extreme Method Slow  ----------------- Speed ----------------- Fast Small  ------------------Area------------------- Large Complicated  ------------ Control --------------- Simple

Design Consideration System Requirement e.g., speed, area,power … Trade-off in these two cases, we need More Processing Elements (PE’s) Better Processing Element Utilization Rate Better Control Scheme

FFT Processor --- Block Diagram

Some Current Themes Radix-2 Multi-path Delay Commutator. ( N = 16 ) Radix-2 Single-path Delay Feedback. ( N = 16 )

Some Current Themes (cont.) Radix-4 Single-path Delay Feedback. ( N = 256 ) Radix-4 Multi-path Delay Commutator. ( N = 256 ) Radix-4 Single-path Delay Commutator. ( N = 256 )

Distinctive merit of the above The delay-feedback are more efficient than delay-commutator in terms of memory utilization Radix-4 has higher multiplier utilization ,however,Radix-2 has simpler BF which are better utilized

Comparison Radix / Speed Low  ----------------------------------- High Control Theme Simple  ----------------------------------- Complex Processing Ability / Unit Low  ----------------------------------- High Combine the advantages  Further decompose high radix PE

Decompose Method (1) Simply ‘‘reuse’’ the repeated micro unit A radix-4 PE

Decompose Method (2) From algorithm level Applying 3 index: n=<n1*N/2 + n2*N/4 + n3>N k=<k1 + 2k2 + 4k3>N where n1,n2={0,1} ;n3={0~N/4-1} Summation of n1

Decompose Method (2) cont. Summation of n2 Only real-imaginary swapping & sign inversion

Graphical Explanation (N=16) Trivial multiplication

Graphical Explanation (cont.) The Eqs are equivalent to the operations below

Circuit of BF2I First N/2 cycles Xr(n) Zr(n+N/2) Xi(n) Zi(n+N/2) Xr(n+N/2) Zr(n) Xi(n+N/2) Zi(n) Second N/2 cycles

Circuit of BF2II Xr(n) Zr(n+N/2) Xi(n) Zi(n+N/2) Xr(n+N/2) Zr(n) Xi(n+N/2) Zi(n) Swap Re&Im and sign inversion

Radix-22 Single-path Delay Feedback FFT architecture using the above technique, for N=256 Compare with original architecture, for N=256

Structural advantage 2 Radix-2 has the same complexity as radix-4,but still retain radix-2 BF structure The stage has non-trivial multiplication Control is simple; synchronization controller address counter for W n

Conclusions FFT Applications: Radar Signal Processing, Fast convolution, Spectrum Estimation, OFDM-based Modulation/demodulations Efficient VLSI architectures (parallel processing) are required for real-time processing. However, most systems still employ DSP processors (e.g., TI C3x/C5x) for computations (fast algorithms like DIT and DIF FFT). VLIW (Very Long-length Instruction Word)-based processors (TI C6x) need new programming skills to utilize the two parallel MAC units.