White and Gloster P741 An Implementation of the Discrete Fourier Transform on a Reconfigurable Processor By Michael J. White 1,2* and Clay Gloster, Jr.,

Slides:



Advertisements
Similar presentations
PIPELINE AND VECTOR PROCESSING
Advertisements

Processor Architecture Needed to handle FFT algoarithm M. Smith.
ECE 734: Project Presentation Pankhuri May 8, 2013 Pankhuri May 8, point FFT Algorithm for OFDM Applications using 8-point DFT processor (radix-8)
Computer Organization and Architecture
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Introduction to Fast Fourier Transform (FFT) Algorithms R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2003.
CS 104 Introduction to Computer Science and Graphics Problems
FPGA BASED IMAGE PROCESSING Texas A&M University / Prairie View A&M University Over the past few decades, the improvements from machine language to objected.
MAPLD 2005 A High-Performance Radix-2 FFT in ANSI C for RTL Generation John Ardini.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
ECE Lecture 1 1 ECE 3561 Advanced Digital Design Department of Electrical and Computer Engineering The Ohio State University.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Written by: Haim Natan Benny Pano Supervisor:
Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.
1 DSP Implementation on FPGA Ahmed Elhossini ENGG*6090 : Reconfigurable Computing Systems Winter 2006.
Interrupts. 2 Definition: An electrical signal sent to the CPU (at any time) to alert it to the occurrence of some event that needs its attention Purpose:
1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.
Trigger design engineering tools. Data flow analysis Data flow analysis through the entire Trigger Processor allow us to refine the optimal architecture.
Processor Architecture Needed to handle FFT algoarithm M. Smith.
Student : Andrey Kuyel Supervised by Mony Orbach Spring 2011 Final Presentation High speed digital systems laboratory High-Throughput FFT Technion - Israel.
Matrix Multiplication on FPGA Final presentation One semester – winter 2014/15 By : Dana Abergel and Alex Fonariov Supervisor : Mony Orbach High Speed.
A comprehensive method for the evaluation of the sensitivity to SEUs of FPGA-based applications A comprehensive method for the evaluation of the sensitivity.
1 of 23 Fouts MAPLD 2005/C117 Synthesis of False Target Radar Images Using a Reconfigurable Computer Dr. Douglas J. Fouts LT Kendrick R. Macklin Daniel.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
Fast Memory Addressing Scheme for Radix-4 FFT Implementation Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Xin Xiao, Erdal Oruklu and.
Mahesh Sukumar Subramanian Srinivasan. Introduction Embedded system products keep arriving in the market. There is a continuous growing demand for more.
EE/CS 481 Spring Founder’s Day, 2008 University of Portland School of Engineering Project Golden Eagle CMOS Fast Fourier Transform Processor Team.
J. Christiansen, CERN - EP/MIC
VHDL Project Specification Naser Mohammadzadeh. Schedule  due date: Tir 18 th 2.
GRECO - CIn - UFPE1 A Reconfigurable Architecture for Multi-context Application Remy Eskinazi Sant´Anna Federal University of Pernambuco – UFPE GRECO.
A Decompression Architecture for Low Power Embedded Systems Lekatsas, H.; Henkel, J.; Wolf, W.; Computer Design, Proceedings International.
Array Synthesis in SystemC Hardware Compilation Authors: J. Ditmar and S. McKeever Oxford University Computing Laboratory, UK Conference: Field Programmable.
1 Fly – A Modifiable Hardware Compiler C. H. Ho 1, P.H.W. Leong 1, K.H. Tsoi 1, R. Ludewig 2, P. Zipf 2, A.G. Oritz 2 and M. Glesner 2 1 Department of.
Senior Project Presentation: Designers: Shreya Prasad & Heather Smith Advisor: Dr. Vinod Prasad May 6th, 2003 Internal Hardware Design of a Microcontroller.
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
Design Objectives The design should fulfill the functional requirements listed below Functional Requirements Hardware design – able to calculate transforms.
EE3A1 Computer Hardware and Digital Design
Rinoy Pazhekattu. Introduction  Most IPs today are designed using component-based design  Each component is its own IP that can be switched out for.
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
PROJECT - ZYNQ Yakir Peretz Idan Homri Semester - winter 2014 Duration - one semester.
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the FPX.
Computer Architecture 2 nd year (computer and Information Sc.)
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
November 29, 2011 Final Presentation. Team Members Troy Huguet Computer Engineer Post-Route Testing Parker Jacobs Computer Engineer Post-Route Testing.
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
Exploiting Parallelism
Greg Alkire/Brian Smith 197 MAPLD An Ultra Low Power Reconfigurable Task Processor for Space Brian Smith, Greg Alkire – PicoDyne Inc. Wes Powell.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
Teaching Digital Logic courses with Altera Technology
Custom Computing Machines for the Set Covering Problem Paper Written By: Christian Plessl and Marco Platzner Swiss Federal Institute of Technology, 2002.
 presented by- ARPIT GARG ISHU MISHRA KAJAL SINGHAL B.TECH(ECE) 3RD YEAR.
F453 Module 8: Low Level Languages 8.1: Use of Computer Architecture.
Chapter I: Introduction to Computer Science. Computer: is a machine that accepts input data, processes the data and creates output data. This is a specific-purpose.
CORDIC Based 64-Point Radix-2 FFT Processor
Introduction to the FPGA and Labs
COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE
Architecture & Organization 1
Introduction to cosynthesis Rabi Mahapatra CSCE617
Architecture & Organization 1
Control Unit Introduction Types Comparison Control Memory
Chapter 11 Processor Structure and function
Presentation transcript:

White and Gloster P741 An Implementation of the Discrete Fourier Transform on a Reconfigurable Processor By Michael J. White 1,2* and Clay Gloster, Jr., Ph.D., P.E. 1* 1 Department of Electrical & Computer Engineering Howard University 2300 Sixth Street, NW Washington, DC NASA/ Goddard Space Flight Center Code 564 Greenbelt, MD *Member, AIAA MAPLD Conference Washington, DC September 9-11, 2003

White and Gloster P742 Outline of the Presentation Introduction The Discrete Fourier Transform (DFT) A Sample Reconfigurable Processor A Floating Point DFT Core Experimental Results Conclusions and Future Work

White and Gloster P743 Introduction A reconfigurable computing (RC) system is a hardware/software data processing system that combines the flexibility of a general purpose processors with the speed of application specific processors. Several applications have been mapped onto RC systems demonstrating an order of magnitude speedup over existing solutions running on a general purpose processor. In the past, RC systems contained very limited hardware resources. As a result, few complex applications, i.e. floating point arithmetic, could benefit from the potential speedup offered by RC systems. To the knowledge of the authors, few have published papers on implementing the DFT on a Field Programmable Gate Array(FPGA) using floating point arithmetic.

White and Gloster P744 Motivation At Goddard, there is an interest in control algorithms, that in part use the DFT. These algorithm should not be constrained to require the input data to be of size 2^n. The goal is to be able to process a 512x512 floating point array in 0.01 seconds.

White and Gloster P745 Problem Statement Given: A software implementation of the DFT Find: An RC system implementation of the DFT that uses floating point arithmetic such that it : 1)fits on a single FPGA 2)can handle on the order of 1000 points 3)execute the DFT significantly faster than the software implementation 4)can compute a 2D DFT more efficiently, i.e. compute the 2D DFT of a 512x512 array in 0.01 seconds

White and Gloster P746 The Discrete Fourier Transform (DFT) The Discrete Fourier Transform(DFT) is defined as: X(k) = Σ c(n)*exp(-j*2*π*n*k/N) where: »c is the complex input sample »N is the total number of input samples »c(n) is the nth input sample »X(k) is the kth output sample

White and Gloster P747 A Sample Reconfigurable Processor Control Unit PECORE(FPGA) To Input Memory To Output Memory Data Unit DFT Function Core

White and Gloster P748 Function Core - Has one or more 32-bit inputs - Has Simple Control - Perform floating point vector operations. - Can be built using other FunCores.

White and Gloster P749 DATA and CONTROL UNIT DATA UNIT Contains a register file (8 32-bit registers) and counters for determining when vector instructions are complete. Contains several memory address registers/counters for indexing through input/output vectors. Contains up to 7 Function Cores CONTROL UNIT Manages memory read/write transactions. Initiates instruction fetch/decode/execution Determines when instruction processing is complete and turns control back over to the Host/Memory Interface. One controller handles processing for all hardware modules/instructions

White and Gloster P7410 DFT Floating Point Core DFT XREALIN XIMAGIN K DFT/IDFT ENABLE EMPTY XREALOUT XIMAGOUT READYTOEMPTY DONE INPUTS OUTPUTS –Xrealin/Ximagin are real and imaginary inputs –K output index –DFT/IDFT flag is –1 for DFT or 1 for Inverse DFT –Enable tells the FPGA to begin processing –Empty tells the FPGA the input buffer is empty –Xrealout/Ximagout are real and imaginary outputs. –Readytoempty says FPGA processing completed –Done tells the pipeline has been “flushed” and all outputs are in the buffer

White and Gloster P7411 The DFT Core Block Diagram ** THETA UNIT SIN/COS TABLE SINθ 32 COMPLEX MULTIPLY COMPLEX ACCUMLATOR COSθ 32 ADDRESS 10 Xr 32Xi 32 Yr 32 Yi 32 REALOUT IMAGOUT ENABLE SELECT DFT XREALIN XIMAGIN N K EMPTYDONE 32 10

White and Gloster P7412 Complex Multiply **** Select DFT Xr COS θ Xi COS θ Xr SIN θ Xi SIN θ SIGOUT0 SIGOUT1 XrCOSθ XiSINθ XiCOSθ XrSINθ * * Delay * *

White and Gloster P7413 Theta and Sin/Cos Units THETA UNIT SIN/COS TABLE SINθ 32 COSθ 32 ADDRESS 10 K n Counter In executing the DFT, K(output index is given), that is to say we know what frequency component we to examine. A counter is used to generate n

White and Gloster P7414 Complex Accumulator REAL ACCUMULATOR IMAGINARY ACCUMULATOR COMPLEX ACCUMULATOR Yr 32 Yi 32 REALOUT IMAGOUT 32

White and Gloster P7415 Experimental Setup VHDL Modeling and Simulation Logic Synthesis Place and Route Execute on FPGA

White and Gloster P7416 FPGA Runtime Environment Session File Definition File FPGA Board RC System General Purpose Processor Interpreter

White and Gloster P7417 Output of DFT FPGA and Simulation The graph shows the outputs of a 10 pt floating point DFT ran on the FPGA and the output of a 10 pt DFT ran on a commercially simulation tool.

White and Gloster P7418 Conclusion VHDL modeling and synthesis are completed. Place and Route tool give a max clock frequency of 13.4 MHz. and 53% of FPGA is utilizes

White and Gloster P7419 Future Work The results of FPGA implementation demonstrated an excellent correction with standard simulation tool. Next step is to perform more checks wit DFT with larger size sample blocks and find execution speed Start work on Floating Point Fast Fourier Transform

White and Gloster P7420 Acknowledgement The authors would like to thank NASA/ Goddard Space Flight Center for its support of this project. In particular, we give thanks to: Mr. Thomas Flatley and Mr. Semion Kizhner for initiating the project. Mr. Robert Kasa and Mr. Wesley Powell for their management support. Dr. John Day for providing the spark that put everything together.