VSIPL++ / FPGA Design Methodology

Slides:



Advertisements
Similar presentations
Nios Multi Processor Ethernet Embedded Platform Final Presentation
Advertisements

© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
MotoHawk Training Model-Based Design of Embedded Systems.
1 Performed By: Khaskin Luba Einhorn Raziel Einhorn Raziel Instructor: Rivkin Ina Spring 2004 Spring 2004 Virtex II-Pro Dynamical Test Application Part.
Aug. 24, 2007ELEC 5200/6200 Project1 Computer Design Project ELEC 5200/6200-Computer Architecture and Design Fall 2007 Vishwani D. Agrawal James J.Danaher.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Configurable System-on-Chip: Xilinx EDK
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.
Reliable Data Storage using Reed Solomon Code Supervised by: Isaschar (Zigi) Walter Performed by: Ilan Rosenfeld, Moshe Karl Spring 2004 Midterm Presentation.
1 Chapter 14 Embedded Processing Cores. 2 Overview RISC: Reduced Instruction Set Computer RISC-based processor: PowerPC, ARM and MIPS The embedded processor.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
1 I/O Management in Representative Operating Systems.
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
Implementation of DSP Algorithm on SoC. Characterization presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompany engineer : Emilia Burlak.
Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
Digital signature using MD5 algorithm Hardware Acceleration
Ross Brennan On the Introduction of Reconfigurable Hardware into Computer Architecture Education Ross Brennan
CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.
Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob.
1 of 23 Fouts MAPLD 2005/C117 Synthesis of False Target Radar Images Using a Reconfigurable Computer Dr. Douglas J. Fouts LT Kendrick R. Macklin Daniel.
RiceNIC: A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Dr. Scott Rixner Rice Computer Architecture:
GBT Interface Card for a Linux Computer Carson Teale 1.
Xilinx Programmable Logic Design Solutions Version 2.1i Designing the Industry’s First 2 Million Gate FPGA Drop-In 64 Bit / 66 MHz PCI Design.
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
NDA Confidential. Copyright ©2005, Nallatech.1 Implementation of Floating- Point VSIPL Functions on FPGA-Based Reconfigurable Computers Using High- Level.
집적회로 Spring 2007 Prof. Sang Sik AHN Signal Processing LAB.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
(More) Interfacing concepts. Introduction Overview of I/O operations Programmed I/O – Standard I/O – Memory Mapped I/O Device synchronization Readings:
High Performance Embedded Computing © 2007 Elsevier Chapter 1, part 2: Embedded Computing High Performance Embedded Computing Wayne Wolf.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
IEEE ICECS 2010 SysPy: Using Python for processor-centric SoC design Evangelos Logaras Elias S. Manolakos {evlog, Department of Informatics.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
1 Fly – A Modifiable Hardware Compiler C. H. Ho 1, P.H.W. Leong 1, K.H. Tsoi 1, R. Ludewig 2, P. Zipf 2, A.G. Oritz 2 and M. Glesner 2 1 Department of.
© 2004 Mercury Computer Systems, Inc. FPGAs & Software Components Graham Bardouleau & Jim Kulp Mercury Computer Systems, Inc. High Performance Embedded.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
LZRW3 Data Compression Core Dual semester project April 2013 Project part A final presentation Shahar Zuta Netanel Yamin Advisor: Moshe porian.
Part A Presentation Implementation of DSP Algorithm on SoC Student : Einat Tevel Supervisor : Isaschar Walter Accompanying engineer : Emilia Burlak The.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Final Presentation Implementation of DSP Algorithm on SoC Student : Einat Tevel Supervisor : Isaschar Walter Accompanying engineer : Emilia Burlak The.
L/O/G/O Input Output Chapter 4 CS.216 Computer Architecture and Organization.
Hardware Benchmark Results for An Ultra-High Performance Architecture for Embedded Defense Signal and Image Processing Applications September 29, 2004.
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
Connecting EPICS with Easily Reconfigurable I/O Hardware EPICS Collaboration Meeting Fall 2011.
© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
PTII Model  VHDL Codegen Verification Project Overview 1.Generate VHDL descriptions for Ptolemy models. 2.Maintain bit and cycle accuracy in implementation.
MAPLD 2005Ardini1 Demand and Penalty-Based Resource Allocation for Reconfigurable Systems with Runtime Partitioning John Ardini.
Corflow Online Tutorial Eric Chung
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
CoDeveloper Overview Updated February 19, Introducing CoDeveloper™  Targeting hardware/software programmable platforms  Target platforms feature.
April 15, 2013 Atul Kwatra Principal Engineer Intel Corporation Hardware/Software Co-design using SystemC/TLM – Challenges & Opportunities ISCUG ’13.
Automated Software Generation and Hardware Coprocessor Synthesis for Data Adaptable Reconfigurable Systems Andrew Milakovich, Vijay Shankar Gopinath, Roman.
System-on-Chip Design
Backprojection Project Update January 2002
Introduction to Programmable Logic
ENG3050 Embedded Reconfigurable Computing Systems
FPGAs in AWS and First Use Cases, Kees Vissers
Overview of Embedded SoC Systems
Introduction to cosynthesis Rabi Mahapatra CSCE617
Course Agenda DSP Design Flow.
Analysis models and design models
THE ECE 554 XILINX DESIGN PROCESS
Chapter 13: I/O Systems.
THE ECE 554 XILINX DESIGN PROCESS
♪ Embedded System Design: Synthesizing Music Using Programmable Logic
Presentation transcript:

VSIPL++ / FPGA Design Methodology Capt. Jules Bergmann, AFRL/IFTC Susan Emeny, ITT Peter Bronowicz, ITT

Overview Introduction- Designing for Hybrid Architectures Design Methodology VSIPL++ / FPGA Integration Integration and Test Applications Status Future Work Conclusions

Introduction Hybrid Computer Architectures Commercially Available FPGAs and Programmable Processors have the potential to deliver high performance Commercially Available Challenge of hybrid architectures to develop a methodology that will: Exploit their capabilities effectively while Making FPGAs accessible to a larger community of developers.

Requirements Requirements of the Methodology Hardware & Software development needs to begin early Portable from test to final system, minimal change Scalable to future technologies, minimal change Productive Concise function description for both HW & SW Streamlined interface between HW & SW Use existing hardware and software methodologies

Design Methodology Algorithm Software Hardware Integration High level exploration (Matlab) Software Scalar, C++, VSIPL++ Performance Imp. (parallel) Hardware VHDL FPGA Performance libraries Integration HW/SW Debug on commodity cluster Migrate to target system

Benefits of the Methodology Support for System Design from algorithm to embedded system Simplifies integration of hardware and software Application acceleration via pre-defined FPGA libraries Standard functions (fft) Custom

Hardware Design Methodology Portability & Scalability Use standard high level synthesis (RTL) Performance Encapsulate device specific optimizations in macro cells Productivity Standardize interface between units for data exchange Use stream interface with flow control. This model matches the way data is usually produced from sensors and requires minimal assumptions about environment

We used this board for our 1st Prototype! Commodity Hardware We used this board for our 1st Prototype! Annapolis Firebird PCI Board

Software Design Methodology VSIPL++ was chosen to achieve: Portability, the reference version compiles anywhere! Scalability, builds on existing standards i.e.. MPI Performance Allows for optimized libraries which take advantage of specific machine features (transparent to application) Allows for user defined functions (i.e. specialized or optimized FPGA functions) Productivity Express computation in a natural fashion

Co-Design Issue VSIPL++ for software & Synthesizable compose-able modules for hardware are great domain specific methodologies. Software and Hardware treat data differently VSIPL++ represents data in discrete blocks Hardware sees the data as a continuous stream Poses a problem of data exchange We developed an infrastructure to exchange data between VSIPL++ block data and the FPGA streaming data

Co-Design Solution Devise a memory adapter, a hardware unit on the FPGA, that directly accesses FPGA memory Uses DMA to place/retrieve data into/from a flow controlled data stream. Create a user defined VSIPL++ block (fpgaDense) based on the existing (Dense) block to facilitate the transfer of data to/from the FPGA memory

Memory Adapters The VSIPL++ FPGA Dense block is LAZY! Memory In INPUT Adapter Stream Out Amt Done Address Length Control Register Software Accessible Software Accessible Stream In Output Adapter Control Register Amt Done Address Length Memory Out FPGA Memory Control Data Flow The VSIPL++ FPGA Dense block is LAZY! Data is only transferred when necessary. Vectors created on the host are copied to FPGA memory only when FPGA function is initiated. Conversely data is written to host memory, only when the host requests the data

Software Representation The hardware is “Object Oriented” where each component object is described by connections. Therefore, we created software objects that directly corresponded to the hardware, also described by connections. Facilitating the co-design of hardware and software.

Arrows Represent Data Flow Hardware Model FFT Memory Bank A Input A Output B Memory Bank B Reads from Bank A Reads from Input A Writes to Output B Writes to Bank B Function Arrows Represent Data Flow

Hardware Block Diagram Mem Bank 0 Mem Bank 1 Mem Bank 3 Mem Bank 2 VM FFT IFFT WMem Acc Cntrl RMem REQ AKK D_OUT ADDR D_IN Read Mem Bank 0 Cntrl Reg: 0x5000 1.) Length Read 2.) Start Addr 3.) Start Length 4.) Start Bit D_VAL Write Mem Bank 1 Cntrl Reg: 0x5200 1.) Length Wrote Read Mem Bank 2 Cntrl Reg: 0x5A00 Write Mem Bank 3 Cntrl Reg: 0x5C00 Write Mem Bank 2 Cntrl Reg: 0x5800 Cntrl Reg: 0x5400 Cntrl Reg: 0x5600 On the Board On the FPGA (P.E.)

Link between Host/FPGA memory Software Model Function Object Output Object Input Object Memory Bank FPGA AddrType Link between Host/FPGA memory

The Models FFT Input A Output B Function Memory Bank A Memory Bank B Reads from Bank A Reads from Input A Writes to Output B Writes to Bank B Function Hardware Software Function Object Output Object Input Object Memory Bank FPGA AddrType

About the Models Both models, each object is described connections Functions are described by I/O connections (data flow) I/O is described by memory connections Or another I/O (an output can feed an input) Memory is described by the board and the processing element it is connected to.

Integrating VSIPL++ & FPGA A description of each hardware object is contained in a Configuration file. The system software reads the configuration file and creates a software object for each hardware object. The application simply calls the “function”, the system will “apply” the function. Move the data (if necessary) & initiate the operation.

VSIPL++ / FPGA Interface FPGA Board FPGA Memory Banks HOST Configuration File FPGA Board # #of FPGAs Clock speed Image File Memory #of banks Size of banks Addressing Function(s) One or more function defs Input Adapter(s) Output Adapter(s) VSIPL++/FPGA Application VSIPL++ Application VSIPL++ Reference Library X86 Image File VSIPL++/FPGA Application Build VSIPL++ FPGA Lib

Configuration File SAMPLE #Configuration File PE = 0, 20 # Processing Element, processing speed mHz CoreFileName=pe0.x86 # Filename of Binary code to program fpga with #Memory Bank Definition #Name, PE#, Size in bytes, DMA Write Port, DMA Read Port, radix assumed to be hex) MemBank=Bank0, 0, 400000, 1000, 1200 # 4194304 bytes MemBank=Bank1, 0, 400000, 2000, 2200 MemBank=Bank2, 0, 400000, 3000, 3200 MemBank=Bank3, 0, 400000, 4000, 4200 #Function Definition FN=PC, input=R0, input=R1, input=R2, input=R3, Output=W0, Output=W1, Output=W2, size=256 FN=FFT0,input=R0, Output=W0,Size=256 FN=VMUL, input=R2, input=R1, output=W1, size=256 FN=IFFT,input=R3, Output=W2,Size=256

Config File Cont. #Input Adapter Defintion #Name, Control Register Address, Name of assoc. memory bank, radix assumed to be hex INPUT=R0, 5000, Bank0 INPUT=R1, 5400, Bank1 INPUT=R2, 5600, Bank1, persistant INPUT=R3, 5A00, Bank2 #Output Adapter Definition #Name, Control Register Address, Name of assoc. memory bank, radix assumed to be hex Output=W0, 5200, Bank1 Output=W1, 5800, Bank2 Output=W2, 5C00, Bank3

Integration & Test Unit & System Level Virtual Breadboarding As individual hardware units are developed, they can be immediately integrated into the software for testing, even before all components are complete. Components can be integrated individually or grouped as sub-systems. Virtual Breadboarding For larger systems, functions can be spread across multiple FPGAs. Hardware in the loop Tightly coupled hardware/software can be easily constructed.

Applications Currently employing this technology in two applications: Spaced Based Radar Embedded System design With this method, a virtual bread board of the MTI/SAR system can be developed and tested on AFRL’s hybrid cluster long before the actual hardware becomes available. (Not scheduled to fly until 2008). Rstream signal processing library for application acceleration. Develop a set of common Signal Processing functions which will be used in Space Based Radar. Goal is to provide a library that can be used with VSIPL++

Status Prototype implementation completed Annapolis Firebird Card using Xilinx VirtexE FPGAs Preliminary experiments on XILINX Virtex-II Pro which integrates a hybrid system, PowerPc processor

Future Work It is anticipated, that opportunities will arise as we gain experience with the streaming protocol Possibly additional streaming objects (fifo’s that logically bypass the fpga memory?) Computer Aided Design Auto generate the configuration file Auto merge of units that will be directly connected, optimizing redundant interfaces. Investigate areas outside of signal processing

Conclusions The integration of VSIPL++ for software design with compose-able hardware design, provides a powerful design methodology for building hybrid hardware/software systems. VSIPL++’s performance, portability and productivity provide a growth path for parallel performance and hardware acceleration. The Stream hardware interface provides a simple mechanism by which hardware units can be composed together forming larger more complex units

Implementation Slides Follow Addendum Implementation Slides Follow

Input Hardware Software Input Responsibilities Control Data Flow Memory In INPUT Adapter Stream Out Amt Done Address Length Control Register Software Accessible Input Object Logical Interface Control Register P.E. # Board # Ctl Reg Adr FPGA AddrType Input Responsibilities Control Data Flow Initiate Host memory to FPGA memory data transfers Initiate the data flow from FPGA memory to the function block

Output Hardware Software Output Responsibilities Control data flow Software Accessible Stream In Output Adapter Control Register Amt Done Address Length Output Object Logical Interface Control Register P.E. # Board # Ctl Reg Adr FPGA AddrType Output Responsibilities Control data flow Initiate FPGA memory to Host memory data transfers Initiate the data flow to FPGA memory from the function block

Memory Bank Responsibilities Manage 1 FPGA memory bank Maintain Total size Maintains Available Free memory Honors requests for memory allocate free FPGA AddrType Object is I/O adapters “window” to the Memory Bank Host/FPGA memory transfers I/O Window to Memory Memory Bank FPGA AddrType Logical Interface Logical Interface Pointer Pointer Bytes Free Next Address Size DMA read DMA write Size State Dirty Flag Host Address Mem Address

Function Object Responsibilities Instantiate Input Objects Function Function Size Function Object Logical Interface P.E. # Board # Mem. Size Input Object(s) Output Object(s) Responsibilities Instantiate Input Objects Instantiate Output Objects Activate/Halt Input Data Stream Activate/Halt Output Data Stream Function

Object Relationships Core Mgr. Function Memory Input Output Memory FPGA AddrType FPGA AddrType

Arrows Represent Data Flow Software Design Input A Output A Mem Bank A Mem Bank B Mem Bank C FFT Output B PULSE COMPRESS VM IFFT Input D Input B Input C Mem Bank D Output C Arrows Represent Data Flow

Hardware Design PULSE COMPRESS FFT IFFT 1000 Memory 1200 2000 2200 Data In READ Address Length Start Amt Done Data Out Control Register WRITE 256 IFFT 4000 4200 PULSE COMPRESS