FPGA-based Supercomputers

Slides:



Advertisements
Similar presentations
VHDL Design of Multifunctional RISC Processor on FPGA
Advertisements

© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
George Mason University ECE 448 – FPGA and ASIC Design with VHDL FPGA Platforms High Level Language (HLL) Design Flows ECE 448 Lecture 21.
Aug. 24, 2007ELEC 5200/6200 Project1 Computer Design Project ELEC 5200/6200-Computer Architecture and Design Fall 2007 Vishwani D. Agrawal James J.Danaher.
Configurable System-on-Chip: Xilinx EDK
FPGA BASED IMAGE PROCESSING Texas A&M University / Prairie View A&M University Over the past few decades, the improvements from machine language to objected.
Porting EDIF to Viva Sreesa Akella, Heather Wake, Duncan Buell, James P. Davis Department of Computer Science and Engineering, University of South Carolina.
ECE 699: Lecture 2 ZYNQ Design Flow.
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
Foundation and XACTstepTM Software
1 Chapter 7 Design Implementation. 2 Overview 3 Main Steps of an FPGA Design ’ s Implementation Design architecture Defining the structure, interface.
LOGO “ Add your company slogan ” Comparative analysis of High Level Programming for Reconfigurable Computers: Methodology and Empirical Study Wen-qian.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
FPGA Based Fuzzy Logic Controller for Semi- Active Suspensions Aws Abu-Khudhair.
© 2011 Xilinx, Inc. All Rights Reserved Intro to System Generator This material exempt per Department of Commerce license exception TSU.
© 2011 Xilinx, Inc. All Rights Reserved This material exempt per Department of Commerce license exception TSU Xilinx Tool Flow.
Delevopment Tools Beyond HDL
Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf
Xilinx at Work in Hot New Technologies ® Spartan-II 64- and 32-bit PCI Solutions Below ASSP Prices January
Ross Brennan On the Introduction of Reconfigurable Hardware into Computer Architecture Education Ross Brennan
Experimental Performance Evaluation For Reconfigurable Computer Systems: The GRAM Benchmarks Chitalwala. E., El-Ghazawi. T., Gaj. K., The George Washington.
George Mason University ECE 448 – FPGA and ASIC Design with VHDL High Level Language (HLL) Design Flow Reconfigurable Supercomputers ECE 448 Lecture 21.
ISE. Tatjana Petrovic 249/982/22 ISE software tools ISE is Xilinx software design tools that concentrate on delivering you the most productivity available.
Performance and Overhead in a Hybrid Reconfigurable Computer O. D. Fidanci 1, D. Poznanovic 2, K. Gaj 3, T. El-Ghazawi 1, N. Alexandridis 1 1 George Washington.
SLAAC Hardware Status Brian Schott Provo, UT September 1999.
Silicon Graphics, Inc. Re-Configurable Application Specific Computing (RASC/FPGA) David Alexander Director of Engineering.
Ch.9 CPLD/FPGA Design TAIST ICTES Program VLSI Design Methodology Hiroaki Kunieda Tokyo Institute of Technology.
1 of 23 Fouts MAPLD 2005/C117 Synthesis of False Target Radar Images Using a Reconfigurable Computer Dr. Douglas J. Fouts LT Kendrick R. Macklin Daniel.
ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.
Parallel Computing Using FPGA ( Field Programmable Gate Arrays ) 15 th May, 2009 Studies in Parallel & Distributed Systems – Sohaib Ahmed.
Xilinx Development Software Design Flow on Foundation M1.5
© 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Xilinx Design Flow FPGA Design Flow Workshop.
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
Heng Tan Ronald Demara A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management.
Slide 1 Starbridge Viva™ Starbridge Solutions to Supercomputing Problems Reconfigurable Systems Summer Institute Esmail Chitalwala Starbridge Customer.
Efficient Implementation of a String Matching Algorithm for SRC and Cray Reconfigurable Computers Esam El-Araby 1, Mohamed Taher 1, Tarek El-Ghazawi 1,
J. Christiansen, CERN - EP/MIC
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
1 Fly – A Modifiable Hardware Compiler C. H. Ho 1, P.H.W. Leong 1, K.H. Tsoi 1, R. Ludewig 2, P. Zipf 2, A.G. Oritz 2 and M. Glesner 2 1 Department of.
Implementation of Image Processing Kernels on SRC and SGI Reconfigurable Computers Esam El-Araby 1, Mohamed Taher 1, Tarek El-Ghazawi 1, and Kris Gaj 2.
George Mason University ECE 448 – FPGA and ASIC Design with VHDL Lecture 18 FPGA Boards & FPGA-based Supercomputers High Level Language (HLL) Design Methodology.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Part A Presentation Implementation of DSP Algorithm on SoC Student : Einat Tevel Supervisor : Isaschar Walter Accompanying engineer : Emilia Burlak The.
Computer Organization & Assembly Language © by DR. M. Amer.
Introductory project. Development systems Design Entry –Foundation ISE –Third party tools Mentor Graphics: FPGA Advantage Celoxica: DK Design Suite Design.
Algorithm and Programming Considerations for Embedded Reconfigurable Computers Russell Duren, Associate Professor Engineering And Computer Science Baylor.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Computer Engineering 1502 Advanced Digital Design Professor Donald Chiarulli Computer Science Dept Sennott Square
A Monte Carlo Simulation Accelerator using FPGA Devices Final Year project : LHW0304 Ng Kin Fung && Ng Kwok Tung Supervisor : Professor LEONG, Heng Wai.
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
Introduction to FPGA Tools
Tools - LogiBLOX - Chapter 5 slide 1 FPGA Tools Course The LogiBLOX GUI and the Core Generator LogiBLOX L BX.
1 Chapter 2 Central Processing Unit. 2 CPU The "brain" of the computer system is called the central processing unit. Everything that a computer does is.
Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May – 9 June 2007 Javier.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Survey of Reconfigurable Logic Technologies
Cray XD1 Reconfigurable Computing for Application Acceleration.
Introduction to High-Level Synthesis ECE 699: Lecture 12.
High-Level Synthesis.
Programmable Logic Devices
Programmable Hardware: Hardware or Software?
Reconfigurable Computing
Course Agenda DSP Design Flow.
Embedded systems, Lab 1: notes
Star Bridge Systems, Inc.
THE ECE 554 XILINX DESIGN PROCESS
Digital Designs – What does it take
THE ECE 554 XILINX DESIGN PROCESS
Xilinx Alliance Series
Presentation transcript:

FPGA-based Supercomputers FPGA Boards and FPGA-based Supercomputers

Resources PCI PCI-X Reconfigurable Supercomputing http://en.wikipedia.org/wiki/Peripheral_Component_Interconnect PCI-X http://en.wikipedia.org/wiki/PCI-X Reconfigurable Supercomputing T. El-Ghazawi, K. Gaj, D. Buell, D. Pointer Tutorial at the Supercomputing 2005 conference http://hpcl.seas.gwu.edu/openfpga/tutorial_html/index.html

FPGA Device Capacity Trends Virtex-5 550 MHz 24M gates* Virtex-II Pro 450 MHz 8M gates* Virtex-4 500 MHz 16M gates* Virtex-II 450 MHz 8M gates Spartan-3 326 MHz 5M gates Virtex-E 240 MHz 4M gates Xilinx Device Complexity Virtex 200 MHz 1M gates XC4000 100 MHz 250K gates Spartan-II 200 MHz 200K gates Spartan 80 MHz 40K gates XC3000 85 MHz 7.5K gates XC5200 50 MHz 23K gates XC2000 50 MHz 1K gates 1985 1987 1991 1995 1998 1999 2000 2002 2003 2004 2006 Year Source: http://class.ece.iastate.edu/cpre583/lectures/Lect-01.ppt

FPGA Boards

General Architecture of an FPGA-Based Board BUS Processing Element (PE#0) (PE#1) (PE#N-1) COMMON MEMORY / INTERCONNECT NETWORK LOCAL MEMORY CLK BUS INTERFACE CONTROLLER I/O CARD

Reconfigurable Computing Boards (Accelerators) Boards may have one or several interconnected FPGA chips Support different bus standards, e.g. PCI, PCI-X, VME May have direct real-time data I/O through a daughter board Boards may have local onboard memory (OBM) to handle large data while avoiding the system bus (e.g. PCI) bottleneck

Reconfigurable Computing Boards (Accelerators) Many boards per node can be supported Host program (e.g. C) to interface user (and mP) with board via a board API Driver API functions may include functionalities such as Reset, Open, Close, Set Clocks, DMA, Read, Write, Download Configurations, Interrupt, Readback

PCI = Peripheral Component Interconnect Common Interface - PCI PCI = Peripheral Component Interconnect 64-bit bus 32-bit bus

PCI - Conventional hardware specifications 32-bit or 64-bit bus width 33.33 MHz clock with synchronous transfers peak transfer rate of 133 MB per second for 32-bit bus width (33.33 MHz × 32 bits × (1 byte ÷ 8 bits) = 133 MB/s) peak transfer rate of 266MB/s for 64-bit bus width 32-bit address space (4 gigabytes) 32-bit port space (now deprecated) 5-volt signaling

PCI-X (PCI eXtended) PCI-X doubles the width to 64-bit, revises the protocol, and increases the maximum signaling frequency to 133 MHz (peak transfer rate of 1014 MB/s) PCI-X 2.0 permits a 266 MHz rate (peak transfer rate of 2035 MB/s) and also 533 MHz rate, expands the configuration space to 4096 bytes, adds a 16-bit bus variant and allows for 1.5 volt signaling

Some Reconfigurable Boards Vendors ANNAPOLIS MICRO SYSTEMS, INC. (www.annapmicro.com) University of Southern California -USC/ISI (http://www.east.isi.edu). AMONTEC (www.amontec.com/chameleon.shtml) XESS Corporation (www.xess.com) CELOXICA (www.celoxica.com) CESYS (www.cesys.com) TRAQUAIR (www.traquair.com) SILICON SOFTWARE: (www.silicon-software.com) COMPAQ: (www.research.compaq.com/SRC/pamette/) ALPHA DATA: (www.alpha-data.com) Associated Professional Systems: (www.associatedpro.com) NALLATECH: (www.nallatech.com)

Representative Example Boards From Annapolis Micro Systems (AMI) http://www.annapmicro.com & Nallatech http://www.nallatech.com

ZBT, zero bus turnaround memory, no idle cycles between read-to-write and write-to-read Source: [AMS02]

Source: [AMS02]

WILDSTAR™ II Pro Reproduced and displayed with permission

WILDSTAR™ II Pro QDR: up to 400 MHz (typically 133 MHz) Each chip has six banks of up to 8 MB/bank, 48/chip Rocket I/O 3.2 Gbps Differential are parallel so, speed is how many and at what clock you run them Differential pairs have higher noise immunity Reproduced and displayed with permission

Nallatech's BenNUEY-PCI-4E Up to 7 VII Pros, 6 are for the DIME-II modular architecture, and intercard communication through Rapid I/O, all to PCI

Reconfigurable Supercomputers

Scalable Reconfigurable Systems Large numbers of reconfigurable processors and microprocessors Everything can be configured Functional units Interconnects Interfaces High-level of scalability Suitable for a wide range of applications Everything can be reconfigured over and over at run time (Run-Time Reconfiguration) to suite underlying applications Can be easily programmed by application scientists, at least in the same way of programming conventional parallel computers

Early Reconfigurable Architecture Interface P memory . . . I/O FPGA Microprocessor system Reconfigurable system

Current Reconfigurable Architecture P FPGA FPGA P . . . P memory P memory FPGA memory FPGA memory Shared Memory and or NIC

Possible Classes of Reconfigurable Supercomputers … μP N RP 1 … RP N Independent Board Design μP Board RP Board μP 1 … μP N RP 1 … RP N Joint Board Design Joint μP/RP Board Tighter Integration

Possible Classes of Reconfigurable Supercomputers – cont. … μP N μP inside of RP Design RP 1 RP N Joint μP/RP Board RP inside of μP Design RP 1 … RP N μP 1 μP N Joint μP/RP Board Tighter Integration

FPGA based supercomputers Machine Released SRC 6 from SRC Computers Cray XD1 from from Cray SGI Altix from SGI SRC 7 from SRC Computers, Inc, 2002 2005 2006

How to choose the system that best suits your needs? Typical users’ criteria: 1. Clock speed 2. Amount of memory 3. Cost of Ownership

How to choose the system that best suits your needs? Recommended users’ criteria: Tools - right level of abstraction - ease of development & verification - progress & backward compatibility 2. Libraries - basic operations - examples of full applications 3. Technical support

How to choose the system that Reconfigurable Processor System best suits your needs? Recommended users’ criteria (cont.): 4. Data Bandwidth Reconfigurable Processor System P system external I/O devices

How to choose the system that best suits your needs? Recommended users’ criteria (cont.): 5. Scalability - variable power and price - efficient communication among the modules

Recommended users’ criteria (cont.): 6. Transfer of control overhead Theoretical behavior Actual behavior P FPGA P FPGA Control transfer overhead time

7. Reconfiguration overhead P FPGA P FPGA P FPGA Reconf A Reconf A Reconf A Task A Task A Task A Reconf B Task A Reconf B Task B Task B Reconf C Task A Task C Reconf C Task C

7. Reconfiguration overhead (cont.) P FPGA 1 FPGA 2 Reconf A Reconf B Task A Reconf C Task B Task C

Recommended users’ criteria (cont.): 8. Number of FPGAs & number of microprocessors 9. Clock speed - maximum - variable vs. fixed 10. Amount of memory

Programming Reconfigurable Computers

SRC Programming Model Microprocessor FPGA VHDL ANSI C MAP C Libraries of macros function_1 macro_1 macro_2 macro_3 macro_4 ………………………. main.c macro_1(a, b, c) macro_2(b, d) macro_2(c, e) function_1() function_2() VHDL FPGA function_2 I/O a macro_3(s, t) macro_1(n, b) macro_4(t, k) Macro_1 ANSI C b c Macro_2 Macro_2 MAP C (subset of ANSI C) d e I/O

SRC Program Partitioning C function for P P system HLL C function for MAP FPGA system VHDL macro HDL

SRC Compilation Process Application sources Macro sources .c or .f files .mc or .mf files . . vhd or or .v files HDL HDL sources sources Logic synthesis Logic synthesis .v files .v files  P Compiler MAP Compiler Netlists . . ngo ngo files files Object .o files .o files files Place & Route Place & Route Linker Linker .bin files .bin files Configuration Application bitstreams executable

Star Bridge Programming Environment - Viva Sheets Library Object

Star Bridge Compilation Process User input Graphical User Interface Netlists .ngo files Xilinx VIVA Place & Route .bin files Configuration bitstreams Application executable

Cray XD1 Programming Flows The MathWorks int mask (a, m) Mitrion-C { return (a & m); } MATLAB/ Simulink High-level Flow Synthesis Xilinx Mitrion process (a, m) is System Generator begin VHDL, z <= a and m; Verilog end process; VHDL or Verilog VHDL/Verilog Synthesis Mentor Graphics Gate-level EDIF Synopsys a z m Synplicity Xilinx Standard Flow Xilinx Place & Route 01001011010101 01010110101001 01000101011010 10100101010101 Source: [Cray, MAPLD05]

Xtreme DSP Design Flow

HDL-based SGI Altix Programming Flow Design iterations Design Verification Design Entry (Verilog, VHDL) .v, .vhd .v, .vhd Behavioral Simulation (VCS, Modelsim) IA-32 Linux Machine Design Synthesis (Synplify Pro, Amplify) .v, .vhd .edf Metadata Processing (Python) Design Implementation (ISE) .ncd, .pcf Static Timing Analysis (ISE Timing Analyzer) .cfg .bin Altix Device Programming (RASC Abstraction Layer, Device Manager, Device Driver) Real-time Verification (gdb) .c

HLL-based SGI Altix Programming Flow HLL Design Entry (Handel-C, Mitrion C, Viva) Design Verification RTL Generation and Integration with Core Services .v, .vhd Behavioral Simulation (VCS, Modelsim) .v, .vhd IA-32 Linux Machine .v, .vhd Design Synthesis (Synplify Pro, Amplify) Metadata Processing (Python) .edf Static Timing Analysis (ISE Timing Analyzer) .ncd, .pcf Design Implementation (ISE) .cfg .bin Altix Device Programming (RASC Abstraction Layer, Device Manager, Device Driver) Real-time Verification (gdb) .c

Processor Architecture Mitrion-C Programming Model for Cray & SGI Microprocessor FPGA Mitrion Distributed Processor Architecture (platform dependent) Application code (platform independent) VHDL main.c Mitrion-C Mitrion Compiler & Configurator function_1(in1) start_fpga() function_1(in2) start_fpga() FPGA RAM ANSI C based on Mitrion API application on the distributed processor Input & output I/O

Increased capability to describe Program Entry for FPGA Accelerator Boards Graphical Data Flow Diagram HDL HLL Software Traditional Hardware Software Extended (e.g. Corefire) Hardware Increased productivity Increased capability to describe parallel execution

Program Entry for Reconfigurable Computers HLL HDL Graphical Data Flow Diagram Software Star Bridge COM objects Hardware porting EDIF Software SRC Hardware HDL macros Increased productivity Increased capability to describe parallel execution

Program Entry for Reconfigurable Computers HLL HDL Graphical Data Flow Diagram Cray XD1 with Simulink Software Simulink Hardware Xilinx System Generator SGI or Cray with Mitrion Software Mitrion Processor Hardware Mitrion-C Increased productivity Increased capability to describe parallel execution