Presentation is loading. Please wait.

Presentation is loading. Please wait.

FPGA-based Supercomputers

Similar presentations


Presentation on theme: "FPGA-based Supercomputers"— Presentation transcript:

1 FPGA-based Supercomputers
FPGA Boards and FPGA-based Supercomputers

2 Resources PCI PCI-X Reconfigurable Supercomputing
PCI-X Reconfigurable Supercomputing T. El-Ghazawi, K. Gaj, D. Buell, D. Pointer Tutorial at the Supercomputing 2005 conference

3 FPGA Device Capacity Trends
Virtex-5 550 MHz 24M gates* Virtex-II Pro 450 MHz 8M gates* Virtex-4 500 MHz 16M gates* Virtex-II 450 MHz 8M gates Spartan-3 326 MHz 5M gates Virtex-E 240 MHz 4M gates Xilinx Device Complexity Virtex 200 MHz 1M gates XC4000 100 MHz 250K gates Spartan-II 200 MHz 200K gates Spartan 80 MHz 40K gates XC3000 85 MHz 7.5K gates XC5200 50 MHz 23K gates XC2000 50 MHz 1K gates 1985 1987 1991 1995 1998 1999 2000 2002 2003 2004 2006 Year Source:

4 FPGA Boards

5 General Architecture of an FPGA-Based Board
BUS Processing Element (PE#0) (PE#1) (PE#N-1) COMMON MEMORY / INTERCONNECT NETWORK LOCAL MEMORY CLK BUS INTERFACE CONTROLLER I/O CARD

6 Reconfigurable Computing Boards (Accelerators)
Boards may have one or several interconnected FPGA chips Support different bus standards, e.g. PCI, PCI-X, VME May have direct real-time data I/O through a daughter board Boards may have local onboard memory (OBM) to handle large data while avoiding the system bus (e.g. PCI) bottleneck

7 Reconfigurable Computing Boards (Accelerators)
Many boards per node can be supported Host program (e.g. C) to interface user (and mP) with board via a board API Driver API functions may include functionalities such as Reset, Open, Close, Set Clocks, DMA, Read, Write, Download Configurations, Interrupt, Readback

8 PCI = Peripheral Component Interconnect
Common Interface - PCI PCI = Peripheral Component Interconnect 64-bit bus 32-bit bus

9 PCI - Conventional hardware specifications
32-bit or 64-bit bus width 33.33 MHz clock with synchronous transfers peak transfer rate of 133 MB per second for 32-bit bus width (33.33 MHz × 32 bits × (1 byte ÷ 8 bits) = 133 MB/s) peak transfer rate of 266MB/s for 64-bit bus width 32-bit address space (4 gigabytes) 32-bit port space (now deprecated) 5-volt signaling

10 PCI-X (PCI eXtended) PCI-X doubles the width to 64-bit, revises the protocol, and increases the maximum signaling frequency to 133 MHz (peak transfer rate of 1014 MB/s) PCI-X 2.0 permits a 266 MHz rate (peak transfer rate of 2035 MB/s) and also 533 MHz rate, expands the configuration space to 4096 bytes, adds a 16-bit bus variant and allows for 1.5 volt signaling

11 Some Reconfigurable Boards Vendors
ANNAPOLIS MICRO SYSTEMS, INC. ( University of Southern California -USC/ISI ( AMONTEC ( XESS Corporation ( CELOXICA ( CESYS ( TRAQUAIR ( SILICON SOFTWARE: ( COMPAQ: ( ALPHA DATA: ( Associated Professional Systems: ( NALLATECH: (

12 Representative Example Boards From Annapolis Micro Systems (AMI) & Nallatech

13 ZBT, zero bus turnaround memory, no idle cycles between read-to-write and write-to-read
Source: [AMS02]

14 Source: [AMS02]

15 WILDSTAR™ II Pro Reproduced and displayed with permission

16 WILDSTAR™ II Pro QDR: up to 400 MHz (typically 133 MHz)
Each chip has six banks of up to 8 MB/bank, 48/chip Rocket I/O 3.2 Gbps Differential are parallel so, speed is how many and at what clock you run them Differential pairs have higher noise immunity Reproduced and displayed with permission

17 Nallatech's BenNUEY-PCI-4E
Up to 7 VII Pros, 6 are for the DIME-II modular architecture, and intercard communication through Rapid I/O, all to PCI

18 Reconfigurable Supercomputers

19 Scalable Reconfigurable Systems
Large numbers of reconfigurable processors and microprocessors Everything can be configured Functional units Interconnects Interfaces High-level of scalability Suitable for a wide range of applications Everything can be reconfigured over and over at run time (Run-Time Reconfiguration) to suite underlying applications Can be easily programmed by application scientists, at least in the same way of programming conventional parallel computers

20 Early Reconfigurable Architecture
Interface P memory . . . I/O FPGA Microprocessor system Reconfigurable system

21 Current Reconfigurable Architecture
P FPGA FPGA P . . . P memory P memory FPGA memory FPGA memory Shared Memory and or NIC

22 Possible Classes of Reconfigurable Supercomputers
μP N RP 1 RP N Independent Board Design μP Board RP Board μP 1 μP N RP 1 RP N Joint Board Design Joint μP/RP Board Tighter Integration

23 Possible Classes of Reconfigurable Supercomputers – cont.
μP N μP inside of RP Design RP 1 RP N Joint μP/RP Board RP inside of μP Design RP 1 RP N μP 1 μP N Joint μP/RP Board Tighter Integration

24 FPGA based supercomputers
Machine Released SRC 6 from SRC Computers Cray XD1 from from Cray SGI Altix from SGI SRC 7 from SRC Computers, Inc, 2002 2005 2006

25 How to choose the system that
best suits your needs? Typical users’ criteria: 1. Clock speed 2. Amount of memory 3. Cost of Ownership

26 How to choose the system that
best suits your needs? Recommended users’ criteria: Tools - right level of abstraction - ease of development & verification - progress & backward compatibility 2. Libraries - basic operations - examples of full applications 3. Technical support

27 How to choose the system that Reconfigurable Processor System
best suits your needs? Recommended users’ criteria (cont.): 4. Data Bandwidth Reconfigurable Processor System P system external I/O devices

28 How to choose the system that
best suits your needs? Recommended users’ criteria (cont.): 5. Scalability - variable power and price - efficient communication among the modules

29 Recommended users’ criteria (cont.):
6. Transfer of control overhead Theoretical behavior Actual behavior P FPGA P FPGA Control transfer overhead time

30 7. Reconfiguration overhead
P FPGA P FPGA P FPGA Reconf A Reconf A Reconf A Task A Task A Task A Reconf B Task A Reconf B Task B Task B Reconf C Task A Task C Reconf C Task C

31 7. Reconfiguration overhead (cont.)
P FPGA 1 FPGA 2 Reconf A Reconf B Task A Reconf C Task B Task C

32 Recommended users’ criteria (cont.):
8. Number of FPGAs & number of microprocessors 9. Clock speed - maximum - variable vs. fixed 10. Amount of memory

33 Programming Reconfigurable Computers

34 SRC Programming Model Microprocessor FPGA VHDL ANSI C MAP C
Libraries of macros function_1 macro_ macro_2 macro_ macro_4 ………………………. main.c macro_1(a, b, c) macro_2(b, d) macro_2(c, e) function_1() function_2() VHDL FPGA function_2 I/O a macro_3(s, t) macro_1(n, b) macro_4(t, k) Macro_1 ANSI C b c Macro_2 Macro_2 MAP C (subset of ANSI C) d e I/O

35 SRC Program Partitioning
C function for P P system HLL C function for MAP FPGA system VHDL macro HDL

36 SRC Compilation Process
Application sources Macro sources .c or .f files .mc or .mf files . . vhd or or .v files HDL HDL sources sources Logic synthesis Logic synthesis .v files .v files P Compiler MAP Compiler Netlists . . ngo ngo files files Object .o files .o files files Place & Route Place & Route Linker Linker .bin files .bin files Configuration Application bitstreams executable

37 Star Bridge Programming Environment - Viva
Sheets Library Object

38 Star Bridge Compilation Process
User input Graphical User Interface Netlists .ngo files Xilinx VIVA Place & Route .bin files Configuration bitstreams Application executable

39 Cray XD1 Programming Flows
The MathWorks int mask (a, m) Mitrion-C { return (a & m); } MATLAB/ Simulink High-level Flow Synthesis Xilinx Mitrion process (a, m) is System Generator begin VHDL, z <= a and m; Verilog end process; VHDL or Verilog VHDL/Verilog Synthesis Mentor Graphics Gate-level EDIF Synopsys a z m Synplicity Xilinx Standard Flow Xilinx Place & Route Source: [Cray, MAPLD05]

40 Xtreme DSP Design Flow

41 HDL-based SGI Altix Programming Flow
Design iterations Design Verification Design Entry (Verilog, VHDL) .v, .vhd .v, .vhd Behavioral Simulation (VCS, Modelsim) IA-32 Linux Machine Design Synthesis (Synplify Pro, Amplify) .v, .vhd .edf Metadata Processing (Python) Design Implementation (ISE) .ncd, .pcf Static Timing Analysis (ISE Timing Analyzer) .cfg .bin Altix Device Programming (RASC Abstraction Layer, Device Manager, Device Driver) Real-time Verification (gdb) .c

42 HLL-based SGI Altix Programming Flow
HLL Design Entry (Handel-C, Mitrion C, Viva) Design Verification RTL Generation and Integration with Core Services .v, .vhd Behavioral Simulation (VCS, Modelsim) .v, .vhd IA-32 Linux Machine .v, .vhd Design Synthesis (Synplify Pro, Amplify) Metadata Processing (Python) .edf Static Timing Analysis (ISE Timing Analyzer) .ncd, .pcf Design Implementation (ISE) .cfg .bin Altix Device Programming (RASC Abstraction Layer, Device Manager, Device Driver) Real-time Verification (gdb) .c

43 Processor Architecture
Mitrion-C Programming Model for Cray & SGI Microprocessor FPGA Mitrion Distributed Processor Architecture (platform dependent) Application code (platform independent) VHDL main.c Mitrion-C Mitrion Compiler & Configurator function_1(in1) start_fpga() function_1(in2) start_fpga() FPGA RAM ANSI C based on Mitrion API application on the distributed processor Input & output I/O

44 Increased capability to describe
Program Entry for FPGA Accelerator Boards Graphical Data Flow Diagram HDL HLL Software Traditional Hardware Software Extended (e.g. Corefire) Hardware Increased productivity Increased capability to describe parallel execution

45 Program Entry for Reconfigurable Computers
HLL HDL Graphical Data Flow Diagram Software Star Bridge COM objects Hardware porting EDIF Software SRC Hardware HDL macros Increased productivity Increased capability to describe parallel execution

46 Program Entry for Reconfigurable Computers
HLL HDL Graphical Data Flow Diagram Cray XD1 with Simulink Software Simulink Hardware Xilinx System Generator SGI or Cray with Mitrion Software Mitrion Processor Hardware Mitrion-C Increased productivity Increased capability to describe parallel execution


Download ppt "FPGA-based Supercomputers"

Similar presentations


Ads by Google