Download presentation
Presentation is loading. Please wait.
Published byLynette Holmes Modified over 8 years ago
1
Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) cdulmer@sandia.gov Pete Dean R&D Seminar December 11, 2003
2
FPGAs are promising… But what’s the catch? There are three main challenges that need to be addressed in order to apply to practical, scientific computing.
3
RC Challenge #1: Floating Point Most FPGAs fine grained Floating point units are large –32b FP occupies ~1,000 CLBs –Commercial capacity improving 2000: 6,000 CLBs 2003: 40,000 CLBs (Max: 220,000) Keith Underwood at Sandia/NM –LDRD: Working on high-speed 64b floating-point cores 32b FP in Xilinx V2P7
4
RC Challenge #2: Design Tools Hardware design is non-trivial –Micromanage computations, clock-by-clock –Not appropriate for most scientists –Need languages, APIs that are easy to use Maya Gokhale at LANL –Streams-C: C-like language for HW design –Pipeline/unroll loops –Schedules access to external memory
5
RC Challenge #3: High-speed I/O FPGAs have large internal computational power –How do we get data into/out of FPGA? –How do we connect to our existing HPC machines? Mitch Sukalski, David Thompson, Craig Ulmer –LDRD: Connect FPGAs to high-performance SANs FPGA
6
Outline Where we have been Networking FPGAs using external NI cards Where we are going Networking FPGAs using internal transceivers Project status Early details
7
Previous Work Where we’ve been..
8
Networking Earlier FPGAs Previous generation of FPGAs were like blank ASICs –Configurable logic and pins Attach a network card to an FPGA card –Communication over PCI Examples: –Virginia Tech:Myrinet –Washington U. in St. Louis: ATM (inline) –Clemson University: Gigabit Ethernet –Georgia Tech: Myrinet CPU FPGA NIC PCI Bus
9
GRIM Project at Georgia Tech Add multimedia devices to cluster –Message layer connects CPUs, memory, and peripherals –Myrinet between hosts, PCI within hosts Celoxica RC-1000 FPGA –Virtex FPGA (1M logic gates) –Four SRAM banks –PCI w/ PMC SRAM 0 SRAM 1 SRAM 2 SRAM 3 PCI FPGA Control & Switching CPU FPGA RAID FPGA Ethernet GRIM
10
FPGA Organization Frame Incoming Message Queues Outgoing Message Queues Communication Library API Application Data Memory API FPGA Card Memory FPGA Circuit Canvas User Circuit API User Circuit n User Circuit 1
11
Lessons Learned Frame provides simple OS –Isolates users from board –Portability Dynamically manage resources –Card memory –Computational circuits PCI bottleneck –Distance between NI and FPGA –PCI difficult to work with Page A SRAM 1 Page B SRAM 2 Host CPU FPGA Circuit X Circuit Y Circuit E Circuit F Circuit G Function Fault Message: Use Circuit F on $C0000000 Page Fault Page C NIC
12
Network Features of Recent FPGAs Where we’re going…
13
FPGA Network Improvements Recent FPGAs have special, built-in cores –High-speed transceivers, dedicated processors Idea: Build our NI inside the FPGA –FPGA becomes a networked, compute resource –Removes the PCI bottleneck FPGA NI Tx Rx NI Tx Rx User-defined Computational Circuits CPU NIC System Area Network CPU NIC CPU NIC
14
Xilinx Virtex-II/Pro FPGA Up to 4 PowerPC405 cores –Embedded version of PPC –300-400MHz Multiple gigabit transceivers –Run at 600Mbps to 3.125Gbps –Up to twenty-four transceivers Additional cores –Distributed internal memory –Arrays of 18b multipliers –Digital clock multipliers, PLLs Xilinx V2P20
15
Multi-Gigabit Transceivers: Rocket I/O Flexible, high-speed transceivers –Can be configured to connect with different physical layers –InfiniBand, GigE, FC, 10GigE, Aurora –Note: low-level interface (commas, disparity, clock mismatches) FPGA Fabric Serializer Deserializer Tx FIFO 8B/10B Encoder CRC 8B/10B Decoder Rx Elastic Buffer Clock Recover CRC check PIN + - + - FPGA Fabric Rocket I/O PIN Rocket I/O PIN Rocket I/O PIN
16
Why MGTs are Important Direct connection to networks –Same chip, different network –Remove PCI from equation Fast connections between FPGAs –Reduces analog design issues –Chain FPGAs together –Reduce pin count Update: Virtex II/ProX –Now 2.488 Gbps – 10.3125 Gbps –Chips have either 8 or 20 transceivers 3.125 Gbps over 44” FR4 * * From Xilinx, http://www.xilinx.com/products/virtex2pro/mgtcharacter.htm
17
Hard PowerPC Core PowerPC 405 –16KB Instruction / 16KB Data caches –Real and Virtual memory modes –GCC is available Multiple memory ports for core –On-chip memory (OCM) –Processor Local Bus (PLB) User-defined memory map –Connect memory blocks or cores –External memory cores available Processor Local Bus (PLB) PowerPC I-CacheD-Cache On-Chip Memory (OCM) Interface
18
System on a Chip (SoC) Commercial SoC –Designing with cores –Customize system New tools –Rapidly connect cores –Library of cores & buses –Saves on wiring legwork Xilinx Platform Studio
19
Current Status Exploring V2P –New architecture, new tools Two reference boards –ML300 (V2P7-6) –Avnet (V2P20-6) Transceiver work –Raw transmission over fiber –Working towards IB http://cdulmer.ran.sandia.gov
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.