Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December.

Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) cdulmer@sandia.gov Pete Dean R&D Seminar December 11, 2003

FPGAs are promising… But what’s the catch? There are three main challenges that need to be addressed in order to apply to practical, scientific computing.

RC Challenge #1: Floating Point Most FPGAs fine grained Floating point units are large –32b FP occupies ~1,000 CLBs –Commercial capacity improving 2000: 6,000 CLBs 2003: 40,000 CLBs (Max: 220,000) Keith Underwood at Sandia/NM –LDRD: Working on high-speed 64b floating-point cores 32b FP in Xilinx V2P7

RC Challenge #2: Design Tools Hardware design is non-trivial –Micromanage computations, clock-by-clock –Not appropriate for most scientists –Need languages, APIs that are easy to use Maya Gokhale at LANL –Streams-C: C-like language for HW design –Pipeline/unroll loops –Schedules access to external memory

RC Challenge #3: High-speed I/O FPGAs have large internal computational power –How do we get data into/out of FPGA? –How do we connect to our existing HPC machines? Mitch Sukalski, David Thompson, Craig Ulmer –LDRD: Connect FPGAs to high-performance SANs FPGA

Outline Where we have been Networking FPGAs using external NI cards Where we are going Networking FPGAs using internal transceivers Project status Early details

Previous Work Where we’ve been..

Networking Earlier FPGAs Previous generation of FPGAs were like blank ASICs –Configurable logic and pins Attach a network card to an FPGA card –Communication over PCI Examples: –Virginia Tech:Myrinet –Washington U. in St. Louis: ATM (inline) –Clemson University: Gigabit Ethernet –Georgia Tech: Myrinet CPU FPGA NIC PCI Bus

GRIM Project at Georgia Tech Add multimedia devices to cluster –Message layer connects CPUs, memory, and peripherals –Myrinet between hosts, PCI within hosts Celoxica RC-1000 FPGA –Virtex FPGA (1M logic gates) –Four SRAM banks –PCI w/ PMC SRAM 0 SRAM 1 SRAM 2 SRAM 3 PCI FPGA Control & Switching CPU FPGA RAID FPGA Ethernet GRIM

FPGA Organization Frame Incoming Message Queues Outgoing Message Queues Communication Library API Application Data Memory API FPGA Card Memory FPGA Circuit Canvas User Circuit API User Circuit n User Circuit 1

Lessons Learned Frame provides simple OS –Isolates users from board –Portability Dynamically manage resources –Card memory –Computational circuits PCI bottleneck –Distance between NI and FPGA –PCI difficult to work with Page A SRAM 1 Page B SRAM 2 Host CPU FPGA Circuit X Circuit Y Circuit E Circuit F Circuit G Function Fault Message: Use Circuit F on $C0000000 Page Fault Page C NIC

Network Features of Recent FPGAs Where we’re going…

FPGA Network Improvements Recent FPGAs have special, built-in cores –High-speed transceivers, dedicated processors Idea: Build our NI inside the FPGA –FPGA becomes a networked, compute resource –Removes the PCI bottleneck FPGA NI Tx Rx NI Tx Rx User-defined Computational Circuits CPU NIC System Area Network CPU NIC CPU NIC

Xilinx Virtex-II/Pro FPGA Up to 4 PowerPC405 cores –Embedded version of PPC –300-400MHz Multiple gigabit transceivers –Run at 600Mbps to 3.125Gbps –Up to twenty-four transceivers Additional cores –Distributed internal memory –Arrays of 18b multipliers –Digital clock multipliers, PLLs Xilinx V2P20

Multi-Gigabit Transceivers: Rocket I/O Flexible, high-speed transceivers –Can be configured to connect with different physical layers –InfiniBand, GigE, FC, 10GigE, Aurora –Note: low-level interface (commas, disparity, clock mismatches) FPGA Fabric Serializer Deserializer Tx FIFO 8B/10B Encoder CRC 8B/10B Decoder Rx Elastic Buffer Clock Recover CRC check PIN + - + - FPGA Fabric Rocket I/O PIN Rocket I/O PIN Rocket I/O PIN

Why MGTs are Important Direct connection to networks –Same chip, different network –Remove PCI from equation Fast connections between FPGAs –Reduces analog design issues –Chain FPGAs together –Reduce pin count Update: Virtex II/ProX –Now 2.488 Gbps – 10.3125 Gbps –Chips have either 8 or 20 transceivers 3.125 Gbps over 44” FR4 * * From Xilinx, http://www.xilinx.com/products/virtex2pro/mgtcharacter.htm

Hard PowerPC Core PowerPC 405 –16KB Instruction / 16KB Data caches –Real and Virtual memory modes –GCC is available Multiple memory ports for core –On-chip memory (OCM) –Processor Local Bus (PLB) User-defined memory map –Connect memory blocks or cores –External memory cores available Processor Local Bus (PLB) PowerPC I-CacheD-Cache On-Chip Memory (OCM) Interface

System on a Chip (SoC) Commercial SoC –Designing with cores –Customize system New tools –Rapidly connect cores –Library of cores & buses –Saves on wiring legwork Xilinx Platform Studio

Current Status Exploring V2P –New architecture, new tools Two reference boards –ML300 (V2P7-6) –Avnet (V2P20-6) Transceiver work –Raw transmission over fiber –Working towards IB http://cdulmer.ran.sandia.gov

Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December.

Similar presentations

Presentation on theme: "Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December.

Similar presentations

Presentation on theme: "Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December."— Presentation transcript:

Similar presentations

About project

Feedback