Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reconfigurable Computing Leveraging FPGA Accelerators in High-Performance Computing Applications Craig Ulmer June 2, 2005 Sandia is.

Similar presentations


Presentation on theme: "Reconfigurable Computing Leveraging FPGA Accelerators in High-Performance Computing Applications Craig Ulmer June 2, 2005 Sandia is."— Presentation transcript:

1 Reconfigurable Computing Leveraging FPGA Accelerators in High-Performance Computing Applications Craig Ulmer cdulmer@sandia.gov June 2, 2005 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. Craig Ulmer, Ryan Hilles, and David ThompsonSNL/CA Keith Underwood and K. Scott HemmertSNL/NM

2 What is Center 8900 doing with FPGAs? Computer Sciences & Information Technologies 8900 –High-Performance Computing –Viz & Scientific computing (8963) FPGA LDRD for accelerating HPC –NM:Computational aspects –CA:Communication aspects FPGA LDRD for network security –NM/CA: Smart routers Catalyst

3 Outline Reconfigurable Computing –Communication Aspects –Computational Aspects New Platforms: Cray XD1 Summary

4 In a nutshell.. Reconfigurable Computing –Use FPGAs as a means of accelerating simulations –Implement key computations in hardware: soft-hardware double doX( double *a, int n) { int i; double x; x=0; for(i=0;i<n;i+=3){ x+= a[i] * a[i+1] + a[i+2]; … } … return x; } * + + a[i]a[i+1] Z -1 a[i+2] SoftwareHardwareSynthesized Hardware FPGA CPU RC Platform

5 Trivial Example: Sorting Sort a matrix of 64b values Hardware implementation –Sorting element –Array of elements –Data transfer engine Exploits parallelism –Do n comparisons each clock –Requires 3n clock cycles Limited to n values –Use as building block Sorting Array Address Wr Req Rd Req DMA Engine Sorting Control Double-Buffered Outgoing Memory Reg > Sorting Element 11.532.994.188.7 79.434.539.254.2 24.354.14.235.1 14.83.019.791.4 17.581.39.431.3 94.188.732.911.5 79.454.239.234.5 54.135.124.34.2 91.419.714.83.0 81.331.317.59.4 unsortedsorted

6 Sorting Performance Xilinx Virtex II/Pro50-7 FPGA –128 elements –110 MHz Data transfer by FPGA –Read/write concurrency –Pinned memory 2-4x Rate of Quicksort(128) –Host is 2.2GHz AMD Opteron

7 RC: Communication Aspects

8 New FPGAs are “Network Ready” Multi-gigabit transceivers (MGTs) –V2Pro:twenty-four 3Gbps channels –V2ProX:twenty 10Gbps channels –InfiniBand, FC, GigE, 10GigE Multiple embedded processors –PowerPC 405 400MHz Direct connection to network –Build NI in hardware –Use PPC core for complex protocols –User-defined computational circuits Xilinx Virtex II/Pro-7 Network Interface PowerPC CPU User- Defined Circuits

9 TCP/GigE Network Interface for FPGAs Network interface (NI) in hardware –TCP Offload Engine(TOE) –Gigabit Ethernet(GigE) Refined TCP functionality –Slow start, NACK detection, Throttling… –User data rate to host: 600 Mb/s Outgoing TCP Message Control Rocket I/O TxRx Align Decode ARP Cache Framer MAC Header Control ARP Reply Ping ReplyIP Header Incoming Byte Stream Timeout Monitor CRC Outgoing Byte Stream Incoming TCP Message Control TOE GigE UnitSlicesV2P20 TOE2,06822% GigE1,10212%

10 Transport Framework for Visualization Need a framework for supporting Visualization in FPGAs –Want to perform viz tasks, not rendering –Built Chromium interface to transport OpenGL graphics Results –Functional system where visualization in FPGA, Rendering in Host –Saturate GigE link with OpenGL data Viz App GigE Network GigE TCP Offload Engine Chromium Packetizer FPGA

11 Network Intrusion Detection System (NIDS) Collaboration with Georgia Tech –GT:Pattern matching cores –SNL: GigE NI cores Network Intrusion Detection System –Examine packets & remove malicious –Based on Snort rule set (1,500 rules) Filter multiple GigE links transparently –Demonstrated for two links NI Malicious Packet Pattern Matching NI OKDrop Scheduler

12 RC: Computational Aspects

13 Floating Point Computations Floating point has been weakness for FPGAs –Complex to implement, consume many gates Keith Underwood and K. Scott Hemmert –Developed FP library at SNL/NM FP Library –Highly-optimized and compact –Single/Double Precision –Add, Multiply, Divide, Square Root –V2P50: 10-20 DP cores, 130-210 MHz

14 Example: FP Distance Computation Compute c=sqrt(a 2 + b 2 ) –Simple pipeline (88 stages) Implemented on Xilinx Virtex II/Pro 50 –159 MHz clock rate –39% of V2P50 –75 MiOperations/s Incoming SRAM Outgoing SRAM DMA Engine Incoming SRAM From Host To Host

15 New Platforms: Cray XD1

16 Cray XD1 Dense multiprocessor system –12 AMD Opterons on 6 blades –6 Xilinx Virtex-II/Pro FPGAs –InfiniBand-like interconnect –6 SATA hard drives –4 PCI-X slots –3U Rack

17 Cray XD1: Placing FPGAs Near Memory Blades use HyperTransport –High bandwidth I/O FPGA accelerator –Xilinx Virtex II/Pro 50 (-7) –HT connection: 1.6+1.6GB/s –10x performance of PCI First of a new breed –Commodity HT accelerators CPU Main Memory Main Memory NI SRAM Expansion Module SRAM FPGA NI 1.6 GB/s HT Processor Blade

18 Summary FPGAs are getting faster and less expensive –Finally have capacity for interesting problems –Million gate parts for $5 Large amount of reconfigurable computing work at SNL –Experience with Virtex II/Pro Platform FPGAs –Networks & embedded CPUs –Willing to share cores, work with others

19 Deliverables NetworkingVisualizationComputationsXD1General OpenGigE OpenTOE Raw GigE GigE Bridge NIDS Filter Isosurfacer Chromium Packetizer Sorting Array MD5 2D Distance DMA Engine Computation Framework Asynch. Message FIFOs Clock domain data pass PowerPC instantiation SNL/CA Cores 32b/64b Floating-PointComputations Add Multiply Divide Square root Vector/Matrix Multiply SNL/NM Cores cdulmer@sandia.gov http://cdulmer.ran.sandia.gov

20 FPGA Slices Relative V2P Price & Density V2P100 V2P70 V2P7 V2P40 Relative Price & Density


Download ppt "Reconfigurable Computing Leveraging FPGA Accelerators in High-Performance Computing Applications Craig Ulmer June 2, 2005 Sandia is."

Similar presentations


Ads by Google