Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reconfigurable Computing: FPGAs for Ultrascale Science Sandia National Laboratories Keith UnderwoodSNL/NM Craig Ulmer SNL/CA SOS-8 Workshop.

Similar presentations


Presentation on theme: "Reconfigurable Computing: FPGAs for Ultrascale Science Sandia National Laboratories Keith UnderwoodSNL/NM Craig Ulmer SNL/CA SOS-8 Workshop."— Presentation transcript:

1 Reconfigurable Computing: FPGAs for Ultrascale Science Sandia National Laboratories Keith UnderwoodSNL/NM Craig Ulmer SNL/CA cdulmer@sandia.gov SOS-8 Workshop April 14, 2004

2 Motivation: CPU Efficiency Trend Efficiency: MFLOPS/MHz/Mtransistors Efficiency Processors While CPU performance has been increasing....processing efficiency has been decreasing.

3 Looking Ahead For commodity clusters, should we be nervous? –Significant increases in technology effort –Diminishing returns –Should we depend on CPU manufacturers for HPC? Sandia has many HPC interests –Investigate computing alternatives and accelerators –FPGAs: Modern Reconfigurable Computing

4 Outline Reconfigurable computing Use FPGAs to accelerate computations Strategy and examples Approaches to scientific computing Challenges for ultrascale science Double-precision floating-point performance System integration and network aspects

5 Reconfigurable Computing Background “Soft Hardware”

6 Computing Spectrum Execute x/xor Fetch Decode Registers + Memory Writeback Software General-Purpose CPU Easily reprogrammed Low cost Fundamental bottlenecks + z -1 xorx + x ABD π x C result Hardware Application-Specific Integrated Circuit (ASIC) Not modifiable High cost Extremely fast Soft-Hardware Field Programmable Gate Arrays (FPGAs) Reconfigurable hardware Medium cost Speedup potential

7 Reconfigurable Hardware Devices Tile architecture –Logic blocks (LBs) –Routing elements Field-Programmable Gate Arrays –Fine granularity –LBs are bit-level operators Commercial trend –Coarse granularity –LBs are ALUs, FPUs –QuickSilver, Pact XPP, ClearSpeed Devices that can be programmed to emulate hardware circuitry

8 Common Acceleration Techniques Processing concurrency Hardware pipelines Custom memory interactions Partial evaluation SRAM Internal SRAM Key: Designing in Hardware A B (0-15) B

9 Reconfigurable Computing for Ultrascale Science: HPC Strategy and Examples Enhancing HPC Performance

10 HPC Strategy at Sandia for RC RC resources work best as accelerators in HPC –Clusters are inexpensive & work well for many applications –Add RC devices to enhance performance Port key portions of algorithms to RC hardware –Focus on hotspots and inner loops –Move data to/from FPGAs in pipelined fashion

11 Scientific Computing Examples Pattern recognition –ATLAS project at CERN –Reduced 2500 CPUs to 120 nodes with FPGAs Visualization –Vizard II project at University of Tübingen –Direct volume rendering for 512 3 datasets Molecular dynamics (MD) –Preliminary work at Los Alamos National Laboratory –20 Cells in an FPGA yields 5.69 GFLOPS Computational fluid dynamics (CFD) analysis for jet engines –Smith and Schnore at GE Global Research Inner Loop FunctionFLOPSP4 1.8GHz HostMulti-FPGA System Euler165154 MFLOPS10.2 GFLOPS Viscous61977 MFLOPS23.2 GFLOPS Smoothing24986 MFLOPS7.0 GFLOPS

12 Craig Ulmer SNL/CA Keith Underwood SNL/NM LANL, Academia Industry Challenges Hard to program –Hardware design –Must be significant parallelism Limited chip capacity Lack of HPC building blocks –Our users need DP-FP System integration –How do we add to our clusters?

13 Reconfigurable Computing for Ultrascale Science: Double-Precision Floating-Point Cores Addressing the need for HPC building blocks

14 Double-Precision Floating-Point Cores Floating point has been historical weakness for FPGAs –FP cores consume significant amounts of hardware –Previous FPGAs lacked capacity Significant improvements in recent commercial FPGAs –Increased capacity, faster clocks, and better building blocks Keith Underwood at SNL/NM –Re-evaluating FP performance in FPGAs –Constructing high-speed DP-FP cores

15 Peak Performance Results Core Single PrecisionDouble Precision Speed Cores per V2P100-6 Peak Performance Speed Cores per V2P100-6 Peak Performance Addition195 MHz8917 GFLOPS143 MHz405.7 GFLOPS Multiplication176 MHz7413 GFLOPS142 MHz273.8 GFLOPS Division120 MHz222.6 GFLOPS98 MHz60.58 GFLOPS From Underwood’s, “FPGAs vs. CPUs: Trends in Peak Floating-Point Performance,” in FPGA’04

16 Double-Precision Multiply Performance Trends

17 Reconfigurable Computing for Ultrascale Science: Networking Aspects Addressing capacity and system integration issues

18 Data Exchange: Multi-Gigabit Transceivers (MGTs) How do we rapidly move data into/out of FPGA? Xilinx Virtex-II/Pro FPGA has MGTs –Channel data rates: 3.125 Gbps –Up to 24 channels –V2/ProX: twenty 10Gbps channels Configured for different physical layers –InfiniBand, FC, GigE, 10GigE –S-ATA, PCI-Express, HT FPGA Fabric Rocket I/O MGT PIN Rocket I/O MGT PIN Rocket I/O MGT PIN

19 Importance of MGTs Increase Raw Capacity Connect FPGAs together –MGTs provide fat pipes –Cables, not PCB traces System Integration Connect FPGA to SAN –Implement NI in FPGA –FPGA is global resource FPGA Computational Circuits FPGA Computational Circuits FPGA Computational Circuits FPGA Computational Circuits Channel FPGA NI Tx Rx NI Tx Rx Computational Circuits CPU NIC System Area Network CPU NIC CPU NIC

20 Recent Sandia Work: SNL OpenTOE At Sandia we are interested in connecting FPGAs to SANs –Main target: InfiniBand –Must implement network protocols for reliable transfer Initial work: GigE and TCP –Implemented GigE core and basic TCP offload engine NI GigE IP Core MGT Tx Rx TCP Core FPGA Computational Circuits SNL OpenTOE NI

21 Concluding Remarks Improvements in commercial FPGAs make RC attractive –FPGAs provide better sustained performance than CPUs –FPGA performance growing faster than Moore’s Law Near-term strategy: accelerator-based approach –Offload key operations into hardware Sandia National Labs investigating RC for HPC acceleration –Enabling scientific computing through fast DP FP cores –Addressing system integration/capacity issues via network


Download ppt "Reconfigurable Computing: FPGAs for Ultrascale Science Sandia National Laboratories Keith UnderwoodSNL/NM Craig Ulmer SNL/CA SOS-8 Workshop."

Similar presentations


Ads by Google