Engineering Applications on NASA’s FPGA*-based Hypercomputers By Olaf.O.Storaasli@nasa.gov Analytical & Computational Methods Branch NASA Langley Research Center Hampton Virginia 7th Military Aerospace Programmable Logic Device (MAPLD) International Conference Reagan Center, Washington DC September 10, 2004 NOTES: find out what Rutishauser’s research is for summer ’03 slides confirm GFLOP numbers *Field-Programmable Gate Array
Contents Background: Hardware, “Gateware” Current: Algorithms Applications: CPU-FPGA, FPGA Future: “New” Spacecraft Hypercomputer 2
NASA Reconfigurable Hypercomputers 6M gates/FPGA 62K gates/FPGA ‘02 ‘04 Good Afternoon – Happy to be here and share some exciting results about a new computing paradigm. Our goal is to harness.FPGAs (image proc – networking) for scientific Our Team has grown from OOS & RCS…..to include 6 NASA + 8 students We’ve Partnered with SBS (SAA), and collaborate with many (NSA..) 3
Computing Faster Without CPUs GOAL: Explore Engineering Applications on NASA’s FPGA-based Hypercomputers TEAM: Drs. Olaf Storaasli, Jarek Sobieski & Robert Singleterry, Dave Rutishauser, Joe Rehder, Garry Qualls, Robert Lewis Students: MIT Harvard VT Brown UVA JPMorgan Case Pitt, Governor’s School PARTNERS: Starbridge Systems (FPGA H/W + VIVA S/W) NSA, USAF, MSFC, AlphaStar Good Afternoon – Happy to be here and share some exciting results about a new computing paradigm. Our goal is to harness.FPGAs (image proc – networking) for scientific Our Team has grown from OOS & RCS…..to include 6 NASA + 8 students We’ve Partnered with SBS (SAA), and collaborate with many (NSA..) 4
VIVA: Custom Chip Design What: Graphically code FPGAs: drag & drop vs text) VIVA Menu Traditional Code: 1D do i = 1, 1000 C= A+B end do VIVA Gateware: 3D + +…+ Parallelism natural esoteric How: Converts icons-transports to FPGA circuit Why: near-ASIC speed (w/o chip design $$$) Corelib: Pre-built objects & examples Data: Any type-size-precision (not fixed) More: System Description ports to any H/W “write once, run anywhere” 5
FPGA Use CPU +FPGA Accelerator Replace CPUs CPU CPU Exploit Local Parallelism Max {kernel Ops/cycle} C/FORTRAN calls VIVA kernel Limit: FPGA gates + Amdahl’s Law Replace CPUs Exploit Parallelism Fully Max {Ops/cycle} => Fill FPGA VIVA/VHDL/Verilog code Limit: FPGA(s) gates CPU CPU <=> Call FPGA kernel Ax=b NASA GPS 50 line kernel 95% CPU Time Move to FPGA 28k lines FORTRAN 6 Cray XD1: Opterons + Xilinx FPGAs
GENOA-GPS* “Port” *‘99 NASA Software-of-the-Year GENOA Analysis/Design (AlphaStar) GPS Matrix Equation Solver (NASA) Structural, EM, acoustic analysis+design Most Computations in 50-line kernel kernel coded: VIVA-GPS VIVA2.4 => large applications ongoing (NASA-AlphaStar-Starbridge) Progressive Failure, Reliability, Durability Manufacturing,Virtual Test, Life prediction Calls GPS Shuttle re-entry wing damage analysis time: 660 hours => minutes (Goal) Finite Element Model *‘99 NASA Software-of-the-Year 7
Columbia Burn-thru Analysis RCC-Tseal Fracture 503 sec Leading Edge FEM Leading Edge Panel 6 Panel 7 Panel 8 38in Insulation Fracture 230 Sec Spar Fracture 500 sec RCC-Tseal Fracture 503 sec Time 8
Maximize Performance via Parallelism FPGA Use CPU +FPGA Accelerator Exploit Local Parallelism Max {kernel Ops/cycle} C/FORTRAN calls VIVA kernel Limit: FPGA gates + Amdahl’s Law Replace CPUs Exploit Parallelism Fully Max {Ops/cycle} => Fill FPGA 100% VIVA code Limit: FPGA(s) gates Maximize Performance via Parallelism Adds/FPGA 16 32 128 256 512 640 % FPGA used 1 2 8 41 51 109 Ops 4 34 77 154 192 1000+ adds/clock cycle => 1011 Ops/sec (1 add/cycle on CPUs) 9 Cray XD1: Opterons + Xilinx FPGAs
Memory: FPGA & SDRAM - keep “action” on/near FPGA - 2-8GB SDRAM (large applications) 144x 2KB blocks RAM 10
File I/O FileIn/FileOut in Corelib Transfers 2 KB blocks (Disk FPGA RAM) User can access FPGA RAM 4 Bytes at a time 11
Add Files in Parallel R S + W R S Read 2 files => Store in FPGA RAM => + files => Write result R S + W R S 12
Parallel Adds Faster - same file size - CPUs (1 add) 100 92 90 80 File size 70 Time in cycles 60 4KB 8KB 50 46 16KB Log. (8KB) 40 Log. (4KB) Log. (16KB) 30 23 20 10 2 4 8 12 16 20 24 28 Number of FPGA Adders used 13
Algorithms Developed Matrix Algebra: {V}, [M], {V}T{V}, [M]x[M],GCD,… n! => Probability: Combinations/Permutations Cordic => Transcendentals: sin, log, exp, cosh… ∂y/∂x & ∫f(x)dx => Runge-Kutta: CFD, Newmark Beta: CSM Matrix Equation Solvers: [A]{x} = {b}, Gauss & Jacobi . Dynamic Analysis: [M]{ü} + [C]{u} + [K]{u} + NL = {P(t)} Analog Computing: digital accuracy NLT - non-linear terms Nonlinear Analysis: reduces NL time Structural Design & Optimization 14
Applications: VIVA Code Jacobi Matrix Solver Gauss Matrix Solver Runge-Kutta Cellular Automata 15
Gauss-Jordan A x = B Solver • VIVA code solves n equations. Ex: x0 + x1 + x2 = 0 x0 – 2x1 + 2x2 = 4 x0 + 2x1 – x2 = 2 => x0 = 4 x1 = -2 x2 = -2 • Run on hypercomputer emulator, then FPGA 16
Spring-Mass Solver Method: 4-stage Runge-Kutta f 17
Cellular Automata • Parallel: Stephen Wolfram - A New Kind of Science • Complexity via simple interactions w/o PDEs • CFD => Structures • Cell-neighbors interactions; simple compute/cell d P FEA solution Cellular Automata solution 18
Cantilever Beam Optimization Constants: L = 24” W = 3” P = 20 lbs = 0.097 lbs/in3 Constraint: Stressallowed = 40K lbs/in2 Find thickness, d, to minimize where 19
VIVA FPGA Code Minimizes Beam Weight d chosen 1023 times VIVA Results: d= 0.156” (0.155 exact) Minimum weight = 1.09 lbs (1.082 exact) 20
“a bold new course into the cosmos” Reconfigurable Scalable Computing (RSC) for Space Applications - $14.8M 21
Spirit & Opportunity Rovers 6 Radiation-tolerant FPGAs: 1M gates @ 100kRads ----------------------------------------- Next: 6M gates @ 200kRads 22
What Reconfigurable Scalable Computing (RSC) for Space Applications Who Langley, Goddard, NSA, Starbridge, Jefferson Lab, ASRC, Queensland When 4 years (FY ‘05-’08) How $14.8M Goal Effective-affordable processing for moon & Mars missions Plan Design-implement-demonstrate RSC for space applications Hardware Stacked scalable FPGAs Gateware Conventional (MPI/Linux) + Special (VIVA) More: 23
Summary Hardware: Exploiting advanced FPGA-based systems FPGAs: Rapid growth, inherently //, flexible, efficient VIVA: Powerful & growing (tailored to NASA needs) Applications: - Many Engineering algorithms (VIVA => FPGAs) - GPS-VIVA => CPU+FPGA accelerator Speed: 640 ops/cycle (2x1011 ops/sec) measured Future: Reconfigurable Scalable Computing for Space 24
The End 25