Developing HPC Scientific and Engineering Applications: From the Laptop to the Grid Gabrielle Allen, Tom Goodale, Thomas Radke, Ed Seidel Max Planck Institute for Gravitational Physics, Germany John Shalf Lawrence Berkeley Laboratory, USA These slides:
Intro 2 Outline for the Day n Introduction (Ed Seidel, 30 min) n Issues for HPC (John Shalf, 60 min) n Cactus Code (Gabrielle Allen, 90 min) n Demo: Cactus, IO and Viz (John, 15 min) n LUNCH n Introduction to Grid Computing (Ed, 15 min) n Grid Scenarios for Applications (Ed, 60 min) n Demo: Grid Tools (John, 15 min) n Developing Grid Applications Today (Tom Goodale, 60 min) n Conclusions (Ed, 5 min)
Introduction
Intro 4 Outline n Review of application domains requiring HPC n Access and availability of computing resources n Requirements from end users n Requirements from application developers n The future of HPC
Intro 5 What Do We Want to Achieve? n Overview of HPC Applications and Techniques n Strategies for developing HPC applications to be: l Portable: from Laptop to Grid l Future proof l Grid ready n Introduce Frameworks for HPC Application development n Introduce the Grid: What is/isn’t it? What will be? n Grid Toolkits: How to prepare/develop apps for Grid, today & tomorrow n What are we NOT doing? l Application specific algorithms l Parallel programming l Optimizing Fortran, etc
Intro 6 Who uses HPC? n Scientists and Engineers l Simulating Nature: Black Hole Collisions, Hurricanes, Ground water flow l Modeling processes: space shuttle entering atmosphere l Analyzing data: lots of it! n Financial Markets l Modeling currencies n Industry l Airlines, insurance companies l Transaction, data, etc n All face similar problems l Computational need not met l Remote facilities l Heterogeneous and changing systems n Look now at three types: High-Capacity, Throughput, Data Computing
Intro 7 Teraflop Computation, AMR, Elliptic-Hyperbolic, ??? Numerical Relativity High Capacity Computing: Want to Compute What Happens in Nature! Perturbative
Intro 8 Computation Needs: 3D Numerical Relativity Get physicists + CS people together Find Resource (TByte, TFlop crucial) Initial Data: 4 coupled nonlin. elliptics Choose Gauge (elliptic/hyperbolic…) Evolution “hyperbolic” evolution coupled with elliptic eqs. Find Resource …. Analysis: Interpret, Find AH, etc t=0 t=100
Intro 9 Any Such Computation Requires Incredible Mix of Varied Technologies and Expertise! n Many Scientific/Engineering Components Physics, astrophysics, CFD, engineering,... n Many Numerical Algorithm Components l Finite difference methods? Finite volume? Finite elements? l Elliptic equations: multigrid, Krylov subspace, preconditioners,... l Mesh Refinement? n Many Different Computational Components l Parallelism (HPF, MPI, PVM, ???) l Architecture Efficiency (MPP, DSM, Vector, PC Clusters, ???) l I/O Bottlenecks (generate gigabytes per simulation, checkpointing…) l Visualization of all that comes out! n Scientist/eng. wants to focus on top, but all required for results... n Such work cuts across many disciplines, areas of CS… n And now do it on a Grid??!!
Intro 10 How to Achieve This? Any Such Computation Requires Incredible Mix of Varied Technologies and Expertise! n Many Scientific/Engineering Components Physics, astrophysics, CFD, engineering,... n Many Numerical Algorithm Components l Finite difference methods? Finite elements? l Elliptic equations: multigrid, Krylov subspace, preconditioners,... l Mesh Refinement? n Many Different Computational Components l Parallelism (HPF, MPI, PVM, ???) l Architecture Efficiency (MPP, DSM, Vector, PC Clusters, ???) l I/O Bottlenecks (generate gigabytes per simulation, checkpointing…) l Visualization of all that comes out! n Scientist/eng. wants to focus on top, but all required for results... n Such work cuts across many disciplines, areas of CS… n And now do it on a Grid??!!
Intro 11 High Throughput Computing: Task farming n Running hundreds - millions ++ of jobs as quickly as possible n Collecting statistics, doing ensemble calculations, surveying large parameter space, etc n Typical Characteristics l Many small, independent jobs: must be managed! l Usually not much data transfer l Sometimes jobs can be moved from site to site n Example Problems: climatemodeling.com, NUG30 n Example Solutions: Condor, SC02 demos, etc n Later: examples that combine “capacity” and “throughput”
Intro 12 Large Data Computing n Data: more and more the “killer app” for the Grid l Data mining: l Looking for patterns in huge databases distributed over the world l E.g. Genome analysis l Data analysis: l Large astronomical observatories l Particle physics experiments l Huge amounts of data from different locations to be correlated, studied l Data generation l Resources Grow: Huge simulations will each generate TB-PB to be studied n Visualization l How to visualize such large data, here, at a distance, distributed n Soon: Dynamic combinations of all types of computing, data & on grids n Our Goal is to give strategies for dealing with all types of computing
Intro 13 Grand Challenge Collaborations Going Large Scale: Needs Dwarf Capabilities Examples of Future of Science & Engineering Require Large Scale Simulations, beyond reach of any machine Require Large Geo-distributed Cross-Disciplinary Collaborations Require Grid Technologies, but not yet using them! Both Apps and Grids Dynamic… NSF Black Hole Grand Challenge 8 US Institutions, 5 years Solve problem of colliding BH (try…) EU Network Astrophysics 10 EU Institutions, 3 years, €1.5M Continue these problems Entire Community becoming Grid enabled NASA Neutron Star Grand Challenge 5 US Institutions Solve problem of colliding neutron stars (try…)
Intro 14 Growth of Computing Resources (from Dongarra)
Intro 15 Not just Growth, Proliferation n Systems getting larger by 2-3-4x per year! l Moore’s law (processor doubles each 18 months) l Increasing parallelism: add more and more processors n More systems l Many more organizations recognizing need for HPC –Universities –Labs –Industry –Business n New kind of parallelism: Grid l Harness these machines, which themselves are growing l Machines all different! Be prepared for next thing…
Intro 16 Today’s Computational Resources n PDA’s n Laptops n PCs n SMPs l Shared memory up to now n Clusters l Distributed memory, must use message passing or task farming n “Traditional” supercomputers l SMPs of up to ~64+ processors l Clustering above this l Vectors n Clusters of large systems: metacomputing n The Grid Everyone: uses PDAs - PCs Industry: prefers traditional machines Academia: clusters for price/perf We show how to minimize effort to go between systems, prepare for Grid
Intro 17 The Same Application … Application Middleware Application Middleware Application Middleware Laptop The Grid Super Computer No network!Biggest machines!
Intro 18 What is Difficult About HPC? n Many different architectures and operating systems n Things change very rapidly n Must worry about many things at same time l Single processor performance, caches, etc l Different languages l Different operating systems (but now, at least everything is (nearly) unix!) l Parallelism l I/O l Visualization l Batch systems n Portability: compilers, datatypes and associated tools
Intro 19 Requirements of End Users n We have problems that need to be solved l Want to work at conceptual level l Build on top of other things that have been solved for us –Use libraries, modules, etc. n We don’t want to waste time with… l Learning a new parallel layer l Writing high performance I/O l Learning a new batch system, etc… n We have collaborators distributed all over the world n We want answers fast, on whatever machines are available n Basically, want to write simple Fortran or C code and have it work…
Intro 20 Requirements of Application Developers n We must have access to latest technologies l These should be available through simple interfaces and APIs l They should be interchangeable with each other when same functionality is available from different packages n Code we develop must be as portable and as future proof as possible l Run on all these architectures we have today l Easily adapted to those of tomorrow l If possible, top level user application code should not change, only layers underneath n We’ll give strategies for doing this, on today’s machines, and on the Grid of tomorrow
Intro 21 Where is This All Going? n Dangerous to predict, but: l Resources will continue to grow for some time –Machines will get larger at this rate: TeraFlop now, PetaFlop tomorrow –Collections of resources into Grids is happening now, will be routine tomorrow –Very heterogeneous environments l Data explosion will be exponential –Mixture of real-time simulation and data analysis will become routine l Bandwidth from point to point will allocatable on demand! l Applications will become very sophisticated, able to adapt to their changing needs, and to changing environment (on time scales of minutes to years) n We are trying today to help you prepare for this!