Exploring Distributed Computing Techniques with Ccactus and Globus

Exploring Distributed Computing Techniques with Ccactus and Globus
Thomas Dramlitsch Albert-Einstein-Institut MPI-Gravitationsphysik (and AEI-ANL-NCSA-LBL team) Solving Einstein’s Equations, Black Holes, and Gravitational Wave Astronomy Cactus, a new community simulation code framework: Grid enabling capabilities Previous Metacomputing experiments What we learned form those Current work, improvements The present state Future development, goals Albert-Einstein-Institut

What is Cactus?: new concept in community developed simulation code infrastructure
Numerical/computational infrastructure to solve PDE’s Freely available, open community source code: spirit of gnu/linux Developed as Response to Needs of these projects It’s production-software Cactus Divided in “Flesh” (core) and “Thorns” (modules or collections of subroutines) User choice between Fortran, C, C++; automated interface between them Parallelism largely automatic and hidden (if desired) from user Checkpointing / Restart capabilities Many parallel utilities / features enabled by Cactus Parallel IO: FlexIO, HDF5; Data streaming, remote visualization/steering Elliptic solvers: PETSc And of course Metacomputing A Vision: any application can plug into Cactus to be Grid enabled Demo tomorrow night at HPDC Albert-Einstein-Institut

Globus Metcomputing Services
Modularity of Cactus... Sub-app Application 1b Application 1a ... Application 2 User selects desired functionality... Cactus Flesh Remote Steer 3 AMR (Grace, etc) MPI layer 1 I/O layer 2 Globus Metcomputing Services Albert-Einstein-Institut

Metacomputing: harnessing power when and where it is needed
Einstein equations typical of apps that require extreme memory, speed many Flops per grid zone (~ ) Finite differences on regular grids Communications of variables through derivatives: ghost zones Largest supercomputers too small! Networks very fast! OC-12 and higher very common in US G-Win: 622 Mbits Potsdam-Berlin-Garching, connect multiple supercomputers Gigabit networking to US possible “Seamless computing and visualization from anywhere” Many metacomputing experiments in progress Albert-Einstein-Institut

Excellent scaling on many architectures
High performance: Full 3D Einstein Equations solved on NCSA NT Supercluster, Origin 2000, T3E Excellent scaling on many architectures Origin up to 256 processors T3E up to 1024 NCSA NT cluster up to 128 processors Achieved 142 Gflops/s on 1024 node T3E-1200 (benchmarked for NASA NS Grand Challenge) But, of course, we want much more… metacomputing, meaning connected computers... Albert-Einstein-Institut

Want to migrate this technology to the generic user...
Metacomputing the Einstein Equations: Connecting T3E’s in Berlin, Garching, San Diego Want to migrate this technology to the generic user... Albert-Einstein-Institut

Scaling of Cactus on two T3Es on different continents
San Diego & Berlin Berlin & Munich Albert-Einstein-Institut

Scaling of Cactus on Multiple SGIs at Remote Sites
Argonne & NCSA Albert-Einstein-Institut

Analysis of previous metacomputing experiments
It worked! (That’s the main thing we wanted at SC98…) Cactus was not optimized for metacomputing: messages too small, latency etc.. Mpich-G could perform better, e.g. intra-machine communication one order of magnitude slower than native MPI Mpich-G2 improves this... Communication is non-trivial (not “embarrassingly parallel”) and very intensive Experiments showed: For some problems, this is feasible We to improve performance significantly with work on optimization of Cactus and Mpich-G That’s what we did! Albert-Einstein-Institut

Optimizing Cactus Communication Layers for Metacomputing
Made the communication layer(s) much more flexible: Can specify size and number of messages, in order to achieve best performance with the underlying network (bandwith, latency) Reduced communication to a bare minimum Overlapping of communication with other cpu’s Overlapping of communication and Computation Made the load balancing of cactus more flexible (Matei Ripeanu): Cactus now allows to decompose the total problem into pieces of different size, according to cpu-power, number of cpu’s used on one machine etc... Cactus compiles (out of the box) with globus and mpich on most common architectures (T3e, Irix, SP-2,…?) Albert-Einstein-Institut

Optimizing Mpich-G: Used Mpich-G2
MPICH-G2 is a completely rewritten communication layer Can distinguish between inter- and intra-machine communication It uses the vendor’s supplied mpi for intra-machine communication Uses TCP/IP between machines This means optimal performance in a metacomputing environment Works with Cactus and Globus on all major unix-systems TCP/IP MPI_COMM_WORLD Albert-Einstein-Institut

Current experiments and future plans
Complete testing and production of tightly coupled simulation between different sites in the USA (NCSA, NERSC, ANL, SDSC and others) Want to use advanced software (Portal, co-scheduling systems etc..) Want to run across many sites and nodes as possible More General Grid Computing problems Distribution of multiple grids Dynamic resource acquisition Aquiring more memory when needed (AMR) Spawning off connected jobs on remote machines Cactus thorn would have access to MDS … Albert-Einstein-Institut

A Portal to Computational Science: The Cactus Collaboratory
Cactus Computational Toolkit Science, Autopilot, AMR, Petsc, HDF, MPI, GrACE, Globus, Remote Steering... 1. User has science idea... 2. Composes/Builds Code Components w/Interface... 3. Selects Appropriate Resources... 4. Steers simulation, monitors performance... 5. Collaborators log in to monitor... Want to integrate and migrate this technology to the generic user... Albert-Einstein-Institut 1

German Gigabit Project supported by DFN-Verein
Developing Techniques to Exploit High Speed Networks Focus on Remote Steering and Visualization OC-12 Testbed between AEI, ZIB, RZG with built-in application groups ready to use it! Already closely connected to ANL, NCSA, KDI projects AEI Albert-Einstein-Institut

Metacomputing Experiments, Production
SC93: remote CM-5 simulation with live viz in CAVE SC95: Heroic I-Way experiments leads to development of Globus. Cornell SP-2, Power Challenge, with live viz in San Diego CAVE SC97: Garching 512 node T3E, launched, controlled, visualized in San Jose SC98: HPC Challenge. SDSC, ZIB, and Garching T3E compute collision of 2 Neutron Stars, controlled from Orlando SC99: Colliding Black Holes using Garching, ZIB T3E’s, with remote collaborative interaction and viz at ANL and NCSA booths April 2000: Attempting to use LANL, NCSA, NERSC, SDSC, ZIB, Garching, NASA-Ames, Maui?, +…? for single simulation! All this technology is available to in main production code for different applications! Albert-Einstein-Institut

Exploring Distributed Computing Techniques with Ccactus and Globus

Similar presentations

Presentation on theme: "Exploring Distributed Computing Techniques with Ccactus and Globus"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Exploring Distributed Computing Techniques with Ccactus and Globus

Similar presentations

Presentation on theme: "Exploring Distributed Computing Techniques with Ccactus and Globus"— Presentation transcript:

Similar presentations

About project

Feedback