Cactus Grid Computing Gabrielle Allen Max Planck Institute for Gravitational Physics, (Albert Einstein Institute)

Slides:

Advertisements

Similar presentations

What I really want from networks NOW, and in 5-10 years time The Researchers View Ed Seidel Max-Planck-Institut für Gravitationsphysik.

Advertisements

Gabrielle Allen*, Thomas Dramlitsch*, Ian Foster †, Nicolas Karonis ‡, Matei Ripeanu #, Ed Seidel*, Brian Toonen † * Max-Planck-Institut für Gravitationsphysik.

1 GridLab Grid Application Toolkit and Testbed Contact: Jarek Nabrzyski, GridLab Project Coordinator Poznań Supercomputing and Networking.

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

GridLab Enabling Applications on the Grid Jarek Nabrzyski et al. Poznań Supercomputing and Networking.

Cactus in GrADS Dave Angulo, Ian Foster Matei Ripeanu, Michael Russell Distributed Systems Laboratory The University of Chicago With: Gabrielle Allen,

Cactus in GrADS (HFA) Ian Foster Dave Angulo, Matei Ripeanu, Michael Russell.

Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and Globus Gabrielle Allen, Thomas Dramlitsch, Ian Foster,

The Cactus Portal A Case Study in Grid Portal Development Michael Paul Russell Dept of Computer Science The University of Chicago

Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.

GridLab: Dynamic Grid Applications for Science and Engineering A story from the difficult to the ridiculous… Ed Seidel Max-Planck-Institut für Gravitationsphysik.

Portals Team GridSphere and the GridLab Project Jason Novotny Michael Russell Oliver Wehrens Albert.

Cactus Code and Grid Programming Here at GGF1: Gabrielle Allen, Gerd Lanfermann, Thomas Radke, Ed Seidel Max Planck Institute for Gravitational Physics,

SC 2003 Demo, NCSA booth GridLab Project Funded by the EU (5+ M€), January 2002 – December 2004 Application and Testbed oriented Cactus Code, Triana Workflow,

Workload Management Massimo Sgaravatto INFN Padova.

GridSphere for GridLab A Grid Application Server Development Framework By Michael Paul Russell Dept Computer Science University.

Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.

SUN HPC Consortium, Heidelberg 2004 Grid(Lab) Resource Management System (GRMS) and GridLab Services Krzysztof Kurowski Poznan Supercomputing and Networking.

QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.

Cactus Tools for the Grid Gabrielle Allen Max Planck Institute for Gravitational Physics, (Albert Einstein Institute)

EU Network Meeting June 2001 Cactus Gabrielle Allen, Tom Goodale Max Planck Institute for Gravitational Physics, (Albert Einstein Institute)

Cornell Theory Center Aug CCTK The Cactus Computational Toolkit Werner Benger Max-PIanck-Institut für Gravitationsphysik (Albert-Einstein-Institute.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

The Cactus Code: A Parallel, Collaborative, Framework for Large Scale Computing Gabrielle Allen Max Planck Institute for Gravitational Physics, (Albert.

The Astrophysics Simulation Collaboratory Portal Case Study of a Grid-Enabled Application Environment HPDC-10 San Francisco Michael Russell, Gabrielle.

GridLab A Grid Application Toolkit and Testbed IST Jarek Nabrzyski GridLab Project Coordinator Poznań.

Grads Meeting - San Diego Feb 2000 The Cactus Code Gabrielle Allen Albert Einstein Institute Max Planck Institute for Gravitational Physics

WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.

Cactus Project & Collaborative Working Gabrielle Allen Max Planck Institute for Gravitational Physics, (Albert Einstein Institute)

NeSC Apps Workshop July 20 th, 2002 Customizable command line tools for Grids Ian Kelley + Gabrielle Allen Max Planck Institute for Gravitational Physics.

Dynamic Grid Simulations for Science and Engineering Ed Seidel Max-Planck-Institut für Gravitationsphysik (Albert Einstein Institute) NCSA, U of Illinois.

Projects using Cactus Gabrielle Allen Cactus Retreat Baton Rouge, April 2004.

1 Cactus in a nutshell... n Cactus facilitates parallel code design, it enables platform independent computations and encourages collaborative code development.

Applications for the Grid Here at GGF1: Gabrielle Allen, Thomas, Dramlitsch, Gerd Lanfermann, Thomas Radke, Ed Seidel Max Planck Institute for Gravitational.

1 Overview of the Application Hosting Environment Stefan Zasada University College London.

GridLab: A Grid Application Toolkit and Testbed Jarosław Nabrzyski GridLab Project Manager Poznań Supercomputing and Networking Center, Poland

GridLab: A Grid Application Toolkit and Testbed

National Computational Science National Center for Supercomputing Applications National Computational Science NCSA-IPG Collaboration Projects Overview.

Nomadic Grid Applications: The Cactus WORM G.Lanfermann Max Planck Institute for Gravitational Physics Albert-Einstein-Institute, Golm Dave Angulo University.

The Globus Project: A Status Report Ian Foster Carl Kesselman

Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.

Introduction to Grid Computing Ed Seidel Max Planck Institute for Gravitational Physics

Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,

Developing Applications on Today’s Grids Tom Goodale Max Planck Institute for Gravitational Physics

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

The Cactus Code: A Problem Solving Environment for the Grid Gabrielle Allen, Gerd Lanfermann Max Planck Institute for Gravitational Physics.

Cactus/TIKSL/KDI/Portal Synch Day. Agenda n Main Goals:  Overview of Cactus, TIKSL, KDI, and Portal efforts  present plans for each project  make sure.

GridLab WP-2 Cactus GAT (CGAT) Ed Seidel, AEI & LSU Co-chair, GGF Apps RG, Gridstart Apps TWG Gabrielle Allen, Robert Engel, Tom Goodale, *Thomas Radke.

GridLab Resource Management System (GRMS) Jarek Nabrzyski GridLab Project Coordinator Poznań Supercomputing and.

New and Cool The Cactus Team Albert Einstein Institute

Scenarios for Grid Applications Ed Seidel Max Planck Institute for Gravitational Physics

Connections to Other Packages The Cactus Team Albert Einstein Institute

2/22/2001Greenbook 2001/OASCR1 Greenbook/OASCR Activities Focus on technology to enable SCIENCE to be conducted, i.e. Software tools Software libraries.

Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.

GridLab Resource Management System (GRMS) Jarek Nabrzyski GridLab Project Coordinator Poznań Supercomputing and.

Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.

Albert-Einstein-Institut Exploring Distributed Computing Techniques with Ccactus and Globus Solving Einstein’s Equations, Black.

Dynamic Grid Computing: The Cactus Worm The Egrid Collaboration Represented by: Ed Seidel Albert Einstein Institute

New and Cool The Cactus Team Albert Einstein Institute

Cactus Workshop - NCSA Sep 27 - Oct Generic Cactus Workshop: Summary and Future Ed Seidel Albert Einstein Institute

Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,

Cactus Project & Collaborative Working

Cactus Tools for the Grid

GridLab: Dynamic Grid Applications for Science and Engineering A story from the difficult to the ridiculous… Ed Seidel Max-Planck-Institut für Gravitationsphysik.

The Cactus Team Albert Einstein Institute

Grid Computing AEI Numerical Relativity Group has access to high-end resources in over ten centers in Europe/USA They want: Bigger simulations, more simulations.

Exploring Distributed Computing Techniques with Ccactus and Globus

Dynamic Grid Computing: The Cactus Worm

Gordon Erlebacher Florida State University

Grid Computing Software Interface

Presentation transcript:

Cactus Grid Computing Gabrielle Allen Max Planck Institute for Gravitational Physics, (Albert Einstein Institute)

Cactus THE GRID: Dependable, consistent, pervasive access to high-end resources CACTUS is a freely available, modular, portable and manageable environment for collaboratively developing parallel, high- performance multi-dimensional simulations www. CactusCode.org

What is Cactus? n Flesh (ANSI C) provides code infrastructure (parameter, variable, scheduling databases, error handling, APIs, make, parameter parsing) n Thorns (F77/F90/C/C++/[Java/Perl/Python]) are plug-in and swappable modules or collections of subroutines providing both the computational instructructure and the physical application. Well-defined interface through 3 config files n Just about anything can be implemented as a thorn: Driver layer (MPI, PVM, SHMEM, …), Black Hole evolvers, elliptic solvers, reduction operators, interpolators, web servers, grid tools, IO, … n User driven: easy parallelism, no new paradigms, flexible n Collaborative: thorns borrow concepts from OOP, thorns can be shared, lots of collaborative tools n Computational Toolkit: existing thorns for (Parallel) IO, elliptic, MPI unigrid driver, n Integrate other common packages and tools: HDF5, Globus, PETSc, PAPI, Panda, FlexIO, GrACE, Autopilot, LCAVision, OpenDX, Amira,... n Trivially Grid enabled!

Modularity of Cactus... Sub-app AMR (GrACE, etc) I/O layer 2 Globus Metacomputing Services User selects desired functionality… Code created... Abstractions... Remote Steer 2 MDS/Remote Spawn Legacy App 2 Symbolic Manip App Unstructured... Application 2... Cactus Flesh MPI layer 3 Application 1

Motivation: Grand Challenge Simulations EU Network Astrophysics n 10 EU Institutions, 3 years n Try to finish these problems … n Entire Community becoming Grid enabled Examples of Future of Science & Engineering n Require Large Scale Simulations, beyond reach of any single machine n Require Large Geo-Distributed Cross-Disciplinary Collaborations n Require Grid Technologies, but not yet using them! n Both Apps and Grids Dynamic… NSF Black Hole Grand Challenge n 8 US Institutions, 5 years n Towards colliding black holes NASA Neutron Star Grand Challenge n 5 US Institutions n Towards colliding neutron stars

Why Grid Computing? n AEI Numerical Relativity Group has access to high-end resources in over ten centers in Europe/USA n They want: l Bigger simulations, more simulations and faster throughput l Intuitive IO at local workstation l No new systems/techniques to master!! n How to make best use of these resources? l Provide easier access … no one can remember ten usernames, passwords, batch systems, file systems, … great start!!! l Combine resources for larger productions runs (more resolution badly needed!) l Dynamic scenarios … automatically use what is available l Remote/collaborative visualization, steering, monitoring n Many other motivations for Grid computing...

Grand Picture Remote steering and monitoring from airport Origin: NCSA Remote Viz in St Louis T3E: Garching Simulations launched from Cactus Portal Grid enabled Cactus runs on distributed machines Remote Viz and steering from Berlin Viz of data from previous simulations in SF café DataGrid/DPSS Downsampling Globus http HDF5 IsoSurfaces

Cactus Grid Projects: n User Portal (KDI Astrophysics Simulation Collaboratory) l Efficient, easy, access to resources … interfaces to everything else n Collaborative Working Methods (KDI ASC) n Large Scale Distributed Computing (Globus) l Only way to get the kind of resolution we really need n Remote Monitoring (TiKSL/GriKSL) l Direct access to simulation from anywhere n Remote Visualization (Live/Offline) (TiKSL/GriKSL) l Collaborative analysis during simulations/Viz of large datasets n Remote Steering (TiKSL/GriKSL) l Live collaborative interaction with simulation (eg IO/Analysis) n Dynamic, Adaptive Scenarios (GridLab/GrADs) l Simulation adapts to changing Grid environment n Make Grid Computing useable/accessible for application users !! l GridLab: Grid Application Toolkit

Remote Monitoring/Steering: Thorn HTTPD n Thorn which allows simulation any to act as its own web server n Connect to simulation from any browser anywhere … collaborate n Monitor run: parameters, basic visualization,... n Change steerable parameters n See running example at n Wireless remote viz, monitoring and steering

Developments n VizLauncher: Output data (remote files/streamed data) automatically launched into appropriate local Viz Client (extending to include application specific networks) n Debugging information (individual thorns can easily provide their own information) n Timing information (thorns, communications, IO), allows users to steer their simulation for better performance (switch of analysis/IO)

Remote Visualization IsoSurfaces and Geodesics All Remote Files VizLauncher (download) Grid Functions Streaming HDF5 (downsampling to match bandwidth) Amira LCA Vision OpenDX Amira Use variety of local clients to view remote simulation data. Collaborative, colleagues can access from anywhere. Now adding matching of data to network characteristics

Remote Steering Remote Viz data XML HTTP HDF5 Amira Any Viz Client

Remote File Access Viz Client (Amira) HDF5 VFD DataGrid (Globus) DPSS FTP HTTP Visualization Client DPSS Server FTP Server Web Server Remote Data Server Downsampling, hyperslabs Viz in Berlin 4TB at NCSA Only what is needed

Remote Data Storage TB Data File Remote File Access grr,psi on timestep 2 Lapse for r<1, every other point HDF5 VFD/ GridFTP: Clients use file URL (downsampling,hyperslabbing) Network Monitoring Service NCSA (USA) Analysis of Simulation Data AEI (Germany) Visualization Chandos Hall (UK) More Bandwidth Available

Computational Physics: Complex Workflow Acquire Code Modules Configure And Build Bugs? Report/Fix bugs Set Params Initial Data Run Many Test Jobs Steer, Kill, Or restart Correct? Select largest Rsrc and run For a week Remove vis and steer Novel Results? Archive TB’s Of Data Select and Stage data to Storage array Regression Rmt Vis Data Mine Observation Y N Y N N Y Papers Nobel Prizes

Cactus/ASC Portal n KDI ASC Project (Argonne, NCSA, AEI, LBL, WashU) n Technology: Web Based (end user requirement) l Globus, GSI, DHTML, Java CoG, MyProxy, GPDK, TomCat, Stronghold/Apache, SQL/RDBMS n Portal should hide/simplify the Grid for users l Single access, locates resources, builds/finds executables, central management of parameter files/job output, submit jobs to local batch queues, tracks active jobs. Submission/management of distributed runs n Accesses the ASC Grid Testbed

Grid Applications: Some Examples n Dynamic Staging: move to faster/cheaper/bigger machine l “Cactus Worm” n Multiple Universe l create clone to investigate steered parameter n Automatic Convergence Testing l from intitial data or initiated during simulation n Look Ahead l spawn off and run coarser resolution to predict likely future n Spawn Independent/Asynchronous Tasks l send to cheaper machine, main simulation carries on n Thorn Profiling l best machine/queue, choose resolution parameters based on queue n Dynamic Load Balancing l inhomogeneous loads, multiple grids n Intelligent Parameter Surveys l farm out to different machines n …Must get application community to rethink algorithms…

Physicist has new idea ! S1S1 S2S2 P1P1 P2P2 S1S1 S2S2 P2P2 P1P1 S Brill Wave Dynamic Grid Computing Found a horizon, try out excision Look for horizon Calculate/Output Grav. Waves Calculate/Output Invariants Find best resources Free CPUs!! NCSA SDSC RZG LRZ Archive data SDSC Add more resources Clone job with steered parameter Queue time over, find new machine Archive to LIGO public database

n Dynamic Distributed apps with Grid-threads (gthreads) n Code should be aware of its environment l What resources are out there NOW, and what is their current state? l What is my allocation? l What is the bandwidth/latency between sites? n Code should be able to make decisions on its own l A slow part of my simulation can run asynchronously…spawn it off! l New, more powerful resources just became available…migrate there! l Machine went down…reconfigure and recover! l Need more memory…get it by adding more machines! n Code should be able to publish this information to Portal for tracking, monitoring, steering… l Unexpected event…notify users! l Collaborators from around the world all connect, examine simulation. n Two protypical examples: l Dynamic, Adaptive Distributed Computing l Cactus Worm: Intelligent Simulation Migration New Paradigms

Large Scale Physics Calculation: For accuracy need more resolution than memory of one machine can provide Dynamic Adaptive Distributed Computation (with Argonne/U.Chicago) SDSC IBM SP 1024 procs 5x12x17 =1020 NCSA Origin Array x12x(4+2+2) =480 OC-12 line (But only 2.5MB/sec) GigE:100MB/sec This experiment: n Einstein Equations (but could be any Cactus application) Achieved: n First runs: 15% scaling n With new techniques: 70-85% scaling, ~ 250GF

Distributing Computing n Why do this? l Capability: Need larger machine memory than a single machine has l Throughput: For smaller jobs, can still be quicker than queues n Technology l Globus GRAM for job submission/authentification l MPICH-G2 for communications (Native MPI/TCP) l Cactus simply compiled with MPICH-G2 implementation of MPI –gmake cactus MPI=globus n New Cactus Communication Technologies l Overlap communication/communications l Simulation dynamically adapts to WAN network –Compression/Buffer size for communication –Extra ghostzones, communicate across WAN every N timesteps l Available generically: all applications/grid topologies

Dynamic Adaptation Adapt: 2 ghosts 3 ghosts Compress on! n Automatically adapt to bandwidth latency issues n Application has NO KNOWLEDGE of machines(s) it is on, networks, etc n Adaptive techniques make NO assumptions about network n Issues: l More intellegent adaption algorithm l Eg if network conditions change faster than adaption… n Next: Real BH hole run across Linux Clusters for high quality data for viz.

n Cactus simulation starts n Queries a Grid Information Server, finds resources n Makes intelligent decision to move n Locates new resource & migrates n Registers new location to GIS n Continues around Europe… n Basic prototypical example of many things we want to do! Cactus Worm: Basic Scenario Live Demo at

Migration due to Contract Violation (Foster, Angulo, Cactus Team…) Load applied 3 successive contract violations Running At UIUC (migration time not to scale) Resource discovery & migration Running At UC

Grid Application Toolkit n Application developer should be able to build simulations, such as these, with tools that easily enable dynamic grid capabilities n Want to build programming API to easily incorporate: l Query information server (e.g. GIIS) –What’s available for me? What software? How many processors? l Network Monitoring l Resource Brokering l Decision Routines (Thorns) –How to decide? Cost? Reliability? Size? l Spawning Routines (Thorns) –Now start this up over here, and that up over there l Authentication Server –Issues commands, moves files on your behalf (can’t pass-on Globus proxy) l Data Transfer –Use whatever method is desired (Gsi-ssh, Gsi-ftp, Streamed HDF5, scp…) l Etc… n Need to be able to test Grid tools as swappable plug-and-play

GridLab: Enabling Dynamic Grid Applications n Large EU Project under negotiation with EC n Members: AEI, ZIB, PSNC, Lecce, Athens, Cardiff, Amsterdam, SZTAKI, Brno, ISI, Argonne, Wisconsin, Sun, Compaq n Grid Application Toolkit for application developers and infrastructure (APIs/Tools) n Will be around 20 new Grid positions in Europe !! Look at for details

Credits