Download presentation
Presentation is loading. Please wait.
Published byKellie Foster Modified over 9 years ago
1
Cactus Grid Computing Gabrielle Allen Max Planck Institute for Gravitational Physics, (Albert Einstein Institute)
2
Cactus THE GRID: Dependable, consistent, pervasive access to high-end resources CACTUS is a freely available, modular, portable and manageable environment for collaboratively developing parallel, high- performance multi-dimensional simulations www. CactusCode.org
3
What is Cactus? n Flesh (ANSI C) provides code infrastructure (parameter, variable, scheduling databases, error handling, APIs, make, parameter parsing) n Thorns (F77/F90/C/C++/[Java/Perl/Python]) are plug-in and swappable modules or collections of subroutines providing both the computational instructructure and the physical application. Well-defined interface through 3 config files n Just about anything can be implemented as a thorn: Driver layer (MPI, PVM, SHMEM, …), Black Hole evolvers, elliptic solvers, reduction operators, interpolators, web servers, grid tools, IO, … n User driven: easy parallelism, no new paradigms, flexible n Collaborative: thorns borrow concepts from OOP, thorns can be shared, lots of collaborative tools n Computational Toolkit: existing thorns for (Parallel) IO, elliptic, MPI unigrid driver, n Integrate other common packages and tools: HDF5, Globus, PETSc, PAPI, Panda, FlexIO, GrACE, Autopilot, LCAVision, OpenDX, Amira,... n Trivially Grid enabled!
4
Modularity of Cactus... Sub-app AMR (GrACE, etc) I/O layer 2 Globus Metacomputing Services User selects desired functionality… Code created... Abstractions... Remote Steer 2 MDS/Remote Spawn Legacy App 2 Symbolic Manip App Unstructured... Application 2... Cactus Flesh MPI layer 3 Application 1
5
Motivation: Grand Challenge Simulations EU Network Astrophysics n 10 EU Institutions, 3 years n Try to finish these problems … n Entire Community becoming Grid enabled Examples of Future of Science & Engineering n Require Large Scale Simulations, beyond reach of any single machine n Require Large Geo-Distributed Cross-Disciplinary Collaborations n Require Grid Technologies, but not yet using them! n Both Apps and Grids Dynamic… NSF Black Hole Grand Challenge n 8 US Institutions, 5 years n Towards colliding black holes NASA Neutron Star Grand Challenge n 5 US Institutions n Towards colliding neutron stars
6
Why Grid Computing? n AEI Numerical Relativity Group has access to high-end resources in over ten centers in Europe/USA n They want: l Bigger simulations, more simulations and faster throughput l Intuitive IO at local workstation l No new systems/techniques to master!! n How to make best use of these resources? l Provide easier access … no one can remember ten usernames, passwords, batch systems, file systems, … great start!!! l Combine resources for larger productions runs (more resolution badly needed!) l Dynamic scenarios … automatically use what is available l Remote/collaborative visualization, steering, monitoring n Many other motivations for Grid computing...
7
Grand Picture Remote steering and monitoring from airport Origin: NCSA Remote Viz in St Louis T3E: Garching Simulations launched from Cactus Portal Grid enabled Cactus runs on distributed machines Remote Viz and steering from Berlin Viz of data from previous simulations in SF café DataGrid/DPSS Downsampling Globus http HDF5 IsoSurfaces
8
Cactus Grid Projects: n User Portal (KDI Astrophysics Simulation Collaboratory) l Efficient, easy, access to resources … interfaces to everything else n Collaborative Working Methods (KDI ASC) n Large Scale Distributed Computing (Globus) l Only way to get the kind of resolution we really need n Remote Monitoring (TiKSL/GriKSL) l Direct access to simulation from anywhere n Remote Visualization (Live/Offline) (TiKSL/GriKSL) l Collaborative analysis during simulations/Viz of large datasets n Remote Steering (TiKSL/GriKSL) l Live collaborative interaction with simulation (eg IO/Analysis) n Dynamic, Adaptive Scenarios (GridLab/GrADs) l Simulation adapts to changing Grid environment n Make Grid Computing useable/accessible for application users !! l GridLab: Grid Application Toolkit
9
Remote Monitoring/Steering: Thorn HTTPD n Thorn which allows simulation any to act as its own web server n Connect to simulation from any browser anywhere … collaborate n Monitor run: parameters, basic visualization,... n Change steerable parameters n See running example at www.CactusCode.org n Wireless remote viz, monitoring and steering
10
Developments n VizLauncher: Output data (remote files/streamed data) automatically launched into appropriate local Viz Client (extending to include application specific networks) n Debugging information (individual thorns can easily provide their own information) n Timing information (thorns, communications, IO), allows users to steer their simulation for better performance (switch of analysis/IO)
11
Remote Visualization IsoSurfaces and Geodesics All Remote Files VizLauncher (download) Grid Functions Streaming HDF5 (downsampling to match bandwidth) Amira LCA Vision OpenDX Amira Use variety of local clients to view remote simulation data. Collaborative, colleagues can access from anywhere. Now adding matching of data to network characteristics
12
Remote Steering Remote Viz data XML HTTP HDF5 Amira Any Viz Client
13
Remote File Access Viz Client (Amira) HDF5 VFD DataGrid (Globus) DPSS FTP HTTP Visualization Client DPSS Server FTP Server Web Server Remote Data Server Downsampling, hyperslabs Viz in Berlin 4TB at NCSA Only what is needed
14
Remote Data Storage TB Data File Remote File Access grr,psi on timestep 2 Lapse for r<1, every other point HDF5 VFD/ GridFTP: Clients use file URL (downsampling,hyperslabbing) Network Monitoring Service NCSA (USA) Analysis of Simulation Data AEI (Germany) Visualization Chandos Hall (UK) More Bandwidth Available
15
Computational Physics: Complex Workflow Acquire Code Modules Configure And Build Bugs? Report/Fix bugs Set Params Initial Data Run Many Test Jobs Steer, Kill, Or restart Correct? Select largest Rsrc and run For a week Remove vis and steer Novel Results? Archive TB’s Of Data Select and Stage data to Storage array Regression Rmt Vis Data Mine Observation Y N Y N N Y Papers Nobel Prizes
16
Cactus/ASC Portal n KDI ASC Project (Argonne, NCSA, AEI, LBL, WashU) n Technology: Web Based (end user requirement) l Globus, GSI, DHTML, Java CoG, MyProxy, GPDK, TomCat, Stronghold/Apache, SQL/RDBMS n Portal should hide/simplify the Grid for users l Single access, locates resources, builds/finds executables, central management of parameter files/job output, submit jobs to local batch queues, tracks active jobs. Submission/management of distributed runs n Accesses the ASC Grid Testbed
17
Grid Applications: Some Examples n Dynamic Staging: move to faster/cheaper/bigger machine l “Cactus Worm” n Multiple Universe l create clone to investigate steered parameter n Automatic Convergence Testing l from intitial data or initiated during simulation n Look Ahead l spawn off and run coarser resolution to predict likely future n Spawn Independent/Asynchronous Tasks l send to cheaper machine, main simulation carries on n Thorn Profiling l best machine/queue, choose resolution parameters based on queue n Dynamic Load Balancing l inhomogeneous loads, multiple grids n Intelligent Parameter Surveys l farm out to different machines n …Must get application community to rethink algorithms…
18
Physicist has new idea ! S1S1 S2S2 P1P1 P2P2 S1S1 S2S2 P2P2 P1P1 S Brill Wave Dynamic Grid Computing Found a horizon, try out excision Look for horizon Calculate/Output Grav. Waves Calculate/Output Invariants Find best resources Free CPUs!! NCSA SDSC RZG LRZ Archive data SDSC Add more resources Clone job with steered parameter Queue time over, find new machine Archive to LIGO public database
19
n Dynamic Distributed apps with Grid-threads (gthreads) n Code should be aware of its environment l What resources are out there NOW, and what is their current state? l What is my allocation? l What is the bandwidth/latency between sites? n Code should be able to make decisions on its own l A slow part of my simulation can run asynchronously…spawn it off! l New, more powerful resources just became available…migrate there! l Machine went down…reconfigure and recover! l Need more memory…get it by adding more machines! n Code should be able to publish this information to Portal for tracking, monitoring, steering… l Unexpected event…notify users! l Collaborators from around the world all connect, examine simulation. n Two protypical examples: l Dynamic, Adaptive Distributed Computing l Cactus Worm: Intelligent Simulation Migration New Paradigms
20
Large Scale Physics Calculation: For accuracy need more resolution than memory of one machine can provide Dynamic Adaptive Distributed Computation (with Argonne/U.Chicago) SDSC IBM SP 1024 procs 5x12x17 =1020 NCSA Origin Array 256+128+128 5x12x(4+2+2) =480 OC-12 line (But only 2.5MB/sec) GigE:100MB/sec 17 12 5 4 2 5 2 This experiment: n Einstein Equations (but could be any Cactus application) Achieved: n First runs: 15% scaling n With new techniques: 70-85% scaling, ~ 250GF
21
Distributing Computing n Why do this? l Capability: Need larger machine memory than a single machine has l Throughput: For smaller jobs, can still be quicker than queues n Technology l Globus GRAM for job submission/authentification l MPICH-G2 for communications (Native MPI/TCP) l Cactus simply compiled with MPICH-G2 implementation of MPI –gmake cactus MPI=globus n New Cactus Communication Technologies l Overlap communication/communications l Simulation dynamically adapts to WAN network –Compression/Buffer size for communication –Extra ghostzones, communicate across WAN every N timesteps l Available generically: all applications/grid topologies
22
Dynamic Adaptation Adapt: 2 ghosts 3 ghosts Compress on! n Automatically adapt to bandwidth latency issues n Application has NO KNOWLEDGE of machines(s) it is on, networks, etc n Adaptive techniques make NO assumptions about network n Issues: l More intellegent adaption algorithm l Eg if network conditions change faster than adaption… n Next: Real BH hole run across Linux Clusters for high quality data for viz.
23
n Cactus simulation starts n Queries a Grid Information Server, finds resources n Makes intelligent decision to move n Locates new resource & migrates n Registers new location to GIS n Continues around Europe… n Basic prototypical example of many things we want to do! Cactus Worm: Basic Scenario Live Demo at http://www.cactuscode.org
24
Migration due to Contract Violation (Foster, Angulo, Cactus Team…) Load applied 3 successive contract violations Running At UIUC (migration time not to scale) Resource discovery & migration Running At UC
25
Grid Application Toolkit n Application developer should be able to build simulations, such as these, with tools that easily enable dynamic grid capabilities n Want to build programming API to easily incorporate: l Query information server (e.g. GIIS) –What’s available for me? What software? How many processors? l Network Monitoring l Resource Brokering l Decision Routines (Thorns) –How to decide? Cost? Reliability? Size? l Spawning Routines (Thorns) –Now start this up over here, and that up over there l Authentication Server –Issues commands, moves files on your behalf (can’t pass-on Globus proxy) l Data Transfer –Use whatever method is desired (Gsi-ssh, Gsi-ftp, Streamed HDF5, scp…) l Etc… n Need to be able to test Grid tools as swappable plug-and-play
26
GridLab: Enabling Dynamic Grid Applications n Large EU Project under negotiation with EC n Members: AEI, ZIB, PSNC, Lecce, Athens, Cardiff, Amsterdam, SZTAKI, Brno, ISI, Argonne, Wisconsin, Sun, Compaq n Grid Application Toolkit for application developers and infrastructure (APIs/Tools) n Will be around 20 new Grid positions in Europe !! Look at www.gridlab.org for details
27
Credits
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.