The Globus Toolkit: Description and Applications Review Steve Tuecke & Ian Foster Argonne National Laboratory The University of Chicago Globus Co-PI: Carl Kesselman, USC/ISI
Overview l The need for Grid services l The Globus toolkit l Globus application case studies –Microtomography: on-line instrumentation –SF-Express and Overflow: distributed supercomputing –CAVERNsoft: collaborative engineering –Nimrod-G: high-throughput computing –ECCE’: problem solving environment l Summary
Creating a Usable Grid : Grid Services (“Middleware”) l Standard grid services that –Provide uniform, high-level access to a wide range of resources (including networks) –Address interdomain issues of security, policy, etc. –Permit application-level management and monitoring of end-to-end performance l Middleware-level and higher-level APIs and tools targeted at application programmers –Map between application and Grid
Grid Services Architecture Grid Fabric Layer Applications TransportMulticast InstrumentationControl interfacesQoS mechanisms Grid Services Layer InformationResource mgmt SecurityData accessFault detection... High-energy physics data analysis Regional climate studies Collaborative engineering Parameter studies On-line instrumentation Application Toolkit Layer Distributed computing Data- intensive Collab. design Remote viz Remote control
The Globus Project: Argonne, USC/ISI, NCSA, Aerospace, NASA Ames, LBNL, others l Basic research in grid-related technologies –Resource management, security, adaptation, etc. l Development of Globus toolkit –Core services for grid-enabled tools & applns l Construction of large grid testbed: GUSTO –Largest grid testbed in terms of sites & apps l Application experiments –Tele-immersion, distributed computing, etc.
GUSTO Testbed Map
Globus Grid Services l The Globus toolkit provides a range of basic Grid services –Security, information, fault detection, communication, resource management,... l These services are simple and orthogonal –Can be used independently, mix and match –Programming model independent l For each there are well-defined APIs l Standards are used extensively –E.g., LDAP, GSS-API, X.509,...
Grid Services Layer (1) l Grid Security Infrastructure –Single-sign on, run anywhere [if authorized] –PKI, X.509 certificates –Identity/credential mapping at each resource –Allows programs to act as user for limited period: delegation of rights
Grid Services Layer (2) l Grid Information Service –Currently an LDAP-based directory service –Publish structure and state info, dynamic performance info, software info, etc., etc. –Resource discovery: “find me an X with property Y available at time T” –Auto-configuration: “tell me what I need to know to use A efficiently/securely/...” –Gateways to other data sources required –Example of integrating “middleware” service
Grid Services Layer (3) l Access to remote data (GASS) –Uniform access to diverse storage management systems –Cache management –Integration with SRB, DPSS, HPSS l Communication (Nexus) –Application-level interfaces to comm services –Multiple methods: reliable/unreliable, IP/other, unicast/multicast –QoS interfaces
Grid Services Layer (4) l Globus Resource Allocation Manager (GRAM) –Uniform interface to resource management l Globus Arch. for Reservation and Allocation –Co-allocation of compute resources –Immediate and advance reservation of network and computers in prototype form l Fault detection service l Network measurement tools l Code management and distribution infrastructure
Application Toolkit Layer: e.g. l Message Passing Interface –Multi-method communication, specialized l CAVERNsoft –Shared state for collaborative environments l Condor, Nimrod-G –High-throughput computing l Parallel Application Workspace (PAWS) –High-speed parallel transfers for coupled apps
Globus Progress l Selected “Grid Services” are being migrated into the infrastructure –Grid information service –Grid security infrastructure –Grid resource management services l Simultaneously these and other Globus services are being applied to develop –Grid-enabled tools –Grid-enabled applications l An ongoing iterative refinement process
tomographic reconstruction real-time collection wide-area dissemination desktop & VR clients with shared controls Advanced Photon Source Case Study 1: Online Instrumentation archival storage DOE X-ray source grand challenge: ANL, USC/ISI, NIST, U.Chicago
CMT Processing Now
Additional Opportunities l End-to-end advance reservation of network, storage, computers l Dynamic discovery and allocation of supercomputers, networks, etc. l Adaptive determination of display resolution, reconstruction fidelity, etc., etc. l Reliable multicast for data, control, video l Access control and discovery for collaborative sessions l Integration with mass storage systems
Case Study 2: Distributed Supercomputing SF-Express Distributed Interactive Simulation: Caltech, USC/ISI l Starting point: SF-Express parallel simulation code l Globus mechanisms for –Resource allocation –Distributed startup –I/O and configuration –Fault detection l 100K vehicles (2002 goal) using 13 computers, 1386 nodes, 9 sites NCSA Origin Caltech Exemplar CEWES SP Maui SP
OVERFLOW with latency-tolerant algorithms MPICH-G “Grid-enabled” message passing Globus services SecurityDirectoryScheduling Process mgmtCommunication ARC SGI O2000 (California) Argonne SGI O2000 (Illinois) OVERFLOW simulation: NASA Ames
Case Study 3: Collaborative Engineering CAVERNsoft: UIC Electronic Visualization Laboratory l Manipulate shared virtual space, with –Simulation components –Multiple flows: Control, Text, Video, Audio, Database, Simulation, Tracking, Haptics, Rendering l Uses Globus comms: (un)reliable uni/multicast l Future: Security, QoS, allocation, reservation
Case Study 4: High-Throughput Computing Nimrod-G: Monash University Cost Deadline Available Machines l Schedule many independent tasks (e.g., parameter study) l Uses Globus security, discovery, data access, scheduling l Future: Reservation, accounting, code management, etc.
l Problem solving environment for comp. chemistry l Globus services used for authentication, remote job submission, monitoring, and control l Future: distributed data archive, resource discovery, charging Case Study 5: Problem Solving Environment ECCE’: Pacific Northwest National Laboratory
Summary l Grids require Grid services that make resources accessible and usable and Grid toolkits for application development l The Globus project is building essential services and partnering with tool developers l Significant success stories in a range of problem classes l We’re looking forward to working with applications throughout the community!