The Grid - Multi-Domain Distributed Computing Kai Rasmussen Paul Ruggieri
Topic Overview The Grid Types Virtual Organizations Security Real Examples Grid Tools Condor Cactus Cactus-G Globus OGSA
The Grid What is a Grid system? Highly heterogeneous set of resources that may or may not be maintained by multiple administrative domains Early idea Computational resources would be universally available as electric power
“A hardware and software infrastructures that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities” - Ian Foster Resources are distributed across sites and organizations with no centralized point of control Resources are distributed across sites and organizations with no centralized point of control What constitutes a Grid? What constitutes a Grid? Resources coordinated without being subjected to a centralized control Resources coordinated without being subjected to a centralized control Uses standard, open source protocols and interfaces Uses standard, open source protocols and interfaces Delivers non-trivial qualities of service Delivers non-trivial qualities of service
Grid Types Computations Grids Resource pure CPU Strength: Computational Intensive applications Data Grids Shared storage and data Terabytes of storage space. Sharing of data among collaborators Fault Tolerance Equipment Grids Set of resources that surround shared equipments, such as a telescope
Virtual Organizations Grids are Multi-domain Resources are administrated by separate departments or institutions All wish to maintain individual control There is a cross site grouping of collaborators sharing resources “Virtual Organization”
Virtual Organizations Users of VO’s share a common goal and trust Collection of resources, users and rules governing sharing Highly controlled - What is Shared? Who is Sharing? How can resources be used? One global domains acting over individual collaborating domains
Grid Security Highly distributed nature VOs spread over many security domains Authentication Proving identity Authorization Obtaining privileges Confidentiality & Integrity Identity and privileges can be trusted
Authentication Certificate Authority (CA) Entity that signs certificate that proves users identity Certificate then used as credentials to use system Typically several CAs to prevent single point of failure/attack Globus Grid Security Infrastructure (GSI) Globus’s Authentication component Global security credential later mapped to local Kerberos tickets or local username and password Typically generate short-term proxy certificate with long-term certificate
Authentication Certification Authority Coordination Group Maintains a global infrastructure of trusted CA agents CA must meet standards Physically secure Must validate identity with Registration Authorities using official documents or photographic identification Private Keys must be minimum of 1020 Bits and have max 1 year life 28 approved CAs is European union
Security Issues Delegation User entrusts separate entity to perform task Entity must be given certification and trusted to behave Limit proxies strength Endow proxy with specific purpose
Grid Projects EGEE - Enabling Grids for eScience 70 sites in over 27 countries Mostly European 40 Virtual Organizations GENIUS Grid-Portal is used for submission Individual collaborators use own middle-ware tools to group resources
LCG Large Hadron Collider Computation Grid Developed distributed systems needed to support computation and data needs of LHC physics experiments EGEE Collaborator 100 Sites Worlds largest Grid
Grid 2003 US effort 27 National sites Processors, Simultaneous Jobs Infrastructure for Particle Physics Grid Virtual Data Grid Laboratory Develop Application Grid Laboratory - Grid3 Platform for experimental CS Research Built on Virtual Data Toolkit Collection of Globus, Condor and other middleware tools
TeraGrid 40 Teraflops of Computational Power 8 National Sites with strong backbone Used for NSF sponsored High Performance Computing Mapping the human arterial tree model TeraShake - Earthquake simulation
Applications Climate Monitoring + Simulation Network Weather Service Climate Data-Analysis Tool Both run on the Earth System Grid running on Globus MEANDER nowcast meteorology Run on Hungarian Supergrid ATLAS Challenge Simulate high energy proton-proton collisions Computational Science Simulations Biology, Fluid Dynamics
Grid Tools Many middleware implementations Globus Condor Condor-G Cactus-G OGSA Solves common Grid problems Resource discovery/management/allocation Security/Authentication
Condor Initially developed in 1983 at University of Wisconsin Pre-Grid tool A Local Resource Management System Allows creation of communities with distributed resources Communities should grown naturally Sharing as much or as little as they care too Sounds like Virtual Organizations
Condor Responsibilities Job Management, Scheduling Resource monitoring and management Checkpointing and Migration Utilize idle CPU Cycle ‘Scavenge
Condor Pool Full set of users and resources in community Composed of three Entities Agent Finds resources and executes jobs Resource Advertise itself and how it can be used in pool Matchmaker Knows of all agents and resources Puts together compatible pairs Pool is defined by single matchmaker
Matchmaking Problem of centralized Scheduling Resources have multiple owners Unique use requirements Matchmaking finds balance between user and resource needs ClassAds Agents advertise requirements Resources advertise how it can be used
Matchmaking Matchmaker scans all known ClassAds Creates matching pairs of agents and resources Informs both parties Individually responsible to negotiate job and initiating execution of job Separation of matching and claiming Matchmaker unaware of complicated allocation Stale information may exist. Resource can deny match
Condor Flocking Linking condor pools necessary for collaboration Sharing of resources beyond the organizational level Individuals belonging to multiple communities Gateway Flocking Entire communities are linked Direct Flocking Individual collaborators belong to many pools
Gateway Flocking Gateway entity serves as a singular point of access for cross pool communication Matchmakers talk to Gateways Gateways talk to Gateways Transparent to user Organizational level sharing Powerful, but difficult to setup and maintain
Gateway Flocking
Direct Flocking Agents report to multiple matchmakers Individual collaboration Natural idea for users Less powerful but simpler to build and deploy Eventually used in favor Gateway Flocking
Direct Flocking
Cactus General-purpose, open-source parallel computation framework Developed for numerical solution to Einstein’s equation Two main components flesh and thorns Flesh – central core Thorns – application modules Provides simple abstract API Hides MPI parallel driver, I/O (thorns)
Cactus-G “Grid-enabled” Cactus Combines Cactus and MPICH-G2 (more later) Layered approach Application thorns Grid-aware infrastructure thorns Grid-enabled communication library (MPICH-G2 in this case)
Globus Condor Pre-Grid tool applied to Grid Systems Multi-domain possible but limited No security. Focus primarily on resource management Globus Set of Grid specific tools Extendable and Hierarchical
The Toolkit Globus Toolkit Components for basic security, resource management, etc Well defined interfaces - “Hour-glass” architecture Local services sit behind API Global services built on top of these local services Interfaces useful to manage heterogeneity Information Service integral component Information-rich environment needed
Globus Services
Resource Management Globus Resource Allocation Manager (GRAM) Responsible for set of local resources Single domain Implemented with set a local RM tools Condor, NQE, Fork, Easy-LL, etc… Resource requests expressed in Resource Specification Language (RSL
Resource Broker Manages RSL requests Uses Information services to discover GRAMS Transforms abstract RSLs into more specific requirements Sends allocation requests to appropriate GRAM
Information Service Grid always in flux Information rich system produces information users find useful Enhances flexibility and performance Necessity for administration Globus Metacomputing Directory Service (MDS) Stores and makes accessible Grid information Lightweight Directory Access Protocol (LDAP) Extensible representation for information Stores component information in directory information tree
Security Local Heterogeneity Resources operated in multiple security domains All use different authentication techniques N-Way authentication Job may be any number of processes on any number of resources One logical entity. User should only authenticate once.
Security Globus Security Infrastructure (GSI) Modular design constructed on top of local services Solves local heterogeneity Globus Identity Mapped into local user identities by local GSI Allows for n-way authorization
OGSA Open Grid Services Architecture Defines a Grid Service Provides standard interface for naming, creating, discovering a Grid Service Location Transparent Globus Toolkit GRAM – resource allocation/management MDS-2 – information discovery GSI – authentication (single sign-on) Web services Widely used Language/system independent
OGSA – Grid Service Interface
OGSA – VO Structure
Condor-G Hybrid Condor-Globus System Local Condor agent (Condor-G) Communicates with Globus GRAM, MDS, GSI, etc Optimized Globus’s GRAM to work with Condor better
Specific Testbed Grid2003 Organized into 6 VOs (one for each application) At each VO site, middleware installed with grid certificate databases GSI, GRAM, and GridFTP used from Globus MDS MonALISA Agent-based monitoring used in conjunction with MDS
MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface Nicholas Karonis, Brian Toonen, Ian Foster
Abstract Grid Enabled MPI implementation Extends MPICH Utilizes Globus Toolkit Authentication, Authorization, Resource Allocation, Executable Staging, I/O, Process management creation and control Hide/Expose critical aspects of heterogeneous environment
The Problem Grids difficult to program for… heterogeneous, highly distributed Build on existing MPI API MPICH specifically Can we implement MPI constructs in a highly heterogeneous environment efficiently and transparently? Yes, use Globus! Can we also allow users to manage heterogeneity? Yes, existing MPI Communicator Construct!
MPICH-G2 Global Security Infrastructure (GSI) Single sign-on authentication Monitoring and Discovery Service (MDS) Select nodes to execute on Resource Specification Language Generated by mpirun Specifies job resource requirements Dynamically-Updated Request Online Coallocator (DUROC)
MPICH-G2 Flow Diagram
MPICH-G2 Improvements Replaces MPICH-G Replace use of Nexus (Globus) for all communication with optimized code Increased Bandwidth Cutout extra layer (Nexus) Reduce intra-machine vendor MPI messaging latency Eliminate unnecessary polling based on source rank info (for Recv) Specified, Specified-pending, multimethod (more later) Only poll TCP (expensive) when necessary (ie using TCP not vendor MPI)
MPICH-G2 Improvements 2 More efficient use of sockets Uses one socket for both directions Multilevel topology-aware collective operations Collective operations originally implemented assuming equidistance Not likely in Grid scenario
App Heterogeneity Management Topology Discovery Need method of discovering topology to minimize expensive transfers intra-site communication vs intra-machine communication Use existing MPI communicator construct Associate attributes with communicators Topology depths and colors Allow MPI developers to create communicators which group processes topologically
Example MPICH-G2 App
Performance Groupings Specified MPI_Recv explicitly specifies process on same machine No outstanding asynchronous operations Explicitly call vendor MPI Specified-pending MPI_Recv explicitly specifies process on same machine Outstanding recv requests on same machine Forced to continuously poll vendor MPI Multimethod MPI_Recv source rank is MPI_ANY_SOURCE OR outstanding recv requests which may require TCP Forced to continuously poll vendor MPI and TCP
Vendor MPI Results Increased performance compared to MPICH-G Relatively close performance to straight vendor MPI
Vendor MPI Results
TCP/IP Results Similar results as Vendor MPI (less interesting) Authors explicitly say they did not attempt to modify the TCP code
TCP/IP Results
Conclusions Good performance Improved performance opposed to previous version “good enough” performance to justify use Eases transition of MPI applications to the context of a Grid Just works Provides developer with a relatively simply means of writing “smart” apps which are aware of their topology
P-GRADE Portal
MTA SZTAKI Computer and Automation Research Institute of the Hungarian Academy of Sciences Laboratory of Parallel and Distributed Computing Peter Kacsuk Joszef Patvarczki HunGrid Member of both SEE-Grid and EGEE
Two Grid Problems Middleware tools build together into a Grid Too many complex parts Confusing for users with little experience Mostly research scientists PVM and MPI allow for Parallel execution Executed within a Globus or Condor site shows good performance Performance decreases when executed in multiple sites
P-GRADE Portal A Web based Portal for accessing Grid High level tools hide complexity of middleware Can be accessed anywhere Workflow solution Complex problems are broken into several parts treated as single framework Executed as an acyclic graph Parallelism at two levels Independent branches run on several grid sites Individual nodes can be parallel programs (MPI or PVM)
Portal Fully functional; built upon middleware tools Grid Certificate management Setting up Grid environment Creation and modification of workflow apps Management and parallel execution of workflow apps on grid resources Visualization of workflow progress
Grid Certificate Security done through Globus GSI Connect to Proxy server; download Certificate Monitor status
Resource Management Use Globus tools to attach jobs to resources Two Strategies Static Allocation Connect Directly to GRAM Servers Dynamic Allocation Connect to MDS service Allocate through Grid resource broker
Workflow Creation & Monitoring P-GRADE Java app for creating parallel workflows Directed input and output files
Parameter Study Singular job run under varying input parameters Outputs later compared against each other Logical Grid Application Each job independent and can be run in parallel
P-GRADE Portal w/ PStudy Adaped Portal to create and manage Parametric studies New workflow Editor Creation of parameterized input file Manage parameter values Workflow Management Submit workflows by parameter ranges Compare outputs Monitor individual job status
Pstudy Manager
Visualization
PGRADE Demo ere/gridsphere ere/gridsphere ere/gridsphere