Introduction to gLite GRID Enviroment Riccardo Rotondo (riccardo.rotondo@garr.it) Consortium GARR Tutorial for Grid Application Porting on Grid Science Gateway Bogotá, 04.06.2012
The Grid A GRID is a distributed computing and storage infrastructure – spanning several administrative domains - allowing sharing of resources in a coordinated manner by a set of homogeneous users organized within Virtual Organizations A GRID provides access to a large variety of resources and an added value with respect to the bare sum of its components GRIDS are the key enabler of e-Science Bogotá, 04.06.2012
A GRID Metaphore Using a PC or a work station Login using username & password (“Authentication”) Owning some rights (“Authorisation”) Run programs or jobs manage files: create, read, list Components are interconnected by a bus You are using the operating system There is only one administrative domain Using the GRID Login using digital credentials (“Authentication”) Owning some rights (“Authorisation”) Run programs or jobs manage files: create, read, list Components are interconnected by internet You are using the GRID middleware There are many administrative domain Bogotá, 04.06.2012
GRID Requirements Heterogeneous (OSes, Devs, Apps.) VO Resource Sharing (Management, Security and Accounting) Resource Utilisation (Reservation, Metering, Monitoring and Logging) Job Execution (VO access, QoS, LCM, WFM, SLA) Data Services (Integration, Provisioning, Cataloguing, Metadata) Security (Authentication, Authorisation and Auditing) Administrative Costs (Provisioning, Deployment and Configuration) Scalability Availability (Disaster Recovery, Fault Management) Specific requirements (EGI: HEP, BioMed,…) QoS – quality of service LCM – Local Credential Maping WFM – Workflow management system SLA – Service Layer Architecture Bogotá, 04.06.2012
gLite Services worker node Bogotá, 04.06.2012
Grid components Authorization and Authentication Security Authorization and Authentication Users/Host/Robot certificates CA/RA concepts Proxy certificates IGTF Authorization providers (VOMS) Access Information Sys. Job Management Data/Metadata Bogotá, 04.06.2012
AuthN/AuthZ Resources are generally owned by VOs that allow access to them based on the “role” of the user and/or its belonging to a specific “group” Every user, server or service is identified by means of a digital certificate (X509) certifying its identity (Authentication) Access to resources takes place in a safe way (integrity, confidentiality), using a granularity which can go at the single user level Each VOs associate resource access rights accordingly to the user “group” and “role” (Authorization) Authorization granularity can go at the single user level Bogotá, 04.06.2012
Certificate issuing User certificates Host certificates The user will be identified by a Registration Authority (RA) The RA releases a PIN The user asks to the CA for a personal certificate using the PIN The request acknowledged by mail exchanges The user receives the certificate Host certificates They are linked to the ‘hostname’ of the server Robot certificates Certificates securely stored into HW devices protected by PIN Mostly used by GRID service providers Not all CAs are supporting yet Robot certificates Bogotá, 04.06.2012
Certificate Proxies Personal certificates are not directly exposed Most of Grids use Temporary certificates (proxies) Normal lifetime 12h The Original Certificate will be not exposed Proxies are certificates digitally signed by the original certificate or another proxy (delegation) GRID Services may operate on the user behaf (SSO) Proxies may be securely stored (i.e. Globus and gLite: MyProxy) Stored proxies may be used to renew other proxies Digitally Signed by CA Digitally Signed by User Cert Digitally Signed by Prev. Cert CA … Self signed Bogotá, 04.06.2012
IGTF Most of GRID infrastructure accept only certificates released by accredited Certification Authorities The International Grid Trust Federation collects all accredited Cas Generation of CAs encouraged while developing NGIs Bogotá, 04.06.2012
Authorization providers VOs own phisical resources GRID Authotization services guarantee the correct user access rights mapping users to configured ‘pool accounts’ Most of Grid Infrastructures use VOMS Virtual Organization Membership Service Allow the creation of Groups of users Allow the creation of different roles among existing groups Before to access VOs resources users must request the membership and agree the AUP A GRID site may support one or more VOs VOMS extends Proxy certificate with further information related to User Group User Role VO resource access expiration Bogotá, 04.06.2012
Grid components The User Interface Grid Portals APIs Science Gateways Security The User Interface Grid Portals APIs Science Gateways Access Information Sys. Job Management Data/Metadata Bogotá, 04.06.2012
User Interface Most of GRID infrastructure provides CLI Unix/Windows/Mac machine with client applications installed User account created after subscribing a VO User interfaces could be Centralized servers (many users) Virtualized machines (single/low number of users) Software packages (single user) High level user interfaces (GUI) Applications offering graphic front-end to existing UI client applications $> cmd Bogotá, 04.06.2012
Grid portals Web front end to GRID capabilities Offering a generic interface to GRID resources Need user certificate configured into the web browser Genius Web Portal P-GRADE Bogotá, 04.06.2012
Science Gateways Community-developed set of tools, applications, and data that is integrated via a portal or a suite of applications No general purpose GRID interaction No longer requestet to deal with digital certificates Just need to belong to a Community though an Identity Federation Bogotá, 04.06.2012
Grid components GRID Information system (GLUE) Security GRID Information system (GLUE) Berkley Database Info. Index (BDII) Access Information Sys. Job Management Data/Metadata Bogotá, 04.06.2012
GLUE schema Most of GRID infrastructures uses the GLUE* schema to represent resource information GLUE Schema is an abstract modeling for Grid resources developed by the Open Grid Forum (OGF) There are many implementation of the GLUE schema LDAP, RDBMS, XML, … The most famous implementation of the GLUE schema is the BDII UML representation Grid Laboratory Uniform Environment Bogotá, 04.06.2012
Information System and Monitoring Berkeley Database Information Index (BDII) (The LDAP implementation of GLUE) The information hierarchically stored via tree modeling GRIS Stores information at resource level Site BDII/GIIS(deprecated) Stores information at site level BDII Stores information at VO level VO Level Site Level Resource Level Bogotá, 04.06.2012
Grid components GRID Job Workflow Resource Manager Computing Element Security GRID Job Workflow Resource Manager Computing Element Access Information Sys. Job Management Data/Metadata Bogotá, 04.06.2012
Overview of a GRID job CE … Resource Manager Computer Farm WN $> cmd http://www CE … Computer Farm WN Job output SUBMITTED WAIT READY SCHEDULED RUNNING DONE (OK) DONE (Failed) CLEARED CANCELLED ABORTED Bogotá, 04.06.2012
Resource Manager Two main components Set of middleware components responsible of distribution and management of jobs across Grid resources. Two main components Workload Manager Accepts and satisfy requests for job management. (Matchmaking) is the process of assigning the best available resource. Logging & Bookeeping keeps track of job execution in term of events (Submitted, Running, Done, Abort) Bogotá, 04.06.2012
Computing Element Service that represents the computing resource that is responsible of to manage the queue of jobs to execute The CE may be used by a Generic Client: an end-user interacting directly with the Computing Element, or by the Resource Manager, which submits a given job to an appropriate CE found by the matchmaking process. Two job submission models : PUSH (Eager Scheduling) (jobs pushed to CE), PULL (Lazy Scheduling) (jobs received when CE has free slots) Bogotá, 04.06.2012
Grid components Storage Elements File Catalog Security Access Information Sys. Job Management Data/Metadata Bogotá, 04.06.2012
Storage Element Storage back-end (Drivers and Hardware) Storage Element services Storage back-end (Drivers and Hardware) Abstraction Layer (SRM) (Interface to manage the specific storage solution: dpm, rfio, …) Transfer service (Protocols: GridFTP(gsiftp), glubus-url-copy, …) Native POSIX like file I/O API (GFAL) Auxiliary Accounting and Logging services Data are stored on Disk Pool Servers or Mass Storage Systems File replicas Reliability, Geographic coverage, Fault tollerance, Network latences Bogotá, 04.06.2012
File Catalog Maps SE files with a human readable ‘filename’ LFN (Logical file name) GUID (Grid unique identifier) SimLinks SURL (Site URL) TURL (Transfer URL) Bogotá, 04.06.2012
Questions … Bogotá, 04.06.2012