Middleware Overview University of the Free State Albert van Eck

Slides:



Advertisements
Similar presentations
EGEE-II INFSO-RI Enabling Grids for E-sciencE The gLite middleware distribution OSG Consortium Meeting Seattle,
Advertisements

FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
Basic Grid Job Submission Alessandra Forti 28 March 2006.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
FESR Consorzio COMETA Grid Introduction and gLite Overview Corso di formazione sul Calcolo Parallelo ad Alte Prestazioni (edizione.
The gLite API – PART I Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Enabling Grids for E-sciencE Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information System on gLite middleware Vincent.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Security and Job Management.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Alexandre Duarte CERN IT-GD-OPS UFCG LSD 1st EELA Grid School.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America gLite Information System Claudio Cherubino.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
Segundo Taller Latino Americano de Computación GRID – Primer Taller Latino Americano de EELA – Primer Tutorial Latino Americano de EELA
INFSO-RI Enabling Grids for E-sciencE VOMS & MyProxy interaction Emidio Giorgio INFN NA4 Generic Applications Meeting 10 January.
13th EELA Tutorial, La Antigua, 18-19, October E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA
EGI-InSPIRE RI Grid Training for Power Users EGI-InSPIRE N G I A E G I S Grid Training for Power Users Institute of Physics Belgrade.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Overveiw of the gLite middleware Yaodong Cheng
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
EGEE Data Management Services
Jean-Philippe Baud, IT-GD, CERN November 2007
Practical using C++ WMProxy API advanced job submission
Gri2Win: Porting gLite to run under Windows XP Platform
Grid2Win Porting of gLite middleware to Windows XP platform
gLite Basic APIs Christos Filippidis
gLite Information System
Classic Storage Element
StoRM: a SRM solution for disk based storage systems
Workload Management System on gLite middleware
Practicals on VOMS and MyProxy
Practical: The Information Systems
gLite Grid Services Salma Saber
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Comparison of LCG-2 and gLite v1.0
Introduction to gLite GRID Enviroment
Grid2Win: Porting of gLite middleware to Windows XP platform
Introduction to Grid Technology
Grid2Win: Porting of gLite middleware to Windows XP platform
Workload Management System
Grid Services Ouafa Bentaleb CERIST, Algeria
gLite Information System
Short update on the latest gLite status
Gri2Win: Porting gLite to run under Windows XP Platform
gLite Information System
gLite Information System(s)
Data services in gLite “s” gLite and LCG.
EGEE Middleware: gLite Information Systems (IS)
gLite Grid Services Riccardo Bruno
Overview of gLite Middleware
Information and Monitoring System
Architecture of the gLite Data Management System
Information System (BDII)
Job Submission M. Jouvin (LAL-Orsay)
Information Services Claudio Cherubino INFN Catania Bologna
Presentation transcript:

Middleware Overview University of the Free State Albert van Eck Grid Application and Users Training School, Cape Town

Outline General overview Security System Information Service VOMS server LCAS LCMAPS Information Service Berkeley DB Information Index (BDII) Workload Management System JDL Computing Element Logging and bookkeeping Questions

gLite Middleware overview

The Middleware structure Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed to help the users building their computing infrastructure but should not be mandatory Foundation Grid Middleware are actually developed in EGEE Must be complete and robust Should allow interoperation with other major grid infrastructures Should not assume the use of Higher- Level Grid Services

gLite Services Decomposition Access CLI API Security Services Information & Monitoring Services Authorization Information & Monitoring Service Discovering Auditing Authentication Network Monitoring Data Services Job Mgmt. Services Metadata Catalog File & Replica Catalog Job Provenance Package Manager Accounting Storage Element Data Movement Computing Element Workload Management

Workload Management System (WMS)‏ gLite infrastructure Workload Management System (WMS)‏ Data Management

Security System

gLite Security Authentication based on X.509 PKI infrastructure Certificate Authorities (CA) issue (long lived) certificates identifying individuals (much like a passport) Trust between CAs and sites is established (offline) In order to reduce vulnerability, Grid user identification is done by (short lived) proxies of their certificates Proxies can Be delegated to a service such that it can act on the user’s behalf Include additional attributes (like VO information via the VO Membership Service VOMS) Be stored in an external proxy store (MyProxy) Be renewed (in case they are about to expire)

X.509 Proxy Certificate Proxy: GSI extension to X.509 Identity Certificates signed by the normal end entity cert (or by another proxy). It enables single sign-on. It supports some important features: Delegation, Mutual authentication It has a limited lifetime (minimized risk of “compromised credentials”) It is created by the voms-proxy-init command Options for voms-proxy-init: -hours <lifetime of credential> -bits <length of key>

GRID Security: Components Large and dynamic population Different accounts at different sites Personal and confidential data Heterogeneous privileges (roles)‏ Desire Single Sign-On Users “Group” data Access Patterns Membership “Groups” Grid Sites Heterogeneous Resources Access Patterns Local policies Membership

VOMS: concepts VOMS Virtual Organization Membership Service: client Extends the proxy with info on VO membership, group, roles Fully compatible with GSI Each VO has a database containing group membership, roles and capabilities informations for each user User contacts VOMS server requesting his authorization info Server sends authorization info to the client, which includes it in a proxy certificate VOMS client Query Authentication Request Auth DB OK C=IT/O=INFN /L=CNAF /CN=Pinco Palla /CN=proxy VOMS AC

FQAN and AC VOMS uses the Fully Qualified Attribute Name (FQAN) to express membership and other authorization info Groups membership, roles and capabilities may be expressed in a format that bounds them together <group>/Role=[<role>][/Capability=<capability>] FQAN are included in an Attribute Certificate Attribute Certificates are used to bind a set of attributes (like membership, roles, authorization info etc) with an identity ACs are digitally signed VOMS uses AC to include the attributes of a user in a proxy certificate

VOMS Certificate AC is included by the client in a well-defined, non critical, extension assuring compatibility with GT-based mechanism asli@levrek:~$ voms-proxy-init --voms gilda Your identity: /C=IT/O=GILDA/OU=Personal Certificate/L=INFN/CN=Marco Fargetta/Email=Marco.Fargetta@ct.infn.it Enter GRID pass phrase: Creating temporary proxy .................................... Done Contacting voms.ct.infn.it:15001 [/C=IT/O=INFN/OU=Host/L=Catania/CN=voms.ct.infn.it] "gilda" Done Creating proxy .................................. Done Your proxy is valid until Tue Jun 26 03:16 asli@levrek:~$

VOMS Certificate Attributes asli@levrek:~$ voms-proxy-info -all subject : /C=IT/O=GILDA/OU=Personal Certificate/L=INFN/CN=Marco Fargetta/CN=proxy issuer : /C=IT/O=GILDA/OU=Personal Certificate/L=INFN/CN=Marco Fargetta identity : /C=IT/O=GILDA/OU=Personal Certificate/L=INFN/CN=Marco Fargetta type : proxy strength : 512 bits path : /tmp/x509up_u18948 timeleft : 11:57:20 === VO gilda extension information === VO : gilda subject : /C=IT/O=GILDA/OU=Personal Certificate/L=INFN/CN=Marco Fargetta issuer : /C=IT/O=INFN/OU=Host/L=Catania/CN=voms.ct.infn.it attribute : /gilda/Role=NULL/Capability=NULL attribute : /gilda/grelc/Role=NULL/Capability=NULL attribute : /gilda/grelc/das/Role=NULL/Capability=NULL attribute : /gilda/grelc/das/grelc.unile.it/Role=NULL/Capability=NULL attribute : /gilda/grelc/das/grelc.unile.it/sakila/Role=NULL/Capability=NULL attribute : /gilda/grelc/das/grelc02.unile.it/Role=NULL/Capability=NULL attribute : /gilda/grelc/das/grelc02.unile.it/sakila/Role=NULL/Capability=NULL timeleft : 11:57:48 asli@levrek:~$ Attributes

VOMS enabled Grid User can be in multiple VOs VO can have groups Aggregate rights VO can have groups Different rights for each Different groups of experimentalists … Nested groups VO has roles Assigned to specific purposes E,g. system admin When assume this role Proxy certificate carries the additional attributes

Information Service

Information Service What? Why? How? Who? System to collect information on the state of resources Why? To discover resources of the grid and their nature To check for health status of resources To provide data in order to manage the workload more efficiently How? Monitoring and publishing fresh data on the state of resources Adopting a well known data model Who? User searching specific resources for their activity Workload Management System Other monitoring system

Information Service Systems The gLite Data Model is based on Grid Laboratory Uniform Environment (GLUE) Schema The IS architecture used in gLite is Berkeley DB Information Index (BDII) has been adopted in LCG middleware as the Information System provider It is an evolution of the Globus Meta Directory System (MDS)‏ It is based on Lightweight Directory Access Protocol (LDAP) servers

GLUE Schema Describe the Grid resources information stored in the IS Independent from the underlying technology Actual release is mapped on LDAP XML ClassAd (Condor Matchmaking language)‏ The entities of the GLUE Schema are organised hierarchically Include the concept of Site, Cluster, Computing Element, Storage Element, and an abstraction of service

GRISs, local BDII and BDII Abbreviations: BDII: Berkeley DataBase Information Index GIIS: Grid Index Information Server GRIS: Grid Resource Information Server Each site can run a BDII. It collects the information given by the local BDIIs At each site, a *local* BDII collects the information given by the GRISs Local GRISes run on CEs and SEs at each site and report dynamic and static information This slide shows the BDII IS in a graphical format. At the bottom of the pyramid we have the Local Grid Resource Information Server GRISES, the GRISES run on the computing elements or on the storage elements and report dynamic an static information. Then we have the Grid Index Information Server, one for each site. A GIIS collects information given by the GRISes. At the top there are the BDIIs, a BDII collects information given by the GIISes. Users and other grid services can interrogate BDII to get information about Grid status.

BDII Users and other Grid services (such as the WMS) can interrogate BDIIs to get information about the Grid status. Each BDII collects information from the site GIISes (or local BDII) defined in a configuration file, which it accessed through a web interface. Every two minutes a cron-job runs a script and collects information (pull model) from all the GIIS (local BDII) listed in the configuration file

Summary The security system of gLite is based on X.509 certificates Users are identified by certificates VOMS server link user to VOs, groups and roles adding attributes to the proxy certificate LCAS and LCMAPS control the local access to the resources checking the user certificates Information System provided by gLite is the BDII The information are organised following the GLUE Schema Current implementation use only BDII to check the state of the resources The user can contact the top BDII in the hierarchy to get the information of all the resources

gLite Workload Management System

Outline gLite Overview Workload Management System Security overview WMS Architecture Job state machine Job Description Language Overview Security overview

gLite services User Interface Workload Management Logging & Bookkeeping Information System submit query retrieve discover services update credential publish state submit publish state query retrieve File and Replica Catalogs Site X Computing Element Storage Element Authorization Service

WMS Objectives The Workload Management System (WMS) comprises a set of Grid middleware components responsible for distribution and management of tasks across Grid resources. The purpose of the Workload Manager (WM) is to accept and satisfy requests for job management coming from its clients meaning of the submission request is to pass the responsibility of the job to the WM. WM will pass the job to an appropriate CE for execution taking into account requirements and the preferences expressed in the job description file The decision of which resource should be used is the outcome of a matchmaking process.

Job Description Language In gLite, Job Description Language (JDL) is used to describe jobs for execution on Grid. The JDL adopted within the gLite middleware is based upon Condor’s CLASSified Advertisement language (ClassAd). A ClassAd is a record-like structure composed of a finite number of attributes separated by semi-colon (;) A ClassAd is highly flexible and can be used to represent arbitrary services The Job Description Language (JDL) is the high level language that must be used by all grid users to submit their applications on the grid. With this language in fact the user specifies requirements and constrains of his application in order to be than analyzed/processed by the WMS The JDL is used in gLite to specify the job’s characteristics and constrains, which are used during the match-making process to select the best resources that satisfy job’s requirements.

JDL: an example Type = "Job"; JobType = "Normal"; Executable = "startGen4.sh"; Environment = {"CLASSPATH=./gfal.jar:./gint.jar","LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH","L CG_GFAL_VO=gilda","LCG_RFIO_TYPE=dpm"}; Arguments = " 0 0 10 4 10000 aliserv6.ct.infn.it lfn:/grid/gilda/valeria/2000pillar.dat /gilda/issgc07/"; StdOutput = "sample.out"; StdError = "sample.err"; InputSandbox = {"startGen4.sh","gint.jar","gfal.jar","libGFalFile.so"}; OutputSandbox = {"sample.err","sample.out","res.txt"}; Requirements = Member("GLITE- 3_1_0",other.GlueHostApplicationSoftwareRunTimeEnvironment); InputSandbox defines a set of local files that you want to be staged remotely for execution Requirements allows to specify a set of characteristic (hardware or software that you wish for the resource. StdError is the remote file where std error will be redirected OutputSandbox defines a set of remote files that you want to get back after execution StdOutput is the remote file where output will be redirected Executable indicates which file will be executed remotely Environment allows to specify env. variables which will be set at run time Arguments appends a string (to be used as argument) to Executable This slide shows an example of jdl. We can see some of the most important jdl attribute. JobType. Executable: a string representing the executable/command name. Environment: List of environment settings needed by the job to run properly. Arguments: a string containing all the job command line arguments. InputSandbox: List of files on the UI local disk needed by the job for running. The listed files will automatically loaded to the remote resource. OutputSandbox: List of files, generated by the job, which have to be retrieved. Requirement: Job requirements on computing resources

Workflows of jobs With a single request, multiple jobs can be generated and executed Direct Acyclic Graph (DAG) is a set of jobs where the input, output, or execution of one or more jobs depends on one or more other jobs A Collection is a group of jobs with no dependencies basically a collection of JDL’s nodeE nodeC nodeA nodeD nodeB A Parametric job is a job having one or more attributes in the JDL that vary their values according to parameters Using compound jobs it is possible to have one shot submission of a (possibly very large, up to thousands) group of jobs Submission time reduction Single call to WMProxy server / single Authentication and Authorization process Sharing of files between jobs Availability of both a single Job Id to manage the group as a whole and Job Ids for each single job in the group Un job non e’ solo la richiesta e successiva esecuzione di una singola computazione; queste possono essere diverse, e per qualche ragione nota all’utente essere fatta attraverso una singola istanza. E’ il caso dei cosiddetti job complessi

Job submission example [issgc59@issgc-ui ~]$ glite-wms-job-submit -d emidio -o jobid-file sfk-explorer.jdl Connecting to the service https://gilda-wms-01.ct.infn.it:7443/glite_wms_wmproxy_server====================== glite-wms-job-submit Success ======================The job has been successfully submitted to the WMProxyYour job identifier is:https://gilda-lb-01.ct.infn.it:9000/4OaQng0PdA1nZJZHMcilqAThe job identifier has been saved in the following file:/home/issgc59/jid=====================================================================

Logging and Bookkeping Every step of the job life cycle is logged on a service called Logging and Bookkeeping It is useful for users willing to know the status of their execution when a job is submitted the UI logs it on LB As result of submission a job identifier is returned WMS logs each step of scheduling CE logs when it receive a job (scheduled), when it’s running and when it’s done Users can query the job status to the LB providing the job id Asynchronous updates.... https://gilda-lb-01.ct.infn.it:9000/fw4Ua8b_7Z8Vd8oJC74NCw

The Computing Element The CE is the front-end machine (master node) to a local batch system supported batch systems are PBS(Torque/MAUI), LSF, Condor WMS “pushes” job execution requests to the CE using condor-G when a CE receives a job, this is moved on a queue Then the job will be executed on the first available among its Worker Nodes (where the batch system clients run) when execution is complete, output files are copied to the CE using scp If the job is succesfully executed, output files are copied back to the WMS using globus-url-copy By queries to the LB, users knows when a job is done and they can retrieve the output

Summary WMS catchs users’ request for job executions Requests are expressed through JDL JDL allows to specify requirements that selected resources must have The WMS processes request and chooses (matchmaking) a Computing Element for the actual execution Status of resources is known to WMS with queries to BDII The CE tries to execute the job and copies back output files to WMS status of execution is logged on LB Users queries LB, discovers their job is done and download output files from WMS

Outline Grid Data Management Challenge gLite DM Services: Storage Elements and SRM File Catalog and Data Management tools (lcg_utils) The AMGA Metadata Service

The Grid DM Challenge Heterogeneity Distribution Data description Need common interface to storage resources Storage Resource Manager (SRM) Need to keep track where data is stored File and Replica Catalog Need scheduled, reliable file transfer File Transfer Service Need a way to associate descriptive information to files and query them Metadata Service Heterogeneity Data are stored on different storage systems using different access technologies Distribution Storage systems are located in different locations – in most cases there is no shared file system or common namespace Data need to be moved between different locations Data description Data are stored as files: need a way to “describe” files and locate them according to their content

Storage Resource Management Storage Resource Management needs to take into account Transparent access to files (migration to/from disk pool) File pinning Space reservation File status notification Life time management The SRM (Storage Resource Manager) takes care of all these details a single interface that takes care of local storage interaction and provides a Grid interface to the outside world In gLite, interactions with the SRM are hidden by higher level services (DM lcg_utils tools and APIs)

gLite Storage Element disk pools NAS MSS

Files Naming conventions Logical File Name (LFN) An alias created by a user to refer to some item of data, e.g. “lfn:/grid/gilda/20030203/run2/track1” Globally Unique Identifier (GUID) A non-human-readable unique identifier for an item of data, e.g. “guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6” Site URL (SURL) (or Physical File Name - PFN) The location of an actual piece of data on a storage system, e.g. “srm://grid009.ct.infn.it/dpm/ct.infn.it/gilda/output10_1” (SRM) Transport URL (TURL) Temporary locator of a replica + access protocol: understood by a SE, e.g. “rfio://lxshare0209.cern.ch//data/alice/ntuples.dat”

What is a file catalog File Catalog SE SE gLite UI SE

Metadata on the Grid Information about files -- but not only! metadata can describe any grid entity/object ex: JobIDs - add logging information to your jobs monitoring of running applications: ex: ongoing results from running jobs can be published on the metadata server Inputset for a storm of parametric jobs information exchanging among grid peers ex: producers/consumers job collections: master jobs produce data to be analyzed; slave jobs query the metadata server to retrieve input to “consume” Simplified DB access on the grid Grid applications that needs structured data can model their data schemas as metadata

Data Management Services Summary Storage Elements – save date and provide a common interface Storage Resource Manager (SRM) Castor, dCache, DPM, … Native Access protocols rfio, dcap, nfs, … Transfer protocols gsiftp, ftp, … Catalogs – keep track where data are stored File Catalog Replica Catalog Metadata Catalog Data Movement – schedules reliable file transfer File Transfer Service gLite FTS (manages physical transfers) LCG File Catalog (LFC) AMGA Metadata Catalogue

gLite Grid Storage Requirements The Storage Element is the service which allows a user or an application to store data for future retrieval Manage local storage (disks), networked storage(via gpfs/nfs) and interface to Mass Storage Systems(tapes) like HPSS, CASTOR, DiskeXtender (UNITREE), … provide the abstraction of different storage backend as a whole gLite requirements: Be able to manage different storage backend uniformly and transparently for the user (providing an SRM interface) Support basic file transfer protocols Globus GridFTP mandatory Others if available (https, ftp, etc) Support a native I/O (remote file) access protocol POSIX (like) I/O client library for direct access of data

References gLite GILDA Infrastructure VOMS GGF Security GLUE Schema http://www.glite.org GILDA Infrastructure https://gilda.ct.infn.it/ VOMS http://infnforge.cnaf.infn.it/projects/voms GGF Security http://www.gridforum.org/security GLUE Schema http://glueschema.forge.cnaf.infn.it/ EGEE http://www.eu-egee.org

Questions ? www.glite.org