Introduction to Grid Technology Antun Balaz SCL, Institute of Physics Belgrade Serbia antun@ipb.ac.rs 25/03/2011
Agenda NGI_AEGIS, EGI and EGI-InSPIRE AEGIS infrastructure and management gLite overview and basic services Managing jobs with gLite
NGI_AEGIS and EGI EGI.eu created in February 2010 Established as an international consortium based in Amsterdam Serbia represented in the EGI Council and other bodies by IPB Coordinates EGI-InSPIRE project, May 2010 – April 2014 IPB represents Serbia as a partner
EGI-InSPIRE FP7 RI-261323 project, ESFRI WP1: Management (NA1) WP2: External relations (NA2) WP3: User community coordination (NA3) WP4: Operations (SA1) WP5: Provisioning the software infrastructure (SA2) WP6: Services for HUC (SA3) WP7: Operational tools (JRA1)
IPB and EGI-InSPIRE IPB is involved in NA2, NA3, SA1 Operations: AEGIS operations Coordination of middleware deployment OMB OTAG
AEGIS infrastructure (1) Production: AEGIS01-IPB-SCL (704 CPUs, 26 TB) AEGIS02-RCUB (48 CPUs, 113 GB) AEGIS03-ELEF-LEDA (64 CPUs, 1.5 TB) AEGIS04-KG (48 CPUs, 480 GB) AEGIS07-IPB-ATLAS (128 CPUs) AEGIS11-MISANU (64 CPUs)
AEGIS infrastructure (2) Certification: AEGIS05-ETFBG AEGIS09-FTN-KM Demo/training: AEGIS08-IPB-DEMO New: UOB Faculty of Physics
AEGIS management NGI_AEGIS management (A. Balaz, D. Vudragovic, V. Slavnic) Helpdesk: helpdesk.aegis.rs Nagios: nagios.aegis.rs Mailing lists
gLite – Grid middleware The Grid relies on advanced software – the middleware - which interfaces between resources and the applications The GRID middleware Finds convenient places for apps to run Optimises use of resources Organises efficient access to data Deals with authentication at different sites Runs the job & monitors progress Transfers the result back to the scientist
gLite – Overview First release 2005 currently gLite 3.13.2 Developed from existing components (globus, condor,..) Interoperability & Co-existence with deployed infrastructure Robust: Performance & Fault tolerance Open Source license
Set of basic Grid services Job submission/management File transfer (individual, queued database access) Data management (replication, metadata) Monitoring/Indexing system information Advanced School in High Performance and GRID Computing – Concepts and Applications, ICTP, Trieste, Italy
Basic services of gLite User Interface Information System Workload Management System Submit job query Retrieve status & output create credential query publish state Submit job File and Replica Catalog Retrieve output Job status Logging Computing Element Storage Element Site X Job status Authorization Service (VOMS) process Logging and bookkeeping
User interface Local Workstation User describes job in text file using Job Description Language Submits job to WMS using (usually) the command-line interface ssh UI UI (user interface) has preinstalled client software WMS Workload Management System CEs
Managing jobs with gLite User Interface Submit Input “sandbox” Information System stderr.txt User interface stdout.txt Get output Output “sandbox” Job status update Job Submit Event Status / log query stderr.txt stdout.txt publish state Input “sandbox” Output “sandbox” Slide inherited from EDG – European Data Grid Job status update A worker node is allocated by the local jobmanager Logging & bookkeeping STD input stream is read from file STD out and err. streams are redirected into files stderr.txt /bin/hostname stdout.txt Computing Element
Characteristics of resources Location of files LFC Network Daemon User Interface Characteristics of resources Workload Manager Inform. Service Job Contr. - CondorG CE characts & status WMS SE characts & status Computing Element Storage Element
glite-wms-job-submit myjob.jdl Daemon responsible for accepting incoming requests waiting submitted LFC Network Daemon User Interface JDL Input Sandbox files Workload Manager Inform. Service RB storage glite-wms-job-submit myjob.jdl Job Contr. - CondorG CE characts & status WMS SE characts & status Computing Element Storage Element
WM: responsible to take the appropriate actions to satisfy the request waiting submitted LFC Network Daemon User Interface Job Workload Manager Inform. Service RB storage WM: responsible to take the appropriate actions to satisfy the request Job Contr. - CondorG CE characts & status WMS SE characts & status Computing Element Storage Element
RB WMS waiting submitted LFC Network Daemon User Interface Match- Maker/ Broker Workload Manager Inform. Service RB storage Where this job can be executed ? Job Contr. - CondorG CE characts & status WMS SE characts & status Computing Element Storage Element
Matchmaker: responsible to find the “best” CE where to submit a job waiting submitted LFC Network Daemon User Interface Matchmaker: responsible to find the “best” CE where to submit a job Match- Maker/ Broker Workload Manager Inform. Service RB storage Job Contr. - CondorG CE characts & status WMS SE characts & status Computing Element Storage Element
RB WMS waiting submitted LFC Network Daemon User Interface Match- Where is the needed InputData ? waiting submitted LFC Network Daemon User Interface Match- Maker/ Broker Workload Manager Inform. Service RB storage What is the status of the Grid ? Job Contr. - CondorG CE characts & status WMS SE characts & status Computing Element Storage Element
RB WMS waiting submitted LFC Network Daemon User Interface Match- Maker/ Broker Workload Manager Inform. Service RB storage CE choice Job Contr. - CondorG CE characts & status WMS SE characts & status Computing Element Storage Element
JA: responsible for the final “touches” waiting submitted LFC Network Daemon User Interface Workload Manager Inform. Service RB storage Job Adapter Job Contr. - CondorG CE characts & status JA: responsible for the final “touches” to the job before performing submission (e.g. creation of wrapper script, etc.) WMS SE characts & status Computing Element Storage Element
JC: responsible for the actual job management submitted waiting ready LFC Network Daemon User Interface Workload Manager Inform. Service RB storage Job Job Contr. - CondorG JC: responsible for the actual job management operations (done via CondorG) CE characts & status WMS SE characts & status Computing Element Storage Element
RB WMS submitted waiting ready scheduled LFC Network Daemon User Interface Workload Manager Inform. Service RB storage Job Contr. - CondorG Input Sandbox files CE characts & status WMS SE characts & status Job Computing Element Storage Element
RB WMS submitted waiting ready scheduled running LFC Network Daemon User Interface Workload Manager Inform. Service RB storage Job Contr. - CondorG Input Sandbox WMS “Grid enabled” data transfers/ accesses Computing Element Storage Element Job
RB WMS submitted waiting ready scheduled running done LFC Network Daemon User Interface Workload Manager Inform. Service RB storage Job Contr. - CondorG Output Sandbox files WMS Computing Element Storage Element
glite-wms-get-output <jobID> submitted waiting ready scheduled running done LFC Network Daemon User Interface Workload Manager Inform. Service RB storage glite-wms-get-output <jobID> Job Contr. - CondorG Output Sandbox WMS Computing Element Storage Element
RB WMS submitted LFC Network Daemon User Interface waiting ready Output Sandbox files Workload Manager Inform. Service RB storage scheduled Job Contr. - CondorG running done WMS cleared Computing Element Storage Element
Job monitoring glite-wms-job-status <jobID> glite-wms-job-logging-info <jobID> User Interface Network Daemon LB: receives and stores job events; processes corresponding job status LB proxy Workload Manager Job status Logging & Bookkeeping Job Contr. - CondorG WMS Log of job events Computing Element
Enjoy further details in presentations and hands-on sessions during the day!