INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org The gLite Workload Management System Elisabetta Molinari (INFN-Milan) on behalf of the JRA1.

Slides:



Advertisements
Similar presentations
Workload Management David Colling Imperial College London.
Advertisements

EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
Development of test suites for the certification of EGEE-II Grid middleware Task 2: The development of testing procedures focused on special details of.
INFSO-RI Enabling Grids for E-sciencE Architecture of the gLite Workload Management System Giuseppe Andronico INFN EGEE Tutorial.
E-infrastructure shared between Europe and Latin America 12th EELA Tutorial for Users and System Administrators Architecture of the gLite.
SEE-GRID-SCI Hands-On Session: Workload Management System (WMS) Installation and Configuration Dusan Vudragovic Institute of Physics.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
Special Jobs Claudio Cherubino INFN - Catania. 2 MPI jobs on gLite DAG Job Collection Parametric jobs Outline.
EGEE-II INFSO-RI Enabling Grids for E-sciencE International Summer School on Grid Computing 2006 gLite Information System and Workload.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Special Jobs Matias Zabaljauregui UNLP.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
The gLite API – PART I Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Experience with the gLite Workload Management.
Enabling Grids for E-sciencE Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
Nadia LAJILI User Interface User Interface 4 Février 2002.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
INFSO-RI Enabling Grids for E-sciencE Status and Plans of gLite Middleware Erwin Laure 4 th ARDA Workshop 7-8 March 2005.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Security and Job Management.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite job submission Fokke Dijkstra Donald.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Provenance Challenge gLite Job Provenance.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Using gLite API Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Feb. 06, Introduction to High Performance and Grid Computing Faculty of Sciences,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The usage of the gLite Workload Management.
INFSO-RI Enabling Grids for E-sciencE Workflow Management in Giuseppe La Rocca INFN – Catania ICTP/INFM-Democritos Workshop on Porting.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
EGEE is a project funded by the European Union under contract INFSO-RI Practical approaches to Grid workload management in the EGEE project Massimo.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
Enabling Grids for E-sciencE The gLite Workload Management System Alessandro Maraschini OGF20, Manchester,
INFSO-RI Enabling Grids for E-sciencE Claudio Cherubino, INFN Catania Grid Tutorial for users Merida, April 2006 Special jobs.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
INFSO-RI Enabling Grids for E-sciencE Job Workflows with gLite Emidio Giorgio INFN NA4 Generic Applications Meeting 10 January 2006.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Alexandre Duarte CERN IT-GD-OPS UFCG LSD 1st EELA Grid School.
Workload Management System Jason Shih WLCG T2 Asia Workshop Dec 2, 2006: TIFR.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMPROXY usage Álvaro Fernández IFIC (CSIC)
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Command Line Grid Programming Spiros Spirou Greek Application Support Team NCSR “Demokritos”
INFSO-RI Enabling Grids for E-sciencE EGEE is a project funded by the European Union under contract IST Job sandboxes.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMS tricks & tips – further scripting Giuseppe.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
Biomed tutorial 1 Enabling Grids for E-sciencE INFSO-RI EGEE is a project funded by the European Union under contract IST JDL Flavia.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
INFSO-RI Enabling Grids for E-sciencE Flexible Job Submission Using Web Services: The gLite WMProxy Experience Giuseppe Avellino.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Advanced Job Riccardo Rotondo
Introduction to Computing Element HsiKai Wang Academia Sinica Grid Computing Center, Taiwan.
Introduction to Job Description Language (JDL) Alessandro Costa INAF Catania Corso di Calcolo Parallelo Grid Computing Catania - ITALY September.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
FESR Trinacria Grid Virtual Laboratory Practical using WMProxy advanced job submission Emidio Giorgio INFN Catania.
Practical using C++ WMProxy API advanced job submission
Architecture of the gLite WMS
Workload Management System on gLite middleware
Workload Management System ( WMS )
The gLite Workload Management System
EGEE tutorial, Job Description Language - more control over your Job Assaf Gottlieb Tel-Aviv University EGEE is a project.
Alexandre Duarte CERN Fifth EELA Tutorial Santiago, 06/09-07/09,2006
5. Job Submission Grid Computing.
Job Description Language (JDL)
Job Submission M. Jouvin (LAL-Orsay)
Presentation transcript:

INFSO-RI Enabling Grids for E-sciencE The gLite Workload Management System Elisabetta Molinari (INFN-Milan) on behalf of the JRA1 IT-CZ cluster

Enabling Grids for E-sciencE INFSO-RI Outline The gLite Workload Management System Main differences to LCG-2 File Catalogs Interfaces supported WMProxy and new Job Types supported: –Parametric jobs, collections, dags,... High level job control tools Some testing results

Enabling Grids for E-sciencE INFSO-RI What the WMS Is The Workload Management System (WMS) is a collection of components providing a service responsible for the distribution and management of tasks across resources available on a Grid, in such a way that applications are conveniently, efficiently and effectively executed Tasks = Jobs to be submitted to the WMS are described via JDL (Job Description Language) and passed to the match-maker to find the best available resource that satisfies the requirements

Enabling Grids for E-sciencE INFSO-RI WMS Internal Architecture from “EGEE Middleware Architecture”, EU deliverable DJRA1.1, August

Enabling Grids for E-sciencE INFSO-RI WMS Components Overview The WMS components handling jobs are: –WMProxy: a service providing access to WMS functionality through a Web Services base interface. It validates, converts and prepares jobs and sends them to the WMS. –WorkLoad Manager: the core component of the Workload Management System. Given a valid request it has to take the appropriate actions to satisfy it, among which finding the resources that best match the given requirements. –Logging and Bookkeeping: provides support for the job monitoring functionality, it stores logging and bookkeeping information concerning events generated by the various components of the WMS. Using this information, the LB service keeps a state machine view of each job.

Enabling Grids for E-sciencE INFSO-RI Job Description Language Uniform language to express the characteristics, requirements and preferences of a job – [ Executable = “my_exe”; StdOutput = “out”; Arguments = “a b c”; InputSandbox = {“/home/user_1/my_exe”}; OutputSandbox = {“out”}; Requirements = Member( other.GlueHostApplicationSoftwareRunTimeEnvironment, "ALICE “ ); Rank = -other.GlueCEStateEstimatedResponseTime; RetryCount = 3 ]

Enabling Grids for E-sciencE INFSO-RI LCG2 – gLite Main differences New components in gLite: –Task Queue: the WM keeps a queue of pending submission requests, a list of jobs to be submitted together with their requirements. Non-matching requests will be retried periodically. –Information Supermarket: read-only cached repository of information on available resources –WMProxy Server: web server interface to submit jobs –LBProxy: more efficient and reliable Logging and Bookeeping server. –CondorC: reliable job submission between the WM and the CE (more reliable than Globus GRAM)

Enabling Grids for E-sciencE INFSO-RI Lcg – gLite differences cont'd New Features in gLite –Bulk submission:  DAGs, collections and parametric jobs –Shallow resubmission: re-submission of the job if it fails before the user job starts running(found to greatly improve success rate) –VoViews: support for VoViews with VOMS FQAN-based access control rule –Automatic zipping of sandboxes, sandboxes from/to gridftp servers –Voms extension renewal (via the glite-proxy-renewald service) –Job perusal : allows job's files content inspection while the job is running –Short Deadline Jobs: the job is submitted to the right queue on the CE (ShortDeadlineJob = true) –High Load Limiter: script to prevent new submissions in case of High Load of the WMS –Prolog and Epilog: scripts that are run before and after the user job starts running

Enabling Grids for E-sciencE INFSO-RI GLite File Catalogs Interface from the WMS to the following data catalogs (via the broker info component) is supported: – DLI – LCG Data Location Interface DataRequirements = { [ DataCatalogType = "DLI"; DataCatalog = " InputData = {"lfn:/my/test/data1","guid:44rr44rr77hh77kkaa3", "lds:my.test.dataset", "query:my_query"}; ] –Storage Index – gLite Storage Index [ DataCatalogType = "SI"; DataCatalog = " InputData = {"lfn:/eo/test.file", "guid:ddffrg5451"}; ]

Enabling Grids for E-sciencE INFSO-RI gLite File Catalogs cont'd  RLS – LCG Replica Location Service [ DataCatalogType = "RLS"; DataCatalog = " InputData = {"lfn:/atlas/test.file", "guid:ggrgrg5656"}; ] In the case of Storage Index and DLI the Data Catalog attribute is optional, if not specified the endpoint is found via Service Discovery.

Enabling Grids for E-sciencE INFSO-RI Some supported job types The gLite WMS supports some new job types: –Interactive Jobs: jobs whose standard streams are forwarded to the submitting client –MPICH: a parallel application using MPICH-P4 implementation of MPI –DAGs: direct acyclic graphs, set of jobs with dependencies –Collections: group of jobs with no dependencies –Parametric Jobs: jobs having one or more attributes in the JDL that vary their values according to parameters, submitted jobs are instances of the same job, where a different value is assigned to parametric attributes

Enabling Grids for E-sciencE INFSO-RI Parametric jobs example [ JobType = "Parametric"; Executable = "cms_sim.exe"; StdInput = "input_PARAM_.txt"; StdOutput = "myoutput_PARAM_.txt"; StdError = "myerror_PARAM_.txt"; Parameters = 10000; ParameterStart = 1000; ParameterStep = 10; InputSandbox = { "file:///home/cms/cms_sim.exe", "file:///home/cms/data/input_PARAM_.txt " }; OutputSandbox = { "myoutput_PARAM_.txt", "myerror_PARAM_.txt" }; OutputSandboxDestURI = "gsiftp://neo.datamat.it:5432/tmp"; Requirements = other.GlueCEInfoTotalCPUs > 2; Rank = other.GlueCEStateFreeCPUs; ] The submission of the JDL will result in the generation of N jobs, where N = (Parameters – ParameterStart)/ParameterStep

Enabling Grids for E-sciencE INFSO-RI High Level Job Control tools WMProxy is a web service that lets the user to submit jobs to the WMS: –WSDL: web service based interface (client stubs can be genarted directly from it with the preferred tool/language) –API: C++, Java, Python bindings. Thin layer around the WS stubs –C++ command line interface –API Documentation: ubmissionhttp://trinity.datamat.it/projects/EGEE/wiki/wiki.php?n=WMProxyAPI.JobS ubmission LB: Logging and Bookeeping server that provides support to the job monitoring functionality: –It has a wsdl that also can be used to generate java client stubs. – It also can be queried via c api: 

Enabling Grids for E-sciencE INFSO-RI gLite WMS - CMS Results ~20000 jobs submitted –3 parallel UIs –33 Computing Elements –200 jobs/collection  Bulk submission Performances –~ 2.5 h to submit all jobs  0.5 seconds/job –~ 17 hours to transfer all jobs to a CE  3 seconds/job  jobs/day Job failures –Negligible fraction of failures due to the gLite WMS  Either application errors or site problems 1.1Worker Node problem 3.3CRL expired 0.2Gatekeeper down 3.9Remote batch system 28Application error Job fraction (%)Failure reason By A.Sciabà - 27 September 2006 ~7000 jobs/day on the LCG RB

Enabling Grids for E-sciencE INFSO-RI gLite WMS - ATLAS Results Official Monte Carlo production –Up to ~3000 jobs/day –Less than 1% of jobs failed because of the WMS in a period of 24 hours Synthetic tests –Shallow resubmission greatly improves the success rate for site-related problems  Efficiency =98% after at most 4 submissions From 7-jun to 9-nov By A.Sciabà – 10 November 2006

Enabling Grids for E-sciencE INFSO-RI Forthcoming features High Availability of the RB: solution that makes the RB more robust and resistant to failures Job Provenance: Store and retain data on finished jobs, complementary to RB Implementation of collections not using DAGman ICE: interface to the new web service based CE (CREAM) DGAS: storage accounting system that collects usage records on the CE