INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Flexible Job Submission Using Web Services: The gLite WMProxy Experience Giuseppe Avellino.

Slides:



Advertisements
Similar presentations
Security middleware Andrew McNab University of Manchester.
Advertisements

EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Development of test suites for the certification of EGEE-II Grid middleware Task 2: The development of testing procedures focused on special details of.
SEE-GRID-SCI Hands-On Session: Workload Management System (WMS) Installation and Configuration Dusan Vudragovic Institute of Physics.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
Special Jobs Claudio Cherubino INFN - Catania. 2 MPI jobs on gLite DAG Job Collection Parametric jobs Outline.
Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Special Jobs Matias Zabaljauregui UNLP.
DIRAC Web User Interface A.Casajus (Universitat de Barcelona) M.Sapunov (CPPM Marseille) On behalf of the LHCb DIRAC Team.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) WMPROXY API Python & C++ Diego Scardaci
The gLite API – PART I Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
Enabling Grids for E-sciencE Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
LCG Middleware Testing in 2005 and Future Plans E.Slabospitskaya, IHEP, Russia CERN-Russia Joint Working Group on LHC Computing March, 6, 2006.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Security and Job Management.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite job submission Fokke Dijkstra Donald.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Practical: Porting applications to the GILDA grid Slides from Vladimir Dimitrov, IPP-BAS.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Introduction to Grid Computing, EGEE and Bulgarian Grid Initiatives Plovdiv, 2006.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Provenance Challenge gLite Job Provenance.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Using gLite API Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers.
INFSO-RI Enabling Grids for E-sciencE The gLite Workload Management System Elisabetta Molinari (INFN-Milan) on behalf of the JRA1.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Feb. 06, Introduction to High Performance and Grid Computing Faculty of Sciences,
INFSO-RI Enabling Grids for E-sciencE Scenarios for Integrating Data and Job Scheduling Peter Kunszt On behalf of the JRA1-DM Cluster,
INFSO-RI Enabling Grids for E-sciencE Workflow Management in Giuseppe La Rocca INFN – Catania ICTP/INFM-Democritos Workshop on Porting.
E-infrastructure shared between Europe and Latin America 1 Workload Management System-WMS Luciano Diaz Universidad Nacional Autónoma de México - UNAM Mexico.
Enabling Grids for E-sciencE The gLite Workload Management System Alessandro Maraschini OGF20, Manchester,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
INFSO-RI Enabling Grids for E-sciencE Claudio Cherubino, INFN Catania Grid Tutorial for users Merida, April 2006 Special jobs.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Using gLite API Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
INFSO-RI Enabling Grids for E-sciencE Job Workflows with gLite Emidio Giorgio INFN NA4 Generic Applications Meeting 10 January 2006.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Alexandre Duarte CERN IT-GD-OPS UFCG LSD 1st EELA Grid School.
EGEE is a project funded by the European Union under contract IST WS-Based Advance Reservation and Co-allocation Architecture Proposal T.Ferrari,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMPROXY usage Álvaro Fernández IFIC (CSIC)
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
INFSO-RI Enabling Grids for E-sciencE EGEE is a project funded by the European Union under contract IST Job sandboxes.
EGEE-III Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMS WS Interface and WMS-UI Restructuring L. Petronzio JRA1.
INFSO-RI Enabling Grids for E-sciencE Grid Services for Resource Reservation and Allocation Tiziana Ferrari Istituto Nazionale di.
INFSO-RI Enabling Grids for E-sciencE Web Services Mike Mineter National e-Science Centre, Edinburgh.
EGI Technical Forum Amsterdam, 16 September 2010 Sylvain Reynaud.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Advanced Job Riccardo Rotondo
Introduction to Computing Element HsiKai Wang Academia Sinica Grid Computing Center, Taiwan.
EGEE is a project funded by the European Union under contract IST Datamat Status Report F. Pacini Datamat S.p.a. Milan, IT-CZ JRA1 meeting,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
Enabling Grids for E-sciencE Agreement-based Workload and Resource Management Tiziana Ferrari, Elisabetta Ronchieri Mar 30-31, 2006.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
FESR Trinacria Grid Virtual Laboratory Practical using WMProxy advanced job submission Emidio Giorgio INFN Catania.
Practical using C++ WMProxy API advanced job submission
Turin, IT-CZ JRA1 meeting, 4-5 Nov 2004
OGF PGI – EDGI Security Use Case and Requirements
Workload Management System ( WMS )
The gLite Workload Management System
Alexandre Duarte CERN Fifth EELA Tutorial Santiago, 06/09-07/09,2006
gLite Advanced Job Management
Job Description Language (JDL)
Job Submission M. Jouvin (LAL-Orsay)
Presentation transcript:

INFSO-RI Enabling Grids for E-sciencE Flexible Job Submission Using Web Services: The gLite WMProxy Experience Giuseppe Avellino CHEP06, Mumbai - India February 2006

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Outline Workload Manager Proxy (WMProxy) SOA Conformance & WS-I Compliance Used Technologies WMProxy Architecture AuthN, AuthZ and Delegation Job Submission Chain Improvements Conclusions and Future Plans

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Workload Manager Proxy (WMProxy) WMProxy is a Web service providing access to the gLite Workload Management System (WMS) through a simple interface Main purposes: –Take advantage of Web / Web Service technologies, adhering to Service Oriented Architecture (SOA) –Improve job submission and control performances –Serve a large number of requests –Provide additional features –Improve usability

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February SOA Conformance & WS-I Compliance Web Services and SOA benefits: The Web is simple The Web is ubiquitous: it allows to provide the service to a greater community of users (accessibility) Web services resolve the interoperability problem among different middlewares (integration) Loose coupling among interacting components

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February SOA Conformance & WS-I Compliance WMProxy is a SOAP Web service The interface is described through a WSDL file The WSDL file was written according to the WS-I Basic Profile –WS-I Basic Profile provides a set of Web service specifications promoting interoperability –WS-I compliant service is described through a WSDL file with a mandatory structure composed of specific parts

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Used Technologies WMProxy runs in a Apache + FastCGI + GridSite container WMProxy is implemented in C++ Stub generation is performed through gSOAP Why C++/gSOAP: –driven by preliminary performance comparison tests –need to integrate and re-use existing production- quality libraries and components –gSOAP interoperability  service accessible from client generated in multiple languages (e.g. C++, Java, Python, Perl) and OSs

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Used Technologies FastCGI features: –High performances –Persistent processes; single process serves multiple requests –Processes are spawned or killed dynamically depending on workload –Environment information, standard input, output, and error are multiplexed over a single full-duplex connection GridSite features: –Support for Grid security credentials (GSI and VOMS) –File transfer over HTTPS (GridHTTPS) –Library for handling Grid Access Control Lists (GACL) –Library of primitives for credential delegation

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February WMProxy Architecture WMProxy integration with the WMS LB Proxy Workload Manager Client Local File System LB Data Base Server Host Logging & Bookkeeping MOD SSL MOD GridSiteMOD FCGI WMProxy Server Apache Request Queue SOAP/ HTTPS

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February WMProxy Architecture WMProxy modules Client WMProxy Operations LB AccessDelegation Request Delivery Directory Manager AuthorizationgSOAP Layer gSOAP Independant WMProxy Server

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February WMProxy Architecture WMProxy is composed of different modules: –gSOAP Layer: it is an intermediate layer between gSOAP and the WMProxy core –WMProxy Operations: implements services published in the service WSDL –Authorization: performs DN / FQAN-based authorization and users mapping –Directory Manager: is used for creating and managing job reserved area on the server file system –LB Access: is responsible for the interaction with LB components  job registration  logging for job events  job information queries –Request Delivery: is used to deliver request to WM –Delegation: provides access to delegation services

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February AuthN, AuthZ and Delegation WMProxy AuthN –Request Authentication is performed by GridSite WMProxy AuthZ –Request authorization is performed by setting and checking appropriate Grid Access Control Lists at:  service level  job level AuthN and AuthZ are performed for all operations WMProxy Delegation –Delegation is used to transfer rights and privileges to another party –Delegation port type is imported from external WSDL file –Delegation services implementation relies on GridSite primitives Delegation is only requested for –jobListMatch –jobRegister –jobSubmit

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Job Submission Chain Job submission requires a job description provided with a high-level language called Job Description Language (JDL) –Specifies job’s characteristics –Specifies constraints on Grid resources Job submission is performed through the two operations jobRegister and jobStart After jobRegister and before jobStart, all preparatory actions can be performed, e.g.,: –Input sandboxes upload –Input data upload and registration to catalogs

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Job Submission Chain jobRegister –A job identifier (Job Id) is generated –Server specific attributes needed by WMS are inserted in JDL –The client is then mapped to a local user with LCMAPS –Job local directories and files are created with appropriate ownership and permissions (i.e., the ones of the mapped local user) –The job is registered to the LB: final JDL + job Id –Job Id is returned to the user (multiple Job Ids in case of compound request) Successful completion of jobRegister indicates that the job has been taken into account within the Grid environment, and it is ready to be actually submitted to Grid resources

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Job Submission Chain jobStart –WMProxy checks if the job has been previously registered and not yet started –Server specific attributes needed by WMS are inserted in JDL –For compound requests:  Sub-jobs registration to LB is performed  Sub-jobs local directories and files are created with appropriate ownership and permissions –Input sandbox archive files are uncompressed (if any) –The job is delivered to WM

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Job Submission Chain The split of job submission into the register + start sequence provides sort of a two-phase commit: –Registration makes the job enter the system –It remains registered for a configurable period of time –Starting the job tells the system that the request can be actually processed jobRegister / jobStart approach benefits: –User control on single submission phases –In case of failure, single operations can be performed again separately –Specific time-consuming processing done only during start operation (compound request)

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Improvements WMProxy implements some features mostly addressing performance improvements: –Bulk-submission  Direct Acyclic Graphs (DAG) submission  Collection submission  Parametric job submission –Job Sandboxes  Shared Input sandbox  Archived/Compressed input sandbox transfer –Asynchronous job start operation Significant changes and extensions to the JDL library have been necessary at this purpose

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Improvements A Direct Acyclic Graph (DAG) is a group of jobs with dependencies A Collection is a group of jobs with no dependencies A Parametric job is a job where some attributes in the JDL are declared to be parametric –A group of different jobs is created before submission –Submitted jobs are instances of the same job, where a value is assigned to parametric attributes –The values assignment are done as described by the user with specific JDL attributes

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Improvements Bulk-submission benefits: –One shot submission of a (possibly very large, up to thousands) group of jobs –Submission time reduction  Single call to WMProxy server  Single AuthN and AuthZ process  Sharing of files between jobs –Availability of both a single Job Id to manage the group as a whole and an Id for each single job in the group

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Improvements Shared Input sandbox is particularly useful when submitting compound jobs –When sub-jobs input sandboxes contain instances of the same file, these are transferred once and made available by WMProxy to all involved sub-jobs Archived/compressed ISB: input sandbox files can be uploaded as compressed tar files to the WMS node –WMProxy takes care of the decompression/extraction of files and makes them available in the proper locations (jobStart) Benefits: –lower data size to transfer –minimize number of calls to file transfer service

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Improvements Asynchronous job start: WMProxy can be configured to complete processing of the jobStart request in background –Control is returned to the client immediately after the request has been accepted –All time-consuming actions are performed “behind the scene” by the service –If some of the processing fails, WMProxy logs the corresponding event to LB so that user can check at any-time the operation result querying LB Benefits: –From user point of view makes submission time (almost) independent from the number of jobs

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Conclusions and Future Plans WMProxy: the Web service based interface to gLite WMS –Allowed the transition to SOA for WMS –Allowed improvement of job submission performances –Allowed provision of a set of new features Positive feedback from “early adopters” –Promising performance measures –Good perception of usability –On-going close collaboration with some experiments/groups to evolve/improve the service according to users needs –Usage in “production” within EGEE during 2006 will provide further outcomes of paramount importance for service evolution Web Service technology demonstrated to be mature enough to be adopted for developing “production-quality” components/services

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Conclusions and Future Plans Future activities will focus on: High Availability and Scalability –Clustered architecture –HTTPS request dispatcher –Redundant sets of WMProxy servers –Shared server file system –Support for huge jobs collections: interaction with LB Service Profiling –First measures of bulk submission performances are promising  gLite 1.5 => ~180 secs for 500 jobs –Goal is to get in the short term to less than ~60 secs for 1000 jobs New features: –User authorization on single operation basis –JSDL support (XML based language for job description)  Currently a GGF recommendation; emerging as standard –Adherence to finalized WS* specifications (where needed)

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February WMProxy Architecture WMProxy integration with the WMS

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February WMProxy Architecture WMProxy modules

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February WMProxy Operations The main operations provided by WMProxy are: jobListMatch - finds resources which match job requirements jobRegister - registers a job for submission jobStart - starts a previously registered job jobSubmit - one shot job submission (job registration + job start) jobCancel - cancels a previously submitted job jobPurge - clean-up of job’s reserved area on WMS node getOutputFileList - returns the URIs of job output files getSandboxDestURI - returns the URI of the location where job’s input sandbox file have to be uploaded getPerusalFiles - allows inspection of job’s files while the job is running getFreeQuota - gets the user available disk space for job’s sandboxes on the WMS node get/add/removeACLItem[s] - handle jobs GACL file entries

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February AuthN, AuthZ and Delegation Delegation is the process used to transfer rights and privileges to another party Credential Delegation sequence diagram

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Job Submission Chain After job registration and before job start, all preparatory actions needed by the job for running can be performed, e.g.,: –Input sandboxes upload –Input data upload and registration to catalogs Sandboxes management is under the control of WMProxy –WMProxy creates the area on the WMS node where job’s sandboxes have to be stored –WMProxy provides the URI of the job’s area location Supported file transfer protocols are –gridFTP: WMS installation includes a gidFTP server –HTTPS: the Apache + GridSite installation for WMProxy

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Job Submission Chain Job Submission sequence diagram

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Improvements A Direct Acyclic Graph (DAG) is a set of jobs where the input, output, or execution of one or more jobs depends on one or more other jobs –The jobs are nodes (vertices) in the graph –The edges (arcs) identify the dependencies –DAG features:  Shared sandboxes  Attributes Inheritance  Attribute references between nodes and with the ‘parent’ job nodeE nodeC nodeA nodeD nodeB

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Improvements [ Type = “dag”; VirtualOrganisation = “EGEE”; InputSandbox = { “/home/data/data1”, “/home/data/data2”}; Nodes = [ nodeA = [ Description = [ Executable = “first.exe”; InputSandbox = { “/home/data/data3.txt”, root.InputSandbox }; OutputSandbox = “nodeAoutput.txt”; (…) ]; nodeB = [ Description = [ Executable = “second.exe”; InputSandbox = “/home/data/data3.txt”; OutputSandbox = root.nodes.nodeA.description.OutputSandbox[0]; (…) ]; Dependencies = { {nodeA, nodeB} }; ] nodeB nodeA

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Improvements Collection submission allows submission of a group of jobs with no dependencies –The JDL description for a Collection basically consists of a list of JDL descriptions –A JDL conversion is performed by WMProxy server to make it “digestible” by WMS –Same features as for DAGs are available:  Shared Sandboxes  Attributes Inheritance  Attribute references between nodes and with the ‘parent’ job

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Improvements [ Type = “collection”; VirtualOrganisation = “EGEE”; Nodes = { [ JobType = “normal”; Executable = “job1.exe”; (…) ], [ JobType = “normal”; Executable = “job2.exe”; (…) ], (…) [ JobType = “normal”; Executable = “jobn.exe”; (…) ]}; ]

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Improvements In Parametric job submission some attributes in the JDL are declared to be parametric –A JDL conversion is performed by WMProxy server to make it “digestible” by WMS –Submitted jobs are instances of the job described in the JDL, where a value is assigned to parametric attributes –The values assignment are done as described by the user with specific JDL attributes

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Improvements [ Type = “job”; JobType = “parametric”; Executable = “sim.exe”; VirtualOrganisation = “EGEE”; StdInput = “input_PARAM_.txt”; StdOutput = “output_PARAM_.txt”; Parameters = 10; ParameterStart = 1; ParameterStep = 1; InputSandbox = { “file:///home/user/sim.exe”, “file:///home/user/data/input_PARAM_.txt”; OutputSandbox = “output_PARAM_.txt”; (…) ] [ Type = “job”; JobType = “job”; Executable = “sim.exe”; VirtualOrganisation = “EGEE”; StdInput = “inputi.txt”; StdOutput = “outputi.txt”; InputSandbox = { “file:///home/user/sim.exe”, “file:///home/user/data/inputi.txt”; OutputSandbox = “outputi.txt”; (…) ] i = 1, 2, …, 10

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Improvements Asynchronous jobStart sequence diagram

Enabling Grids for E-sciencE INFSO-RI CHEP06, Mumbai - India, February Improvements Job File Perusal allows job’s files content inspection while the job is running –A process running on the WN transfers chunks of selected job files to the WMS node or to specified URIs –File selection can be specified/changed/removed by the user through a specific WMProxy operation –Start/Stop of file perusal can be triggered at any time after job submission Benefits: –The user can follow-up the job's progress –Early detection of jobs correct behavior can save from considerable waste of resources –Faster turnaround of debug sessions, trial runs and other kinds of tests