Download presentation
Presentation is loading. Please wait.
Published byAlisha Powers Modified over 9 years ago
1
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Flexible Job Submission Using Web Services: The gLite WMProxy Experience Giuseppe Avellino giuseppe.avellino@datamat.it CHEP06, Mumbai - India 13-17 February 2006
2
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 2 Outline Workload Manager Proxy (WMProxy) SOA Conformance & WS-I Compliance Used Technologies WMProxy Architecture AuthN, AuthZ and Delegation Job Submission Chain Improvements Conclusions and Future Plans
3
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 3 Workload Manager Proxy (WMProxy) WMProxy is a Web service providing access to the gLite Workload Management System (WMS) through a simple interface Main purposes: –Take advantage of Web / Web Service technologies, adhering to Service Oriented Architecture (SOA) –Improve job submission and control performances –Serve a large number of requests –Provide additional features –Improve usability
4
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 4 SOA Conformance & WS-I Compliance Web Services and SOA benefits: The Web is simple The Web is ubiquitous: it allows to provide the service to a greater community of users (accessibility) Web services resolve the interoperability problem among different middlewares (integration) Loose coupling among interacting components
5
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 5 SOA Conformance & WS-I Compliance WMProxy is a SOAP Web service The interface is described through a WSDL file The WSDL file was written according to the WS-I Basic Profile –WS-I Basic Profile provides a set of Web service specifications promoting interoperability –WS-I compliant service is described through a WSDL file with a mandatory structure composed of specific parts
6
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 6 Used Technologies WMProxy runs in a Apache + FastCGI + GridSite container WMProxy is implemented in C++ Stub generation is performed through gSOAP Why C++/gSOAP: –driven by preliminary performance comparison tests –need to integrate and re-use existing production- quality libraries and components –gSOAP interoperability service accessible from client generated in multiple languages (e.g. C++, Java, Python, Perl) and OSs
7
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 7 Used Technologies FastCGI features: –High performances –Persistent processes; single process serves multiple requests –Processes are spawned or killed dynamically depending on workload –Environment information, standard input, output, and error are multiplexed over a single full-duplex connection GridSite features: –Support for Grid security credentials (GSI and VOMS) –File transfer over HTTPS (GridHTTPS) –Library for handling Grid Access Control Lists (GACL) –Library of primitives for credential delegation
8
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 8 WMProxy Architecture WMProxy integration with the WMS LB Proxy Workload Manager Client Local File System LB Data Base Server Host Logging & Bookkeeping MOD SSL MOD GridSiteMOD FCGI WMProxy Server Apache Request Queue SOAP/ HTTPS
9
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 9 WMProxy Architecture WMProxy modules Client WMProxy Operations LB AccessDelegation Request Delivery Directory Manager AuthorizationgSOAP Layer gSOAP Independant WMProxy Server
10
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 10 WMProxy Architecture WMProxy is composed of different modules: –gSOAP Layer: it is an intermediate layer between gSOAP and the WMProxy core –WMProxy Operations: implements services published in the service WSDL –Authorization: performs DN / FQAN-based authorization and users mapping –Directory Manager: is used for creating and managing job reserved area on the server file system –LB Access: is responsible for the interaction with LB components job registration logging for job events job information queries –Request Delivery: is used to deliver request to WM –Delegation: provides access to delegation services
11
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 11 AuthN, AuthZ and Delegation WMProxy AuthN –Request Authentication is performed by GridSite WMProxy AuthZ –Request authorization is performed by setting and checking appropriate Grid Access Control Lists at: service level job level AuthN and AuthZ are performed for all operations WMProxy Delegation –Delegation is used to transfer rights and privileges to another party –Delegation port type is imported from external WSDL file –Delegation services implementation relies on GridSite primitives Delegation is only requested for –jobListMatch –jobRegister –jobSubmit
12
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 12 Job Submission Chain Job submission requires a job description provided with a high-level language called Job Description Language (JDL) –Specifies job’s characteristics –Specifies constraints on Grid resources Job submission is performed through the two operations jobRegister and jobStart After jobRegister and before jobStart, all preparatory actions can be performed, e.g.,: –Input sandboxes upload –Input data upload and registration to catalogs
13
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 13 Job Submission Chain jobRegister –A job identifier (Job Id) is generated –Server specific attributes needed by WMS are inserted in JDL –The client is then mapped to a local user with LCMAPS –Job local directories and files are created with appropriate ownership and permissions (i.e., the ones of the mapped local user) –The job is registered to the LB: final JDL + job Id –Job Id is returned to the user (multiple Job Ids in case of compound request) Successful completion of jobRegister indicates that the job has been taken into account within the Grid environment, and it is ready to be actually submitted to Grid resources
14
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 14 Job Submission Chain jobStart –WMProxy checks if the job has been previously registered and not yet started –Server specific attributes needed by WMS are inserted in JDL –For compound requests: Sub-jobs registration to LB is performed Sub-jobs local directories and files are created with appropriate ownership and permissions –Input sandbox archive files are uncompressed (if any) –The job is delivered to WM
15
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 15 Job Submission Chain The split of job submission into the register + start sequence provides sort of a two-phase commit: –Registration makes the job enter the system –It remains registered for a configurable period of time –Starting the job tells the system that the request can be actually processed jobRegister / jobStart approach benefits: –User control on single submission phases –In case of failure, single operations can be performed again separately –Specific time-consuming processing done only during start operation (compound request)
16
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 16 Improvements WMProxy implements some features mostly addressing performance improvements: –Bulk-submission Direct Acyclic Graphs (DAG) submission Collection submission Parametric job submission –Job Sandboxes Shared Input sandbox Archived/Compressed input sandbox transfer –Asynchronous job start operation Significant changes and extensions to the JDL library have been necessary at this purpose
17
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 17 Improvements A Direct Acyclic Graph (DAG) is a group of jobs with dependencies A Collection is a group of jobs with no dependencies A Parametric job is a job where some attributes in the JDL are declared to be parametric –A group of different jobs is created before submission –Submitted jobs are instances of the same job, where a value is assigned to parametric attributes –The values assignment are done as described by the user with specific JDL attributes
18
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 18 Improvements Bulk-submission benefits: –One shot submission of a (possibly very large, up to thousands) group of jobs –Submission time reduction Single call to WMProxy server Single AuthN and AuthZ process Sharing of files between jobs –Availability of both a single Job Id to manage the group as a whole and an Id for each single job in the group
19
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 19 Improvements Shared Input sandbox is particularly useful when submitting compound jobs –When sub-jobs input sandboxes contain instances of the same file, these are transferred once and made available by WMProxy to all involved sub-jobs Archived/compressed ISB: input sandbox files can be uploaded as compressed tar files to the WMS node –WMProxy takes care of the decompression/extraction of files and makes them available in the proper locations (jobStart) Benefits: –lower data size to transfer –minimize number of calls to file transfer service
20
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 20 Improvements Asynchronous job start: WMProxy can be configured to complete processing of the jobStart request in background –Control is returned to the client immediately after the request has been accepted –All time-consuming actions are performed “behind the scene” by the service –If some of the processing fails, WMProxy logs the corresponding event to LB so that user can check at any-time the operation result querying LB Benefits: –From user point of view makes submission time (almost) independent from the number of jobs
21
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 21 Conclusions and Future Plans WMProxy: the Web service based interface to gLite WMS –Allowed the transition to SOA for WMS –Allowed improvement of job submission performances –Allowed provision of a set of new features Positive feedback from “early adopters” –Promising performance measures –Good perception of usability –On-going close collaboration with some experiments/groups to evolve/improve the service according to users needs –Usage in “production” within EGEE during 2006 will provide further outcomes of paramount importance for service evolution Web Service technology demonstrated to be mature enough to be adopted for developing “production-quality” components/services
22
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 22 Conclusions and Future Plans Future activities will focus on: High Availability and Scalability –Clustered architecture –HTTPS request dispatcher –Redundant sets of WMProxy servers –Shared server file system –Support for huge jobs collections: interaction with LB Service Profiling –First measures of bulk submission performances are promising gLite 1.5 => ~180 secs for 500 jobs –Goal is to get in the short term to less than ~60 secs for 1000 jobs New features: –User authorization on single operation basis –JSDL support (XML based language for job description) Currently a GGF recommendation; emerging as standard –Adherence to finalized WS* specifications (where needed)
23
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 23 WMProxy Architecture WMProxy integration with the WMS
24
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 24 WMProxy Architecture WMProxy modules
25
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 25 WMProxy Operations The main operations provided by WMProxy are: jobListMatch - finds resources which match job requirements jobRegister - registers a job for submission jobStart - starts a previously registered job jobSubmit - one shot job submission (job registration + job start) jobCancel - cancels a previously submitted job jobPurge - clean-up of job’s reserved area on WMS node getOutputFileList - returns the URIs of job output files getSandboxDestURI - returns the URI of the location where job’s input sandbox file have to be uploaded getPerusalFiles - allows inspection of job’s files while the job is running getFreeQuota - gets the user available disk space for job’s sandboxes on the WMS node get/add/removeACLItem[s] - handle jobs GACL file entries
26
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 26 AuthN, AuthZ and Delegation Delegation is the process used to transfer rights and privileges to another party Credential Delegation sequence diagram
27
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 27 Job Submission Chain After job registration and before job start, all preparatory actions needed by the job for running can be performed, e.g.,: –Input sandboxes upload –Input data upload and registration to catalogs Sandboxes management is under the control of WMProxy –WMProxy creates the area on the WMS node where job’s sandboxes have to be stored –WMProxy provides the URI of the job’s area location Supported file transfer protocols are –gridFTP: WMS installation includes a gidFTP server –HTTPS: the Apache + GridSite installation for WMProxy
28
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 28 Job Submission Chain Job Submission sequence diagram
29
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 29 Improvements A Direct Acyclic Graph (DAG) is a set of jobs where the input, output, or execution of one or more jobs depends on one or more other jobs –The jobs are nodes (vertices) in the graph –The edges (arcs) identify the dependencies –DAG features: Shared sandboxes Attributes Inheritance Attribute references between nodes and with the ‘parent’ job nodeE nodeC nodeA nodeD nodeB
30
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 30 Improvements [ Type = “dag”; VirtualOrganisation = “EGEE”; InputSandbox = { “/home/data/data1”, “/home/data/data2”}; Nodes = [ nodeA = [ Description = [ Executable = “first.exe”; InputSandbox = { “/home/data/data3.txt”, root.InputSandbox }; OutputSandbox = “nodeAoutput.txt”; (…) ]; nodeB = [ Description = [ Executable = “second.exe”; InputSandbox = “/home/data/data3.txt”; OutputSandbox = root.nodes.nodeA.description.OutputSandbox[0]; (…) ]; Dependencies = { {nodeA, nodeB} }; ] nodeB nodeA
31
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 31 Improvements Collection submission allows submission of a group of jobs with no dependencies –The JDL description for a Collection basically consists of a list of JDL descriptions –A JDL conversion is performed by WMProxy server to make it “digestible” by WMS –Same features as for DAGs are available: Shared Sandboxes Attributes Inheritance Attribute references between nodes and with the ‘parent’ job
32
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 32 Improvements [ Type = “collection”; VirtualOrganisation = “EGEE”; Nodes = { [ JobType = “normal”; Executable = “job1.exe”; (…) ], [ JobType = “normal”; Executable = “job2.exe”; (…) ], (…) [ JobType = “normal”; Executable = “jobn.exe”; (…) ]}; ]
33
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 33 Improvements In Parametric job submission some attributes in the JDL are declared to be parametric –A JDL conversion is performed by WMProxy server to make it “digestible” by WMS –Submitted jobs are instances of the job described in the JDL, where a value is assigned to parametric attributes –The values assignment are done as described by the user with specific JDL attributes
34
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 34 Improvements [ Type = “job”; JobType = “parametric”; Executable = “sim.exe”; VirtualOrganisation = “EGEE”; StdInput = “input_PARAM_.txt”; StdOutput = “output_PARAM_.txt”; Parameters = 10; ParameterStart = 1; ParameterStep = 1; InputSandbox = { “file:///home/user/sim.exe”, “file:///home/user/data/input_PARAM_.txt”; OutputSandbox = “output_PARAM_.txt”; (…) ] [ Type = “job”; JobType = “job”; Executable = “sim.exe”; VirtualOrganisation = “EGEE”; StdInput = “inputi.txt”; StdOutput = “outputi.txt”; InputSandbox = { “file:///home/user/sim.exe”, “file:///home/user/data/inputi.txt”; OutputSandbox = “outputi.txt”; (…) ] i = 1, 2, …, 10
35
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 35 Improvements Asynchronous jobStart sequence diagram
36
Enabling Grids for E-sciencE INFSO-RI-508833 CHEP06, Mumbai - India, 13-17 February 2006 36 Improvements Job File Perusal allows job’s files content inspection while the job is running –A process running on the WN transfers chunks of selected job files to the WMS node or to specified URIs –File selection can be specified/changed/removed by the user through a specific WMProxy operation –Start/Stop of file perusal can be triggered at any time after job submission Benefits: –The user can follow-up the job's progress –Early detection of jobs correct behavior can save from considerable waste of resources –Faster turnaround of debug sessions, trial runs and other kinds of tests
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.