WS – PGRADE Tutorial MTA SZTAKI Laboratory of Parallel and Distributed Systems (LPDS) M. Kozlovszky Research fellow
2 Contents History, family of products P-GRADE Portal, WS-PGRADE, gUSE WS-PGRADE in a nutshell Architecture, high level structures, Workflow handling WS-PGRADE features Parameter Study (PS) support in WS-PGRADE Generator-job-collector type PS Xconnect-DOTconnect type PS Repository Internal Storage (home dir) Automated workflow submission & Timing Web services Embedded Workflows DB support Conditional job submission Howto use WS-PGRADE /Basic + Advanced tasks/ Case studies CancerGrid portal, TINKER application
3 Family of P-GRADE Portal products P-GRADE portal Creating (basic) workflows and parameter sweeps for clusters, service grids, desktop grids P-GRADE/GEMLCA portal (University of Westminster) To wrap legacy applications into Grid Services To add legacy code services to P-GRADE Portal workflows WS-PGRADE Creating complex workflow and parameter sweeps for clusters, service grids, desktop grids, databases Creating complex applications using embedded workflows, legacy codes and community components from workflow repository
4 Motivations of creating gUSE To overcome (most of) the limitations of P-GRADE portal: To provide better modularity to replace any service To improve scalability to millions of jobs To enable advanced dataflow patterns To interface with wider range of resources To separate Application Developer view from Application User view WS-PGRADE (Web Services Parallel Grid Runtime and Developer Environment) and gUSE (Grid User Support Environment) architecture
5 References and WS-PGRADE/gUSE installations WS-PGRADE Portal service is available for GILDA - Training VO of EGEE and other projects SEE-GRID - South-Eastern European Grid VOCE - Virtual Organization Central Europe of EGEE HunGrid - Hungarian Grid VO of EGEE NGS - UK National Grid Service Desktop Grids: SZTAKI DG Users and projects using WS-PGRADE/gUSE EDGeS project (Enabling Desktop Grids for e-Science) Integrating EGEE with BOINC and XtremWeb technologies User interfaces and tools ProSim project In silico simulation of intermolecular recognition JISC ENGAGE program University of Westminster Desktop Grid Using AutoDock on institutional PCs CancerGrid project Predicting various properties of molecules to find anti-cancer leads Creating science gateway for chemists
6 WS-PGRADE in a nutshell General purpose, workflow-oriented portal. Supports the development and execution of workflow-based applications Based on GridSphere Services supported by the portal: New functionalities Web services DB connectors Embedded workflows Job level PS Conditional jobs Recursive graph Multi-generator Multi-collector CROSS product PS DOT product PS Solves service+desktop Grids/clusters interoperability problems at workflow level WS-PGRADE + gUSE = P-GRADE = ServiceEGEE grids (LCG2,GLite) Globus grids (GT2,GT4) Desktop grids clusters Job execution File storage Certificate management Information system Brokering Job monitoring Workflow & job visualization
7 WS-PGRADE architecture Graphical User Interface: WS-PGRADE Workflow Engine Workflow storage File storage Application repository Logging gUSE information system Submitters Gridsphere portlets Autonomous Services: high level middleware service layer Resources: middleware service layer Local resources, service grid VOs, Desktop Grid resources, Web services, Databases gUSE Meta-broker Submitters File storage Submitters
8 Concrete Workflow Algorithms, executable Resource references, Inputs Graph Jobs, Edges, Ports Template Constraints, Comments, Form Generators Workflow Instance Running state, Outputs Repository Item Application OR Project OR, Workflow part (G,T,CW) Important high-level graph structures in WS-PGRADE Legend: a b a must reference b a b a may reference b
9 Concrete Workflow Algorithms, Resource references, Inputs Graph Jobs, Edges, Ports Template Constraints, Comments, Form Generators Workflow Instance Running state, Outputs Repository Item Application OR Project OR, Workflow part (G,T,CWI) New Edit, Copy Delete New Configure, Copy, Delete New Submit New Workflow handling by Developer Export Import Observe, Download, Suspend, Delete Edit 1. Create and edit the Graph structure 2. Add name to the graph and configure it (job, port, resources, etc.) 3. Submit the Concrete Workflow, observe its status and fetch its result 4. Restrict some parameters/features of the workflow and create a template 5. Add name to the template and configure it (job, port, resources, etc.) 8. Redefine the originally used graph within the workflow 6. Export workflow, template, graph, to the repository (reuse it later by you or by end users) Graph2 Jobs, Edges, Ports 7. Reuse an Application, Template, graph or Workflow from repository
10 Concrete Workflow Algorithms, Resource references, Inputs Template Constraints, Comments, Form Generators Workflow Instance Running state, Outputs Repository Item Application OR Project OR, Workflow part (G,T,CW,WI) Configure, Delete Submit Workflow handling by end user Import Observe, Download, Suspend, Delete 1. Import an Application/Template 2. Set the available parameters on the end- user View 3. Submit the Concrete Workflow, observe its status and fetch its result
11 Contents History, family of products P-GRADE Portal, WS-PGRADE, gUSE WS-PGRADE in a nutshell Architecture, high level structures, Workflow handling WS-PGRADE features Parameter Study (PS) support in WS-PGRADE Generator-job-collector type PS Xconnect-DOTconnect type PS Repository Internal Storage (home dir) Workflow activation & Timing Web services Embedded Workflows DB support Conditional job submission Howto use WS-PGRADE /Basic + Advanced tasks/ Case studies CancerGrid portal, TINKER application
12 Parameter Study (PS) support in WS-PGRADE Generator-job-collector type PS Not totally similar concept as in P-GRADE P-GRADE: workflow level PS WS-PGRADE: job level PS Xconnect-DOTconnect type PS X (cross) product. (dot) product
13 Generator-job-collector type PS Generator type job has at least one multiple output port. Output Number/Max Size: variable, number attribute, port specific. If the number of files produced by a single run is less than Output Number: the generated files will be encountered cyclically in further jobs. If the number of files exceed the Output Number the exceeding files will be not used *K= Generator 1. run 2. run 3. run 1 run *K= Generator 1. run 2. run 1 run *K= Generator 1. run 2. run 3. run 1 run run
14 Xconnect-DOTconnect type PS Configuring the Workflow: Overview hmn *K 1 Determine number of accepted files on free input Ports Determine Job to be Generator by defining Multiple output port. In this case the job may be able to produce more than 1 jobs associated to the multiple output port within one job submission step Determine Dot or Cross product relation of Input ports to define the number of job submissions Determine Job to be Collector by defining a Gathering Input Port. The Job execution will be postponed until all input files to that Port have arrived and can be elaborated in a single job submission step Legend: Cross Product Dot Product
15 Submitting the Workflow: Overview (Animation the number of generated output files) hmn m*n h*K S m*nh*K m*n*h*K SS S S S h*K*K 1 S=max(m*n,h*k) 1 Sm*n*h*Km*nhSS In case of Generator job the number of job submissions may differ from the number of files on Output Ports In case of cross product individual Job submission is generated for each possible input file combination In case of dot product the Job is submitted with input files having a common index number in each input Ports
16 Repository usage & services Share with others the developed Graphs Workflows - Configured graphs Templates - Restricted workflows Applications - Semantically tested, trusted Workflows containing the definitions of the eventually referenced Embedded Workflows (including their transitive closures) Projects - Applications which are not ready/fully functional yet Build up a science/application gateway for end-users Restrict certain workflow parameters Provide easy workflow configuration/parameter setup interface Repository is acting like an interface between developers and end users Note: Exported/imported “Projects” are not checked for completeness and correctness “Applications” are checked. Name collisions are checked and handled by the system.
17 Repository Export Step 1: The button Export of the requested WF is selected Step 2: Decide about the way of exporting (in case of Application and Project the Embedded Workflows will be exported with) Step 3: Enter free input text which appearing in the Import List will describe and identify the object for the user. Step 4: Execute the Export Way of working is similar in case of Graph and Template just “Export type” can not be select there
18 Repository import views Materials are classified by their type Latest available lists Application Project Concrete Template Graph To import
19 Columns of individual instances, please note, that outputs can be downloaded separately Internal Storage (user workspace) Inner slider to encounter and access each Instances Columns of bulk download: All or proper parts of all instances of a given WF can be downloaded Information about the quota of the user allotted storage capacity in the Portal server
20 Upload Implementation to the internal Storage Step 1: Select the compressed file in the client machine containing the requested Workflow Step 2(option): Check the kind of name(s) you want to redefine Step 3 (If Step 2 performed): Enter the new name(s) which will not collide Step 4: Confirm the operation
21 Workflow submission A workflow can be submitted by 3 different way: 1.Interactively started by the user hitting the button Submit belonging to the given concrete workflow on the portlet Workflow/Concrete (default) 2.Started by a –crontab like - predefined time schedule. The corresponding timetable can be set in the portlet Workflow/Timing 3.Started by an external event. The corresponding event to be waited for and the name of the Workflow can be defined in the portlet Workflow/Remoting
22 Workflow Submission(1) Interactively by the user Step 1: The workflow is selected by button “Submit” Step 2: The submission can be confirmed or refused after the optional filling of a free description field identifying Workflow Instance for the user
23 Workflow Submission(2) Start automatically by an internal timer List of scheduled Workflows Definition of a new element to be added Checklist to select an existing Workflow to be added Tool to define submission time Confirm button to append a new element to the list Item can be revoked deleting it from the list The items will leave the list on schedule time expiration even if the Workflow submission has failed by any cause Start a workflow at a predefined time Valid certificate should be available (if grid is used) Crontab like working Single run
24 Workflow Submission(3) Start by external event Two parts must be defined: On the Portal Server Side - the name of the Workflow and - the identifier “Remote Key” On the Client Side inside of a WSDL description - the URN of service call - the name of the service - the owner of the workflow (user) - the identifier “Remote Key” - a free string to be associated to Workflow Instance Note, that a Service call referencing a common Remote Key used in more than one Portal Server Side description submits all the associated Workflows
25 Workflow Submission(3) Start by external event – Portal Side Name of WF to be appended to the list of remotely callable WF- s Definition of Keyword to identify the call Any Workflow can be deleted from the listeners. The delete button can accessed only in the not hidden state List of listening Workflows Button to hide the stored Keyword and the button Delete Append button Button to see a stored Keyword Check list can select each of the available workflows
26 Workflow Submission(3) Start by external event – Client side: WSDL URL of the Portal SERVER Predefined name of the Service pUserOwner of the Workflow (Portal User) pIDRemote Key has been defined to identify the call pTextFree string to identify the Workflow instance The client should provide a Service call conforming the WSDL.
27 Web service support Principle: Job is a web service user can reach an existing remote service with the following attributes: 1.Type: Base standard type of the Web Service. The administrator of the portal server sets the list of standards the portal can understand. The default value is Axis. 2.Services: Selection list of services of the given type having been explored by the Portal Server 3.Methods: Selection list functions the selected service implements Concrete WF List Selected WF Selected Job Job Executable Job Inputs & Outputs JDL/RSLJob Config. History Job is WorkflowJob is ServiceJob is Binary
28 Web service support (contd.) Step 1. Select Service as Job interpretation class Step 2. Select type of Service to be understood Step 3. Select one from the found services Step 4. Select a Method among the interface routines of Service
29 Web service support - Parameter I/O Rule: The sequence of port identifiers of a calling Job must correspond to the parameter sequence of a published Service. The parameter list of the service must be known by the user (WSDL tag “parameterOrder” defines order of parameters) Ports ordered by “Port Number” are associated to parameters ordered by “parameterOrder”. Example: Job The Job “job” may be the invocation of the Service S with the following parameter list: First parameter: INPUT (corresponding to Port 0) Second parameter: OUTPUT (corresponding to Port 1) Third parameter: INPUT (corresponding to Port 2)
30 Embedded workflow support Principle: Job is an embedded workflow Original Workflow Embedded Workflow To ensure the compatibility of interfaces the embedded workflow must be defined by a Template The dummy job whose execution will be substituted by the call of the embedded one
31 Embedded workflow support Focus on the caller Job Step 2. Embedded (called) Workflow is selected by the check list showing the possible templated Workflows Focus has been set by selecting a job Step 1. Select (embedded) Workflow as Job interpretation class
32 E mbedded Workflow support – passing input parameter) 0 1 The blue line indicates that the focus is on the given port of the caller The checklist permits the selection one of the permitted ports of the embedded Workflow 0 1 The radio button should be set to “Yes” if we want to connect the given port of the caller to a port of the called workflow. In the other case the input file will be directed to “/dev/null”
33 E mbedded Workflow support – passing output parameter) 0 1 The blue line indicates that the focus is on the given port of the caller The checklist permits the selection one of the permitted ports of the embedded Workflow The radio button should be set to “Yes” if we want to connect the given port of the caller to a port of the called workflow. In the other case the output file will not be directed to the output of the caller
34 DB support (Datasource: SQL Database).mdb Departments 24ML 26FK 15MA AgeGenderName Person External File name SQL Upload Remote Value SQL URL (UDBC) USER PASSWORD SELECT Age from Person where Gender =“M” Step 1 (offline) Create Database on a remote site Step 2 (online) Port configuration: set Query Step 3(online) Port configuration: File generation from result set
35 Conditional job submission Rule: To any input port a boolean conditional expression may be attached. They will be evaluated upon the value(s) of the referenced port(s). The false value inhibits the execution of the actual and subsequent jobs. The state of the failed job is indicated as “Term_false” the eventual subsequent jobs will remain in “Init” state. Expression Syntax: | where : == | != | contain The comparison of the file value (regarded as a string) associated to the current port and of other operand which may be a string or the string file content of a different port results true or false depending on the string operation, where == means equal, != means not equal, and contain means, that the left operand contains the right operand. AND operation is assumed for more than one ports associated with conditional expressions: B1 B2 Do not execute if B1 AND B2 = FALSE
36 Howto use WS-PGRADE Basic + Advanced tasks Create Graph, or reuse an existing one Create Concrete Workflow from Graph Setup Concrete Workflow (ports, jobs) Submit and check the running Workflow Create template and share it as application through the repository
37 Graph Editor - Graph Creation There are two ways to create a Workflow Graph (WfG): Opening the Graph Editor (with the proper button) in the portlet Workflow/Graph Clone an existing one (Saving the actual Graph with a new name)
38 Graph Editor detailed - Build up a WF Graph New Job New Port Right Click Port Property Select Output Port name can be changed Comment can be inserted Close by OK New Job New Port Press mouse…,and drag to Input Port
39 Howto use WS-PGRADE Basic + Advanced tasks Create Graph, or reuse an existing one Create Concrete Workflow from Graph Setup Concrete Workflow (ports, jobs) Submit and check the running Workflow Create template and share it as application through the repository
40 Development cycle – Creating the Concrete Workflow Create Concrete Workflow from Graph, template, different workflow give a name to the Concrete Workflow give notes
41 Howto use WS-PGRADE Basic + Advanced tasks Create Graph, or reuse an existing one Create Concrete Workflow from Graph Setup Concrete Workflow (ports, jobs) Submit and check the running Workflow Create template and share it as application through the repository
42 Setup Concrete Workflow Setup all the job properties Execution model: Workflow, Service, Binary Type:GEMLCA,GLITE,DG,GT-4,GT-2,Local Grid resource Type of binary SEQ, MPI, Java Number of MPI Nodes Executable Additional parameters Setup all the port properties
43 Workflow configuration Hierarchy Concrete WF List Selected WF Selected Job Job ExecutableJob Inputs & OutputsJDL/RSLJob Config. History Job is WorkflowJob is ServiceJob is Binary
44 Workflow Configuration; step-by-step Select Configure Select a job by mouse click Fill the job property characteristics. Details have discussed previously. Select Port Property Configuration Fill port property characteristics. Details have discussed previously. Select JDL/RSL Configuration Select one of the JDL/RSL Configuration Parameters of the list box Insert a definition Confirm the settings Close the configuration of this job Save & Upload the Workflow configuration. Remind eventual error messages! Return to main view Note the inner slider: By moving it you can encounter –and make visible – any Port of the current job
45 Howto use WS-PGRADE Basic + Advanced tasks Create Graph, or reuse an existing one Create Concrete Workflow from Graph Setup Concrete Workflow (ports, jobs) Submit and check the running Workflow Create template and share it as application through the repository
46 Concrete WF List Workflow History File Selected Job Job Exec Conf Port Configurations JDL/RSL file Workflow instance List Workflow Instance Job List Details Configure Short Details Std Out Job Job Instance List View Contents Job Instance Output FileSTD Error File STD Out FileLog File Std Err Output Log Job List Info, Submit, Delete/Abort All Suspend/Resume/Delete, Visualize Job Executable WF visualization Job Inputs & Outputs JDL/RSL Job Config. History Locate Item in graph Locate Item Basic Workflow management with WS- PGRADE Portal In case of PS Workflows the list “Job Instance” may contain more than one elements with different “PID” -s
47 Howto use WS-PGRADE Basic + Advanced tasks Create Graph, or reuse an existing one Create Concrete Workflow from Graph Setup Concrete Workflow (ports, jobs) Submit and check the running Workflow Create template and share it as application through the repository
48 Creating the Template Phase 1 Selecting the base Workflow of the Template and determine the main rule of the creation. Phase 2 Decide individually about the free/immutable status of each characteristics which is not already immutable at the moment.
49 Contents History, family of products P-GRADE Portal, WS-PGRADE, gUSE WS-PGRADE in a nutshell Architecture, high level structures Workflow handling By developers By end users WS-PGRADE features Parameter Study (PS) support in WS-PGRADE Generator-job-collector type PS Xconnect-DOTconnect type PS Workflow timing web service support Embedded workflows DB support Basic Workflow management with WS-PGRADE Portal Repository usage Principles of Workflow management Case studies ProSim portal, CancerGrid portal, TINKER application
50 EU Framework Program 6 Title: Grid Aided Computer System For Rapid Anti-Cancer Drug Design Project period January 1, 2007 – December 31, 2009 (+ extended period) Goals: Developing focused libraries with a high content of anti-cancer leads, building models for predicting various molecule properties Developing a computer system based on grid technology, which helps to accelerate and automate the in silico design of libraries for drug discovery processes The CancerGrid project (Case Study 1)
51 molecule database executing workflows browsing molecules DG clients from all partners Molecule database server Portal and DesktopGrid server BOINC server 3G Bridge Portal DG jobs WU 1 WU 2 WU N Job 1 Job 2 Job N GenWrapper for batch execution BOINC client Legacy Application Portal Storage Local Resource Local jobs Legacy Application WU X WU Y The CancerGrid infrastructure
52 The CancerGrid portal (gUSE & SZTAKI DG) Workflow development & configuration Algorithms configuration Workflow execution Molecule database browser Integrated components of CancerGrid portal Structure viewer
53 TINKER Conformer Generator (Case study 2.) P-GRADE/gUSE Workflow - background The application uses the open-source TINKER library, which contains tool for molecular design: The TINKER molecular modeling software (written in Fortran language) is a complete and general package for molecular mechanics and dynamics, with some special features for biopolymers.
54 Problem: The application to be gridified A combination of these tools can be used for molecular modeling (so-called QSAR studies for drug development). The original application was developed by a Hungarian biochemist Ferenc Ötvös, PhD, from the Biological Research Center, Szeged, Hungary. Target end users are biologists or chemists, who need to examine conformers with the TINKER package. The application generates conformers by unconstrained molecular dynamics at high temperature to overcome conformational bias then finishes each conformer by simulated annealing and/or energy minimization to obtain reliable structures. The aim is to obtain conformation ensemble(s) to be evaluated by multivariate statistical modeling.
55 Problem: The application to be gridified (contd.) The sequential application generates conformers and executes TINKER algorithms on them. It runs for about 7 days on a 2 GHz single CPU 1 GB memory machine. STARTRead input Peptide definition: Sequence, Chirality Generate 1000 K trajectory snapshots Parameter File 1 T1TiTn minimize Parameter File 2 TM1 300 K dynamics Parameter File 3 TD1 minimize Parameter File 2 TDM1 Simulated annealing Parameter File 4 TSA1 minimize Parameter File 2 TSAM1 n =
56 Gridification process… In the original sequention application, the TINKER algorithms are called by Bash unix scripts, which are also applicable in EGEE grids. As it can be seen from the previous figure: The sequential generation of conformers cannot be parallelized. The rest of the application performs 5 algorithms on each conformers in 3 threads, which can run in parallel. The gridified P-GRADE workflow The gridified gUSE workflow
57 Data handling The input TINKER package is 4.5 MBs. The generator job (runs for around 15 hours) creates 50 tarballs containing 1000 conformers each, and the TINKER package. Each one of these files are 6.1 MBs. Offline runtime and output of the PS jobs: First (dyn300k+min300k): ~40 min, 2.2 MBs Second (min1000k): ~35 min, 1.1 MBs Third (Simann+min): ~60 min, 2.2MBs The final output file is around 300 MBs. The generator and collector input and output files are stored in storage elements. Results: Tested in 3 VOs (VOCE, SEE-GRID-SCI, Biomed) A significant speedup with 7 times faster execution achieved in average.
58 Thank you for your attention! Questions?