06/08/10 PBS, LSF and ARC integration Zoltán Farkas MTA SZTAKI LPDS
06/08/10PBS, LSF and ARC 2 Outline Introduction Requirements PBS and LSF ARC Architecture of P-GRADE Portal runtime layer PBS/LSF integration ARC integration Summary
06/08/10PBS, LSF and ARC 3 Introduction P-GRADE Portal supported gLite, Globus ETHZ requirement: Make use of PBS local clusters Make use of LSF local clusters (Brutus) Sometimes make use of ARC grid resources All this should be integrated within P-GRADE Portal
06/08/10PBS, LSF and ARC 4 PBS (and LSF) Portable Batch Scheduler (Load Sharing Facility) Schedule users' jobs on a cluster Interactive login to a submission node Users execute different commands: qsub (bsub): submit qstat (bjobs): status qdel (bkill): abort Submission Node Cluster node Cluster node Cluster node Cluster node Cluster node Scheduler node
06/08/10PBS, LSF and ARC 5 ARC Advanced Resource Connector Complete grid middleware with: Information system Command-line clients with integrated broker Data management stack (GridFTP) Usable through client programs: Job description: xRSL ngsub: submit ngstat: status update ngkill: cancel ngget: get results
06/08/10PBS, LSF and ARC 6 P-GRADE Portal Architecture Workflow Editor-related components Portlet-related components Workflow data storage Execution layer See next slide!
06/08/10PBS, LSF and ARC P-GRADE Portal Machine Globus Grid EGEE Grid P-GRADE Portal's filesystem User Workflow Data Common workflow and job execution scripts Globus scripts EGEE scripts Apache Tomcat servlet container GridSphere portal framework P-GRADE Portal Portlet DAGMan PBS scripts PBS Cluster PBS Cluster Workflow Editor Servlet Workflow Editor Client P-GRADE Portal Portlet P-GRADE Portal Portlet P-GRADE Portal Portlet P-GRADE Portal Portlet
06/08/10PBS, LSF and ARC 8 LSF and PBS integration I. Principal idea: User should be able to configure a remote ssh connection to submission nodes through the Settings portlet Connection is established using ssh keypairs Established connections are reused in order to minimize ssh connection attempts Connections are used on a: Per-user, Per-resource bassis → a given user's connection isn't accessible by other users → different resources use different connections
06/08/10PBS, LSF and ARC 9 LSF and PBS integration II. Portal Machine Connection Pool User 1 Connection Pool User 2 LSF resource 1 PBS resource 1 LSF resource 3 PBS resource 2 LSF resource 2 PRIV PUB PRIV PUB
06/08/10PBS, LSF and ARC 10 LSF and PBS integration III. Job preparation: wkf_pre_LSF.sh: prepare job, wrapper, collect files wkf_pre_PBS.sh: prepare job, wrapper, collect files Job execution: wkf_LSF.sh: submit and observe job using b* commands wkf_PBS.sh: submit and observer job using q* commands Wrappers: LSF_fake.sh: handle generator and collector jobs, run exe PBS_fake.sh: handle generator and collector jobs, run exe Job post-processing: No real task (wkf_post_LSF.sh and wkf_post_PBS.sh)
06/08/10PBS, LSF and ARC 11 LSF and PBS integration features Full PS support Very quick response time compared to grid middlewares Support for any kind of executable
06/08/10PBS, LSF and ARC 12 ARC integration I. Very similar to the EGEE support An ARC client stack has to be installed on the P- GRADE Portal machine Users can gain access with X.509 proxy certs Two possible resource selections: User can specify the target cluster Cluster can be selected by client broker
06/08/10PBS, LSF and ARC 13 ARC integration II. Job preparation: wkf_pre_nordugrid.sh Wrapper script preparation Generator-related cleanups (as needed) Autogenerator-related file uploads (as needed) Job execution: wkf_nordugrid.sh xRSL prepared based on job properties Job submission and management using ng* commands Wrapper script: manage generator and collector jobs if needed Job post-processing: wkf_post_nordugrid.sh No real job to perform
06/08/10PBS, LSF and ARC 14 ARC integration features Full PS support Offers the possibility to select execution resource Support for any kind of executable Multi-node job support Offers possibility to specify runTimeEnvironment attributes
06/08/10PBS, LSF and ARC 15 Summary PBS, LSF and ARC integration was relatively simple thanks to the pluggable architecture of P- GRADE Portal However, the devil is in the details: Ssh connection sharing + parallel connection limits Proper LSF job cancel …