Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 Direct gLExec integration with PanDA Fernando H. Barreiro Megino CERN IT-ES-VOS.

Similar presentations


Presentation on theme: "Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 Direct gLExec integration with PanDA Fernando H. Barreiro Megino CERN IT-ES-VOS."— Presentation transcript:

1 www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 Direct gLExec integration with PanDA Fernando H. Barreiro Megino CERN IT-ES-VOS (Shortened, updated version of presentation in FNAL) 11/18/2015 1

2 www.egi.eu EGI-InSPIRE RI-261323 Introduction The WLCG Grid job and Worker Node Security Assessment (http://cern.ch/go/7vK9) requires the usage of glExechttp://cern.ch/go/7vK9 gLExec acts as a light-weight 'gatekeeper’ 1.Take grid credentials as input 2.Consider local site policies to authenticate and authorize the credentials 3.Switch to new identity and execution sandbox to run a given command Here we focus on the direct integration of gLExec with PanDA Pilot gLExec usage comes for free with GlideInWMS 11/18/2015 2

3 www.egi.eu EGI-InSPIRE RI-261323 Direct gLExec integration: Stage 1 11/18/2015 3 PanDA Server PanDA pilot gLExec Computing Element … Job MyProxy Job User proxy Job PanDA pilot gLExec Job

4 www.egi.eu EGI-InSPIRE RI-261323 Refactoring of MyProxyUtils Few years ago Jose Caballero wrote MyProxyUtils python wrapper to MyProxy and gLExec some set-up for PanDA pilot Re-factored the library: Included usage of gLExec tools not available back in the days Environment wrap/unwrap scripts (http://cern.ch/go/h6Qz)http://cern.ch/go/h6Qz Creation of secure sandbox through mkgltempdir (http://cern.ch/go/s6pt)http://cern.ch/go/s6pt Reviewed the logic (e.g. removed chmod 777 of pilot directory before identity switching) Effort in imposing coding standards: Side-objective of our activity is to help Paul Nilsson improve the overall pilot code Validated MyProxyUtils through standalone tests on lxbatch Problems setting target-directory mode in mkgltempdir (to be followed up) Ulrich had to fix the installation of the gLExec tools and myproxy-logon on part of the lxbatch WNs Are we the first people to use the auxiliary gLExec tools? 11/18/2015 4

5 www.egi.eu EGI-InSPIRE RI-261323 Integration with PanDA Pilot Pilot Collect local info Job recovery Additional cleanup Multi-job loop Check proxy Check local space Collect local info Get job Fork sub process Clean up Monitoring loop Check log size (Check workDir) Looping check Check local space Setup job Transfer input Transfer output Execute payload Signal handler Abort and clean up gLExec Signal handler Abort and clean up 11/18/2015 5 Once user job is downloaded, switch identity ASAP

6 www.egi.eu EGI-InSPIRE RI-261323 Integration with PanDA Pilot This is the hardest part of Stage 1 model implementation Guidance of Paul Nilsson is important Improving the pilot’s long-term sustainability Getting familiar with the pilot code Re-factoring the code for easier maintenance Taking out the main pilot module (~4500 lines) is the complicated part of the job Splitting main pilot module (pilot.py) into two modules Moving functions/classes between the two modules or to utility library when needed Side effects on other modules Fixing warnings and errors that arise from separation Modularize the code Need to serialize/de-serialize python environment (variables, constants…) in order to share through gLExec Configuration manager Custom encoder/decoder for serialization/deserialization of pilot objects 11/18/2015 6

7 www.egi.eu EGI-InSPIRE RI-261323 Integration with PanDA Pilot First refactorization being debugged on lxplus Bypassing glexec/MyProxyUtils for the moment Work remaining Get the pilot to run jobs without any errors Include glexec/MyProxyUtils Solve permission problems that will arise from running parts of the pilot under two different users and sandboxes (e.g. merging log files) Merge changes with latest dev pilot Grid-wide testing 11/18/2015 7

8 www.egi.eu EGI-InSPIRE RI-261323 Drawbacks of Stage 1 model Users have to manage their proxy on MyProxy What if it expires? ATLAS uses a large number of pilot certificates Users would have to delegate to all pilot certificates Worker nodes are hammering MyProxy server John Hover tested successfully MyProxy at ~25Hz (March 2012), this means over 2M accesses should be possible per day. Test conditions currently unknown MyProxy server as single point of failure These shortcomings could be solved by the model proposed for Stage 2 Proposal. Not approved yet 11/18/2015 8

9 www.egi.eu EGI-InSPIRE RI-261323 Stage 2: PanDA server caching and client integration 11/18/2015 9 MyProxy 4. User proxy and job 3. User proxy 2. User proxy 2. Job PanDA pilot glExec Computing Element … Job PanDA pilot glExec Job PanDA client 1. User proxy and job PanDA Server Proxy cache 5. Notification: Proxy about to expire Disclaimer: This model has not been presented for approval yet

10 www.egi.eu EGI-InSPIRE RI-261323 Stage 1&2 comparison 11/18/2015 10 Stage 1 Stage 2


Download ppt "Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 Direct gLExec integration with PanDA Fernando H. Barreiro Megino CERN IT-ES-VOS."

Similar presentations


Ads by Google