Download presentation
Presentation is loading. Please wait.
Published byEverett Wells Modified over 8 years ago
1
www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 Direct gLExec integration with PanDA Fernando H. Barreiro Megino Simone Campana Ramon Medrano (CERN IT-ES-VOS) 9/30/2016 Fernando H. Barreiro Megino1
2
www.egi.eu EGI-InSPIRE RI-261323 Introduction The WLCG Grid job and Worker Node Security Assessment (http://cern.ch/go/7vK9) requires the usage of glExechttp://cern.ch/go/7vK9 gLExec acts as a light-weight 'gatekeeper’ 1.Take grid credentials as input 2.Consider local site policies to authenticate and authorize the credentials 3.Switch to new identity and execution sandbox to run a given command gLExec usage comes for free with GlideInWMS, but alternatively we can integrate directly gLExec with PanDA Pilot 9/30/2016 Fernando H. Barreiro Megino2
3
www.egi.eu EGI-InSPIRE RI-261323 Common Analysis Framework 9/30/2016 Fernando H. Barreiro Megino3 (Optional) Client Service VO-specific client PanDA monitor and Dashboard Historical views Data Mgmt Services PanDA pilot Computing Element … Client sideServer sideGrid resources PanDA components VO specific, external components glideIns PanDA Server GlideIn WMS GlideInWMS components PanDA Pilot Factories Job trans Data Adaptor glexec PanDA pilot Job trans
4
www.egi.eu EGI-InSPIRE RI-261323 Direct gLExec integration: Stage 1 9/30/2016 Fernando H. Barreiro Megino4 PanDA Server PanDA pilot gLExec Computing Element … Job MyProxy Job User proxy Job PanDA pilot gLExec Job Implementation steps 1.MyProxy & gLExec standalone tests 2.Refactoring of MyProxyUtils 3.gLExec SAM test 4.Integration into PanDA pilot
5
www.egi.eu EGI-InSPIRE RI-261323 1. MyProxy & gLExec standalone tests Gain experience in using MyProxy and gLExec 1.Upload & download proxies and test delegation between Fernando and Ramon 2.Test gLExec through bsub on lxbatch (i.e. no pilot involved yet). Switching identity from Fernando to Ramon 1.Run id, voms-proxy-info … 2.Copy a file in&out from EOS SCRATCHDISK 9/30/2016 Fernando H. Barreiro Megino5
6
www.egi.eu EGI-InSPIRE RI-261323 2. Refactoring of MyProxyUtils Few years ago Jose Caballero wrote MyProxyUtils python wrapper to MyProxy and gLExec some set-up for PanDA pilot Re-factored the library: Included usage of gLExec tools not available back in the days Environment wrap/unwrap scripts (http://cern.ch/go/h6Qz)http://cern.ch/go/h6Qz Creation of secure sandbox through mkgltempdir (http://cern.ch/go/s6pt)http://cern.ch/go/s6pt Reviewed the logic (e.g. removed chmod 777 of pilot directory before identity switching) Effort in imposing coding standards: Side-objective of our activity is to help Paul Nilsson improve the overall pilot code Validated MyProxyUtils by repeating previous standalone tests Problems setting target-directory mode in mkgltempdir (to be followed up) Ulrich had to fix the installation of the gLExec tools and myproxy-logon on part of the lxbatch WNs Are we the first people to use the auxiliary gLExec tools? pylint check 9/30/2016 Fernando H. Barreiro Megino6
7
www.egi.eu EGI-InSPIRE RI-261323 3. gLExec SAM test Existing gLExec SAM test is not very complete Switch from one proxy to itself and do nothing as target user We would like to extend the test 1.Use MyProxy to download a different target credential 2.Check that the gLExec tools are installed in every site EGI sites should have them (Maarten Litmaath) OSG sites do not have mkgltempdir (Dave Dykstra, Igor Sfiligoi) Alternatively the script could be shipped with pilot 3.Check that the full gLExec workflow succeeds in every site Alessandro pointed us to the SAM gLExec code and Ramon will implement the changes 9/30/2016 Fernando H. Barreiro Megino7
8
www.egi.eu EGI-InSPIRE RI-261323 4.Integration with PanDA Pilot Pilot Collect local info Job recovery Additional cleanup Multi-job loop Check proxy Check local space Collect local info Get job Fork sub process Clean up Monitoring loop Check log size (Check workDir) Looping check Check local space Setup job Transfer input Transfer output Execute payload Signal handler Abort and clean up gLExec Signal handler Abort and clean up 9/30/2016 Fernando H. Barreiro Megino8 Once user job is downloaded, switch identity ASAP
9
www.egi.eu EGI-InSPIRE RI-261323 4.Integration with PanDA Pilot This is the hardest part of Stage 1 model implementation Guidance of Paul Nilsson is important Improving the pilot’s long-term sustainability Getting familiar with the pilot code Re-factoring the code for easier maintenance Taking out the main pilot module (~4500 lines) is a lot of work Splitting main pilot module (pilot.py) into two modules Moving functions/classes between the two modules or to utility library when needed Fixing warnings and errors that arise from separation Modularize the code Need to serialize/de-serialize python environment (variables, constants…) in order to share through gLExec Ramon wrote a configuration manager Dictionary-like way to store configuration values In-depth serializable (json and pickle depending on available libraries) Thread/multi-process safe Solve permission problems that will arise from running parts of the pilot under two different users and sandboxes (e.g. merging log files) Testing will not be trivial Still a lot of work to be done 9/30/2016 Fernando H. Barreiro Megino9
10
www.egi.eu EGI-InSPIRE RI-261323 Drawbacks of Stage 1 model Users have to manage their proxy on MyProxy What if it expires? ATLAS uses a large number of pilot certificates Users would have to delegate to all pilot certificates Worker nodes are hammering MyProxy server John Hover tested successfully MyProxy at ~25Hz (March 2012), this means over 2M accesses should be possible per day. Test conditions currently unknown MyProxy server as single point of failure These shortcomings could be solved by the model proposed for Stage 2 Proposal. Not approved yet 9/30/2016 Fernando H. Barreiro Megino10
11
www.egi.eu EGI-InSPIRE RI-261323 Stage 2: PanDA server caching and client integration 9/30/2016 Fernando H. Barreiro Megino11 MyProxy 4. User proxy and job 3. User proxy 2. User proxy 2. Job PanDA pilot glExec Computing Element … Job PanDA pilot glExec Job PanDA client 1. User proxy and job PanDA Server Proxy cache 5. Notification: Proxy about to expire Disclaimer: This model is only a proposal and has not been discussed or presented for approval
12
www.egi.eu EGI-InSPIRE RI-261323 Questions? 9/30/2016 Fernando H. Barreiro Megino12
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.