LCG Pilot Jobs + glexec John Gordon, STFC-RAL GDB 7 November 2007.

Slides:



Advertisements
Similar presentations
CREAM John Gordon GDB November CREAM number of sites now – gstat2 says 24. Batch systems supported Experiment Tests Feedback from sites. Evaluation.
Advertisements

INFSO-RI Enabling Grids for E-sciencE Update on LCG/EGEE Security Policy and Procedures David Kelsey, CCLRC/RAL, UK
Jan 2010 Current OSG Efforts and Status, Grid Deployment Board, Jan 12 th 2010 OSG has weekly Operations and Production Meetings including US ATLAS and.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Security Policy Group Summary EGI TF David Kelsey 6/28/
WLCG Security TEG, risks and Identity Management David Kelsey GridPP28, Manchester 18 Apr 2012.
Release & Deployment ITIL Version 3
David Groep Nikhef Amsterdam PDP & Grid Traceability in the face of Clouds EGI-GEANT Symposium – cloud security track With grateful thanks for the input.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
LCG Introduction John Gordon, SFTC GDB December 2 nd 2009.
Deployment Issues David Kelsey GridPP13, Durham 5 Jul 2005
PanDA Multi-User Pilot Jobs Maxim Potekhin Brookhaven National Laboratory Open Science Grid WLCG GDB Meeting CERN March 11, 2009.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks David Kelsey RAL/STFC,
JSPG: User-level Accounting Data Policy David Kelsey, CCLRC/RAL, UK LCG GDB Meeting, Rome, 5 April 2006.
Security Policy Update LCG GDB Prague, 4 Apr 2007 David Kelsey CCLRC/RAL
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks David Kelsey RAL/STFC,
8-Jul-03D.P.Kelsey, LCG-GDB-Security1 LCG/GDB Security (Report from the LCG Security Group) RAL, 8 July 2003 David Kelsey CCLRC/RAL, UK
JRA Execution Plan 13 January JRA1 Execution Plan Frédéric Hemmer EGEE Middleware Manager EGEE is proposed as a project funded by the European.
Pilot Jobs John Gordon Management Board 23/10/2007.
Virtualised Worker Nodes Where are we? What next? Tony Cass GDB /12/12.
15-Dec-04D.P.Kelsey, LCG-GDB-Security1 LCG/GDB Security Update (Report from the Joint Security Policy Group) CERN 15 December 2004 David Kelsey CCLRC/RAL,
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Grid Security Vulnerability Group Linda Cornwall, GDB, CERN 7 th September 2005
Summary of AAAA Information David Kelsey Infrastructure Policy Group, Singapore, 15 Sep 2008.
INFSO-RI Enabling Grids for E-sciencE EGEE SA1 in EGEE-II – Overview Ian Bird IT Department CERN, Switzerland EGEE.
Trusted Virtual Machine Images a step towards Cloud Computing for HEP? Tony Cass on behalf of the HEPiX Virtualisation Working Group October 19 th 2010.
Security Policy Update David Kelsey UK HEP Sysman, RAL 1 Jul 2011.
LCG workshop on Operational Issues CERN November, EGEE CIC activities (SA1) Accounting: current status
Security Operations David Kelsey GridPP Deployment Board 3 Mar 2005
A Trust Framework for Security Collaboration among Infrastructures David Kelsey (STFC-RAL, UK) WLCG GDB, CERN 10 Jul 2013.
LCG Report from GDB John Gordon, STFC-RAL MB meeting February24 th, 2009.
LCG Support for Pilot Jobs John Gordon, STFC GDB December 2 nd 2009.
Security Policy: From EGEE to EGI David Kelsey (STFC-RAL) 21 Sep 2009 EGEE’09, Barcelona david.kelsey at stfc.ac.uk.
Workload management, virtualisation, clouds & multicore Andrew Lahiff.
Security Policy Update WLCG GDB CERN, 14 May 2008 David Kelsey STFC/RAL
Julia Andreeva on behalf of the MND section MND review.
LCG Pilot Jobs and glexec John Gordon.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Update Authorization Service Christoph Witzig,
OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL May 8 th, 2008.
1Maria Dimou- cern-it-gd LCG November 2007 GDB October 2007 VOM(R)S Workshop report Grid Deployment Board.
JSPG Update David Kelsey MWSG, Zurich 31 Mar 2009.
Proxy management mechanism and gLExec integration with the PanDA pilot Status and perspectives.
18-May-04D.P.Kelsey, LCG-GDB-Security1 LCG/GDB Security Update (Report from the LCG Security Group) Barcelona 18 May 2004 David Kelsey CCLRC/RAL, UK
Security Policy Update WLCG GDB CERN, 8 Dec 2010 David Kelsey STFC/RAL david.kelsey AT stfc.ac.uk.
Planning for LCG Emergencies HEPiX, Fall 2005 SLAC, 13 October 2005 David Kelsey CCLRC/RAL, UK
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
LCG Issues from GDB John Gordon, STFC WLCG MB meeting September 28 th 2010.
INFSO-RI Enabling Grids for E-sciencE Joint Security Policy Group David Kelsey, CCLRC/RAL, UK 3 rd EGEE Project.
LCG Pilot Jobs + glexec John Gordon, STFC-RAL GDB 7 December 2007.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
LHCb 2009-Q4 report Q4 report LHCb 2009-Q4 report, PhC2 Activities in 2009-Q4 m Core Software o Stable versions of Gaudi and LCG-AA m Applications.
Traceability WLCG GDB Amsterdam, 7 March 2016 David Kelsey STFC/RAL.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Questionnaires to Cloud technology providers and sites Linda Cornwall, STFC,
Trusted Virtual Machine Images the HEPiX Point of View Tony Cass October 21 st 2011.
Grid Deployment Technical Working Groups: Middleware selection AAA,security Resource scheduling Operations User Support GDB Grid Deployment Resource planning,
LCG Introduction John Gordon, STFC-RAL GDB November 7th, 2007.
Grid Security Policy: EGEE to EGI David Kelsey (RAL) 16 Sep 2009 JSPG meeting, DFN Berlin david.kelsey at stfc.ac.uk.
15-Jun-04D.P.Kelsey, LCG-GDB-Security1 LCG/GDB Security Update (Report from the LCG Security Group) CERN 15 June 2004 David Kelsey CCLRC/RAL, UK
HEPiX Virtualisation working group Andrea Chierici INFN-CNAF Workshop CCR 2010.
CREAM Status and plans Massimo Sgaravatto – INFN Padova
Security Policy Update WLCG GDB CERN, 11 June 2008 David Kelsey STFC/RAL
Multi User Pilot Jobs update
David Kelsey CCLRC/RAL, UK
CREAM Status and Plans Massimo Sgaravatto – INFN Padova
John Gordon, STFC-RAL GDB 10 October 2007
Global Banning List and Authorization Service
The CREAM CE: When can the LCG-CE be replaced?
Summary from last MB “The MB agreed that a detailed deployment plan and a realistic time scale are required for deploying glexec with setuid mode at WLCG.
WLCG Collaboration Workshop;
Presentation transcript:

LCG Pilot Jobs + glexec John Gordon, STFC-RAL GDB 7 November 2007

LCG 2 Recent Progress Multi-User Pilot Jobs Policy discussed in MB, JSPG, OSCT At October GDB we decided to refer the issue toMB for a decision. MB defined a draft policy –Requires sites who support LHC experiments to support their pilot jobs frameworks. –Mandates jobs running under the identity of the work, not some generic pilot job framework identity. Glexec in setuid mode seen as the only candidate for this functionality. –Defines work to be done and agreed by MB before policy becomes effective.

LCG 3 WLCG policy on pilot jobs submitting work on behalf of third parties The topic of pilot jobs has been discussed several times in the GDB, and in particular at the last two meetings. At the October meeting it was agreed to make a proposal to the MB to adopt a policy requiring that sites support pilot jobs submitting work on behalf of third parties. A summary note was prepared by J.Gordon (17/10/07) and presented to the MB on 23 October. This identified a number of issues and made recommendations for a pilot jobs policysummary note After discussion the following policy statement proposed and endorsed by the MB meeting on 6 November.

LCG 4 MB Policy WLCG sites must allow job submission by the LHC VOs using pilot jobs that submit work on behalf of other users. It is mandatory to change the job identity to that of the real user to avoid the security exposure of a job submitted by one user running under the credentials of the pilot job user. Implementation of this policy is subject to the following pre-requisites: –The identity change and sub-job management must be executed by a commonly agreed mechanism that has been reviewed by a recognized group of security experts. At present the only candidate is glexec, and a positive review by the security teams of each of the grid infrastructures (OSG, EGEE) would be sufficient. –All experiments wishing to use this service must publish a description of the distributed parts of their pilot job frameworks. A positive recommendation to the MB on the security aspects of the framework by a team of experts with representatives of OSG and EGEE is required. The frameworks should be compatible with the draft JSPG Grid Multi-User Pilot Jobs Policy document. –glexec testing: glexec must be integrated and successfully tested with the commonly used batch systems (BQS, PBS, PBS pro, Condor, LSF, SGE). –LCAS/LCMAPS: the server version of LCAS/LCMAPS must be completed, certified and deployed. The policy will come into effect when the MB agrees that all of the above pre-requisites have been met.

LCG 5 JSPG(1) Summary of discussion on Pilot Jobs and related issues –(D Kelsey's personal notes - not yet approved by JSPG) Discussed current draft policy (v0.3) on "Multi User Pilot Jobs" –See Discussion focussed on whether to require Sites to run the Grid-provided utility (today glexec) in the identity switching mode –or should Sites should have the right to choose The view of the participants in the meeting room at CERN was –there are significant security risks in not switching identity –the users' workload runs under the same identity as the pilot job framework resulting in the ability of users to take control of the framework and to interfere with the audit logs Should therefore require identity switching –But, this is contentious and may not be acceptable The view of the OSG participants, including Ruth Pordes, –had not been able to consult their Sites or discuss this suggestion internally –could not agree to the proposal to require identity switching at this time –They have concerns as to what extent they can impose such policies on their Sites. JSPG concluded –a very contentious issue –need to discuss with the Sites and the Grid managements –Can consensus be reached?

LCG 6 JSPG(2) Also discussed the requirement that pilot jobs must clean up all local data files between different user jobs within one pilot job. –Strong requirement coming out of discussion at the August GDB meeting –but Oxana noted that this may conflict with experiment requirements –JSPG agreed that there are security concerns, e.g. with one user job leaving infected files which the next job may pick up. Again we concluded that further consultation is required. Did not produce a new version of the policy document –pointless until these issues are resolved On the second day revisited the topic to decide best way forward Concentrate on the requirements for traceability and logging These are general requirements which apply not only to multi-user pilot jobs, but also to all other forms of job submission including, for example, Grid portals get agreement on these general principles which can then be applied to the consideration of any particular service, such as pilot jobs

LCG 7 JSPG(3) Draft words in new "Policy on Traceability and Logging” –This will replace the old policy on "Audit Requirements“ –The words are not yet final and still need more work. The main points are: –Risk management is crucial for Grid operations. –When security incidents happen it must be possible to identify the cause so that it can be contained while keeping services operational. It must also be possible to take action to prevent the incident happening again. –The response to an incident needs to be commensurate with the scale of the problem. Banning a whole large VO, rather than just one user, is in most cases impossible as this affects too many users. –Understanding and fixing the cause of an incident is essential before re-enabling access. If a VO has been blocked in its entirety, unless there is a way of understanding what happened, it will be impossible to fix and hence impossible to re-enable access. –The minimum level of traceability for Grid usage is to be able to identify the source of all actions (executables, file transfers, pilot jobs, portal jobs, etc) that are part of an incident and the individual who initiated them. In addition, sufficiently fine-grained controls, such as blocking the originating user and monitoring to detect abnormal behaviour, are necessary for keeping services operational. –There are trade-offs between knowledge of individual user identity and controlling the code which can be executed. Anonymous user access may well be possible if there is no possibility for the user to submit or modify executable code, i.e. the user just provides input parameters and/or data to control a job.

LCG 8 Multi-User Pilot Job Policy Draft Should be completed But the ideas in it can be used to help review the experiment frameworks.

LCG 9 Review glexec The identity change and sub-job management must be executed by a commonly agreed mechanism that has been reviewed by a recognized group of security experts. At present the only candidate is glexec, and a positive review by the security teams of each of the grid infrastructures (OSG, EGEE) would be sufficient. OSG - Don? EGEE - John?

LCG 10 Experiment Frameworks All experiments wishing to use Pilot Jobs must publish a description of the distributed parts of their pilot job frameworks. A positive recommendation to the MB on the security aspects of the framework by a team of experts with representatives of OSG and EGEE is required. The frameworks should be compatible with the draft JSPG Grid Multi-User Pilot Jobs Policy document. Do all experiments agree? Are frameworks sufficiently documented? Who will form the team?

LCG 11 gLExec and batch glexec must be integrated and successfully tested with the commonly used batch systems (BQS, PBS, PBS pro, Condor, LSF, SGE). Such testing can probably be done now using sudo. Who will do this? –BQS – CC-IN2P3 –PBS –PBS pro –Condor –LSF –SGE When?

LCG 12 LCAS/LCMAPS Local mode LCAS/LCMAPS does not scale to large sites –Client-server version required OSG uses GUMS – they have a common API –Only one version of glexec required LCAS/LCMAPS: the server version of LCAS/LCMAPS must be completed, certified and deployed. Timescale? JRA1 SA3 SA1

LCG 13 gLExec deployment Status of certification PPS testing Safe configuration Does this need to wait for server mode LCAS/LCMAPS?

LCG 14 Timescale glexec review Framework Review Batch system testing Glexec certification and

LCG 15 Any Other Issues?

LCG 16 Summary The MB Policy aims to break the current deadlock –One way or another Still much work to do before the policy becomes effective –Timescale But … want to know now if sites will not allow glexec in setuid mode under any circumstances –Tell Les and John JSPG opinions, review of glexec and of frameworks could all help persuade institute management.