Maarten Litmaath, GDB, 2008/06/11 1 Pilot Job Frameworks Review GDB working group mandated by WLCG MB on Jan. 22, 2008 Mission –Review security issues.

Slides:



Advertisements
Similar presentations
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Advertisements

Role Based VO Authorization Services Ian Fisk Gabriele Carcassi July 20, 2005.
Haga clic para cambiar el estilo de título Haga clic para modificar el estilo de subtítulo del patrón DIRAC Framework A.Casajus and R.Graciani (Universitat.
Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',
SSC2 and Update on Multi-user Pilot Jobs Framework Mingchao Ma, STFC – RAL HEPSysMan Meeting 20/06/2008.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
OSG Site Provide one or more of the following capabilities: – access to local computational resources using a batch queue – interactive access to local.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
PanDA Multi-User Pilot Jobs Maxim Potekhin Brookhaven National Laboratory Open Science Grid WLCG GDB Meeting CERN March 11, 2009.
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
Maarten Litmaath (CERN), GDB meeting, CERN, 2006/02/08 VOMS deployment Extent of VOMS usage in LCG-2 –Node types gLite 3.0 Issues Conclusions.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
Mine Altunay July 30, 2007 Security and Privacy in OSG.
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
Role Based VO Authorization Services Ian Fisk Gabriele Carcassi July 20, 2005.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
CHEP 2006, February 2006, Mumbai 1 LHCb use of batch systems A.Tsaregorodtsev, CPPM, Marseille HEPiX 2006, 4 April 2006, Rome.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
VO Box Issues Summary of concerns expressed following publication of Jeff’s slides Ian Bird GDB, Bologna, 12 Oct 2005 (not necessarily the opinion of)
DIRAC Review (12 th December 2005)Stuart K. Paterson1 DIRAC Review Workload Management System.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
LCG Support for Pilot Jobs John Gordon, STFC GDB December 2 nd 2009.
JAliEn Java AliEn middleware A. Grigoras, C. Grigoras, M. Pedreira P Saiz, S. Schreiner ALICE Offline Week – June 2013.
Security Policy Update WLCG GDB CERN, 14 May 2008 David Kelsey STFC/RAL
DIRAC Pilot Jobs A. Casajus, R. Graciani, A. Tsaregorodtsev for the LHCb DIRAC team Pilot Framework and the DIRAC WMS DIRAC Workload Management System.
EGEE is a project funded by the European Union under contract IST Package Manager Predrag Buncic JRA1 ARDA 21/10/04
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.
Proxy management mechanism and gLExec integration with the PanDA pilot Status and perspectives.
EGEE is a project funded by the European Union under contract IST Experiment Software Installation toolkit on LCG-2
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
Placeholder ES 1 CERN IT EGI Technical Forum, Experiment Support group AAI usage, issues and wishes for WLCG Maarten Litmaath CERN.
LCG Pilot Jobs + glexec John Gordon, STFC-RAL GDB 7 December 2007.
Maarten Litmaath (CERN), EGEE’08 1 Pilot Job Frameworks Review Introduction Summary GDB presentation.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Core and Framework DIRAC Workshop October Marseille.
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
UCS D OSG Summer School 2011 Overlay systems OSG Summer School An introduction to Overlay systems Also known as Pilot systems by Igor Sfiligoi University.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Direct gLExec integration with PanDA Fernando H. Barreiro Megino Simone Campana.
Honolulu - Oct 31st, 2007 Using Glideins to Maximize Scientific Output 1 IEEE NSS 2007 Making Science in the Grid World - Using Glideins to Maximize Scientific.
WLCG IPv6 deployment strategy
L’analisi in LHCb Angelo Carbone INFN Bologna
Multi User Pilot Jobs update
David Kelsey CCLRC/RAL, UK
Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)
Design rationale and status of the org.glite.overlay component
Andreas Unterkircher CERN Grid Deployment
Workload Management System
GDB 8th March 2006 Flavia Donno IT/GD, CERN
Joint JRA1/JRA3/NA4 session
ALICE FAIR Meeting KVI, 2010 Kilian Schwarz GSI.
John Gordon, STFC-RAL GDB 10 October 2007
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
How to enable computing
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
CRC exercises Not happy with the way the document for testbed architecture is progressing More a collection of contributions from the mware groups rather.
Grid Deployment Board meeting, 8 November 2006, CERN
Short update on the latest gLite status
WLCG security landscape in EGI and beyond Maarten Litmaath CERN v1
Installation and Commissioning of ALICE VO-BOXES and AliEn Services
Simulation use cases for T2 in ALICE
LCG middleware and LHC experiments ARDA project
Discussions on group meeting
WLCG Collaboration Workshop;
Grid Security M. Jouvin / C. Loomis (LAL-Orsay)
WMS Options: DIRAC and GlideIN-WMS
From Prototype to Production Grid
Presentation transcript:

Maarten Litmaath, GDB, 2008/06/11 1 Pilot Job Frameworks Review GDB working group mandated by WLCG MB on Jan. 22, 2008 Mission –Review security issues in the pilot job framework of each experiment Pilot jobs are taken as multi-user in this context –Define a minimum set of security requirements –Advise on improvements Per framework or common to all –Report to GDB and MB Time frame is a few months Members –ALICE: Predrag Buncic –ATLAS: Torre Wenaus –CMS: Igor Sfiligoi –LHCb: Andrei Tsaregorodtsev –WLCG: Maarten Litmaath (chair) –EGEE: David Groep –FNAL: Eileen Berman –GridPP: Mingchao Ma –OSG: Mine Altunay

Maarten Litmaath, GDB, 2008/06/11 2 StatusStatus 6 phone conferences held Discussion on mailing list Each experiment is to provide a document about their system –LHCb were the first –CMS were next –ALICE and ATLAS not ready yet ATLAS have a lot of documentation on PanDA on their TWiki pages – A security questionnaire has been discussed –Agreement on the relevance/scope of a question was not always evident –Each document should provide the answers for an experiment Adequacy judged per experiment, no formal criteria LHCb answers deemed satisfactory, CMS to provide more details later

Questionnaire v1.0 (1/2) Describe in a schematic way all components of the system. –If a component needs to use IPC to talk to another component for any reason, describe what kind of authentication, authorization, integrity and/or privacy mechanisms are in place. If configurable, specify the typical, minimum and maximum protection you can get. Describe how user proxies are handled from the moment a user submits a task to the central task queue to the moment that the user task runs on a WN, through any intermediate storage. What happens around the identity change on the WN, e.g. how is each task sandboxed and to what extent? How can running processes be accounted to the correct user? How is a task spawned on the WN and how is it destroyed? How can a site be blocked? Maarten Litmaath, GDB, 2008/06/11 3

Questionnaire v1.0 (2/2) What site security processes are applied to the machine(s) running the WMS? [ Here WMS means the VO WMS, not the gLite WMS. ] –Who is allowed access to the machine(s) on which the service(s) run, and how do they obtain access? –How are authorized individuals authenticated on the machine(s)? –What is the process for keeping the service(s) and OS patched and up- to-date, especially with respect to security patches? –Do you have an identified security contact? –Describe the incident response plan to deal with security incidents and reports of unauthorized use? –What services (in general) run on the machine(s) that offer the WMS service? –What processes exist to maintain audit logs (e.g. for use during an incident)? –What monitoring exists on the machine(s) to aid detection of security incidents or unauthorized use? Can you limit the users that can submit jobs to the VO WMS? How? Maarten Litmaath, GDB, 2008/06/11 4

LHCb DIRAC WMS Maarten Litmaath, GDB, 2008/06/11 5

DIRAC WMS workflow Workload preparation on UI (input files, JDL) Workload submission to DIRAC WMS Requirements and priority analysis  insertion into Central Task Queue Grid submission of pilot job with original requirements –Using special credential (VOMS role) Start of pilot on WN –Install Job Agent, check WN capacity and environment Request highest-priority matching workload from Task Queue –Download associated limited user proxy –Execute job wrapper through glexec Preparation of input data, installation of missing LHCb software as needed Parallel execution of user workload and watchdog process Upload output data –Cleanup of user workload and proxy –Report resource consumption to DIRAC accounting system Get another workload if enough remaining CPU time Maarten Litmaath, GDB, 2008/06/11 6

LHCb document “Workload Management with Pilot Agents in DIRAC” – 1.Overview 2.Workflow in the DIRAC WMS 3.User authentication and authorization 4.Brief DISET overview [DISET = DIRAC SEcure Transport] 5.Job submission to the DIRAC WMS 6.Proxy handling in the DIRAC WMS 7.Pilot Job 8.Running user job 9.Job monitoring, user interaction with a job 10.User job accounting 11.Pilot Job Questionnaire 12.References Maarten Litmaath, GDB, 2008/06/11 7

DIRAC WMS remarks DISET provides secure communication channels DIRAC WMS restricts users and can block them It can block sites –A mask is applied to pilot job submission and to workload matching It is trusted by myproxy.cern.ch It is hosted by the CERN IT department according to IT regulations –Access restrictions, security updates, incident response procedures A pilot job cannot renew its own proxy –It can only download limited proxies for user workloads Maarten Litmaath, GDB, 2008/06/11 8

CMS glideinWMS schematics Maarten Litmaath, GDB, 2008/06/11 9

CMS glideinWMS workflow System relies on standard Condor components and communication User jobs are submitted to a set of condor_schedd on submit machines –Waiting jobs are advertized in a Central Manager Glidein factories submit pilot jobs a.k.a. glideins VO frontends regulate the number of glideins to be submitted –Based on the number of jobs waiting in the condor_schedd queues A condor_collector is used as a dashboard for message exchange. Generic Connection Brokers can provide connectivity across firewalls and in NAT environments A glidein contacts the Central Manager for a job matching the WN The user job is spawned through glexec and cleaned up afterwards –The glidein may then wait a while for another job or exit Maarten Litmaath, GDB, 2008/06/11 10

Envisioned CMS production system Maarten Litmaath, GDB, 2008/06/11 11

Envisioned CMS analysis instance Maarten Litmaath, GDB, 2008/06/11 12

CMS document “Workload management with glideinWMS” – 1.Introduction 2.Structural overview 1.The Condor pool 2.Glidein handling 3.Working in a firewalled world 4.Credentials handling 3.Deployment scenarios by USCMS 1.Current prototype production installation 2.Planned worldwide production installation 3.Planned analysis installation 4.Conclusions Appendix –Pilot Job Frameworks questionnaire Maarten Litmaath, GDB, 2008/06/11 13

CMS glideinWMS remarks Condor provides secure communication channels Both production and analysis instances can restrict/block users Sites can be blocked –Using IP tables or Condor functionality A submit machine is trusted by an associated MyProxy server Users will not login onto submit machine, but interact with CRAB server –CRAB = CMS Remote Analysis Builder Submit machines etc. hosted by T0/T1/T2 sites –Need access restrictions, security updates, incident response procedures Condor does not yet support limited delegation –Accepted as feature request Maarten Litmaath, GDB, 2008/06/11 14

ATLAS PanDA architecture Maarten Litmaath, GDB, 2008/06/11 15

PanDA workflow PanDA server receives user job definitions with data requirements Jobs are inserted into global work queue Brokerage module prioritizes jobs and assigns them to sites Required input data is dispatched to the chosen site –Interaction with ATLAS Distributed Data Management system –Jobs are released to dispatcher when input data has been made available Delivery of pilot jobs to WNs managed by independent subsystem Pilot jobs contact job dispatcher for work –If no appropriate work is available, the pilot may pause or exit Maarten Litmaath, GDB, 2008/06/11 16

Site ALICE central services ALICE job submission in LCG Job 1lfn1, lfn2, lfn3, lfn4 Job 2lfn1, lfn2, lfn3, lfn4 Job 3lfn1, lfn2, lfn3 Job 1.1lfn1 Job 1.2lfn2 Job 1.3lfn3, lfn4 Job 2.1lfn1, lfn3 Job 2.1lfn2, lfn4 Job 3.1lfn1, lfn3 Job 3.2lfn2 Optimizer Computing Agent RB, WMS CEWN Env OK? Die with grace Execs agent Sends job agent to site YesNo Close SE’s & Software Matchmaking Receives work-load Asks work-load Retrieves workload Sends job result Updates TQ Submits job User ALICE Job Catalogue Submits job agent VO-Box LCG User Job ALICE catalogues Registers output lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} ALICE File Catalogue Maarten Litmaath, GDB, 2008/06/11 17 Package Manager

ALICE services User (client) X509 certificate, VOMS registration in ALICE VO User is authenticated to the catalogue/job submission service through GSI authentication Receives a temporary session token API Server Certificate authentication (GSI handshake), authorization Temporary session token Catalogue DB User files, unix-like ACL LDAP User DN AliEn username and directory definition Groups/roles (VOMS) Process DB User process ID, job states Job Manager Job splitting, job priorities DBI SSL SOAP DBI Authorization Central services Maarten Litmaath, GDB, 2008/06/11 18