INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org gLExec and OS compatibility David Groep Nikhef.

Slides:



Advertisements
Similar presentations
Open Science Grid Discovering and understanding the site environment Or, yet another site test kit.
Advertisements

On Multi-User Pilot Jobs Recovering control in an obfuscated world.
OSG AuthZ Architecture AuthZ Components Legend VO Management Services Grid Site GUMS Site Services SAZ CE Gatekeeper Prima Is Auth? Yes / No SE SRM gPlazma.
OpenVMS System Management A different perspective by Andy Park TrueBit b.v.
INFSO-RI Enabling Grids for E-sciencE Glexec overview Gerben Venekamp NIKHEF.
Chapter 9 Chapter 9: Managing Groups, Folders, Files, and Object Security.
1 CSE 380 Computer Operating Systems Instructor: Insup Lee and Dianna Xu University of Pennsylvania Fall 2003 Lecture Note: Protection Mechanisms.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Recovering control over compute in the wake of community-run scheduling services.
Process Description and Control A process is sometimes called a task, it is a program in execution.
INFSO-RI Enabling Grids for E-sciencE gLExec, SCAS and the paths forward Introduction to pilot jobs and gLExec and SCAS framework.
Chapter 4 UNIX Common Shells Commands By C. Shing ITEC Dept Radford University.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
Grid job submission using HTCondor Andrew Lahiff.
Hands On UNIX II Dorcas Muthoni. Processes A running instance of a program is called a "process" Identified by a numeric process id (pid)‏  unique while.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Feb. 06, Introduction to High Performance and Grid Computing Faculty of Sciences,
Mine Altunay July 30, 2007 Security and Privacy in OSG.
Pilot Jobs John Gordon Management Board 23/10/2007.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Direct gLExec integration with PanDA Fernando H. Barreiro Megino CERN IT-ES-VOS.
INFSO-RI Enabling Grids for E-sciencE LCAS/LCMAPS and WSS Site Access Control boundary conditions David Groep NIKHEF.
Glexec, SCAS & CREAM. Milestones CREAM-CE capable of large-scale direct job submission Glexec & SCAS capable of large-scale use on WN in logging only.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management and Interoperability Peter Kunszt (JRA1 DM Cluster) 2 nd EGEE Conference,
Overview of Privilege Project at Fermilab (compilation of multiple talks and documents written by various authors) Tanya Levshina.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
Role Based VO Authorization Services Ian Fisk Gabriele Carcassi July 20, 2005.
INFSO-RI Enabling Grids for E-sciencE LCAS/LCMAPS and WSS Site Access Control boundary conditions David Groep et al. NIKHEF.
Trusted Virtual Machine Images a step towards Cloud Computing for HEP? Tony Cass on behalf of the HEPiX Virtualisation Working Group October 19 th 2010.
INFSO-RI Enabling Grids for E-sciencE glexec deployment models local credentials and grid identity mapping in the presence of complex.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks New Authorization Service Christoph Witzig,
OSG Site Admin Workshop - Mar 2008Using gLExec to improve security1 OSG Site Administrators Workshop Using gLExec to improve security of Grid jobs by Alain.
INFSO-RI Enabling Grids for E-sciencE glexec deployment models local credentials and grid identity mapping in the presence of complex.
EGEE-II INFSO-RI Enabling Grids for E-sciencE YAIM Overview MiMOS Grid tutorial HungChe, ASGC OPS Team.
LCG Pilot Jobs and glexec John Gordon.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Update Authorization Service Christoph Witzig,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks IPv6 code checker tool Salvatore Monforte.
INFSO-RI Enabling Grids for E-sciencE glexec on worker nodes David Groep NIKHEF.
INFSO-RI Enabling Grids for E-sciencE Policy management and fair share in gLite Andrea Guarise HPDC 2006 Paris June 19th, 2006.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.
Open Science Grid Build a Grid Session Siddhartha E.S University of Florida.
Proxy management mechanism and gLExec integration with the PanDA pilot Status and perspectives.
CSC414 “Introduction to UNIX/ Linux” Lecture 3
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
Unix System Administration Controlling Processes Chapter 5.
INFSO-RI Enabling Grids for E-sciencE Glexec Gerben Venekamp NIKHEF.
Security and VO management enhancements in Panda Workload Management System Jose Caballero Maxim Potekhin Torre Wenaus Presented by Maxim Potekhin at HPDC08.
HTCondor Security Basics HTCondor Week, Madison 2016 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
EGEE-III INFSO-RI Enabling Grids for E-sciencE VO Authorization in EGEE Erwin Laure EGEE Technical Director Joint EGEE and OSG Workshop.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarksEGEE-III INFSO-RI MPI on the grid:
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
Why you should care about glexec OSG Site Administrator’s Meeting Written by Igor Sfiligoi Presented by Alain Roy Hint: It’s about security.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
Company LOGO Security in Linux PhiHDN - VuongNQ. Contents Introduction 1 Fundamental Concepts 2 Security System Calls in Linux 3 Implementation of Security.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Argus: command line usage and banning Christoph.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Direct gLExec integration with PanDA Fernando H. Barreiro Megino Simone Campana.
gLExec and OS compatibility
OpenPBS – Distributed Workload Management System
Multi User Pilot Jobs update
HTCondor Security Basics
...looking a bit closer under the hood
Workload Management System
Glexec deployment models local credentials and grid identity mapping in the presence of complex schedulers David Groep NIKHEF.
Avani R.Vasant V.V.P. Engineering College
John Gordon, STFC-RAL GDB 10 October 2007
Hands On UNIX AfNOG 2010 Kigali, Rwanda
Hands On UNIX AfNOG X Cairo, Egypt
HTCondor Security Basics HTCondor Week, Madison 2016
Grid Management Challenge - M. Jouvin
Preventing Privilege Escalation
Presentation transcript:

INFSO-RI Enabling Grids for E-sciencE gLExec and OS compatibility David Groep Nikhef

Enabling Grids for E-sciencE INFSO-RI Outline What is glExec-on-WN (again) OS and Batch System Interoperability –Starting and killing jobs –Cleaning up files –Pruning stray processes gLExec and OS integration – JRA1 All Hands February

Enabling Grids for E-sciencE INFSO-RI Use Case for ‘gLExec on the WN’ 1.Make pilot job subject to normal site policies for jobs VO submits a pilot job to the batch system –the VO ‘pilot job’ submitter is responsible for the pilot behaviour this might be a specific role in the VO, or a locally registered ‘special’ user at each site –Pilot job obtains the true user job, and presents the user credentials and the job (executable name) to the site (gLExec) to request a decision on a cooperative basis 2.Preventing ‘back-manipulation’ of the pilot job –make sure user workload cannot manipulate the pilot –project sensitive data in the pilot environment (proxy!) –by changing uid for target workload away from the pilot gLExec and OS integration – JRA1 All Hands February 2008

Enabling Grids for E-sciencE INFSO-RI Pilot Jobs and gLExec On success: the site will set the uid/gid to the new user’s job On failure gLExec will return with an error, and pilot job can terminate or obtain other user’s job gLExec and OS integration – JRA1 All Hands February 2008

Enabling Grids for E-sciencE INFSO-RI gLExec deployment modes Identity Mapping Mode – ‘just like on the CE’ –have the VO query (and by policy honour) all site policies –actually change uid based on the true user’s grid identity –enforce per-user isolation and auditing using uids and gids –requires gLExec to have setuid capability Non-Privileged Mode – declare only –have the VO query (and by policy honour) all site policies –do not actually change uid: no isolation or auditing per user –the gLExec invocation will be logged, with the user identity –does not require setuid powers – job keeps running in pilot space ‘Empty Shell’ – do nothing but execute the command… gLExec and OS integration – JRA1 All Hands February 2008

Enabling Grids for E-sciencE INFSO-RI Identity change Let’s assume you make it setuid. Fine. Where to map to: To a shared set of common pool accounts –Uid and gid mapping on CE corresponds to the WN –Requires SCAS or shared state (gridmapdir) directory –Clear view on who-does-what To a per-WN set of pool accounts –No site-wide configuration needed –Only limited (and generic) set of pool uids on the WN –Need only as many pool accounts as you have job slots –Makes cleanup easier, ‘local’ to the node Or something in between... e.g. 1 pool for CE other for WN gLExec and OS integration – JRA1 All Hands February 2008

Enabling Grids for E-sciencE INFSO-RI Starting and Killing Jobs The batch system performs the following basic functions 1.Job Submission 2.Job Suspend/Resume 3.Job Kill 4.CPU time accounting does not yet address enforcing sanity and user compliance gLExec and OS integration – JRA1 All Hands February from the test description of Ulrich Schwickerath

Enabling Grids for E-sciencE INFSO-RI Starting and Killing Jobs gLExec and OS integration – JRA1 All Hands February

Enabling Grids for E-sciencE INFSO-RI Testing it Can batch system suspend/kill the gLExec’ed processes? Test using gLExec itself –Most ‘true’ tests –Requires installation of gLExec and all dependencies Test using the sutest mini programme –Same logic, but a stand-alone small (50-line) C programme –No dependencies –A few hard-coded constants to be set before compilation –Trivial to test gLExec and OS integration – JRA1 All Hands February

Enabling Grids for E-sciencE INFSO-RI Session Preservation 10 $ date && ps --forest -eo pid,ppid,sess,user,uid,euid,cmd Fri Feb 8 14:02:41 CET 2008 PID PPID SESS USER UID UID CMD root 0 0 /usr/sbin/pbs_mom davidg \_ -bash davidg \_ /bin/sh …/jobs/33.tbn05.ni.SC nobody \_ /project/sutest /bin/sleep nobody \_ /bin/sleep $ date && qsub -q test tmp/tt.pbs Fri Feb 8 14:02:21 CET tbn05.nikhef.nl tbn05:~:1018$ cat tmp/tt.pbs #! /bin/sh date /project/sutest /bin/sleep 120 date gLExec and OS integration – JRA1 All Hands February and Torque will kill all processes in the tree:

Enabling Grids for E-sciencE INFSO-RI CPU accounting No change with respect to current behaviour of jobs Times are accumulated on wait and collated with the gLExec usage gLExec and OS integration – JRA1 All Hands February

Enabling Grids for E-sciencE INFSO-RI Forcing havoc on yourself gLExec and OS integration – JRA1 All Hands February $ ( date &&./sutest /bin/sleep 60 && date ) Fri Feb 8 16:41:24 CET 2008 Notice: identity changed to uid root 0 /usr/sbin/sshd root 0 \_ sshd: davidg [priv] davidg 502 | \_ sshd: davidg 502 | \_ -bash davidg 502 | \_ -bash nobody 99 | \_./sutest /bin/sleep nobody 99 | \_ /bin/sleep 60 # kill Killed nobody /sutest /bin/sleep nobody \_ /bin/sleep 60

Enabling Grids for E-sciencE INFSO-RI Cleaning up files File cleanup: what do sites use today? Check for files owner by users not currently running a job? –Who ‘is running’ becomes ill defined –Need a ‘back-mapping’ tool that can trawl log files or a state dir tmpwatch(8) for old files? –Change of uid does in influence this solution Transient TMPDIR facilities (PBSPro, Torque 2+)? –Runs with root privileges anyway –TMPDIR is inherited by the gLExec’ed child –And is thus unaffected by gLExec gLExec and OS integration – JRA1 All Hands February

Enabling Grids for E-sciencE INFSO-RI Pruning stray processes Killing stray user processes ‘not owned by a user with a currently running process’ –E.g. used at CERN/LSF –Need a ‘back-mapping’ tool that can trawl log files or a state dir –But: is not trustworthy to begin with on multi-job-slot machines! Kill processes that are ‘too old’ –Will run as root anyway –Unaffected, but is not trustworthy either Kill processes not ‘parented’ in a batch job –gLExec will preserve the process tree, and thus this will work –Will also slaughter daemonizing jobs today … which is a Good Thing™ gLExec and OS integration – JRA1 All Hands February

Enabling Grids for E-sciencE INFSO-RI Pruning User Processes For Torque in perl (simple migration to other systems) Kill processes that are not a child of a registered pbs_mom Uses the momctl command on the node Caveats –Will usually not kill processes with a uid < 99 –May optionally preserve top-level sshd sessions (beware of MPI) –Does not protect against fork bombs gLExec and OS integration – JRA1 All Hands February

Enabling Grids for E-sciencE INFSO-RI Where are we now? You can deploy today if You run LSF or Torque and don’t manage disk or processes You run LSF or Torque and use TMPDIR and prone_userproc style job slaughtering You should wait for back-mapping tool (+update your script) if You use LSF or Torque and use uid recognition for pruning stray processes (but you ought to change this anyway) You use uid recognition for file cleaning Back-mapping tool is expected to be out of development in XXX weeks gLExec and OS integration – JRA1 All Hands February

Enabling Grids for E-sciencE INFSO-RI Summary and References References –sutest program: gLExec and OS integration – JRA1 All Hands February