Security aspects of the lcg-CE Maarten Litmaath (CERN) EGEE’08.

Slides:



Advertisements
Similar presentations
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MyProxy and EGEE Ludek Matyska and Daniel.
Advertisements

29 June 2006 GridSite Andrew McNabwww.gridsite.org VOMS and VOs Andrew McNab University of Manchester.
GUMS status Gabriele Carcassi PPDG Common Project 12/9/2004.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Andrew McNab - EDG Access Control - 14 Jan 2003 EU DataGrid security with GSI and Globus Andrew McNab University of Manchester
Network Printing. Printer sharing Saves money by only needing one printer Increases efficiency of managing resources.
New VOMS servers campaign GDB, 8 th Oct 2014 Maarten Litmaath IT/SDC.
Expanding scalability of LCG CE A.Kiryanov, PNPI.
Process Description and Control A process is sometimes called a task, it is a program in execution.
Ch 8-3 Working with domains and Active Directory.
Accounting Update Dave Kant Grid Deployment Board Nov 2007.
Log analysis and user traceability Eygene Ryabinkin, Russian Research Centre «Kurchatov Institute» March, 12 th 2009,
SUSE Linux Enterprise Desktop Administration Chapter 12 Administer Printing.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks What GGUS can do for you JRA1 All hands.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
Evolution of the Open Science Grid Authentication Model Kevin Hill Fermilab OSG Security Team.
Open Science Grid OSG CE Quick Install Guide Siddhartha E.S University of Florida.
Hands On UNIX II Dorcas Muthoni. Processes A running instance of a program is called a "process" Identified by a numeric process id (pid)‏  unique while.
Maarten Litmaath (CERN), GDB meeting, CERN, 2006/02/08 VOMS deployment Extent of VOMS usage in LCG-2 –Node types gLite 3.0 Issues Conclusions.
8-Jul-03D.P.Kelsey, LCG-GDB-Security1 LCG/GDB Security (Report from the LCG Security Group) RAL, 8 July 2003 David Kelsey CCLRC/RAL, UK
Chapter 3 & 6 Root Status and users File Ownership Every file has a owner and group –These give read,write, and execute priv’s to the owner, group, and.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
Introduction to OSG Security Suchandra Thapa Computation Institute University of Chicago March 19, 20091GSAW 2009 Clemson.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks A GSI-secured job manager for connecting.
Role Based VO Authorization Services Ian Fisk Gabriele Carcassi July 20, 2005.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Lab 10 Overview DNS. DNS name server Set up a local domain name server . is the root domain .lab is the WH302 lab’s TLD (top level domain)  hades.lab.
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
CREAM Installation&Configuration Sara Bertocco INFN Padova 11 th International GridKa School 2013 – Big Data, Clouds and Grids.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
VO Box Issues Summary of concerns expressed following publication of Jeff’s slides Ian Bird GDB, Bologna, 12 Oct 2005 (not necessarily the opinion of)
Last update 21/01/ :05 LCG 1Maria Dimou- cern-it-gd Current LCG User Registration, VO management and Authorisation Procedures VOMS workshop
Linux Operations and Administration
Certification and test activity ROC/CIC Deployment Team EGEE-SA1 Conference, CNAF – Bologna 05 Oct
OSG Site Admin Workshop - Mar 2008Using gLExec to improve security1 OSG Site Administrators Workshop Using gLExec to improve security of Grid jobs by Alain.
Andrew McNab - Security issues - 4 Mar 2002 Security issues for TB1+ (some personal observations from a WP6 and sysadmin perspective) Andrew McNab, University.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks User traceability and log analysis tools.
INFSO-RI Enabling Grids for E-sciencE - II SLCS, VASH, and LCAS/LCMAPS Plugins All-Hands Meeting Helsinki Placi Flury, SWITCH 19.
INFSO-RI Enabling Grids for E-sciencE SRMv2.2 in DPM Sophie Lemaitre Jean-Philippe.
Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.
Andrew McNab - Dynamic Accounts - 2 July 2002 Dynamic Accounts in TB1.3 What we could do with what we’ve got now... Andrew McNab, University of Manchester.
1Maria Dimou- cern-it-gd LCG November 2007 GDB October 2007 VOM(R)S Workshop report Grid Deployment Board.
CERN Running a LCG-2 Site – Oxford July - 1 LCG2 Administrator’s Course Oxford University, 19 th – 21 st July Developed.
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE4004 ATLAS CMS LHCb Totals
Open Science Grid Build a Grid Session Siddhartha E.S University of Florida.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite configuration (plans) Robert Harakaly.
Dave Kant LCG Accounting Overview GDA 7 th June 2004.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information System Tutorial Laurence Field.
Placeholder ES 1 CERN IT EGI Technical Forum, Experiment Support group AAI usage, issues and wishes for WLCG Maarten Litmaath CERN.
Gridification progress report David Groep, Oscar Koeroo Wim Som de Cerff, Gerben Venekamp Martijn Steenbakkers.
John Gordon Grid Accounting Update John Gordon (for Dave Kant) CCLRC e-Science Centre, UK LCG Grid Deployment Board NIKHEF, October.
II EGEE conference Den Haag November, ROC-CIC status in Italy
EGEE is a project funded by the European Union under contract IST The Workload Management System: an example Simone Campana LCG Experiment.
Incident Response Forensics and Review OSG Security Drill OSG Site Administrators workshop Indianapolis August Anand Padmanabhan UIUC.
Why you should care about glexec OSG Site Administrator’s Meeting Written by Igor Sfiligoi Presented by Alain Roy Hint: It’s about security.
3/5/2007Chris Green, FNAL / OSG VO-Level Site Validation.
Getting Started with Linux
AuthN and AuthZ in StoRM A short guide
Classic Storage Element
...looking a bit closer under the hood
Security aspects of the CREAM-CE
Farida Naz Andrea Sciabà
Hands On UNIX AfNOG 2010 Kigali, Rwanda
Short update on the latest gLite status
Hands On UNIX AfNOG X Cairo, Egypt
Lab 10 Overview DNS.
Dynamic DNS support for EGI Federated cloud
Configuring Internet-related services
Site availability Dec. 19 th 2006
Presentation transcript:

Security aspects of the lcg-CE Maarten Litmaath (CERN) EGEE’08

How to start/stop the services Maarten Litmaath (CERN), EGEE’082 Start the services /etc/init.d/globus-gass-cache-marshal start /etc/init.d/globus-job-manager-marshal start /etc/init.d/globus-gatekeeper start /etc/init.d/globus-gridftp start /etc/init.d/bdii start /etc/init.d/gLite start Stop the services /etc/init.d/globus-gatekeeper stop /etc/init.d/globus-gridftp stop /etc/init.d/globus-gass-cache-marshal stop /etc/init.d/globus-job-manager-marshal stop /etc/init.d/bdii stop /etc/init.d/gLite stop

Network usage Maarten Litmaath (CERN), EGEE’083 globus-gatekeeper – Listens on port 2119 globus-job-manager – Started by the gatekeeper – Listens in GLOBUS_TCP_PORT_RANGE – Connects back to RB/WMS/… grid_manager_monitor_agent – 1 per DN per RB/WMS/Condor-G submit node – globus-url-copy job state summaries back to RB/WMS/Condor-G globus-gridftp-server – Listens on port 2811 and in GLOBUS_TCP_PORT_RANGE glite-lb-logd + glite-lb-interlogd – glite-lb-logd listens on port 9002 Only WNs should connect – glite-lb-interlogd connects to LB servers bdii – Listens on port 2170

User mapping (1) Maarten Litmaath (CERN), EGEE’084 Gatekeeper and gridftp-server have authZ callout – /etc/grid-security/gsi-authz.conf /opt/glite/etc/lcas/lcas.db /opt/glite/etc/lcmaps/lcmaps.db /opt/glite/etc/lcas/lcas.db  authorization – lcas_userban.mod /opt/glite/etc/lcas/ban_users.db – lcas_voms.mod VOMS proxy with valid (!) extensions – Connection is closed if extensions have expired plain grid proxy

User mapping (2) Maarten Litmaath (CERN), EGEE’085 /opt/glite/etc/lcmaps/lcmaps.db  mapping – VOMS proxy with valid extensions lcmaps_voms_localgroup.mod – /etc/grid-security/groupmapfile  primary and secondary GIDs lcmaps_voms_localaccount.mod – /etc/grid-security/grid-mapfile  static account UID lcmaps_voms_poolaccount.mod – /etc/grid-security/grid-mapfile  pool account UID – /etc/grid-security/gridmapdir  remember mapping – Else lcmaps_localaccount.mod – /etc/grid-security/grid-mapfile  static account lcmaps_poolaccount.mod – /etc/grid-security/grid-mapfile  pool account – /etc/grid-security/gridmapdir  remember mapping

User mapping (3) Maarten Litmaath (CERN), EGEE’086 /etc/grid-security/groupmapfile – derived from YAIM's groups.conf and users.conf /etc/grid-security/grid-mapfile – Concatenation of 2 parts, needed for LCAS classic grid-mapfile – contents updated by edg-mkgridmap cron job /etc/grid-security/voms-grid-mapfile – derived from YAIM's groups.conf and users.conf

User mapping (4) Maarten Litmaath (CERN), EGEE’087 /etc/grid-security/gridmapdir – Should have mode 0770 – Contains root-owned empty files named after pool accounts – The file for an unused account has a hard link count of 1 gridmapdir]# ls -li ops rw-r--r-- 1 root root 0 Dec ops034 – The file for a used account has a hard link count of 2 gridmapdir]# ls -li ops rw-r--r-- 2 root root 0 Sep 14 03:57 ops022 – The other link’s name encodes the DN and VOMS UNIX groups gridmapdir]# ls -li | grep ^ rw-r--r-- 2 root root 0 Sep 14 03:57 %2fdc%3dch%2fdc%3dcern%2fcn%3darthur%20dent:ops rw-r--r-- 2 root root 0 Sep 14 03:57 ops022

User mapping (5) Maarten Litmaath (CERN), EGEE’088 lcg-expiregridmapdir cron job recycles pool accounts as needed – When usage exceeds a threshold (80%) – Decided per set of accounts having the same prefix – The oldest accounts are recycled until the usage falls again below the threshold – An account can only be recycled when it has been idle for a certain period (was 2 days, 10 days in latest YAIM) This should happen only rarely  create sufficient accounts! – /var/log/lcg-expiregridmapdir.log shows current situation VO dtmsgm: inuse / total = 13 / 99 = 0.13, thr = 0.8 VO ops: inuse / total = 24 / 50 = 0.48, thr = 0.8 VO dteam: inuse / total = 76 / 99 = 0.77, thr = 0.8

Logs for mapped clients Maarten Litmaath (CERN), EGEE’089 “JMA” records in gatekeeper log and /var/log/messages – DN, client IP address, RB/WMS job ID, local account, batch system job ID – For “lcg” job managers only /var/log/messages has batch system job ID /opt/edg/var/gatekeeper/grid-jobmap_* summarize job records – Also the user’s VOMS attributes – Directory is defined in /etc/sysconfig/globus /etc/sysconfig/dgas-add-record.conf /var/log/gridftp-session.log – DN, client hostname, local account, file transferred /var/log/globus-gridftp.log – Client IP address, local account, file transferred

Job traces (1) Maarten Litmaath (CERN), EGEE’0810 /var/log/globus-gatekeeper.log JMA 2008/09/17 20:15:04 GATEKEEPER_JM_ID :15: has EDG_WL_JOBID ' JMA 2008/09/17 20:15:12 GATEKEEPER_JM_ID :15: for /DC=ch/DC=cern/..... on JMA 2008/09/17 20:15:12 GATEKEEPER_JM_ID :15: mapped to opssgm (8892, 2688) JMA 2008/09/17 20:15:12 GATEKEEPER_JM_ID :15: has GRAM_SCRIPT_JOB_ID :lcglsf:internal_ : manager type lcglsf JMA 2008/09/17 20:15:13 GATEKEEPER_JM_ID :15: JM exiting /var/log/messages Sep 17 20:15:12 ce111 gridinfo[24172]: JMA 2008/09/17 20:15:12 GATEKEEPER_JM_ID :15: has GRAM_SCRIPT_JOB_ID :lcglsf:internal_ : manager type lcglsf Sep 17 20:15:48 ce111 gridinfo: [ ] Submitted job :lcglsf:internal_ : to batch system lcglsf with ID Sep 17 20:20:48 ce111 gridinfo: [ ] Job :lcglsf:internal_ : (ID ) has finished

Job traces (2) Maarten Litmaath (CERN), EGEE’0811 /opt/edg/var/gatekeeper/grid-jobmap_ "localUser=8892" "userDN=/DC=ch/DC=cern/....." "userFQAN=/ops/Role=lcgadmin/Capability=NULL" "userFQAN=/ops/Role=NULL/Capability=NULL" "jobID= "ceID=ce111.cern.ch:2119/jobmanager-lcglsf-grid_ops" "lrmsID= " "timestamp= :15:48" For job submission failures – lrmsID may be “FAILED” – Local account home directory may contain relevant gram_job_mgr_*.log file which may provide a clue

How to fix errors Maarten Litmaath (CERN), EGEE’0812 Keep your services up to date w.r.t. gLite updates GOC Wiki trouble shooting pages describe most common errors – – Some entries are out of date – Recently updated entries usually fairly correct Regional grids also have Wiki pages and mailing lists Your ROC will assist you if needed – Open a GGUS ticket The LCG-Rollout list may be of help

How to ban a user or a VO Maarten Litmaath (CERN), EGEE’0813 In /opt/glite/etc/lcas/ban_users.db – Add a line for each DN that is banned – Each line must start with the DN surrounded by double quotes One can simply copy the corresponding line from the grid-mapfile Trailing fields are ignored – Nothing needs to be restarted To ban a VO reconfigure the service without that VO – Will also adapt the information system

How to map suspect account to DN Maarten Litmaath (CERN), EGEE’0814 /opt/edg/var/gatekeeper/grid-jobmap_* – Only for jobs submitted to the batch system – Does not catch “fork” jobs running on the lcg-CE itself Used by RB/WMS/Condor-G for “grid_monitor” processes Useful for debugging or small-scale tests via globus-job-run Can be abused… /var/log/globus-gatekeeper.log, /var/log/messages – “JMA” records – Time stamps may disambiguate multiple DNs mapped to the same static account Which DN started this dubious “xyzsgm” process on my CE? /etc/grid-security/gridmapdir – Only works for pool accounts

Script to map pool account to DN Maarten Litmaath (CERN), EGEE’ Essence: my $gmd = "/etc/grid-security/gridmapdir"; my $gmf = "/etc/grid-security/grid-mapfile"; open(GMD, "ls -lai $gmd |"); while ( ) { = split; if ($f[-1] eq $user) { (my $dn = $map{$f[0]}) =~ s/%(..)/chr(hex($1))/eg; # decode... $dn =~ s/:.*//; # remove VOMS UNIX groups if ($dn eq "") { print "Not in use: $user\n"; exit 0; } open(GMF, "$gmf") or die "$gmf: $!\n"; while ( ) { if (/^\s*("$dn")\s/i) { print "$1\n"; exit 0; } print "Not in grid-mapfile: \"$dn\"\n"; # user probably left the VO exit 0; } $map{$f[0]} = $f[-1]; } print STDERR "Not found: $user\n"; exit 1;

How to handle suspicious jobs Maarten Litmaath (CERN), EGEE’08 – v216 Pause or stop the batch system queues Suspend all active jobs, if the batch system supports it Stop gatekeeper and gridftp-server while suspected DNs not yet identified Ban suspected DNs or VO Keep the active jobs submitted by the suspected accounts suspended if possible, to facilitate forensic investigations – Otherwise kill the jobs Follow the EGEE Incident Response Procedure –

Software manager compromise Maarten Litmaath (CERN), EGEE’0817 If a software manager (“sgm”) account is suspected – Prevent the start of new jobs for that VO – Suspend or kill running jobs of that VO – The software area for the affected VO must be reinstalled from scratch before new jobs can be accepted for that VO Avoid risk of Trojan horses

Security recommendations Maarten Litmaath (CERN), EGEE’0818 Use pool accounts instead of static accounts where possible – See Configure enough pool accounts so that recycling occurs rarely – Check /var/log/lcg-expiregridmapdir.log Ensure that each VO software area is only writable by the software managers for that VO – Group-writable only for the “sgm” accounts group, not the VO Do not mount the VO software areas on the lcg-CE, but only on the WNs and on the VOBOXes – Reduced exposure, reduced risk