Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN.

Slides:



Advertisements
Similar presentations
29 June 2006 GridSite Andrew McNabwww.gridsite.org VOMS and VOs Andrew McNab University of Manchester.
Advertisements

The LHC experiments AuthZ Interoperation requirements GGF16, Athens 16 February 2006 David Kelsey CCLRC/RAL, UK
Role Based VO Authorization Services Ian Fisk Gabriele Carcassi July 20, 2005.
OSG AuthZ Architecture AuthZ Components Legend VO Management Services Grid Site GUMS Site Services SAZ CE Gatekeeper Prima Is Auth? Yes / No SE SRM gPlazma.
Implementing Finer Grained Authorization in the Open Science Grid Gabriele Carcassi, Ian Fisk, Gabriele, Garzoglio, Markus Lorch, Timur Perelmutov, Abhishek.
Andrew McNab - EDG Access Control - 14 Jan 2003 EU DataGrid security with GSI and Globus Andrew McNab University of Manchester
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
Open Science Grid Use of PKI: Wishing it was easy A brief and incomplete introduction. Doug Olson, LBNL PKI Workshop, NIST 5 April 2006.
New VOMS servers campaign GDB, 8 th Oct 2014 Maarten Litmaath IT/SDC.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
SSC2 and Update on Multi-user Pilot Jobs Framework Mingchao Ma, STFC – RAL HEPSysMan Meeting 20/06/2008.
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
May 8, 20071/15 VO Services Project – Status Report Gabriele Garzoglio VO Services Project – Status Report Overview and Plans May 8, 2007 Computing Division,
Apr 30, 20081/11 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Apr 30, 2008 Gabriele Garzoglio.
PanDA Multi-User Pilot Jobs Maxim Potekhin Brookhaven National Laboratory Open Science Grid WLCG GDB Meeting CERN March 11, 2009.
Mine Altunay OSG Security Officer Open Science Grid: Security Gateway Security Summit January 28-30, 2008 San Diego Supercomputer Center.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Security and Job Management.
Mar 28, 20071/9 VO Services Project Gabriele Garzoglio The VO Services Project Don Petravick for Gabriele Garzoglio Computing Division, Fermilab ISGC 2007.
EU DataGrid (EDG) & GridPP Authorization and Access Control User VOMS C CA 2. certificate dn, ca, key 1. request 3. certificate 4. VOMS cred: VO, groups,
Grid User Management System Gabriele Carcassi HEPIX October 2004.
Global Grid Forum GridWorld GGF15 Boston USA October Abhishek Singh Rana and Frank Wuerthwein UC San Diegowww.opensciencegrid.org The Open Science.
EMI is partially funded by the European Commission under Grant Agreement RI Argus Policies Tutorial Valery Tschopp - SWITCH EGI TF Prague.
Maarten Litmaath (CERN), GDB meeting, CERN, 2006/02/08 VOMS deployment Extent of VOMS usage in LCG-2 –Node types gLite 3.0 Issues Conclusions.
EDG Security European DataGrid Project Security Coordination Group
Placeholder ES 1 CERN IT Experiment Support group Authentication and Authorization (AAI) issues concerning Storage Systems and Data Access Pre-GDB,
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
Mine Altunay July 30, 2007 Security and Privacy in OSG.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Direct gLExec integration with PanDA Fernando H. Barreiro Megino CERN IT-ES-VOS.
30-Sep-03D.P.Kelsey, SCG Summary1 Security Co-ordination Group (WP7 SCG) EDG Heidelberg 30 September 2003 David Kelsey CCLRC/RAL, UK
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
Overview of Privilege Project at Fermilab (compilation of multiple talks and documents written by various authors) Tanya Levshina.
Role Based VO Authorization Services Ian Fisk Gabriele Carcassi July 20, 2005.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
WebFTS File Transfer Web Interface for FTS3 Andrea Manzi On behalf of the FTS team Workshop on Cloud Services for File Synchronisation and Sharing.
Trusted Virtual Machine Images a step towards Cloud Computing for HEP? Tony Cass on behalf of the HEPiX Virtualisation Working Group October 19 th 2010.
VO Privilege Activity. The VO Privilege Project develops and implements fine-grained authorization to grid- enabled resources and services Started Spring.
OSG AuthZ components Dane Skow Gabriele Carcassi.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
EMI INFSO-RI Argus Policies in Action Valery Tschopp (SWITCH) on behalf of the Argus PT.
DTI Mission – 29 June LCG Security Ian Neilson LCG Security Officer Grid Deployment Group CERN.
EGEE is a project funded by the European Union under contract IST VO box: Experiment requirements and LCG prototype Operations.
VO Box Issues Summary of concerns expressed following publication of Jeff’s slides Ian Bird GDB, Bologna, 12 Oct 2005 (not necessarily the opinion of)
OSG Site Admin Workshop - Mar 2008Using gLExec to improve security1 OSG Site Administrators Workshop Using gLExec to improve security of Grid jobs by Alain.
LCG Support for Pilot Jobs John Gordon, STFC GDB December 2 nd 2009.
Workload management, virtualisation, clouds & multicore Andrew Lahiff.
EMI INFSO-RI Argus The EMI Authorization Service Valery Tschopp (SWITCH) Argus Product Team.
DIRAC Pilot Jobs A. Casajus, R. Graciani, A. Tsaregorodtsev for the LHCb DIRAC team Pilot Framework and the DIRAC WMS DIRAC Workload Management System.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Update Authorization Service Christoph Witzig,
WLCG Authentication & Authorisation LHCOPN/LHCONE Rome, 29 April 2014 David Kelsey STFC/RAL.
INFSO-RI Enabling Grids for E-sciencE SAML-XACML interoperability Oscar Koeroo.
Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,
SESEC Storage Element (In)Security hepsysman, RAL 0-1 July 2009 Jens Jensen.
Sep 17, 20081/16 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Sep 17, 2008 Gabriele Garzoglio.
Placeholder ES 1 CERN IT EGI Technical Forum, Experiment Support group AAI usage, issues and wishes for WLCG Maarten Litmaath CERN.
Maarten Litmaath (CERN), EGEE’08 1 Pilot Job Frameworks Review Introduction Summary GDB presentation.
Security and VO management enhancements in Panda Workload Management System Jose Caballero Maxim Potekhin Torre Wenaus Presented by Maxim Potekhin at HPDC08.
EMI is partially funded by the European Commission under Grant Agreement RI Argus Policies Tutorial Valery Tschopp (SWITCH) – Argus Product Team.
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Overview of the New Security Model Akos Frohner (CERN) WP8 Meeting VI DataGRID Conference Barcelone, May 2003.
Why you should care about glexec OSG Site Administrator’s Meeting Written by Igor Sfiligoi Presented by Alain Roy Hint: It’s about security.
Argus EMI Authorization Integration
AuthN and AuthZ in StoRM A short guide
Status of the SRM 2.2 MoU extension
Global Banning List and Authorization Service
How to enable computing
WLCG security landscape in EGI and beyond Maarten Litmaath CERN v1
WLCG Collaboration Workshop;
Presentation transcript:

Security aspects of the WLCG infrastructure: clients and services Maarten Litmaath CERN

Outline How it all should work Proxies Incoherence Security model examples Banning Argus Site authorization Pilot jobs Virtual machines and clouds Data security Other services SSO, identity providers Vulnerability aspects HEPiX , LBNL2 This list probably is incomplete…

How it all should work (1) Users and services have digital certificates signed by trusted certificate authorities (CAs) – Certificate lifetime usually is 1 year Users are members of virtual organizations (VOs) – WLCG: alice, atlas, cms, lhcb, dteam, ops, … – Users need to re-sign AUP every year – Sites decide which VOs to support at which QoS Services are rarely made members of a VO – It would be desirable to some extent A service could prove that it is trusted by the VO Now: rely on information system + filtering HEPiX , LBNL3

How it all should work (2) Users create short-lived proxies for grid access Long-lived proxies are only found on MyProxy servers Proxies are delegated to services as needed – Some services can retrieve or renew proxies via MyProxy Services interpret proxies consistently – The same criteria are used by different services – User jobs and data are protected as needed Services log security-related information consistently Users can easily be banned as needed HEPiX , LBNL4

Where we want to be HEPiX , LBNL5

Where we are HEPiX , LBNL6

Proxies (1) Plain grid proxy – Usage: grid-proxy-init – Mapping can only be based on the DN – DNs in grid-mapfile harvested from VOMS servers Different subsets can be mapped differently VOMS proxy – Usage: voms-proxy-init –voms vo voms-proxy-init –voms vo:/vo/group voms-proxy-init –voms vo:/vo/group/Role=role – Plain grid proxy + set of attributes signed by VOMS server – Attributes: groups and/or roles – Mapping can be based on attributes and/or the DN Attributes usually preferred HEPiX , LBNL7

Proxies (2) Proxy lifetime should be “short” – Cf. AFS/Kerberos token lifetime – Default 12 hours, 24 hours probably OK – Current practice: LHC experiments use multi-day proxies to avoid potential problems with proxy renewal CMS use 8-day proxies! Long job needs proxy to be renewed before it expires Long-lived proxies can be stored on a MyProxy server – Trusted services can retrieve or renew short-lived proxies MyProxy server currently is a single point of failure – RFE: upload proxies to multiple servers, try all of them for downloading proxies as needed HEPiX , LBNL8

Incoherence Different services treat proxies differently – Libraries – Mapping Plain proxies VOMS proxies – Logging – Banning Not possible on certain services! – Testing/debugging/forensics tools Available for some scenarios on some services Try finding two gLite services with the same security model ! – OSG, ARC? HEPiX , LBNL9

Security model examples LCG Computing Element – VOMS mapping with fallback on plain proxy mapping CREAM Computing Element – VOMS only OSG Computing Element – GUMS: VOMS, DN Disk Pool Manager – Virtual IDs – VOMS mapping and plain proxy mapping dCache – gPlazma: GUMS, vo-role-map, … Workload Management System – VOMS authZ by 2 different libraries: GridSite, LCMAPS But Condor-G engine only looks at the DN! HEPiX , LBNL10

Banning OSG have SAZ and GUMS, ARC have Charon EGEE/gLite: LCAS library and SCAS/Argus services have banning plugins – Easy to ban a DN – LCG-CE, CREAM-CE, WMS DPM/LFC virtual ID table will get banning flags – Currently only plain proxies can be fully banned By mapping them to non-existent accounts/VOs – VOMS proxies can be banned only from creating new files Argus should make this consistent and easy – Also can import a grid-wide ban list HEPiX , LBNL11

Argus Argus is the long-term gLite authorization framework It should give all gLite services a consistent authZ model It allows for authZ decisions to be taken centrally per site – A single place to pull the plug It can import remote policies – Regional, national, project-based, … – Give priority to local/national/… users – Banning of DNs, e.g. grid-wide Policies can affect QoS for DNs or VOMS attributes – Preferences – Banning Argus will be introduced gradually – It can coexist with legacy services HEPiX , LBNL12

Site authorization EGEE – SCAS Released to production early July for glexec on the WN Only deployed on the few sites that helped debugging glexec and its use by ATLAS and LHCb – Argus In certification OSG – GUMS – SAZ ARC – Charon – Argus support foreseen HEPiX , LBNL13

Pilot jobs (1) A pilot job checks and prepares the worker node environment for a real job, i.e. a task that it downloads from a central task queue – Late binding leads to good efficiency A multi-user pilot job can pick up a task from any user in the VO The task should run with its own associated proxy – Access services, store data etc. with the correct identity It should run under an account corresponding to that proxy – Separate users as the CE head node would have done – Protect the pilot proxy against malicious payloads A setuid root utility is needed to switch to the correct identity – Like “sudo” or Apache “suexec”  gLExec HEPiX , LBNL14

Pilot jobs (2) Each experiment has a pilot job framework – ALICE: AliEn – ATLAS: PanDA – CMS: glideinWMS, only used on OSG – LHCb: DIRAC All examined by GDB Pilot Job Frameworks Review group Current usage – Production managers run VO workload for many/all users – Individual users may be able to run their own jobs Foreseen usage – Pilot jobs use glexec to run payload under user account Problem: we have no production experience with glexec and there is little time left before the LHC starts HEPiX , LBNL15

Virtual machines and clouds Running each job in its own VM is desirable – Reduce security interference between jobs Shared software area and shared services remain – Local files left behind can be cleaned up completely – Implemented at some sites and becoming more popular Shared SW area not needed when SW included in the image – Avoids Trojan horses and bottleneck Complete images also are a natural fit for clouds Some sites are experimenting with clouds HEPiX , LBNL16

Data security (1) Fine-grained security policies for data access are possible in principle In practice there are only 2 levels of security today – Production managers are responsible for the vast majority of a VO’s data volume (99%) – Only they have write access to specific resources used in managing production data Reserved sub-trees in the catalog name space Reserved disk pools and tape access – All the remaining resources are group-writable By default writable for the whole VO! Different groups in a VO can be shielded from each other – If they are mapped differently – This may require site admin intervention HEPiX , LBNL17

Data security (2) BeStMan – Classic grid-mapfile, GUMS CASTOR – Classic grid-mapfile, insecure RFIO !! dCache – gPlazma supports GUMS, vo-role-mapfile, … DPM, LFC – Maps to virtual UIDs and GIDs (defined in DB) – Native VOMS support, fallback on classic grid-mapfile Lcgdm-mapfile to determine the VO for a plain grid proxy Grid-mapfile is needed by DPM GridFTP server StoRM – Native VOMS support – Uses just-in-time ACLs to give access to data on cluster FS HEPiX , LBNL18

Other services Information system – Insecure LDAP Anyone can search for vulnerable hosts Information can be corrupted (DNS spoofing, MITM attack) – Any site can claim it supports any VO The VO can configure a filter to get rid of unwanted sites or run a private, static information system – Filters currently work only for Computing and Storage Elements Monitoring – When secure, often viewable for any DN from a trusted CA Accounting – Secure – Privacy HEPiX , LBNL19

SSO, identity providers SSO for services is popular Identity providers – Kerberos – Shibboleth – … Why should grid usage be excluded? SSO identity can be translated into grid identity – FNAL Kerberos CA, SLCS – SWITCH SLCS – … HEPiX , LBNL20

Vulnerability aspects EGEE Grid Security Vulnerability Group has >70 open issues – The vast majority of them are deemed low risk …for now A complete list of domains involved in WLCG could be used to configure service firewalls accordingly – Outbound client connections might also be constrained Jobs/payloads should be signed by the user proxy – Close the door to “easy” injection of rogue jobs HEPiX , LBNL21

Conclusions Security aspects of WLCG clients and services show a forest of libraries, configurations and features – A lot of legacy More consistency and simplicity are highly desirable Some important functionalities only implemented partially – Banning – Site-wide policies – Data protection There are steady improvements and road maps – To get us out of the woods… HEPiX , LBNL22

HEPiX , LBNL23