FermiGrid - PRIMA, VOMS, GUMS & SAZ

Slides:



Advertisements
Similar presentations
Dec 14, 20061/10 VO Services Project – Status Report Gabriele Garzoglio VO Services Project WBS Dec 14, 2006 OSG Executive Board Meeting Gabriele Garzoglio.
Advertisements

Role Based VO Authorization Services Ian Fisk Gabriele Carcassi July 20, 2005.
Implementing Finer Grained Authorization in the Open Science Grid Gabriele Carcassi, Ian Fisk, Gabriele, Garzoglio, Markus Lorch, Timur Perelmutov, Abhishek.
Andrew McNab - EDG Access Control - 14 Jan 2003 EU DataGrid security with GSI and Globus Andrew McNab University of Manchester
E-science grid facility for Europe and Latin America A Data Access Policy based on VOMS attributes in the Secure Storage Service Diego Scardaci.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
GRAM: Software Provider Forum Stuart Martin Computational Institute, University of Chicago & Argonne National Lab TeraGrid 2007 Madison, WI.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
VOX Project Status T. Levshina. Talk Overview VOX Status –Registration –Globus callouts/Plug-ins –LRAS –SAZ Collaboration with VOMS EDG team Preparation.
May 8, 20071/15 VO Services Project – Status Report Gabriele Garzoglio VO Services Project – Status Report Overview and Plans May 8, 2007 Computing Division,
G RID M IDDLEWARE AND S ECURITY Suchandra Thapa Computation Institute University of Chicago.
Apr 30, 20081/11 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Apr 30, 2008 Gabriele Garzoglio.
Virtualization within FermiGrid Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
Mine Altunay OSG Security Officer Open Science Grid: Security Gateway Security Summit January 28-30, 2008 San Diego Supercomputer Center.
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
Metrics and Monitoring on FermiGrid Keith Chadwick Fermilab
VOMRS/VOMS-Admin Convergence and VO Services Project Status Tanya Levshina Computing Division, Fermilab.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks David Kelsey RAL/STFC,
Mine Altunay July 30, 2007 Security and Privacy in OSG.
Overview of Privilege Project at Fermilab (compilation of multiple talks and documents written by various authors) Tanya Levshina.
Role Based VO Authorization Services Ian Fisk Gabriele Carcassi July 20, 2005.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
4/25/2006Condor Week 1 FermiGrid Steven Timm Fermilab Computing Division Fermilab Grid Support Center.
Metrics and Monitoring on FermiGrid Keith Chadwick Fermilab
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
VO Privilege Activity. The VO Privilege Project develops and implements fine-grained authorization to grid- enabled resources and services Started Spring.
OSG AuthZ components Dane Skow Gabriele Carcassi.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
Mar 27, gLExec Accounting Solutions in OSG Gabriele Garzoglio gLExec Accounting Solutions in OSG Mar 27, 2008 Middleware Security Group Meeting Igor.
VO Membership Registration Workflow, Policies and VOMRS software (VOX Project) Tanya Levshina Fermilab.
Jun 12, 20071/17 AuthZ Interoperability – Status and Plan Gabriele Garzoglio AuthZ Interoperability Status and Plans June 12, 2007 Middleware Security.
DTI Mission – 29 June LCG Security Ian Neilson LCG Security Officer Grid Deployment Group CERN.
Virtual Organization Membership Service eXtension (VOX) Ian Fisk On behalf of the VOX Project Fermilab.
OSG Site Admin Workshop - Mar 2008Using gLExec to improve security1 OSG Site Administrators Workshop Using gLExec to improve security of Grid jobs by Alain.
June 6, 2006OSG - Draft VO AUP1 Open Science Grid Trust as a Foundation June 6, 2006 Keith Chadwick.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick & Steve Timm.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
VOX Project Tanya Levshina. 05/17/2004 VOX Project2 Presentation overview Introduction VOX Project VOMRS Concepts Roles Registration flow EDG VOMS Open.
The GRIDS Center, part of the NSF Middleware Initiative Grid Security Overview presented by Von Welch National Center for Supercomputing.
FermiGrid Keith Chadwick. Overall Deployment Summary 5 Racks in FCC:  3 Dell Racks on FCC1 –Can be relocated to FCC2 in FY2009. –Would prefer a location.
Sep 17, 20081/16 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Sep 17, 2008 Gabriele Garzoglio.
VOX Project Status T. Levshina. 5/7/2003LCG SEC meetings2 Goals, team and collaborators Purpose: To facilitate the remote participation of US based physicists.
Site Authorization Service Local Resource Authorization Service (VOX Project) Vijay Sekhri Tanya Levshina Fermilab.
FermiGrid Keith Chadwick Fermilab Computing Division Communications and Computing Fabric Department Fabric Technology Projects Group.
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
Development of the Fermilab Open Science Enclave Policy and Baseline Keith Chadwick Fermilab Work supported by the U.S. Department of.
April 18, 2006FermiGrid Project1 FermiGrid Project Status April 18, 2006 Keith Chadwick.
VOX Project Status Report Tanya Levshina. 03/10/2004 VOX Project Status Report2 Presentation overview Introduction Stakeholders, team and collaborators.
Why you should care about glexec OSG Site Administrator’s Meeting Written by Igor Sfiligoi Presented by Alain Roy Hint: It’s about security.
FermiGrid The Fermilab Campus Grid 28-Oct-2010 Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
Virtual Organization Management Registration Service (VOMRS) T. Levshina J. Weigand S. White Co-Authors: L. Bauerdick, G. Carcassi, I. Fisk, A. Heavey,
Virtualization within FermiGrid Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
FermiGrid Highly Available Grid Services Eileen Berman, Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
June 5, 2006gLexec1 gLexec (within the OSG and Fermilab) June 5, 2006 Keith Chadwick.
Dynamic Accounts: Identity Management for Site Operations Kate Keahey R. Ananthakrishnan, T. Freeman, R. Madduri, F. Siebenlist.
Virtual Organisations and the NGS Mike Jones Research Computing Services e-Science & “The Grid” for Bio/Health Informaticians, IT January 2008.
FermiGrid - PRIMA, VOMS, GUMS & SAZ Keith Chadwick Fermilab
Jean-Philippe Baud, IT-GD, CERN November 2007
OGF PGI – EDGI Security Use Case and Requirements
f f FermiGrid – Site AuthoriZation (SAZ) Service
THE STEPS TO MANAGE THE GRID
Update on EDG Security (VOMS)
Patrick Dreher Research Scientist & Associate Director
The New Virtual Organization Membership Service (VOMS)
Leigh Grundhoefer Indiana University
Grid Security M. Jouvin / C. Loomis (LAL-Orsay)
Grid Computing Software Interface
Presentation transcript:

FermiGrid - PRIMA, VOMS, GUMS & SAZ Keith Chadwick Fermilab chadwick@fnal.gov

What is FermiGrid? FermiGrid is: The Fermilab campus Grid. A set of common services to support the campus Grid: The site globus gateway, VOMS, VOMRS, GUMS, SAZ, MyProxy, Gratia Accounting, etc. A forum for promoting stakeholder interoperability and resource sharing within Fermilab. The portal from the Open Science Grid to Fermilab Compute and Storage Services: Production: fermigrid1, fngp-osg, fcdfosg1, fcdfosg2, docabosg2, sdss-tam, FNAL_FERMIGRID_SE (public dcache), stken, etc… Integration: fgtest1, fnpcg, etc… FermiGrid Web Site & Additional Documentation: http://fermigrid.fnal.gov/ 23 Oct 2006 Keith Chadwick

FermiGrid - Infrastructure Site Globus Gateway: Job forwarding gateway using Condor-G and CEMon. Makes use of “accept limited” globus gatekeeper option. VOMS & VOMRS: VO Membership Service & VO Management Registration Service . Allows user to select roles. GUMS: Grid User Mapping Service. maps FQAN in x509 proxy to site specific UID/GID. SAZ: Site AuthoriZation Service. Allows site to to make fine grained job authorization decisions. MyProxy: Service to security store and retrieve signed x509 proxies. 23 Oct 2006 Keith Chadwick

Site Gatekeeper Job Forwarding Why? Single point of control. Hide site internal details. Facilitate resource sharing. Allow (some) load balancing Support specification of user job requirements (via ClassAds). Why not? Complicates problem diagnosis. Non-standard configuration. Can confuse users. 23 Oct 2006 Keith Chadwick

Site Gateway Job Forwarding with CEMon and BlueArc - Animation VOMS Server Periodic Synchronization GUMS Server Step 1 - user issues voms-proxy-init user receives voms signed credentials Step 3 – Gateway requests GUMS Mapping based on VO & Role ? SAZ Server Step 4 – Gateway checks against Site Authorization Service Site Gateway Step 2 – user submits their grid job via globus-job-run, globus-job-submit, or condor-g Step 5 - Grid job is forwarded to target cluster clusters send ClassAds via CEMon to the site wide gateway BlueArc CMS WC1 CDF OSG1 CDF OSG2 D0 CAB2 SDSS TAM GP Farm LQCD 23 Oct 2006 Keith Chadwick

Globus gatekeeper - GUMS & SAZ interface GUMS and SAZ are interfaced to the globus gatekeeper through the gsi_authz callout: /etc/grid-security/gsi_authz.conf ##### PRIMA globus_mapping /usr/local/vdt/prima/lib/libprima_authz_module_gcc32dbg globus_gridmap_callout ##### SAZ globus_authorization /usr/local/vdt/saz/client/lib/libSAZ-gt3.2_gcc32dbg globus_saz_access_control_callout 23 Oct 2006 Keith Chadwick

SAZ - Site AuthoriZation Service We deployed the Fermilab Site AuthoriZation (SAZ) service on the Fermilab Site Globus Gatekeeper (fermigrid1) on Monday October 2, 2006. SAZ allows Fermilab to make Grid job authorization decisions for the Fermilab site based using the DN, VO, Role and CA information contained in the proxy certificate provided by the user. Fermilab has currently configured SAZ to operate in a default accept mode for user proxy credentials that are associated with VOs (user proxy credentials generated by voms-proxy-init). Users that continue to use grid-proxy-init may no longer be able execute on Fermilab Compute Elements. 23 Oct 2006 Keith Chadwick

SAZ Database Table Structure DN: user_name, enabled, trusted, changedAt VO: vo_name, enabled, trusted, changedAt Role: role_name, enabled, trusted, changedAt CA: ca_name, enabled, trusted, changedAt 23 Oct 2006 Keith Chadwick

SAZ - Site AuthoriZation Pseudo-Code Site authorization callout on globus gateway sends SAZ authorization request (example): user: /DC=org/DC=doegrids/OU=People/CN=Keith Chadwick 800325 VO: fermilab Role: /fermilab/Role=NULL/Capability=NULL CA: /DC=org/DC=DOEGrids/OU=Certificate Authorities/CN=DOEGrids CA 1 SAZ server on fermigrid4 receives SAZ authorization request, and: 1. Verifies certificate and trust chain. 2. If [ the certificate does not verify or the trust chain is invalid ]; then SAZ returns "Not-Authorized" fi 3. Issues select on "user:" against the SAZDB user table 4. if [ the select on "user:" fails ]; then a record corresponding to the "user:" is inserted into the SAZDB user table with (user.enabled = Y, user.trusted=F) 5. Issues select on "VO:" against the local SAZDB vo table 6. if [ the select on "VO:" fails ]; then a record corresponding to the "VO:" is inserted into the SAZDB vo table with (vo.enabled = Y, vo.trusted=F) 7. Issues select on ”Role:" against the local SAZDB role table 8. if [ the select on “Role:" fails ]; then a record corresponding to the "VO-Role:" is inserted into the SAZDB role table with (role.enabled = Y, role.trusted=F) 9. Issues select "CA:" against the local SAZDB ca table 10. if [ the select on "CA:" fails ]; then a record corresponding to the "CA:" is inserted into the SAZDB ca table with (ca.enabled = Y, ca.trusted=F) 11. The SAZ server then returns the logical and of (user.enabled, vo.enabled, vo-role.enabled, ca.enabled ) to the SAZ client (which was called by either the globus gatekeeper or glexec). 23 Oct 2006 Keith Chadwick

SAZ - Animation DN A D M I VO N SAZ Role Gatekeeper CA Job Job 23 Oct 2006 Keith Chadwick

SAZ - A Couple of Caveats What about grid-proxy-init or voms-proxy-init without a VO? The “NULL” VO is specifically disabled (vo.enabled=“F”, vo.trusted=“F”). If a user has user.trusted=“Y” in their user record then >>> we allow them to execute jobs without VO “sponsorship” <<<. This granting of user.trusted=“Y” is not automatic. The number of users with this privilege will be VERY limited. What about pilot jobs / glide-in operation? To comply with the (draft) Fermilab policy on pilot jobs, VO’s that submit pilot jobs will shortly be required to use glexec to launch their user portion of the glide-in jobs. SAZ authoriization requests from glexec may require that the VO to have role.trusted=“Y” in the VO specific role record that they are using for glide-in operations. The granting of role.trusted=“Y” will not be automatic. Authorization for trusted=“Y” flags in the SAZ database tables is granted and revoked by the Fermilab Computer Security Executive based on explicit trust relationships. 23 Oct 2006 Keith Chadwick

SAZ - Open Issues Extra /CN=<random number> in DN. Examples: /DC=org/DC=doegrids/OU=People/CN=Leigh Grundhoefer (GridCat) 693100/CN=1173547087 /DC=org/DC=doegrids/OU=People/CN=Leigh Grundhoefer (GridCat) 693100/CN=1642479879 /DC=org/DC=doegrids/OU=People/CN=Leigh Grundhoefer (GridCat) 693100/CN=1769868279 Result of user issuing grid-proxy-init. Does not occur in voms-proxy-init. Looking at code changes to handle “extra CN problem”. Condor fails to properly delegate the full voms proxy attributes. This can be worked around in condor_config by setting: DELEGATE_JOB_GSI_CREDENTIALS=FALSE A ticket on this issue has been opened with the Condor developers. Testing by Chris Green and John Weigand show that Reliable File Transfer (RFT) with WS-Gram is also failing to properly delegate the full voms attributes: RFT is using the full voms proxy for the first transaction, but uses a cached copy without the role information for the second transaction. A ticket on this issue has been opened with the Globus developers. 23 Oct 2006 Keith Chadwick

Draft Fermilab VO Trust Relationship Policy Fermilab will only accept jobs from Virtual Organizations (VOs) which have established trust relationships in good standing. Trust relationships can be requested by VO management by contacting Fermilab Computer Security, and are granted and revoked by the Fermilab Computer Security Executive. Some VOs such as CDF, D0, MINOS, LQCD, already possess a valid trust relationship with Fermilab due to overlap of staff or the umbrella of Fermilab's own operational and management controls. Other VOs will be expected to establish the trust relationship as described below in order to continue using Fermilab resources. Criteria for Establishing Trust Relationships: Policies and practices for mutual security are continually adjusted to meet changes in risk perceptions. (NIST) Acceptable use of Fermilab resources is governed by both the VO's and Fermilab's Acceptable Use Policies. The Open Science Grid's User AUP (V2.0, February 9, 2006) is an example of an AUP acceptable to Fermilab and applies to users operating under OSG's auspices. A VO must describe and operate its technical infrastructure in a transparent manner which permits verification of its functioning. A VO must have an operational organization with an appropriate number of staff members who respond to Fermilab requests (email and/or phone calls) within a reasonable time, generally during the normal business hours of its home site. A VO must have an established and published response plan to deal with security incidents and reports of unauthorized use, and the staff to implement the plan. Non-compliance with site policies by a VO or its members may trigger early or frequent re-examination of the trust relationship with the VO. 23 Oct 2006 Keith Chadwick

Draft Pilot Job Policy A Pilot Job (also called a glide-in or late-binding job) is a batch job which starts on a grid worker node but loads some other job, termed the User Job, which has been created by another user. Rules: Pilot Jobs will only be acceptable from VOs whose trust relationships with Fermilab include authorization to use them. A Pilot Job must use the site provided glexec facility to map the application and data files to the actual owner of the User Job. glexec will perform the necessary callout to the Grid User Management System (GUMS) and Site Authorization Service (SAZ), and the Pilot Job must respect the result of these Policy Decision Points. A Pilot Job and the User Job will not attempt to circumvent job accounting or limits on placed system resources by the batch system. A Pilot Job may launch multiple User Jobs in serial fashion, but must not attempt to maintain data files between jobs belonging to different users. When transferring a User Job into the worker node, the Pilot Job will use a level of security equivalent to that of the original job submission process. Consequences: Fermilab reserves the right to terminate any batch jobs that appear to be operating beyond their authorization, including Pilot Jobs and User Jobs not in compliance with this policy. The DN of the Job Manager or the entire VO may be placed on the Site Black List until the situation is rectified. Fermilab expects any VO authorized to run Pilot Jobs to assure compliance by its users. 23 Oct 2006 Keith Chadwick

glexec Joint development by David Groep / Gerben Venekamp / Oscar Koeroo (NIKHEF) and Dan Yocum / Igor Sfiligoi (Fermilab). Integrated (via “plugins”) with LCAS / LCMAPS infrastructure (for LCG) and GUMS / SAZ infrastructure (for OSG). glexec is currently deployed on a couple of small clusters at Fermilab, moving towards a “significant” deployment at Fermilab this week. Will be included in Condor 6.9.x. 23 Oct 2006 Keith Chadwick

glexec block diagram 23 Oct 2006 Keith Chadwick

High Availability / Service Redundancy Plans Gatekeeper: Redundant Condor_Master and Condor_Negotiator. VOMS: Sticky problem. Have requested a change to VOMRS that will make things much easier. GUMS: Have a test active/standby GUMS service operating with Linux-HA. Believe that we know how to implement an active/active service. SAZ: Can implement either active/standby or active/active. MyProxy: Need for MyProxy will be eliminated by new CEMon based job forwarding mechanism. 23 Oct 2006 Keith Chadwick

Metrics In addition to the normal operation effort of installing, running and upgrading the various FermiGrid services over the past year, we have spent significant effort to collect and publish operational metrics. Examples: Globus gatekeeper calls by jobmanager per day Globus gatekeeper IP connections per day VOMS calls per day VOMS server IP connections per day GUMS calls per day GUMS server IP connections per day GUMS server unique Certificates and Mappings per day SAZ Authorizations and Rejections per day SAZ server IP connections per day SAZ server unique DN, VO, Role & CA per day. Metrics collection scripts run once a day and collect information for the previous day. 23 Oct 2006 Keith Chadwick

Metrics - fermigrid1 23 Oct 2006 Keith Chadwick

Service Monitoring Service Monitor scripts run multiple times per day (typically once per hour). They gather detailed information about the service that they are monitoring. They also verify the health of the service that they are monitoring (together with any dependent services), notify administrators and automatically restart the service(s) as necessary to insure continuous operations. 23 Oct 2006 Keith Chadwick

Service Monitor - fermigrid1 23 Oct 2006 Keith Chadwick

Areas of Current Work within FermiGrid SAZ and glexec - nearing completion. BlueArc storage and public dcache storage element - ongoing. Further Metrics and Service Monitor Development - ongoing. Gratia Accounting. Web Services. XEN. Service Failover Research, Development & Deployment of future ITBs and OSG releases 23 Oct 2006 Keith Chadwick

Parting Comments Extracting metrics and service monitor information needs to be easier - trolling through (globus gatekeeper, voms, gums, saz) log files is not an efficient method. Having a uniform standard time format (and some sort of unique process/thread id) is essential. Problem diagnosis is also very difficult (our job forwarding gateway does compound this problem). David Bianco from Jefferson Lab gave a presentation on Sguil at the Fall 2006 HEPiX conference. Having a similar common interface for the globus gatekeepers and services log files together with the ability to correlate events from multiple sources would significantly improve problem diagnosis. https://indico.fnal.gov/conferenceDisplay.py?confId=384 https://indico.fnal.gov/materialDisplay.py?contribId=9&sessionId=17&materialId=slides&confId=384 23 Oct 2006 Keith Chadwick

fin Any questions? 23 Oct 2006 Keith Chadwick