Presentation is loading. Please wait.

Presentation is loading. Please wait.

Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman Center for Information Technology Integration University of Michigan.

Similar presentations


Presentation on theme: "Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman Center for Information Technology Integration University of Michigan."— Presentation transcript:

1 Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan

2 Overview ● Basic NFSv4 in production ● Open Science Grid (OSG) Overview ● OSG Installation ● OSG Configuration ● Submitting a job! ● Authentication differences (AFS vs. NFSv4) ● Authentication futures

3 Basic NFSv4 file service in production ● Basic file storage ● User name mappings ● Home directories ● Kernel builds, etc.

4 Open Science Grid Overview ● Architecture – Head node & worker notes – Core is NSF Middleware Initiative (including Globus, Condor, kx.509) ● Authentication – X.509, kx.509, proxy certs ● No cluster file-system required – “Home”, Base, Data, Apps, Temp, Worker node temp

5 OSG Installation ● New Linux kernels, new NFSv4 code, new OSG releases, repeat! ● Base installation is done solely on head node ● Credentials needed – Root access assumed for local file system access ● Mapping machine cred now necessary – Kerberos credentials for NFS file system access ● Name-to-UID mapping issues – Found the need for tools/scripts for flushing mappings

6 OSG Configuration ● Daemons (i.e., MonALISA and Condor) on head node and worker nodes require authentication for file system access – Keytabs – More name to UID mapping required ● Virtual Organization (VO) accounts – DN to UNIX account name via grid-mapfile – Name to UID mappings required for file system access

7 Submitting a job! ● Job submission uses X.509 authentication – Need Kerberos authentication for file-system access – Gatekeeper forks a job manager process for each job ● Job manager process runs as the original user and needs user’s credentials ● Verified works as expected using AUTH_SYS w/o requiring Kerberos credentials

8 MGRID Architecture mod_ssl mod_kx509 mod_kct CHEF Apache Tomcat KCT GateKeeper Resource Grid Resource KCA kx509kinit User Workstation KDC Kerberos V5 SSL (Client Certificate required) GSI Kerberos SASL MGRID Portal 1 2 3 4 5 6 7 6 Authorization Resource Mgr SASL 8 mod_ jk mod_ php LDAP Authorization LDAP libpkcs11 Browser

9 Grid job authentication issues ● Jobs scheduled to run in the future ● Long-running jobs (refreshing credentials) ● Combination of both (future and long-running) ● Distribution of user credentials to worker nodes for file system access

10 Authentication differences (AFS vs NFSv4) AFSNFSv4 Kernel uses tokensKernel uses GSS contexts Kernel assumes tokens were obtained prior to file access (klog) Kernel requests GSS context on-demand at the time of the (first) file access Single token for all file servers in a cell Separate service ticket (really GSS context) needed for each server

11 Current Architecture user kernel client server user process GSSD SVC GSSD NFS NFSD gss context cache gss context cache Credentials on Disk keytab KDC AS TGS 1 2 3 4 5 6 7 8 9 10 11 12 13

12 Authentication futures ● SPKM3 – Allows us to stay in X.509 world – Anonymous (DH) ● Certificate on server to prevent MIM – X.509 Certificates ● LIPKEY – Built on top of SPKM3 – Allows TLS-like password authentication

13 Linux kernel keys support (a.k.a. keyring) ● General credential storage in-kernel – thread-specific keyring – process-specific keyring – session-specific keyring (PAG-like via JOIN_SESSION_KEYRING) ● Different key types: ‘user’, ‘rpcsec_gss context’ ● Create, delete, link, search, revoke, etc. ● Quotas and permissions ● Referenced by serial # and description

14 MIT Kerberos ccache using keyring as backing storage Assumes a single “active” credentials cache Can store more than one ccache in same session keyring All user-level code Session | +---> krb5_cc_active (key: contains 0x00004f12) | +---> /tmp/krb5cc_20010_XF45C2 (keyring: id is 0x000023cd) | | | +---> kwc@CITI.UMICH.EDU (principal info) | +---> krbtgt/CITI.UMICH.EDU@CITI.UMICH.EDU | +---> nfs/screamer.citi.umich.edu@CITI.UMICH.EDU | +---> nfs/troy.citi.umich.edu@CITI.UMICH.EDU | +---> pop/citi.umich.edu@CITI.UMICH.EDU | +---> afs@CITI.UMICH.EDU | +---> /tmp/krb5cc_20010_umich (keyring: id is 0x00004f12) | +---> kwc@UMICH.EDU (principal info) +---> krbtgt/UMICH.EDU@UMICH.EDU +---> imap/tremors.itd.umich.edu@UMICH.EDU

15 Mount using keyring support ● Mount program will use keytab to set up machine credentials in keyring ● /sbin/request-key invoked and finds machine credentials ● Context is negotiated and “rpcsec_gss context” key instantiated

16 User access using keyring support ● Assumes they have credentials in keyring via kinit or PAM – No more looking around blindly for creds in filesystem – /sbin/request-key invoked and finds user’s session- specific credentials

17 Keyring issues ● Upcalls from asynchronous events ● Still need to tie “rpcsec_gss context” keys to Kerberos credentials

18 Future Architecture user kernel client server user process request-key handler SVC GSSD NFS NFSD gss context cache (in keyring) gss context cache KDC AS TGS 1 2 3 4 5 6 8 9 10 11 TGT keytab 7

19 Questions / Discussion http://www.citi.umich.edu/projects

20 References ● Open Science Grid – http://www.opensciencegrid.org ● MonALISA – http://monalisa.cacr.caltech.edu ● Condor – http://www.cs.wisc.edu/condorCondor ● Keyring – Kernel Source: /Documentation/keys.txt

21 Backup Slides

22 Krb5: Obtaining gss context ● TGT: currently stored in file system ● Per NFSD service ticket: currently stored in file system ● GSSD locates user credentials by convention (/tmp/krb5cc_uid) ● Synchronizing gss_context and credential problematic

23 Linux credential interface ● New system calls for kernel credential placement ● Available for upcoming PAG implementation ● Passed via upcall to GSSD ● Credential vs. gss context management no longer a problem

24 Linux Krb5 kernel credential ● Pass TGT to kernel as credential ● Stored in user process (PAG) ● Passed to GSSD via gss_init_sec_context upcall ● GSSD manages Krb5 NFSD service tickets ● Multiple in kernel TGTs vs. cross realm authentication

25 Client: LIPKEY with SPKM3 ● Initiator – Anonymous SPKM3 client ● Credential: – LIPKEY username and password – sent to server encrypted in SPKM3 session key ● Context – per LIPKEY(?) and SPKM3 gss context

26 Linux LIPKEY kernel credential ● LIPKEY credential (username and password) is per server. ● Not stored in kernel ● Instead, store information to be passed to GSSD which will prompt user for LIPKEY password for each NFSD.

27 Client: SPKM with X509 ● Initiator – password for long term user X.509 private key ● Credential – short term proxy X509 credential and private key (grid-proxy-init) ● Context – per SPKM gss context

28 Linux SPKM kernel credential ● Pass proxy (short term) X509 credential and private key to kernel as credential ● Stored in user process (PAG) ● Passed to GSSD via gss_init_sec_context upcall ● GSSD manages CA hierarchy and credential checking


Download ppt "Challenges Running an NFSv4- backed OSG Cluster Kevin Coffman Center for Information Technology Integration University of Michigan."

Similar presentations


Ads by Google