nfsv4 and linux peter honeyman linux scalability project center for information technology integration university of michigan ann arbor
open source reference implementation u sponsored by sun microsystems u part of citi’s linux scalability project u ietf reference implementation u 257 page spec u linux and openbsd u interoperates with solaris, java, network appliance, hummingbird, emc,... u september 1 code drop for linux
what’s new? u lots of state u compound rpc u extensible security added to rpc layer u delegation for files - client cache consistency u lease-based, non-blocking, byte-range locks u win32 share locks u mountd: gone u lockd, statd: gone
nfsv4 state u state is new to nfs protocol u nfsv3: lockd manages state u compound rpc - server state u dos share locks - server and client state u delegation - server and client state u server maintains per-thread global state u client and server maintain file, lock, and lock owner state
server state per global thread u compound operations often use result of previous operation as arguments u nfs file handle is the coin of the realm u current file handle current working directory u some operations (rename) need two file handles - save file handle
compound rpc u hope is to reduce traffic u complex calling interface u partial results used u rpc/xdr layering u variable length: kmalloc buffer for args and recv u want to xdr args directly into rpc buffer u want to allow variable length receive buffer
rpc/xdr layering u rpc layer does not interpret compound ops u replay cache: locking vs. regular u have to decode to decide which replay cache to use
example: mount compound rpc putrootfh lookup getattr getfh
nfsv4 mount u server pseudofs joins exported subtrees with a read only virtual file system u any client can mount into the pseudofs u users browse the pseudofs (via lookup)
nfsv4 pseudofs u access into exported sub trees based on user’s credentials and permissions client /etc/fstab doesn’t change with servers export list server /etc/exports doesn’t need to maintain an ip based access list
the server boots, parses /etc/exports, creates the pseudo fs, mirroring the local fs up to the exported directories. the local fs exported directories are mounted on their pseudo fs counterparts. mounting a pseudo file system nfsv4 client / d ba e f c ghi local fs directory pseudo fs directory exported directory Pseudo FS the client boots and mounts a directory of the pseudo fs with the AUTH_SYS security flavor. user has read-only access to the pseudo fs, and traverses the pseudo fs until encountering an exported directory. user creds the first nfsv4 procedure that acts on the exported directory causes nfsd to return NFS4ERR_WRONGSEC, causing the client to call SECINFO and obtain the list of security flavors on the exported directory. ba / Local FS the user’s permissions in the negotiated security realm determine access to the exported directory. before the first open, the client calls SETCLIENTID to negotiate a per-server unique client identifier.
rpcsec_gss u mit krb5 gssrpc and sesame are open source, but neither is really rpcsec_gss u sun released their rpcsec_gss, a complete rewrite of onc u gss & sun onc a tough match u both are transport independent u gss channel bindings / onc xprt u overloading of programs’ null_proc
kernel rpcsec_gss u rpc layering had to be violated u gss implementations are not kernel safe u security service code not kernel safe (kerberos 5) u kernel security services implemented as rpc upcalls to a user-level daemon, gssd u but only some services - e.g. encryption - need to be in the kernel
rpcsec_gss: where are we now? u (mostly) complete user-level kerberos 5 implementation u linux kernel implementation with kerberos 5 u mutual authentication u session key setup u no encryption u gssd
kerberos 5 security initialization nfs client nfsd kernel gssd kernel user gssd kerberos 5 kdc ,10 nfsv4 compound procedure 5,8 nfsv4 overloaded null procedure 1,4,6,7 gssd rpc interface ,3 kerberos 5 tcp/ip
locking u lease based locks u no byte range callback mechanism u server defines a lease for per client lock state u server can reclaim client state if lease not renewed u open sets lock state, including lock owner (clientid, pid) u server returns lock stateid
locking u stateid mutating operations are ordered (open, close, lock, locku, open_downgrade) u lock owner can request a byte range lock and then: u upgrade the initial lock u unlock a sub-range of the initial lock u server is not required to support sub-range lock semantics
server lock state u need to associate file, lock, lock owner, & lease u per lock owner: lock sequence number u per file state: in hash table u may move file state into struct file private area
server lock state u lock owners in hash table u server doesn’t own the inode u lock state in linked list off file state u stateid: handle to server lock state u per client state: in hash table - lock lease
client lock state u lock owners: in hash table u per lock owner lock sequence number u use struct file private data area u client owns the inode, use private inode data area
client lock state u use inode file_lock struct private data area for byte range lock state u (eventually) store same locking state as the server for delegated files u use the super block private data area to store per server state (returned clientid)
delegation u intent is to reduce traffic u server decides to hand out delegation at open u if client accepts, client provides callback u many read delegations, or one write delegation
delegation u when client delegates a cached file it handles: u all locking, share and byte range u future opens u client can’t reclaim a delegation without a new open u no delegation for directories
server delegation state u associates delegation with a file u delegation state in linked list off file state u stateid: separate from the lock stateid u client call back path
linux vfs changes u shared problem: open with o_excl described by peter braam u nfsv4 implements win32 share locks, which require atomic open with create u linux 2.2.x and linux 2.4 vfs is problematic
linux vfs changes u to create and open a file, three inode operations are called in sequence u lookup resolves the last name component u create is called to create an inode u open is called to open the file
xopen u inherent race condition means no atomicity u we partially solved this problem u we added a new inode operation which performs the open system call in one step u int xopen(struct file *filep, struct inode *dir_i, struct dentry *dentry, int mode) u if the xopen() inode operation is null, the current two step code is used u nfsv4 open subsumes lookup, create, open, access
user name space u local file system uses uid/gid u protocol u different security can produce different name spaces
user name space u unix user name u kerberos 5 realm u pki realm - x500 or dn naming u gssd to local file system representation
open issues u local file system choices u currently ext2 u acl implementation will determine fs for linux 2.4 u kernel additions and changes u rpc rewrite u crypto in the kernel u atomic open
next steps u march 31 - full linux 2.4 implementation, without acl’s u june 30 - acl’s added u network appliance sponsored nfsv3/v4 linux performance project
any questions?