AFS and NFS … 20 Years Later Mike Kazar kazar@netapp.com June 2005
Overview Inspired by talk by Brian Pawlowski (beepy) presented to a bunch of folks at IBM including lots of AFS people What were AFS’s goals? NFS’s goals? Innovation on client and server ends Storage management Then and now Some thoughts about the future Lessons learned Questions 9/20/2018
Part of AFS design team, with Who Am I? Part of AFS design team, with Bob Sidebotham (Rx, Volumes) Mike West (Server) Sherri Nichols (VLDB &c) M. Satyanarayanan (prototype RPC) Dave Nichols (prototype client) Coded parts of AFS cache manager kernel port of Rx Now at NetApp 9/20/2018
Management was an afterthought AFS Goals in 1984 Distributed connect lots of workstations together Scalability did I mention *lots*? key approach was caching the fastest RPC is the one you don’t make Security needed isolation from random students Management was an afterthought but turned out to be critical volumes volume *moves* mirroring 9/20/2018
Key points from Beepy’s talk What is NFS? IETF standard bundled with all Unix/Linux systems available on nearly everything heterogeneous (systems, FSes) If NFS is the answer, what was the question? 9/20/2018
Stick a fork in it… FTAM 9/20/2018
Clients, Servers, an’ ‘at So, why am I smiling? Clients, Protocols, and Servers Different goals for each 9/20/2018
Reference implementation Clients OS integration Bug-free or you get calls all the time Reference implementation helps portability a great deal even though porting still lots of work “Vnode” layers in Windows, AIX, Solaris, &c 9/20/2018
Protocols Where NFS Really Shines Where NFS sucks Public protocol spec everyone knows they can implement it Interoperability tests (Connectathon) formally, helps certify who’s in the game informally, helps communication! set goals for future work Reference implementation as education tool Where NFS sucks cache coherence how did they blow this in NFS version 4? but still relatively minor few applications use DSM locking makes most things work 9/20/2018
Servers Where AFS Shines Data management global name space cell name spaces transparent move transparent load balancing mirrors flexibly allocated volumes snapshots / clones Usable ACLs suprisingly caching using memory caches today 9/20/2018
NFS is the protocol of choice Today NFS is the protocol of choice open licensing made it a no-brainer simplicity also a plus for growth Some AFS data management available today Snapshots on NetApp filers flexible volumes non-transparent moves Some AFS data management available “soon” transparent move load balancing mirrors multiple cells 9/20/2018
OnTAP NG Architecture 9/20/2018
Gigabit Ethernet Switch Architecture Detail ONTAP / NG: 2-Stage Distributed File System Request switched to appropriate back-end IP-based cluster network No client code changes Client Access Client Access Gigabit Ethernet Gigabit Ethernet Network Function TCP termination VLDB lookup Protocol translation to SpinFS Network Function TCP termination VLDB lookup Protocol translation to SpinFS SpinFS Protocol X Disk Function Caching Locking Disk Function Caching Locking Gigabit Ethernet Switch Fibre Channel Fibre Channel 12 9/20/2018
History as Knobs No knobs Too many knobs Just right initial filers one volume, period Suns, &c network parameters, exports, not much else Too many knobs tracking thousands of volumes figuring out restores where do I create a new volume? Just right self-managing based on guidance eg. this part of the name space is a database 9/20/2018
Policy-based management The Future Policy-based management declare part of name space “database” inherit RAID level, drive speed referenced abstractly constraint engine moves data around with limited system impact when desired tied to delegation sub-admins tied to name space parts sub-admins constrained by resource limits 9/20/2018
Ties to historical data The Future Ties to historical data what volumes are heavily loaded, and when? which volumes grow and shrink? what variation in size what variation in load what volumes were on this server? backup database issue charge-back 9/20/2018
The Future Quality of Service Important for managing applications but no common framework eg: to connnect job controller and storage so, need to start somewhere virtual servers, perhaps per volume? priorities vs. guaranteed bandwidth or ops 9/20/2018
Perfection is highly overrated Lessons Learned Perfection is highly overrated POSIX semantics never really required Huge effort in DCE/DFS Universities are similar to enterprises pointless politics and empire building 24x7 operation availability and reliability coordination with users nearly impossible slightly cheaper 9/20/2018
Technology Transfer is Hard Lessons Learned Technology Transfer is Hard 20 years to get AFS volume concept out only happened because Blake Lewis and Ed. Zayas went to NetApp More for transparent moves Spinnaker acquisition Beware second system syndrome and ignoring customer requirements DCE/DFS was nearly complete waste of time Morris’s point about 3rd systems independence from inventor is critical throw out bad ideas allows simplification 9/20/2018
No one pays attention to system management Lessons Learned No one pays attention to system management but it is critical to any technology deployment QoS is part of this scaling is part of this (managing resource pools) centralized error reporting dynamic reconfiguration Don’t be greedy! know where your real value is know how to get help IBM lived in fear that Sun would productize AFS! so bungled licensing 9/20/2018