A Low-Bandwidth Network File System A. Muthitacharoen, MIT B. Chen, MIT D. Mazieres, NYU
Key Ideas A network file systems for slow or wide-area networks Exploits similarities between files or versions of the same file Avoids sending data that can be found in the server’s file system or the client’s cache Also uses conventional compression and caching Requires 90% less bandwidth than traditional network file systems
Working on slow networks Make local copies Must worry about update conflicts Use remote login Only for text-based applications Use instead a LBFS Better than remote login Must deal with issues like auto-saves blocking the editor for the duration of transfer
LBFS Exploits cross-file similarities especially with previous versions of the same file Auto-save files, … LBFS file server divides the files it stores into chunks and indexes the chunks by hash value LBFS client similarly indexes a large persistent file cache LBFS never transfers chunks that the recipient already has
Previous Work (I) AFS Callbacks require server to notify clients when a cached file has been modified Leases achieve same goal but have an expiration time Coda supports slow networks and even disconnected operation Defers some updates to saves bandwidth OceanStore applies Bayou’s conflict resolution mechanisms to a file system
Previous Work (II) Operation-based updates (Lee et al.) Proxy-client close to the server duplicates client computations in the hope of duplicating its output files Spring and Wetherall propose to use two large cooperating caches storing identical copies of the last n megabytes of network traffic Rsync uses directory tree mirroring at client and server.
LBFS LBFS provides close-to-open consistency Similar to AFS session consistency LBFS assumes clients will have a cache large enough to contain a user’s entire working set of files When possible, LBFS reconstitutes files using chunks of existing data in the file system and client cache instead of transmitting those chunks over the network
Indexing Issues Major challenge is keeping the index a reasonable size while dealing with shifting offsets Indexing conventional file blocks would not work Indexing and hashing overlapping file blocks at all offsets would require too much space
LBFS Solution Considers only non-overlapping chunks of files Sets chunk boundaries based on file contents to avoid sensitivity to shifting file offset Examines every overlapping 48-byte region of the file to selects boundary regions, or breakpoints, using Rabin fingerprints Expected chunk size is 8 KB plus the size of the 48-byte breakpoint window
Handling Insertions
More Indexing Issues Pathological cases Very small chunks Sending hashes of chunks would consume as much bandwidth as just sending the file Very large chunks Cannot be sent in a single RPC LBFS imposes minimum and maximum chuck sizes
The Chunk Database Indexes each chunk by the first 64 bits of its SHA-1 hash To avoid synchronization problems, LBFS always recomputes the SHA-1 hash of any data chunk before using it Simplifies crash recovery Recomputed SHA-1 values are also used to detect hash collisions in the database
Protocol Based on NFS version 3 Adds Extensions to exploit inter-file commonality (GETHASH) Leases Compresses all traffic using conventional gzip
File Consistency (I) Whenever a client makes any RPC on an LBFS file, it gets back a read lease on the file. If a user opens a file whose lease has expired, the client asks the server for the attributes of the file Grants the client a lease on the file. Client can check if it has the current version of the file in its cache If the file times have changed, client must obtain new contents of file from server
File Consistency (II) No need for write leases LBFS provides close-to-open consistency Server never demands back a dirty file If multiple clients are writing the same file,the last one to close the file will overwrite changes from the others File updates are atomic Limits damage caused by concurrent updates
Security Issues LBFS uses SFS security infrastructure Servers have public keys Messages are encrypted Specific security issue: A user could check whether the file system contains a particular chunk of data by observing subtle timing differences in server’s answer to CONDWRITE request
Implementation (I)
Implementation (II) Uses NFS Two NFS-related issues When server commits a temporary file to a target file, it must copy the contents of the temporary file onto the target file to preserve the target file i-node Hard to preserve previous contents of a truncated file Message order is guaranteed by TCP
Evaluation (I) Communality of data in /usr/local
Evaluation (II) Normalized bandwidth consumption (2 of 3 benchmarks)
Key First four bars of each workload show upstream bandwidth, the second four downstream bandwidth. CIFS is Windows natural network file system “Leases+Gzip” uses LBFS file caching, leases, and data compression but not its chunking scheme “LBFS, new DB” is LBFS starting with a a new database
Evaluation (III) Normalized application times
Key Execution times weere normalized orma,ized execution times Measurements made over a cable modem link with 384 Kb/sc uplink and 1.5 Mb/s downlink LAN data were obtained on a 100 Mb/s full- duplex LAN.
Conclusion Under normal circumstances, LBFS consumes 90% less bandwidth than traditional file systems. Makes transparent remote file access a viable and less frustrating alternative to running interactive programs on remote machines.