Scale and Performance in a Distributed File System

Scale and Performance in a Distributed File System
John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Satyanarayanan, Robert N. Sidebotham, Michael J. West Carnegie Mellon University, ACM Transactions on Computer System, Vol. 6., No. 1, February 1988, Presentation by: Amberly Rowles-Lawson

Introduction & Motivation
Paper discusses Andrew File System (AFS) Scalable distributed file system Focus on improving scalability Large number of users without degradation of performance ( ,000 users) Support simplified security Simplify system administration System that was in action in 1988 at carnegie melon university .Previous papers to discuss the system, this one has a focus on how to improve it when scaled

Outline What is AFS? Prototype Improvements Conclusions
Testing of the prototype Improvements Scalability Operational Testing Comparison with other distributed file system (NSF) Conclusions

Overview of AFS – What are DFS?
Distributed file systems Provide access to data stored at servers using file system interfaces Lots of challenges; Fault tolerant, recoverable, highly available, consistent, predictable, etc.. Files system interfaces Open, close and check on files Read/Write data to files Lock files Overall able to manage files

Prototype Built to validate basic file system architecture
Gain feedback on the design Venus a dedicated process to deal with all requests from client Persists until communication is terminated User-level locking implemented Each Vice server stores directory hierarchy Mirroring structure of Vice files .admin directory Vice file status info .stub directory location database Venus on a client workstation

Prototype – Cont’ Vice-Venus interface names files by full pathname
Name resolution performed by servers No low-level name such as inode Venus considers all cached files suspect Verifies timestamp with server responsible for file Each open has at least one interaction with server Polling type approach Inode – stores information about files and directioes and other stuff Inode number – indexes a table of inodes in a known location on the device, from the inode number the file system driver portion of the kernel acces the contents of the inode including the location of the filled, allowing access to the file

Limitations of Prototype
Commands involving Vice were noticeably slower than local files Stat call – Would sometimes be called more than once a file Server side overload due to too many processes High VM paging demands Network resources in the kernel frequently exhausted Moving user directories across servers was difficult If disk full, easier to add another disk than to move Unable to implement Disk Quotas on users Stat call to obtain information about files before opening them, cache validity check – would need to be performed for a file even if already in local cache Remote Procudure call caused kernel to be frequently exhausted Single process for each user was good if there was a failure, would only affect that one person

Prototype – Benchmark Collection of operations that represent actions an average user would use, corresponds to load unit (5 AFS users) MakeDir – Constructs target subtree Copy – Copies every file from source to target subtree ScanDir- Examines status of every file in subtree ReadAll – Scans every byte of every files in subtree Make – Compiles and links all the files in the subtree Each experiment performed 3 times, numbers in parenthesis are s.d

Prototype- Benchmark Testing
Looked at distribution of calls to Vice Skewed towards TestAuth – Validate cache entries GetFileStat – Gets status info about files absent from cache We see servers mostly cache validations, status requests, they make up around 90% of all operations where only 6% of operations are filie transferes, and the fetch to store ration is 2:1

Prototype- Benchmark Testing cont’
Tested prototype with different loads For benchmark TestAuth Rose rapidly beyond load of 5

Prototype- Benchmark Testing cont’
Server CPU/Disk utilization profile CPU is a performance bottleneck Frequent context switches (from lots of processes) Time spent traversing full pathnames We see that over a short amount of time the system util is arealy up to 75%, this is caused by context swtiching and pathname traversal, Context swticing is the computing process of storing and restoring the state (context) of a CPU so that execution can be resumed from the same point at a later time. This enables multiple processes to share a single CPU Overall can see prototype is not great, many places for improvement!

Problems with Prototype
Many problems Too slow Not very scalable Not administration/security friendly Solutions Better Cache Management Name Resolution Communication and server process structure Low-Level storage representation The Volume!

Overview of AFS – General Structure
Vice: Set of servers Venus: User-level process on workstation Caches and stores files from Vice Contacts Vice only when file is open or closed VICE ->Vast Integrated Computing Environment file system A homoheneous loaction transparante file name space to all the client workstations The operating system on each workstation intercept file system call and forward them to auser level process on their workstation. Venus then cahce files from vice and stores modified copies of files back on the servers they came from. Venus contacts vice only when a file is opened or closed. Reading and writing of indiviual bytes of a file are performed direction on the cached copy and bybpass venus. Venus performs as much of the work as possible VIRTUE: Virtue is Reached Through Unix and Emacs As much work as possible is performed by Venus two caches, one for file and the toher for file status . Vice only covers functions essential to integrity, availability or security. Only minimial communication between servers Image:

Improving the Prototype -Cache Management
Two caches Status of files – in VM to allow rapid servicing of stat calls Data – in local disk Modifications to cached files are done locally and reflected back to Vice when file is closed Whole files are cached Venus intercepts only the opening and closing of files Assumes caches entries are valid unless notified Server promises to notify before allowing any changes (Callback) Each server and Venus maintain callback state information Polling -> event based notifications Greatly reduces validation traffic Possible poor performance during reads thatdo not access the whole file

Improving the Prototype –Name Resolution
Venus only aware of pathnames – no notion of inode High CPU overhead Two level names Each Vice file or directory identified by unique FID Fid – 96 bits Volume Number – Identifies a collection of files located on one server Vnode Number – Index used as an index into a file storage information array Uniquifier – Ensures uniqueness of Fids allows for Vnode Numbers to be re-used Volumes are located through replicated volume location database – manageable size Venus performs logical equvalent of a namei operation and maps Vice pathenames Moving files form one server to another does not indalidate the contents of directories cached on workstation. Agreation of files into volumes keeps location database at a managable size

Improving the Prototype –Communication and Server Process Structure
Using a server process per client does not scale well Use user-level mechanism to support multiple Lightweight Processes (LWPs) Bound to a particular client for the duration of a single server operation Keeps communication out of kernel – can support many more clients per server

Improving the Prototype – Low-Level Storage Representation
Files hold Vice data Files accessed by their inodes rather than pathnames Vnode information for a Vice file identifies the inode of the file storing its data Data access is rapid Using index of a fid on a table to look up vnode info Iopen call to read or write data Nearly all pathname lookups are eliminated Vnode information found in FID

Fixing Operability of AFS - Problems
Vice constructed out of collections of files Only entire disk partitions could be mounted, risk of fragmentation if partitions not large enough Movement of files across servers was difficult Impossible to implement quota system Hard to replicate files consistently Standard utilities to create backups were not enough for multi-site system Could not backup a users file unless entire disk partition was taken offline

Fixing Operability of AFS- Volumes
Volume – collection of files forming partial subtree of Vice name space Volume resides within a single disk partition on a server – usually many per partition, one per user Can easily move volumes, when moved update volume location database Moved using frozen copy-on-write snapshot of the volume and shipping it to the new site. If volume at original site changes during this process can repeat the process by cloning only the files that have changed,

Fixing Operability of AFS
Quotas Assigned using volumes, each user is assigned a fixed volume with a fixed quota Backup To back up a volume create a read-only clone which is then dumped to tape Volume provide Operational Transparency Allow disk usage quotas Volumes to be easily moved between servers

Improving Prototype – Overall Design
Open file with pathname P on a workstation (client) Workstation Server Kernel If D is in the cache and has a callback on it If D is in the cache but no call back on it, new copy of D is fetched, callback established If D is not in cache, it is fetched from server and a callback is established Open cached copy of file Vice File P Venus If file is modified, write back to Vice on close At the end of the pathname traversal all the intermediate directores and the target file anre in the cache with callbacks on them. Future references to files require no network communication at all. LRU replaement is perodically run to reclaim cache sapce Simplified view, also would need to take into account authentication, protection checkign … but although first acces might be complicated and slightly slow all future accesses will be fast take advantage of locality Cache Callback established

Improving Prototype - Testing
Instantly see improvements in scalability Before Improvements After Improvements

Improving Prototype – Testing Cont’

Improving Prototype – Effect Cont’
CPU and Disk Utilization are also down CPU is still the bottleneck of performance Results are good!

Comparison with A Remote-Open file System
There exists other DFS that are different to AFS Data in the file are not fetched en mass. Remote sites participate in each individual read and write operation Compare AFS to Suns NFS Why Sun NFS? Sun NFS successful, tuned and refined Can run on same hardware as AFS Industry ‘standard’ *note* Not designed for large systems

Comparison Results Used benchmark to compare AFS with NFS
Had two subsets Cold Cache – workstation caches were cleared before each trial Warm Cache – caches were left unaltered between trials

Comparison Results Cont’
We see NFS performs better than AFS at small loads but degrades quickly NFS lack of a disk cache and the need to check with the server on each file open causes the time for it to be considerably more load dependant, The cache in angre improves the time only for the copy phase fo the bench mark

Difference in Utilization We see CPU and disk both saturate in NFS NFS uses more resrources even at low loads NFS generates nearly three times more packets than AFS at a load of one For a load of one

Advantage of remote–open file system Low latency Time to open file read one byte and close See when AFS has no data in cache takes a long time NFS independent of file size Overall AFS scales a lot better than NFS although nFS performs better at low loads and for a ‘cold’ cache. Future work involves getting AFS to run in the kernel potential for improvments

Future Works Keep testing with scaling
Be able to deal with the ,000+ users Moving Venus into the kernel to improve performance Using a industry standard intercept mechanism Increase portability of system Implement decentralized administration and physical dispersal of servers

Conclusion Implementing the Volume was helpful
Focus on scale effect for their tests Good results Only around 3500 users Overall able to greatly improve their initial prototype Still possible problems Security – if multiple clients access a file, no mechanism to assure protection Lots of wasted space needed to move volumes What if very large file?

Questions?

Scale and Performance in a Distributed File System

Similar presentations

Presentation on theme: "Scale and Performance in a Distributed File System"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Scale and Performance in a Distributed File System

Similar presentations

Presentation on theme: "Scale and Performance in a Distributed File System"— Presentation transcript:

Similar presentations

About project

Feedback