Scale and Performance in a Distributed File System

Slides:

Advertisements

Similar presentations

Andrew File System CSS534 ZACH MA. History  Originated in October 1982, by the Information Technology Center (ITC) formed with Carnegie Mellon and IBM.

Advertisements

Distributed Storage March 12, Distributed Storage What is Distributed Storage?  Simple answer: Storage that can be shared throughout a network.

CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.

Andrew File System (AFS)

G Robert Grimm New York University Disconnected Operation in the Coda File System.

Coda file system: Disconnected operation By Wallis Chau May 7, 2003.

Virtual Memory Chapter 8. Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.

Other File Systems: AFS, Napster. 2 Recap NFS: –Server exposes one or more directories Client accesses them by mounting the directories –Stateless server.

Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.

CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.

G Robert Grimm New York University Scale and Performance in Distributed File Systems: AFS and SpriteFS.

Caching. Andrew Security Andrew Scale and Performance Sprite Performance.

NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.

University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.

Distributed Databases

Lecture 23 The Andrew File System. NFS Architecture client File Server Local FS RPC.

Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.

1 The Google File System Reporter: You-Wei Zhang.

Hands-On Microsoft Windows Server 2008 Chapter 5 Configuring, Managing, and Troubleshooting Resource Access.

CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.

Networked File System CS Introduction to Operating Systems.

Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.

Advanced Operating Systems - Spring 2009 Lecture 21 – Monday April 6 st, 2009 Dan C. Marinescu Office: HEC 439 B. Office.

Distributed File Systems

Distributed File Systems Case Studies: Sprite Coda.

Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.

Chapter 20 Distributed File Systems Copyright © 2008.

Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.

Dr. M. Munlin Network and Distributed System Structures 1 NETE0516 Operating Systems Instructor: ผ. ศ. ดร. หมัดอามีน หมัน หลิน Faculty of Information.

Presented By: Samreen Tahir Coda is a network file system and a descendent of the Andrew File System 2. It was designed to be: Highly Highly secure Available.

Jinyong Yoon,  Andrew File System  The Prototype  Changes for Performance  Effect of Changes for Performance  Comparison with A Remote-Open.

Caching in the Sprite Network File System Scale and Performance in a Distributed File System COMP 520 September 21, 2004.

COT 4600 Operating Systems Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 3:00-4:00 PM.

Lecture 25 The Andrew File System. NFS Architecture client File Server Local FS RPC.

Review CS File Systems - Partitions What is a hard disk partition?

Introduction to AFS IMSA Intersession 2003 An Overview of AFS Brian Sebby, IMSA ’96 Copyright 2003 by Brian Sebby, Copies of these slides.

Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?

Hands-On Microsoft Windows Server 2008 Chapter 5 Configuring, Managing, and Troubleshooting Resource Access.

File-System Management

Chapter 17: Distributed-File Systems

Jonathan Walpole Computer Science Portland State University

Memory COMPUTER ARCHITECTURE

Maintaining Windows Server 2008 File Services

Chapter 11: File System Implementation

Chapter 12: File System Implementation

Andrew File System (AFS)

Chapter 2: System Structures

File System Implementation

The Client/Server Database Environment

Storage Virtualization

NFS and AFS Adapted from slides by Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift.

Page Replacement.

Chapter 17: Distributed-File Systems

CSE 451: Operating Systems Winter Module 22 Distributed File Systems

Scale and Performance in a Distributed File System

Distributed File Systems

DISTRIBUTED FILE SYSTEMS

Distributed File Systems

CSE 451: Operating Systems Spring Module 21 Distributed File Systems

Distributed File Systems

Outline Chapter 2 (cont) OS Design OS structure

CSE 451: Operating Systems Winter Module 22 Distributed File Systems

Chapter 15: File System Internals

Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.

Distributed File Systems

Chapter 17: Distributed-File Systems

File System Interface (cont)

Lecture 4: File-System Interface

Distributed File Systems

M05 DISTRIBUTED FILE SYSTEM

Presentation transcript:

Scale and Performance in a Distributed File System John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Satyanarayanan, Robert N. Sidebotham, Michael J. West Carnegie Mellon University, ACM Transactions on Computer System, Vol. 6., No. 1, February 1988, Presentation by: Amberly Rowles-Lawson

Introduction & Motivation Paper discusses Andrew File System (AFS) Scalable distributed file system Focus on improving scalability Large number of users without degradation of performance (5000-10,000 users) Support simplified security Simplify system administration System that was in action in 1988 at carnegie melon university .Previous papers to discuss the system, this one has a focus on how to improve it when scaled

Outline What is AFS? Prototype Improvements Conclusions Testing of the prototype Improvements Scalability Operational Testing Comparison with other distributed file system (NSF) Conclusions

Overview of AFS – What are DFS? Distributed file systems Provide access to data stored at servers using file system interfaces Lots of challenges; Fault tolerant, recoverable, highly available, consistent, predictable, etc.. Files system interfaces Open, close and check on files Read/Write data to files Lock files Overall able to manage files

Prototype Built to validate basic file system architecture Gain feedback on the design Venus a dedicated process to deal with all requests from client Persists until communication is terminated User-level locking implemented Each Vice server stores directory hierarchy Mirroring structure of Vice files .admin directory Vice file status info .stub directory location database Venus on a client workstation

Prototype – Cont’ Vice-Venus interface names files by full pathname Name resolution performed by servers No low-level name such as inode Venus considers all cached files suspect Verifies timestamp with server responsible for file Each open has at least one interaction with server Polling type approach Inode – stores information about files and directioes and other stuff Inode number – indexes a table of inodes in a known location on the device, from the inode number the file system driver portion of the kernel acces the contents of the inode including the location of the filled, allowing access to the file

Limitations of Prototype Commands involving Vice were noticeably slower than local files Stat call – Would sometimes be called more than once a file Server side overload due to too many processes High VM paging demands Network resources in the kernel frequently exhausted Moving user directories across servers was difficult If disk full, easier to add another disk than to move Unable to implement Disk Quotas on users Stat call to obtain information about files before opening them, cache validity check – would need to be performed for a file even if already in local cache Remote Procudure call caused kernel to be frequently exhausted Single process for each user was good if there was a failure, would only affect that one person

Prototype – Benchmark Collection of operations that represent actions an average user would use, corresponds to load unit (5 AFS users) MakeDir – Constructs target subtree Copy – Copies every file from source to target subtree ScanDir- Examines status of every file in subtree ReadAll – Scans every byte of every files in subtree Make – Compiles and links all the files in the subtree Each experiment performed 3 times, numbers in parenthesis are s.d

Prototype- Benchmark Testing Looked at distribution of calls to Vice Skewed towards TestAuth – Validate cache entries GetFileStat – Gets status info about files absent from cache We see servers mostly cache validations, status requests, they make up around 90% of all operations where only 6% of operations are filie transferes, and the fetch to store ration is 2:1

Prototype- Benchmark Testing cont’ Tested prototype with different loads For benchmark TestAuth Rose rapidly beyond load of 5

Prototype- Benchmark Testing cont’ Server CPU/Disk utilization profile CPU is a performance bottleneck Frequent context switches (from lots of processes) Time spent traversing full pathnames We see that over a short amount of time the system util is arealy up to 75%, this is caused by context swtiching and pathname traversal, Context swticing is the computing process of storing and restoring the state (context) of a CPU so that execution can be resumed from the same point at a later time. This enables multiple processes to share a single CPU Overall can see prototype is not great, many places for improvement!

Problems with Prototype Many problems Too slow Not very scalable Not administration/security friendly Solutions Better Cache Management Name Resolution Communication and server process structure Low-Level storage representation The Volume!

Overview of AFS – General Structure Vice: Set of servers Venus: User-level process on workstation Caches and stores files from Vice Contacts Vice only when file is open or closed VICE ->Vast Integrated Computing Environment file system A homoheneous loaction transparante file name space to all the client workstations The operating system on each workstation intercept file system call and forward them to auser level process on their workstation. Venus then cahce files from vice and stores modified copies of files back on the servers they came from. Venus contacts vice only when a file is opened or closed. Reading and writing of indiviual bytes of a file are performed direction on the cached copy and bybpass venus. Venus performs as much of the work as possible VIRTUE: Virtue is Reached Through Unix and Emacs As much work as possible is performed by Venus two caches, one for file and the toher for file status . Vice only covers functions essential to integrity, availability or security. Only minimial communication between servers Image: https://wiki.engr.illinois.edu/.../

Improving the Prototype -Cache Management Two caches Status of files – in VM to allow rapid servicing of stat calls Data – in local disk Modifications to cached files are done locally and reflected back to Vice when file is closed Whole files are cached Venus intercepts only the opening and closing of files Assumes caches entries are valid unless notified Server promises to notify before allowing any changes (Callback) Each server and Venus maintain callback state information Polling -> event based notifications Greatly reduces validation traffic Possible poor performance during reads thatdo not access the whole file

Improving the Prototype –Name Resolution Venus only aware of pathnames – no notion of inode High CPU overhead Two level names Each Vice file or directory identified by unique FID Fid – 96 bits Volume Number – Identifies a collection of files located on one server Vnode Number – Index used as an index into a file storage information array Uniquifier – Ensures uniqueness of Fids allows for Vnode Numbers to be re-used Volumes are located through replicated volume location database – manageable size Venus performs logical equvalent of a namei operation and maps Vice pathenames Moving files form one server to another does not indalidate the contents of directories cached on workstation. Agreation of files into volumes keeps location database at a managable size

Improving the Prototype –Communication and Server Process Structure Using a server process per client does not scale well Use user-level mechanism to support multiple Lightweight Processes (LWPs) Bound to a particular client for the duration of a single server operation Keeps communication out of kernel – can support many more clients per server

Improving the Prototype – Low-Level Storage Representation Files hold Vice data Files accessed by their inodes rather than pathnames Vnode information for a Vice file identifies the inode of the file storing its data Data access is rapid Using index of a fid on a table to look up vnode info Iopen call to read or write data Nearly all pathname lookups are eliminated Vnode information found in FID

Fixing Operability of AFS - Problems Vice constructed out of collections of files Only entire disk partitions could be mounted, risk of fragmentation if partitions not large enough Movement of files across servers was difficult Impossible to implement quota system Hard to replicate files consistently Standard utilities to create backups were not enough for multi-site system Could not backup a users file unless entire disk partition was taken offline

Fixing Operability of AFS- Volumes Volume – collection of files forming partial subtree of Vice name space Volume resides within a single disk partition on a server – usually many per partition, one per user Can easily move volumes, when moved update volume location database Moved using frozen copy-on-write snapshot of the volume and shipping it to the new site. If volume at original site changes during this process can repeat the process by cloning only the files that have changed,

Fixing Operability of AFS Quotas Assigned using volumes, each user is assigned a fixed volume with a fixed quota Backup To back up a volume create a read-only clone which is then dumped to tape Volume provide Operational Transparency Allow disk usage quotas Volumes to be easily moved between servers

Improving Prototype – Overall Design Open file with pathname P on a workstation (client) Workstation Server Kernel If D is in the cache and has a callback on it If D is in the cache but no call back on it, new copy of D is fetched, callback established If D is not in cache, it is fetched from server and a callback is established Open cached copy of file Vice File P Venus If file is modified, write back to Vice on close At the end of the pathname traversal all the intermediate directores and the target file anre in the cache with callbacks on them. Future references to files require no network communication at all. LRU replaement is perodically run to reclaim cache sapce Simplified view, also would need to take into account authentication, protection checkign … but although first acces might be complicated and slightly slow all future accesses will be fast take advantage of locality Cache Callback established

Improving Prototype - Testing Instantly see improvements in scalability Before Improvements After Improvements

Improving Prototype – Testing Cont’

Improving Prototype – Effect Cont’ CPU and Disk Utilization are also down CPU is still the bottleneck of performance Results are good!

Comparison with A Remote-Open file System There exists other DFS that are different to AFS Data in the file are not fetched en mass. Remote sites participate in each individual read and write operation Compare AFS to Suns NFS Why Sun NFS? Sun NFS successful, tuned and refined Can run on same hardware as AFS Industry ‘standard’ *note* Not designed for large systems

Comparison Results Used benchmark to compare AFS with NFS Had two subsets Cold Cache – workstation caches were cleared before each trial Warm Cache – caches were left unaltered between trials

Comparison Results Cont’ We see NFS performs better than AFS at small loads but degrades quickly NFS lack of a disk cache and the need to check with the server on each file open causes the time for it to be considerably more load dependant, The cache in angre improves the time only for the copy phase fo the bench mark

Comparison Results Cont’ Difference in Utilization We see CPU and disk both saturate in NFS NFS uses more resrources even at low loads NFS generates nearly three times more packets than AFS at a load of one For a load of one

Comparison Results Cont’ Advantage of remote–open file system Low latency Time to open file read one byte and close See when AFS has no data in cache takes a long time NFS independent of file size Overall AFS scales a lot better than NFS although nFS performs better at low loads and for a ‘cold’ cache. Future work involves getting AFS to run in the kernel potential for improvments

Future Works Keep testing with scaling Be able to deal with the 5000-10,000+ users Moving Venus into the kernel to improve performance Using a industry standard intercept mechanism Increase portability of system Implement decentralized administration and physical dispersal of servers

Conclusion Implementing the Volume was helpful Focus on scale effect for their tests Good results Only around 3500 users Overall able to greatly improve their initial prototype Still possible problems Security – if multiple clients access a file, no mechanism to assure protection Lots of wasted space needed to move volumes What if very large file?

Questions?