Caching in the Sprite Network File System Scale and Performance in a Distributed File System COMP 520 September 21, 2004.

Slides:



Advertisements
Similar presentations
Chapter 4 Memory Management Basic memory management Swapping
Advertisements

Paging: Design Issues. Readings r Silbershatz et al: ,
More on File Management
Mendel Rosenblum and John K. Ousterhout Presented by Travis Bale 1.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 23: Distributed-File Systems (Chapter 17)
File Systems.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.
Distributed File Systems CS 3100 Distributed File Systems1.
Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE CS582: Distributed Systems Lecture 13, 14 -
Caching in Distributed File System Ke Wang CS614 – Advanced System Apr 24, 2001.
Distributed Systems 2006 Styles of Client/Server Computing.
Other File Systems: LFS and NFS. 2 Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger –So, a lot of reads do not require.
Lecture 17 I/O Optimization. Disk Organization Tracks: concentric rings around disk surface Sectors: arc of track, minimum unit of transfer Cylinder:
Other File Systems: AFS, Napster. 2 Recap NFS: –Server exposes one or more directories Client accesses them by mounting the directories –Stateless server.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
Distributed File System: Design Comparisons II Pei Cao Cisco Systems, Inc.
G Robert Grimm New York University Scale and Performance in Distributed File Systems: AFS and SpriteFS.
PRASHANTHI NARAYAN NETTEM.
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
1 Distributed Systems: Distributed Process Management – Process Migration.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
Lecture 23 The Andrew File System. NFS Architecture client File Server Local FS RPC.
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
Distributed File Systems 1 CS502 Spring 2006 Distributed Files Systems CS-502 Operating Systems Spring 2006.
Networked File System CS Introduction to Operating Systems.
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
Advanced Operating Systems - Spring 2009 Lecture 21 – Monday April 6 st, 2009 Dan C. Marinescu Office: HEC 439 B. Office.
Distributed File Systems
CHAPTER 2: COMPUTER-SYSTEM STRUCTURES Computer system operation Computer system operation I/O structure I/O structure Storage structure Storage structure.
Lecture 19: Virtual Memory
Distributed File Systems Case Studies: Sprite Coda.
Distributed File Systems Distributed file system (DFS) – a distributed implementation of the classical time-sharing model of a file system, where multiple.
Chapter 4 Storage Management (Memory Management).
Transparent Process Migration: Design Alternatives and the Sprite Implementation Fred Douglis and John Ousterhout.
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Jinyong Yoon,  Andrew File System  The Prototype  Changes for Performance  Effect of Changes for Performance  Comparison with A Remote-Open.
1 Memory Management. 2 Fixed Partitions Legend Free Space 0k 4k 16k 64k 128k Internal fragmentation (cannot be reallocated) Divide memory into n (possible.
Chapter 11: File System Implementation Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 11: File System Implementation Chapter.
CS333 Intro to Operating Systems Jonathan Walpole.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
4P13 Week 9 Talking Points
Review CS File Systems - Partitions What is a hard disk partition?
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Jonathan Walpole Computer Science Portland State University
File System Implementation
Andrew File System (AFS)
File System Implementation
NFS and AFS Adapted from slides by Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift.
Chapter 9: Virtual-Memory Management
Page Replacement.
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
Scale and Performance in a Distributed File System
Distributed File Systems
DISTRIBUTED FILE SYSTEMS
Distributed File Systems
CSE 451: Operating Systems Autumn 2005 Memory Management
CSE 451: Operating Systems Spring Module 21 Distributed File Systems
Distributed File Systems
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Distributed File Systems
Page Cache and Page Writeback
Distributed File Systems
M05 DISTRIBUTED FILE SYSTEM
Presentation transcript:

Caching in the Sprite Network File System Scale and Performance in a Distributed File System COMP 520 September 21, 2004

Agenda The Sprite file system Basic cache design Concurrency issues BenchmarkingAndrew

The Sprite file systems is functionally similar to UNIX Read, write, open, and close calls provide access to files Read, write, open, and close calls provide access to files Sprite communicates kernel-to-kernel Sprite communicates kernel-to-kernel Remote-procedure-calls (RPC) allow kernels to talk to each other Remote-procedure-calls (RPC) allow kernels to talk to each other RPC

Sprite uses caching on the client and server side Two different caching mechanisms Two different caching mechanisms –Server workstations use caching to reduce delays caused by disk accesses –Client workstations use caching to minimize the number of calls made to non-local disks Client Cache Server Cache Client Cache Network File Traffic Server Traffic Server Traffic File Traffic Disk Traffic Disk Traffic

Three main issues are addressed by Sprite’s caching system 1. Should client caches be kept in main memory or on local disk? 2. What structure and addressing scheme should be used for caching? 3. What should happen when a block is written back to disk?

Sprite caches client data in main memory, not on local disk Allows clients to be diskless Allows clients to be diskless –Cheaper –Quieter Data access is faster Data access is faster Physical memory is large enough Physical memory is large enough – Provides a high hit ratio –Memory size will continue to grow A single caching mechanism can be used for both client and server A single caching mechanism can be used for both client and server Local Disk

A virtual addressing structure is used for caching Data organized into blocks Data organized into blocks –4 Kbytes –Virtually addressed –Unique file identifier and block index –Both client and server cache data blocks Server also caches naming info. Server also caches naming info. –Addressed using physical address –All naming operations (open, close, etc.) passed to the server –Cached file info. lost if server crashed Client Cache Server Cache Data block Data block Data block Data block Data block Data block Data block Data block Data block Mgmt Info Mgmt Info Mgmt Info Open Close Read Write Data blocks

Sprite uses a delayed-write policy to write dirty blocks to disk Every 30 seconds dirty blocks which have not been changed in the last 30 seconds are written to disk Every 30 seconds dirty blocks which have not been changed in the last 30 seconds are written to disk Blocks written by a client are written to the server’s cache in seconds and to the server’s disk in more seconds Blocks written by a client are written to the server’s cache in seconds and to the server’s disk in more seconds Limits server traffic Limits server traffic Minimizes the damage in a crash Minimizes the damage in a crash 30 sec. Dirty Block Disk Dirty Block Client Server 30 sec. Untouched for 30 seconds Untouched for 30 seconds

Agenda The Sprite file system Basic cache design Concurrency issues BenchmarkingAndrew

Two unusual design optimizations differentiate system, solve problems Consistency guaranteed Consistency guaranteed –All clients see the most recent version of a file –Provides transparency to the user –Concurrent and sequential write-sharing permitted Cache size changes Cache size changes –Virtual memory system and file system negotiate over physical memory –Cache space reallocated dynamically

Concurrent write-sharing makes the file system more user friendly A file is opened by multiple clients A file is opened by multiple clients At least one client has the file open for writing At least one client has the file open for writing Concurrent write- sharing occurs Concurrent write- sharing occurs F1: R F1: W

Concurrent write-sharing can jeopardize file consistency Server detects concurrent write-sharing Server detects concurrent write-sharing Server instructs client B to write all dirty blocks to memory Server instructs client B to write all dirty blocks to memory Server notifies all clients that file is no longer cacheable Server notifies all clients that file is no longer cacheable Clients remove all cached blocks Clients remove all cached blocks All future access requests sent to server All future access requests sent to server Server serializes requests Server serializes requests File becomes cacheable again once no longer open and undergoing write sharing File becomes cacheable again once no longer open and undergoing write sharing Server Client A Client C Client B F1: W F1: R NotifyWrite NotifyRequest Notify

Sequential write-sharing provides transparency, but not without risks Sequential write-sharing Sequential write-sharing –Occurs when a file is modified by a client, closed, then opened by a second client –Clients are always guaranteed to see the most recent version of the file File 1: v1 Client A Client BClient C Server File 1: v2 Client A Client BClient C Server File 1: v1

Sequential write-sharing: Problem 1 Problem: Problem: –Client A modifies a file –Client A closes the file –Client B opens the file using out-of-date cached blocks –Client B has an out-of- date version of the file Solution: version numbers Solution: version numbers Server Client A Client C Client B F1: v1 F1: cacheF1: v1 F1: v2 Close F1 Open F1

Sequential write-sharing: Problem 2 Problem: The last client to write to a file did not flush the dirty blocks Problem: The last client to write to a file did not flush the dirty blocks Solution: Solution: –Server keeps track of last writer –Only last writer allowed to have dirty blocks –When server receives open request, notifies last writer –Writer writes any dirty blocks to server Ensures reader will receive up-to-date info. Ensures reader will receive up-to-date info. Server (Client A) Client A Client C Client B F1: Dirty blocks Open: F1 Write: F1Notify F1

Cache consistency does increase server traffic Server traffic was reduced by over 70% due to client caching Server traffic was reduced by over 70% due to client caching 25% of all traffic is a result of cache consistency 25% of all traffic is a result of cache consistency Table 2 Table 2 –Gives an upper bound on cache consistency algorithms –Unrealistic since incorrect results occurred

Dynamic cache allocation also sets Sprite apart Virtual memory and the file system battle over main memory Virtual memory and the file system battle over main memory Both modules keep a time-of-last access Both modules keep a time-of-last access Both compare oldest page with oldest page in other module Both compare oldest page with oldest page in other module The oldest page in the cache is recycled The oldest page in the cache is recycled Virtual Memory Keeps pages in approx. LRU order using clock algorithm File System Keeps blocks in perfect LRU order by tracking read and write calls Main Memory Pages Blocks

Negotiations could cause double- caching Problem: Problem: –Pages being read from backing files could wind up in both the files cache and the virtual memory cache –Could force a page eliminated from the virtual memory pool to be moved to the file cache –The page would then have to wait another 30 seconds to be sent to the server Solution: Solution: –When writing and reading backing files, virtual memory skips the local file cache Virtual MemoryFile System Main Memory Page A

Multi-block pages create problems in shared caching Problem: Problem: –Virtual memory pages are big enough to hold multiple file blocks –Which block’s age should be used to represent the LRU time of the page? –What should be done with the other blocks once one is relinquished? Solution: Solution: –The age of the page is the age of the youngest block –All blocks in a page are removed together Virtual MemoryFile System 2:152:16 2:193:05 4:304:31 Main Memory

Agenda The Sprite file system Basic cache design Concurrency issues BenchmarkingAndrew

Micro-benchmarks show reading from a server cache is fast An upper limit on remote file access costs An upper limit on remote file access costs Two important results: Two important results: –A client can access his own cache 6-8 time faster than he can access the server’s cache –A client is able to write and read from the server’s cache about as quickly as he can from a local disk

Macro-benchmarks indicate disks and caching together run fastest With a warm start and client caching, diskless machines were only up to 12% worse than machines with disks With a warm start and client caching, diskless machines were only up to 12% worse than machines with disks Without caching machines were 10-50% slower Without caching machines were 10-50% slower

Agenda The Sprite file system Basic cache design Concurrency issues BenchmarkingAndrew

Andrew’s caching is notably different from Sprite’s Vice: a group of trusted servers Vice: a group of trusted servers –Stores data and status information in separate files –Has a directory hierarchy Venus: A user-level process on each client workstation Venus: A user-level process on each client workstation –Status cache: stored in virtual memory for quick status checks –Data cache: stored on local disk Client Cache Server Cache Memory Data block Data block Data block Data block Data block Data block Data block Naming Open Close Read Write Data blocks VenusVice Local Disk Memory Data file Status info Data file Data file Status info Data file Data file Status info Status info Status info Open Close Data files SpriteAndrew

…the pathname conventions are also very different Two level naming Two level naming –Each Vice file or directory identified by a unique fid –Venus maps Vice pathnames to fids –Servers see only fids Each fid has 3 parts and is 96 bits long Each fid has 3 parts and is 96 bits long –32-bit Volume number –32-bit Vnode number (index into the Volume) –32-bit Uniquifier (guarantees no fid is ever used twice) –Contains no location information Volume locations are maintained on Volume Location Database found on each server Volume locations are maintained on Volume Location Database found on each server Fid: 32-bits Volume numberUniquifierVnode number 32-bits

Andrew uses write-on-close convention Sprite Delayed-write policy Delayed-write policy –Changes written back every 30 seconds –Prevents writing changes that are quickly erased –Decreases damage in the event of a crash –Rationed network traffic Andrew Write-on-close policy Write-on-close policy Write changes are visible to the network only after the file is closed Write changes are visible to the network only after the file is closed Little information lost in a crash Little information lost in a crash –caching on local disk, not main memory –The network will not see a file in the event of a client crash 75% of files open less than 0.5 seconds 75% of files open less than 0.5 seconds 90% open less than 10 seconds 90% open less than 10 seconds Could results in higher server traffic Could results in higher server traffic Delays closing process Delays closing process

Sequential consistency is guaranteed in Andrew and Sprite Clients are guaranteed to see the latest version of a file Clients are guaranteed to see the latest version of a file –Venus assumes that cached entries are valid –Server maintains Callbacks to cached entries –Server notifies callbacks before allowing a file to be modified –Server has the ability to break callbacks to reclaim storage –Reduces server utilization since communication occurs only when a file is changed File 1: v1 Client A Client BClient C Server File 1: v2 Client A Client BClient C Server File 1: v1

Concurrent write-sharing consistency is not guaranteed Different workstations can perform the same operation on a file at the same time Different workstations can perform the same operation on a file at the same time No implicit file locking No implicit file locking Applications must coordinate if synchronization is an issue Applications must coordinate if synchronization is an issue Client A Client BClient C Server File 1: W File 1: R

Comparison to other systems With one client, Sprite is around 30% faster than NFS and 35% faster than Andrew With one client, Sprite is around 30% faster than NFS and 35% faster than Andrew Andrew has the greatest scalability Andrew has the greatest scalability –Each client in Andrew utilized about 2.4% of the CPU –5.4% in Sprite –20% in NFS

Summary: Sprite vs. Andrew SpriteAndrew Cache location MemoryDisk Client caching Clients cache data blocks Clients cache data files and status info. Cache size VariableFixed File path lookups Lookups performed by server Lookups performed by clients Concurrent write-sharing 30 second write delay, consistency guaranteed Write on close, consistency not guaranteed Sequential Consistency guaranteed: Servers know which workstations have a file cached Consistency guaranteed: Servers maintain callbacks to identify which workstations cache files Cache Validation Validated on open Server notifies when modified Kernel vs. User-level Kernel-to-kernel communication OS intercepts file system calls and forwards to user- level Venus

Conclusion: Both file systems have benefits and drawbacks Sprite Benefits: Benefits: –Guarantees sequential and concurrent consistency –Faster runtime with single client due to memory caching and kernel-to-kernel communication –Files can be cached in blocks Drawbacks: Drawbacks: –Lacks the scalability of Andrew –Writing every 30 seconds could result in lost data –Fewer files can be cached on main memory than on the disk Andrew Benefits: Benefits: –Better scalability due in part to shifting path lookup to client –Transferring entire files reduces communication with server, no read and write calls –Tracking entire files is easier than individual pages Drawbacks Drawbacks –Lacks concurrent write-sharing consistency guarantees –Caching to the disk slows runtime –Files larger than the disk cannot be cached

Backup

Cache consistency does increase server traffic Server traffic was reduced by over 70% due to client caching Server traffic was reduced by over 70% due to client caching 25% of all traffic is a result of cache consistency 25% of all traffic is a result of cache consistency Table 2 Table 2 –Gives an upper bound on cache consistency algorithms –Unrealistic since incorrect results occurred Server Traffic with Cache Consistency Client Cache Size Blocks read Blocks Written Total Traffic Ration 0 Mbyte % 0.5 Mbyte % 1 Mbyte % 2 Mbytes % 4 Mbytes % 8 Mbytes % Server Traffic, Ignoring Cache Consistency Client Cache Size Blocks read Blocks Written Total Traffic Ration 0 Mbyte % 0.5 Mbyte % 1 Mbyte % 2 Mbytes % 4 Mbytes % 8 Mbytes %

Micro-benchmarks show reading from a server cache is fast Give an upper limit on remote file access costs Give an upper limit on remote file access costs Two important results: Two important results: –A client can access his own cache 6-8 time faster than he can access the server’s cache –A client is able to write and read from the server’s cache about as quickly as he can from a local disk Read and Write Throughput, Kbytes/second Local Cache Server Cache Local Disk Server Disk Read Write Maximum read and write rates in various places

Macro-benchmarks indicate disks and caching together run fastest With a warm start and client caching, diskless machines were only up to 12% worse than machines with disks With a warm start and client caching, diskless machines were only up to 12% worse than machines with disks Without caching machines were % slower Without caching machines were % slower Benchmar k Local Disk, with Cache Diskless, Server Cache Only Diskless, Client and Server Caches ColdWarmColdWarmColdWarm Andrew261105%249100%373150%363146%291117%280112% Fs-make660102%649100%855132%843130%698108%685106% Simulator161109%147100%168114%153104%167114%147100% Sort65107%61100%74121%72118%66108%61100% Diff22165%8100%27225%12147%27223%8100% Nroff53103%51100%57112%56109%53105%52102% Top number: time in seconds Bottom number: normalized time

Status info. on Andrew and Sprite Sprite mgmt cache contains: Sprite mgmt cache contains: –File maps –Disk management info.

Volumes in Andrew Volume Volume –A collection of files –Forms a partial subtree in Vice name space Volumes joined at Mount Points Volumes joined at Mount Points Resides in a single disk partition Resides in a single disk partition Can be moved from server to server easily for load balancing Can be moved from server to server easily for load balancing Enables quotas and backup Enables quotas and backup

Sprite caching improves speed and reduces overhead Client side caching enables diskless workstations Client side caching enables diskless workstations –Caching on diskless workstations improves runtime by 10-40% –Diskless workstations with caching are only 0-12% slower than workstations with disks Caching on the server and client side result in overall system improvement Caching on the server and client side result in overall system improvement –Server utilization is reduced from 5-18% to 1-9% per active client –File intensive benchmarking was completed 30-35% faster on Sprite than on other systems