AFS vs YFS.  Use large numbers of small file servers  Use many small partitions per file server  Restrict the number of processors to 1 or 2  Limit.

Slides:



Advertisements
Similar presentations
CMPT 401 Dr. Alexandra Fedorova Lecture III: OS Support.
Advertisements

PlanetLab Operating System support* *a work in progress.
The TickerTAIP Parallel RAID Architecture P. Cao, S. B. Lim S. Venkatraman, J. Wilkes HP Labs.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Threading Part 4 CS221 – 4/27/09. The Final Date: 5/7 Time: 6pm Duration: 1hr 50mins Location: EPS 103 Bring: 1 sheet of paper, filled both sides with.
Hermes: An Integrated CPU/GPU Microarchitecture for IPRouting Author: Yuhao Zhu, Yangdong Deng, Yubei Chen Publisher: DAC'11, June 5-10, 2011, San Diego,
Precept 3 COS 461. Concurrency is Useful Multi Processor/Core Multiple Inputs Don’t wait on slow devices.
Multiprocessing Memory Management
Review: Operating System Manages all system resources ALU Memory I/O Files Objectives: Security Efficiency Convenience.
MEMORY MANAGEMENT By KUNAL KADAKIA RISHIT SHAH. Memory Memory is a large array of words or bytes, each with its own address. It is a repository of quickly.
Distributed File System: Design Comparisons II Pei Cao Cisco Systems, Inc.
1 Chapter 4 Threads Threads: Resource ownership and execution.
Jeff Chheng Jun Du.  Distributed file system  Designed for scalability, security, and high availability  Descendant of version 2 of Andrew File System.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 7: Planning a DNS Strategy.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Operating System Organization
Distributed File System: Design Comparisons II Pei Cao.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
NovaBACKUP 10 xSP Technical Training By: Nathan Fouarge
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
To provide the world with a next generation storage platform for unstructured data, enabling deployment of mobile applications, virtualization solutions,
Selecting and Implementing An Embedded Database System Presented by Jeff Webb March 2005 Article written by Michael Olson IEEE Software, 2000.
Networked File System CS Introduction to Operating Systems.
Firewall and Internet Access Mechanism that control (1)Internet access, (2)Handle the problem of screening a particular network or an organization from.
Announcing U.S. Dept of Energy SBIR Grant Supporting Development of Next Generation OpenAFS Jeffrey Altman, President Your File System Inc. 13 September.
CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review.
DCE (distributed computing environment) DCE (distributed computing environment)
Computer Emergency Notification System (CENS)
SWE202 Review. Processes Process State As a process executes, it changes state – new: The process is being created – running: Instructions are being.
Chapter 12 Transmission Control Protocol (TCP)
DONE-08 Sizing and Performance Tuning N-Tier Applications Mike Furgal Performance Manager Progress Software
Introduction to AFS IMSA Intersession 2003 AFS Servers and Clients Brian Sebby, IMSA ‘96 Copyright 2003 by Brian Sebby, Copies of these.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
Host and Callback Tracking in OpenAFS Jeffrey Altman, Secure Endpoints, Inc Derrick Brashear, Sine Nomine Associates.
Supporting Multi-Processors Bernard Wong February 17, 2003.
Processes CSCI 4534 Chapter 4. Introduction Early computer systems allowed one program to be executed at a time –The program had complete control of the.
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris
To provide the world with a next generation storage platform for unstructured data, enabling deployment of mobile applications, virtualization solutions,
AFS/OSD Project R.Belloni, L.Giammarino, A.Maslennikov, G.Palumbo, H.Reuter, R.Toebbicke.
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Section 10: Last section! Final review.
Chapter 2 Memory and process management
Memory COMPUTER ARCHITECTURE
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Chapter 3: Process Concept
Introduction to Operating Systems
Process Management Presented By Aditya Gupta Assistant Professor
Chapter 4 Threads.
Software Architecture in Practice
CMSC 611: Advanced Computer Architecture
Computer Architecture
Operating Systems : Overview
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
Operating Systems : Overview
Operating Systems : Overview
Specialized Cloud Architectures
Chapter 2: Operating-System Structures
Operating Systems : Overview
Chapter 3: Processes.
Operating Systems : Overview
Chapter 2: Operating-System Structures
CSE 542: Operating Systems
CSE 542: Operating Systems
Presentation transcript:

AFS vs YFS

 Use large numbers of small file servers  Use many small partitions per file server  Restrict the number of processors to 1 or 2  Limit the network bandwidth to 1gbit  Avoid workloads requiring: Multiple clients creating / removing entries in a single directory Multiple clients writing to or reading from a single file More clients than file server worker threads accessing a single volume Applications requiring features that AFS does not offer:  Byte range locking, ext. attributes, per file ACLs, etc

 Deployed isolation file servers and complex monitoring to detect hot volumes and quarantine them  Developed complex workarounds including vicep- access, OSD, and OOB  Segregated RW and RO access into separate cells and constructed their own volume management systems to “vos release” volumes from RW cell to RO cells  Used the AFS name space for some tasks and other “high performance” file systems for others NFS3, NFS4, Lustre, GPFS, Panasys, others

 Additional servers cost money US$6800 per year according to Cornell University Including hardware depreciation, support contracts, maintenance, power and cooling, staff time  Increased complexity for end users  Multiple backup strategies

 Maintain the data and the name space  Fix the performance problems  Enhance the functionality to match Apple/Microsoft first class file systems  Improve Security  Save money

 What are the bottlenecks in AFS and why do they exist?  What can be done to maximize the performance of an AFS file server?  How scalable is a YFS file server?

 File Server Throughput is bound by the amount of data the listener thread can read from the network during any time period  As Simon Wilkinson likes to say: “There are only two things wrong with AFS RX, the protocol and the implementation.”

 Incorrect Round Trip Time calculations  Incorrect Retransmission Timeout implementation  Window size vs Congested Networks Broken window management makes congested networks worse  Soft ACKs and Hard ACKs Twice as many ACKs as necessary

 Lock Contention 20% of runtime spent waiting for locks  UDP Context Switching Every packet processed on a different CPU Cache line invalidation

 To see the full details, see

 Light weight processes (LWP) is a cooperative threading model that was used for the original AFS implementation  Only one thread can execute at a time  Threads yield voluntarily or when blocking for I/O  Data access is implicitly protected by single execution  All lock state changes are atomic when a thread yields. In other words: Acquire + Release + Yield == Never Acquire Acquire A + Acquire B == Acquire B + Acquire A

 When converting a cooperative threaded application to pthreads, it is faster to add global locks to protect data structures that are accessed across I/O than to redesign the data structures and the work flow  AFS 3.4 added pthread file servers by adding a minimum number of global locks to each package  AFS 3.6 added finer grained but still global locks

 AFS file servers must acquire many mutexes during the processing of each RPC (* = global) RX  peer_hash*, conn_hash*, peer, conn_call, conn_data, stats*, free_packet_queue*, free_call_queue*, event_queue*, and more viced  H* [host table, callbacks]  FS* [stats]  VOL* [volume metadata]  VNODE [file/dir]

 Threads are scheduled to a processor and must give up their time slice whenever a required lock is unavailable  When there are multiple processors, threads are scheduled to a processor.  Any data not in the processor cache or that has been invalidated, must be fetched. Locks are represented as data in memory whose state changes when acquired and released.  Two side effects of global locks: Only one thread at a time can make progress Multiple processor cores hurt performance

 An AFS file server promises its client that for a fixed period of time it notify the client if the metadata or data state of an accessed object changes  For read write volumes, one callback promise per file object  For read only volumes, one callback promise per volume regardless of how many file objects are accessed  Today, many file servers are deployed with callback tables containing millions of entries

 A host table and hash tables for looking up host entries by IP address and UUID are protected by a single global lock.  Host entries have their own locks. To avoid hard deadlocks, locking an entry requires dropping the global lock, obtaining the entry lock, and obtaining the global lock.  Soft deadlocks occur when multiple threads are blocked on the entry lock but the thread holding it is blocked waiting for the global lock.  Lock contention occurs multiple times for each new rx connection and each time a call is scheduled.

 The Callback Table is protected by the same global lock as the Host Table  Each new/updated callback promise requires exclusive access to the table  Notifying registered clients of state changes (breaking callbacks) requires exclusive access  Garbage collection of expired callbacks (5 minute intervals) requires exclusive access  Callback Table Limit exceeded requires exclusive access for immediate garbage collection and premature callback notification

 The larger the callback table the longer exclusive access is maintained for garbage collection and callback breaks  While exclusive access is maintained, no calls can be scheduled nor can existing calls be completed

 Increasing the worker thread pool permits additional calls to be scheduled instead of blocking in the rx wait queue  Primary benefit of scheduling is that locks provide a filtering mechanism to decide which calls can make progress. Calls on the rx wait queue can never make progress of thread pool is exhausted  Downside of increased thread pool size is increased lock contention and more CPU time wasted on thread scheduling

 Start with “large” configuration -L  Make thread pool as large as possible For 1.4, -p 128 For 1.6, -p 256  Set directory buffer size to twice the thread count -b 512

 Volume Cache larger than total volume count -vc  Small vnode cache (files) -s  Large vnode cache (directories) -l  If volumes are very large, may require higher multiples

 The callback table must be large enough to avoid thrashing -cb Where that value *72 bytes should not exceed 10% of machine physical memory  Use “xstat_fs_test's -collId 3 –once” to monitor “GetSomeSpaces” value. If non-zero, increase –cb value

 UDP Receive Buffer Must be large enough to receive all packets for in process calls. -udpsize Won’t take effect unless OS is configured to match  UDP Send Buffer -sendsize  (2^21) unless client chunk size is larger

 AFS protocol does not expose the last access time to clients  Nor does the AFS file server make use of it  Turn off last access time updates to avoid large amounts of unnecessary disk i/o unrelated to serving the needs of clients

 Syncing data to disk is very expensive. If you trust your UPS and have a good battery backup caching storage adapter we recommend reducing the frequency of sync operations.  For 1.6.5, new option -sync onclose

 YFS File Server experience much less contention between threads  RPCs take less time to complete Store operations do not block simultanenous Fetch requests  One YFS File Server can replace at least 30 AFS file servers Max in-flight RPCs per AFS server = 240 Max in-flight RPCs per YFS server = 16,000 (dynamic) 240 * 30 = 7,200

Up to 8.2 gbits/second per listener thread

 SLAC has experienced file server meltdowns for years. Large number of file servers deployed to permit distribution of load isolation of volume accesses by users.  One YFS file server satisfied 500 client nodes for nearly 24 hours without noticeable delays 1gbit NIC, 8 processor cores, 6gbit/sec local raid disk 800 operations per second 55MB/sec FetchData 5MB/sec StoreData

 2038 Safe  100ns time  2^64 volumes  2^96 vnodes / volume  2^64 max quota/vol/part size  Per File ACLs  Volume Security Policies Max ACL / Wire Privacy  Servers do not run as “root”  Linux O_DIRECT  Mandatory Locking  IPv6 network stack

 RXGK GSS-API Authentication AES-256/SHA-1 wire privacy File server wire security policies  File servers cannot serve volumes with stronger required policies Combined Identity Tokens Keyed cache managers / Machine IDs Maximum Volume ACL prevents data leaks