Garth A. Gibson*, David F. Nagle**, William Courtright II*, Nat Lanza*, Paul Mazaitis*, Marc Unangst*, Jim Zelenka* "NASD Scalable Storage Systems",USENIX99,

Slides:



Advertisements
Similar presentations
System Integration and Performance
Advertisements

High Performance Cluster Computing Architectures and Systems Hai Jin Internet and Cluster Computing Center.
Operating-System Structures
The Zebra Striped Network Filesystem. Approach Increase throughput, reliability by striping file data across multiple servers Data from each client is.
Distributed Storage March 12, Distributed Storage What is Distributed Storage?  Simple answer: Storage that can be shared throughout a network.
A Cost-Effective, High-Bandwidth Storage Architecture Garth A. Gibson, David F. Nagle, Khalil Amiri, Jeff Butler, Fay W. Chang, Howard Gobioff, Charles.
Distributed components
Network Operating Systems Users are aware of multiplicity of machines. Access to resources of various machines is done explicitly by: –Logging into the.
File System Implementation
Other File Systems: LFS and NFS. 2 Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger –So, a lot of reads do not require.
CS 550 Amoeba-A Distributed Operation System by Saie M Mulay.
Distributed File System: Design Comparisons II Pei Cao Cisco Systems, Inc.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
Storage Networking Technologies and Virtualization Section 2 DAS and Introduction to SCSI1.
Dr. Kalpakis CMSC 421, Operating Systems File System Implementation.
Case Study - GFS.
File Systems (2). Readings r Silbershatz et al: 11.8.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Team CMD Distributed Systems Team Report 2 1/17/07 C:\>members Corey Andalora Mike Adams Darren Stanley.
Client/Server Software Architectures Yonglei Tao.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
PETAL: DISTRIBUTED VIRTUAL DISKS E. K. Lee C. A. Thekkath DEC SRC.
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Networked File System CS Introduction to Operating Systems.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Khalil Amiri*, David Petrou, Gregory R. Ganger* and Garth A. Gibson "Dynamic Function Placement for Data-intensive Cluster Computing," Proceedings of the.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
What is a Distributed File System?? Allows transparent access to remote files over a network. Examples: Network File System (NFS) by Sun Microsystems.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
Properties of Layouts Single failure correcting: no two units of same stripe are mapped to same disk –Enables recovery from single disk crash Distributed.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
The Vesta Parallel File System Peter F. Corbett Dror G. Feithlson.
Presenters: Rezan Amiri Sahar Delroshan
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved RPC Tanenbaum.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
File System Implementation
Ridge Xu 12.1 Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation Directory Implementation.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
ITEC 502 컴퓨터 시스템 및 실습 Chapter 11-2: File System Implementation Mi-Jung Choi DPNM Lab. Dept. of CSE, POSTECH.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Chapter 11: File System Implementation Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 11: File System Implementation Chapter.
HTTP evolution - TCP/IP issues Lecture 4 CM David De Roure
Chapter 11: File System Implementation Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Jan 1, 2005 File-System Structure.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
The Mach System Silberschatz et al Presented By Anjana Venkat.
File Systems cs550 Operating Systems David Monismith.
EE324 INTRO TO DISTRIBUTED SYSTEMS. Distributed File System  What is a file system?
AFS/OSD Project R.Belloni, L.Giammarino, A.Maslennikov, G.Palumbo, H.Reuter, R.Toebbicke.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 5.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Topic 4: Distributed Objects Dr. Ayman Srour Faculty of Applied Engineering and Urban Planning University of Palestine.
Chapter 12: File System Implementation
Direct Attached Storage and Introduction to SCSI
File System Implementation
Direct Attached Storage and Introduction to SCSI
Chapter 12: File System Implementation
Overview Continuation from Monday (File system implementation)
Chapter 15: File System Internals
Presentation transcript:

Garth A. Gibson*, David F. Nagle**, William Courtright II*, Nat Lanza*, Paul Mazaitis*, Marc Unangst*, Jim Zelenka* "NASD Scalable Storage Systems",USENIX99, Extreme Linux Workshop, Monterey, CA, June

Motivation NASD minimizes server based data movement and separates management and filesystem sematics from store-and-forward copying Figure 1: Standalone server with attached disks –Look at long path requests and data take through OS layers and through various machines Reference implementation of NASD for Linux 2.2 including NASD device code that runs on workstation or PC masquerading as subsystem or disk drive NFS-like distributed file system that uses NASD subsystems or devices NASD striping middleware for large striped files

Figure 1 -- NetSCSI and NASD Figure 1 outlines data path where clients ask for data, servers forward request to storage -- forwarded request is a DMA command to return data directly to a client. –When DMA is complete, status is returned to server and collected and forwarded to client NASD –On first access, client contacts server for access checks –Server grants reusable rights or capabilities –Clients then present requests directly to storage –Storage verifies capabilities and directly replies

NASD Interface Read, write object data Read, write object attributes Create, resize, remove soft partitions Construct copy-on-write version of object Logical version number on file can be changed by file manager to revoke capability

NASD Security Security protocol –Capability has public portion -- CAapArg, private key CapKey –CapArg specifies what rights are being granted for which object –CapKey is a keyed message digest of CapArg and a secret key shared only with target drive –Client sends CapArg with each request, gnerates a CapKey-keyed digest of request parameters and CapArg –Each drive knows its secret keys and receives CapArg with each request –Can compute client’s CapKey and verify request –If any field of CapArg or request has been changed, digest comparison will fail –Scheme protects integrity of requests but does not protect privacy of data

Filesystems for NASD Constructed distributed file system with NFS-like semantics tailored for NASD Each file and directory occupies exactly one NASD object, offsets in files are same as offsets in objects File length, last file modify time correspond directly to NASD-maintained object attributes Remainder of file attributes stored in uninterpreted section of object’s attributes Data moving operations -- read, write) and attribute reads (getattr) are sent directly to NASD drive –file attributes are either computed from NASD object attributes (e.g. modify times and object size) or stored in the uninterpreted filesystem-specific attribute Other requests are handled by file manager Capabilities are piggybacked on file manager’s response to lookup operations

Access to Striped Files and Continuous Media NASD-optimized parallel filesystem Filesystem manages objects not directly backed by data Backed by storage manager which redirects clients to component NASD objects NASD PFS supports SIO low-level parallel filesystem interface on top of NASD-NFS files striped using user-level Cheops middleware Figure 6

Garth A. Gibson, David F. Nagle, Khalil Amiri, Jeff Butler, Fay W. Chang, Howard Gobioff, Charles Hardin, Erik Riedel, David Rochberg and Jim Zelenka A cost-effective, high-bandwidth storage architecture. Architectural Support for Programming Languages and Operating Systems Proceedings of the 8th international conference on Architectural support for programming languages and operating systems October 2 - 7, 1998, San Jose, CA USA Pages

Evolution of storage architectures Local Filesystem -- Simple- aggregate, application, file management concurrency control, low level storage management. Data makes one trip of peripheral area network such as SCSI. Disks offer fixed sized block abstraction Distributed Filesystem -- Intermediate server machine is introduced. Server offers simple file access interface to clients. Distributed Filesystem with RAID controller -- Interpose another computer -- RAID controller. Distributed Filesystem that employs DMA -- Can arrange to DMA data to clients rather than to copy through server. HPSS is an example (although this is not how it is usually employed). NASD- based DFS, NASD-Cheops based DFS

Principals of NASD Direct transfer -- data moved between drive and client without indirection or store-and-forward through file server Asynchronous oversight -- Ability of client to perform most operations without synchronous appeal to the file manager Cryptographic integrity -- Drives ensure that commands and data have not been tampered with by generating and verifying cryptographic keyed digests Object based interface -- Drives export variable length objects instead of fixed-size blocks. Allows disk drives to direct knowledge of relationships between disk blocks and minimize security overhead.

Prototype Implementation NASD prototype drive runs on 133MHz, 64MB, Dec Alpha 3000/400 with two Seagate ST52160 disks attached by two 5 MB/s SCSI busses Intended to simulate a controller and drive NASD system implements own internal object access, cache, disk space management modules Figure 6 -- Performance for sequential reads and writes –Sequential bandwidth as function of request size –NASD better tuned for disk access on reads that miss cache –FFS better tuned for cache accesses –Write performance of FFS due to immediate acknowledgement for writes up to 64KB

Scalability 13 NASD drives, each linked by OC-3 ATM to 10 client machines Each client issues series of sequential 2MB read requests striped across four NASDs. Each NASD can deliver 32MB/s from cache to RPC protocol stack DCE RPC cannot push more than 80Mb/s through a 155 Mb/s ATM link before receiving client saturates Figure 7 demonstrates close to linear scaling up to 10 clients

Computational Requirements Table 1 -- number of instructions needed to service given request size including all communications (DCE RPC, UDP/IP) Overhead mostly due to communications Significantly more expensive than Seagate Barracuda

Filesystems for NASD NFS covered in last paper AFS -- lookup operations carried out by parsing directory files locally AFS RPCs added to obtain and relinquish capabilities explicitly AFS’s sequential consistency provided by breaking callbacks (notifying holders of potentially stale copies) when a write capability is issued File manager does’nt know that a write operation has arrived at a drive so it must tell clients when a write may occur No new callbacks on file with outstanding write capability AFS enforces per-volume quota on allocated disk space File manager allocates space when it issues a capability, and it keeps track of how much space is actually written to

Active Disks Provide full application-level programmability of drives Customize functionality for data intensive computations NASD’s object based interface provides knowledge of data at devices without having to use external metadata