October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch

Slides:



Advertisements
Similar presentations
Tivoli SANergy. SANs are Powerful, but... Most SANs today offer limited value One system, multiple storage devices Multiple systems, isolated zones of.
Advertisements

PRESENTATION TITLE GOES HERE Introduction to NFS v4 and pNFS David Black, SNIA Technical Council, EMC slides by Alan Yoder, NetApp with thanks to Michael.
General Parallel File System
IDC HPC User Forum Panasas Update April 5, 2011 Houston, TX.
PNFS, 61 th IETF, DC1 pNFS: Requirements 61 th IETF – DC November 10, 2004.
Network Storage and Cluster File Systems Jeff Chase CPS 212, Fall 2000.
Beyond NAS and SAN: The Evolution of Storage Marc Farley Author Building Storage Networks.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
Ceph: A Scalable, High-Performance Distributed File System
Ceph: A Scalable, High-Performance Distributed File System Priya Bhat, Yonggang Liu, Jing Qin.
A Cost-Effective, High-Bandwidth Storage Architecture Garth A. Gibson, David F. Nagle, Khalil Amiri, Jeff Butler, Fay W. Chang, Howard Gobioff, Charles.
Silicon Graphics, Inc. Poster Presented by: SGI Proprietary Technologies for Breakthrough Research Rosario Caltabiano North East Higher Education & Research.
Comparison and Performance Evaluation of SAN File System Yubing Wang & Qun Cai.
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) Monday, Aug. 19, 2002.
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
Module – 7 network-attached storage (NAS)
CSE 598D Storage Systems, Spring 2007 Object Based Storage Presented By: Kanishk Jain.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
pNFS extension for NFSv4 IETF 61 November, 2004
Pooja Shetty Usha B Gowda.  Network File Systems (NFS)  Drawbacks of NFS  Parallel Virtual File Systems (PVFS)  PVFS components  PVFS application.
IDC HPC User Forum Update APRIL 16, 2012 PANASAS PRODUCT MARKETING.
“A Cost-Effective, High-Bandwidth Storage Architecture”. Gibson et al. ASPLOS’98 Shimin Chen Big Data Reading Group Presentation NASD (Network-Attached.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
MODULE – 8 OBJECT-BASED AND UNIFIED STORAGE
Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this.
Latest Relevant Techniques and Applications for Distributed File Systems Ela Sharda
Leases and cache consistency Jeff Chase Fall 2015.
CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
Chapter 10: File-System Interface Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Jan 1, 2005 Chapter 10: File-System.
Storage Tank in Data Grid Shin, SangYong(syshin, #6468) IBM Grid Computing August 23, 2003.
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Redundant Array of Independent Disks.  Many systems today need to store many terabytes of data.  Don’t want to use single, large disk  too expensive.
DOE PI Meeting at BNL 1 Lightweight High-performance I/O for Data-intensive Computing Jun Wang Computer Architecture and Storage System Laboratory (CASS)
Large Scale Parallel File System and Cluster Management ICT, CAS.
PNFS BOF FAST Sorin Faibish, EMC Mike Eisler, NetApp Brent Welch, Panasas Piyush Shivam, Sun Microsystems.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
Ceph: A Scalable, High-Performance Distributed File System
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Click to add text Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 6: Accessing.
PNFS Birds-of-Feather FAST 2010: February 24 Sorin Faibish, EMC and pNFS friends.
Padova, 5 October StoRM Service view Riccardo Zappi INFN-CNAF Bologna.
Review CS File Systems - Partitions What is a hard disk partition?
AFS/OSD Project R.Belloni, L.Giammarino, A.Maslennikov, G.Palumbo, H.Reuter, R.Toebbicke.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Tackling I/O Issues 1 David Race 16 March 2010.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
CEG 2400 FALL 2012 Linux/UNIX Network Operating Systems.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
An Introduction to GPFS
GPFS Parallel File System
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
Parallel Virtual File System (PVFS) a.k.a. OrangeFS
Network Attached Storage Overview
CSE 598D Storage Systems, Spring 2007 Object Based Storage
Providing Secure Storage on the Internet
An Introduction to Computer Networking
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Presentation transcript:

October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch

IETF 61, Nov 2004 Page 2 Abstract pNFS extends NFSv4 Scalable I/O problem (Brief review) of the proposal Status and next steps Object Storage and pNFS Object security scheme

IETF 61, Nov 2004 Page 3 pNFS Extension to NFSv4 NFS is THE file system standard Fewest additions that enable parallel I/O from clients Layouts are the key additional abstraction Describe where data is located and how it is organized Clients access storage directly, in parallel Generalized support for asymmetric solutions Files: Clean way to do filer virtualization, eliminate botteneck Objects: Standard way to do object-based file systems Blocks: Standard way to do block-based SAN file systems

IETF 61, Nov 2004 Page 4 Scalable I/O Problem Storage for 1000’s of active clients => lots of bandwidth Scaling capacity through 100’s of TB and into PB File Server model has good sharing and manageability but it is hard to scale Many other proprietary solutions GPFS, CXFS, StorNext, Panasas, Sustina, Sanergy, … Everyone has their own client Like to have a standards based solution => pNFS

IETF 61, Nov 2004 Page 5 Scaling and the Client Gary Grider’s rule of thumb for HPC 1 Gbyte/sec for each Teraflop of computing power GHz processors => 6TF => 6 GB/sec One file server with 48 GE NICs? I don’t think so. 100 GB/sec I/O system in ’08 or ’09 for 100 TF cluster Making movies 1000 node rendering farm, plus 100’s of desktops at night Oil and Gas 100’s to 1000’s of clients Lots of large files (10’s of GB to TB each) EDA, Compile Farms, Life Sciences … Everyone has a Linux cluster these days

IETF 61, Nov 2004 Page 6 Asymmetric File Systems Control Path vs. Data Path (“out-of-band” control) Metadata servers (Control) File name space Access control (ACL checking) Sharing and coordination Storage Devices (Data) Clients access storage directly SAN Filesystems CXFS, EMC Hi Road, Sanergy, SAN FS Object Storage File Systems Panasas, Lustre Client pNFS Server Storage Storag Storage Client NFSv4 + pNFS ops Storage Protocol Storage Management Protocol

IETF 61, Nov 2004 Page 7 NFS and Cluster File Systems Currently, NFS head can be a client of a cluster file system (instead of a client of the local file system) NFSv4 adds more state, makes integration with cluster file system more interesting NFS Client Cluster FS NFS Client Cluster FS Native Client NFS “Head” Native Client NFS “Head” Native Client NFS “Head” Native Client

IETF 61, Nov 2004 Page 8 pNFS and Cluster File Systems Replace proprietary cluster file system protocols with pNFS Lots of room for innovation inside the cluster file system pNFS Client Cluster FS + NFSv4 pNFS Client Native Client Cluster FS + NFSv4 pNFS Client

IETF 61, Nov 2004 Page 9 pNFS Ops Summary GETDEVINFO Maps from opaque device ID used in layout data structures to the storage protocol type and necessary addressing information for that device LAYOUTGET Fetch location and access control information (i.e., capabilities) LAYOUTCOMMIT Commit write activity. New file size and attributes visible on storage. LAYOUTRELEASE Give up lease on the layout CB_LAYOUTRETURN Server callback to recall layout lease

IETF 61, Nov 2004 Page 10 Multiple Data Server Protocols BE INCLUSIVE !! Broaden the market reach Three (or more) flavors of out- of-band metadata attributes: BLOCKS: SBC/FCP/FC or SBC/iSCSI… for files built on blocks OBJECTS: OSD/iSCSI/TCP/IP/GE for files built on objects FILES: NFS/ONCRPC/TCP/IP/GE for files built on subfiles Inode-level encapsulation in server and client code Local Filesystem pNFS server pNFS IFS Client Apps Layout driver 1. SBC (blocks) 2. OSD (objects) 3. NFS (files) NFSv4 extended w/ orthogonal layout metadata attributes Layout metadata grant & revoke

IETF 61, Nov 2004 Page 11 March 2005 Connectathon NetApp server prototype by Garth Goodson NFSv4 for the data protocol Layouts are an array of filehandles and stateIDs Storage Mangement protocol is also NFSv4 Client prototype by Sun Solaris 10 variant

IETF 61, Nov 2004 Page 12 Object Storage Object interface is midway between files and blocks (think inode) Create, Delete, Read, Write, GetAttr, SetAttr, … Objects have numeric ID, not pathnames Objects have a capability based security scheme (details in a moment) Objects have extensible attributes to hold high-level FS information Based on NASD and OBSD research out of CMU (Gibson et. al) SNIA T10 standards based. V1 complete, V2 in progress.

IETF 61, Nov 2004 Page 13 Objects as Building Blocks Object Comprised of: User Data Attributes Layout Interface: ID Read/Write Create/Delete Getattr/Setattr Capability-based File Component: Stripe files across storage nodes

IETF 61, Nov 2004 Page 14 Objects and pNFS Clients speak pNFS to metadata server Or they will, eventually Clients speak iSCSI/OSD to the data servers (OSD) Files are striped across multiple objects Metadata manager speaks iSCSI/OSD to data server Metadata manager uses extended attributes on objects to store metadata High level file system attributes like owners and ACLs Storage maps Directories are just files stored in objects

IETF 61, Nov 2004 Page 15 Object Security (1) Clean security model based on shared, secret device keys Part of T10 standard Metadata manager knows device keys Metadata manager signs caps, OSD verifies them (explained next slide) Metadata server returns capability to the client Capability encodes: object ID, operation, expire time, data range, cap version, and a signature Data range allows serialization over file data, as well as different rights to different attributes Cap version allows revocation by changing it on the object May want to protect capability with privacy (encryption) to avoid snooping the path between metadata manager and client

IETF 61, Nov 2004 Page 16 Object Security (2) Capability Request (client to metadata manager) CapRequest = object ID, operation, data range Capability signed with device key known by metadata manager CapSignature = {CapRequest, Expires, CapVersion} KeyDevice CapSignature used as a signing key (!) OSD Command (client to OSD) Request contains {CapRequest, Expires, CapVersion} + other details + nonce to prevent replay attacks RequestSignature = {Request} CapSignature OSD can compute CapSignature by signing CapRequest with its own key OSD can then verify RequestSignature Caches and other tricks used to make this go fast

IETF 61, Nov 2004 Page 17 Scalable Bandwidth

IETF 61, Nov 2004 Page 18 Per Shelf Bandwidth

IETF 61, Nov 2004 Page 19 Scalable NFS

IETF 61, Nov 2004 Page 20 Rendering load from 1000 clients

IETF 61, Nov 2004 Page 21 Status pNFS ad-hoc working group Dec ’03 Ann Arbor, April ’04 FAST, Aug ’04 IETF, Sept ’04 Pittsburgh Internet Drafts draft-gibson-pnfs-problem-statement-01.txt July 2004 draft-gibson-pnfs-reqs-00.txt October 2004 draft-welch-pnfs-ops-00.txt October 2004 draft-welch-pnfs-ops-01.txt March 2005 Next Steps Add NFSv4 layout and storage protocol to current draft Add text for additional error and recovery cases Add separate drafts for object storage and block storage

IETF 61, Nov 2004 Page 22 Backup

IETF 61, Nov 2004 Page 23 Object Storage File System Out of band control path Direct data path

IETF 61, Nov 2004 Page 24 “Out-of-band” Value Proposition Out-of-band allows a client to use more than one storage address for a given file, directory or closely linked set of files Parallel I/O direct from client to multiple storage devices Scalable capacity: file/dir uses space on all storage: can get big Capacity balancing: file/dir uses space on all storage: evenly Load balancing: dynamic access to file/dir over all storage: evenly Scalable bandwidth: dynamic access to file/dir over all storage: big Lower latency under load: no bottleneck developing deep queues Cost-effectiveness at scale: use streamlined storage servers pNFS standard leads to standard client SW: share client support $$$

IETF 61, Nov 2004 Page 25 Scaling and the Server Tension between sharing and throughput File server provides semantics, including sharing Direct attach I/O provides throughput, no sharing File server is a bottleneck between clients and storage Pressure to make server ever faster and more expensive Clustered NAS solutions, e.g., Spinnaker SAN filesystems provide sharing and direct access Asymmetric, out-of-band system with distinct control and data paths Proprietary solutions, vendor-specific clients Physical security model, which we’d like to improve Client Server Client

IETF 61, Nov 2004 Page 26 Symmetric File Systems Distribute storage among all the clients GPFS (AIX), GFS, PVFS (User Level) Reliability Issues Compute nodes less reliable because of the disk Storage less reliable, unless replication schemes employed Scalability Issues Stealing cycles from clients, which have other work to do Coupling of computing and storage Like early days of engineering workstations, private storage