Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”

Slides:



Advertisements
Similar presentations
Boxwood: Distributed Data Structures as Storage Infrastructure Lidong Zhou Microsoft Research Silicon Valley Team Members: Chandu Thekkath, Marc Najork,
Advertisements

C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation Presented by: Zhiyong (Ricky) Cheng.
NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.
Network Storage and Cluster File Systems Jeff Chase CPS 212, Fall 2000.
The Zebra Striped Network Filesystem. Approach Increase throughput, reliability by striping file data across multiple servers Data from each client is.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Chapter 11: File System Implementation
1 Recap (RAID and Storage Architectures). 2 RAID To increase the availability and the performance (bandwidth) of a storage system, instead of a single.
1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.
File System Implementation
File System Implementation
Distributed File Systems CS 519: Operating System Theory Computer Science, Rutgers University Instructor: Thu D. Nguyen TA: Xiaoyan Li Spring 2002.
Other File Systems: LFS and NFS. 2 Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger –So, a lot of reads do not require.
File Management Systems
Comparison and Performance Evaluation of SAN File System Yubing Wang & Qun Cai.
Scalable Clusters Jed Liu 11 April Overview Microsoft Cluster Service Built on Windows NT Provides high availability services Presents itself to.
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
1 File Management in Representative Operating Systems.
Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) Monday, Aug. 19, 2002.
1 I/O Management in Representative Operating Systems.
How to Cluster both Servers and Storage W. Curtis Preston President The Storage Group.
Module – 7 network-attached storage (NAS)
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
Distributed File Systems Sarah Diesburg Operating Systems CS 3430.
Network File Systems II Frangipani: A Scalable Distributed File System A Low-bandwidth Network File System.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
PETAL: DISTRIBUTED VIRTUAL DISKS E. K. Lee C. A. Thekkath DEC SRC.
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Networked File System CS Introduction to Operating Systems.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
MODULE – 8 OBJECT-BASED AND UNIFIED STORAGE
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review.
Page 1 of John Wong CTO Twin Peaks Software Inc. Mirror File System A Multiple Server File System.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
Chapter 20 Distributed File Systems Copyright © 2008.
UNIX File and Directory Caching How UNIX Optimizes File System Performance and Presents Data to User Processes Using a Virtual File System.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
File System Implementation
Chapter 11: File System Implementation Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Jan 1, 2005 Implementation.
12.1 Silberschatz, Galvin and Gagne ©2003 Operating System Concepts with Java Chapter 12: File System Implementation Chapter 12: File System Implementation.
Chapter 11: File System Implementation Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 11: File System Implementation Chapter.
VMware vSphere Configuration and Management v6
Enhancements to NFS 王信富 R /11/6. Introduction File system modules File system modules –Directory module –File module –Access control module.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
UNIX File System (UFS) Chapter Five.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
CommVault Architecture
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
File-System Implementation
Introduction to Data Management in EGI
Storage Virtualization
Overview Continuation from Monday (File system implementation)
Overview: File system implementation (cont)
Lecture 15 Reading: Bacon 7.6, 7.7
File System Implementation
CS 295: Modern Systems Organizing Storage Devices
Presentation transcript:

Petal and Frangipani

Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”

Petal/Frangipani Petal Frangipani NFS Untrusted OS-agnostic FS semantics Sharing/coordination Disk aggregation (“bricks”) Filesystem-agnostic Recovery and reconfiguration Load balancing Chained declustering Snapshots Does not control sharing Each “cloud” may resize or reconfigure independently. What indirection is required to make this happen, and where is it?

Remaining Slides The following slides have been borrowed from the Petal and Frangipani presentations, which were available on the Web until Compaq SRC dissolved. This material is owned by Ed Lee, Chandu Thekkath, and the other authors of the work. The Frangipani material is still available through Chandu Thekkath’s site at For CPS 212, several issues are important: Understand the role of each layer in the previous slides, and the strengths and limitations of each layer as a basis for innovating behind its interface (NAS/SAN). Understand the concepts of virtual disks and a cluster file system embodied in Petal and Frangipani. Understand the similarities/differences between Petal and the other reconfigurable cluster service work we have studied: DDS and Porcupine. Understand how the features of Petal simplify the design of a scalable cluster file system (Frangipani) above it. Understand the nature, purpose, and role of the three key design elements added for Frangipani: leased locks, a write-ownership consistent caching protocol, and server logging for recovery.

5 Petal: Distributed Virtual Disks Systems Research Center Digital Equipment Corporation Edward K. Lee Chandramohan A. Thekkath 4/26/2015

6 Logical System View /dev/vdisk1/dev/vdisk2/dev/vdisk3 /dev/vdisk4/dev/vdisk5 AdvFSNT FS PC FSUFS Scalable Network Petal

7 Physical System View Scalable Network Petal Server Parallel Database or Cluster File System /dev/shared1

8 Virtual Disks Each disk provides 2^64 byte address space. Created and destroyed on demand. Allocates disk storage on demand. Snapshots via copy-on-write. Online incremental reconfiguration.

9 Virtual to Physical Translation PMap0 vdiskID offset (disk, diskOffset) PMap1 Virtual Disk Directory GMap PMap2PMap3 (server, disk, diskOffset) (vdiskID, offset) Server 0Server 1Server 2Server 3

10 Global State Management Based on Leslie Lamport’s Paxos algorithm. Global state is replicated across all servers. Consistent in the face of server & network failures. A majority is needed to update global state. Any server can be added/removed in the presence of failed servers.

11 Fault-Tolerant Global Operations Create/Delete virtual disks. Snapshot virtual disks. Add/Remove servers. Reconfigure virtual disks.

12 Data Placement & Redundancy Supports non-redundant and chained-declustered virtual disks. Parity can be supported if desired. Chained-declustering tolerates any single component failure. Tolerates many common multiple failures. Throughput scales linearly with additional servers. Throughput degrades gracefully with failures.

13 Chained Declustering D0 Server0 D3 D4 D7 D1 Server1 D0 D5 D4 D2 Server2 D1 D6 D5 D3 Server3 D2 D7 D6

14 Chained Declustering D0 Server0 D3 D4 D7 Server1 D2 Server2 D1 D6 D5 D3 Server3 D2 D7 D6 D1 D0 D5 D4

15 The Prototype Digital ATM network. 155 Mbit/s per link. 8 AlphaStation Model MHz Alpha running Digital Unix. 72 RZ29 disks. 4.3 GB, 3.5 inch, fast SCSI (10MB/s). 9 ms avg. seek, 6 MB/s sustained transfer rate. Unix kernel device driver. User-level Petal servers.

16 The Prototype src-ss1 Digital ATM Network (AN2) src-ss2 src-ss8 petal1petal2 petal8 /dev/vdisk1 ………

17 Throughput Scaling

18 Virtual Disk Reconfiguration 6 servers 8 servers virtual disk w/ 1GB of allocated storage 8KB reads & writes

Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation

Why Not An Old File System on Petal? Traditional file systems (e.g., UFS, AdvFS) cannot share a block device The machine that runs the file system can become a bottleneck

Frangipani Behaves like a local file system multiple machines cooperatively manage a Petal disk users on any machine see a consistent view of data Exhibits good performance, scaling, and load balancing Easy to administer

Ease of Administration Frangipani machines are modular can be added and deleted transparently Common free space pool users don’t have to be moved Automatically recovers from crashes Consistent backup without halting the system

Components of Frangipani File system core implements the Digital Unix vnode interface uses the Digital Unix Unified Buffer Cache exploits Petal’s large virtual space Locks with leases Write-ahead redo log

Locks Multiple reader/single writer Locks are moderately coarse-grained protects entire file or directory Dirty data is written to disk before lock is given to another machine Each machine aggressively caches locks uses lease timeouts for lock recovery

Logging Frangipani uses a write ahead redo log for metadata log records are kept on Petal Data is written to Petal on sync, fsync, or every 30 seconds on lock revocation or when the log wraps Each machine has a separate log reduces contention independent recovery

Recovery Recovery is initiated by the lock service Recovery can be carried out on any machine log is distributed and available via Petal