FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment Presented by: Boon Thau Loo CS294-4 (Adapted from Adya’s OSDI’02.

Slides:



Advertisements
Similar presentations
G O O G L E F I L E S Y S T E M 陳 仕融 黃 振凱 林 佑恩 Z 1.
Advertisements

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google Jaehyun Han 1.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
G Robert Grimm New York University Disconnected Operation in the Coda File System.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Large Scale Sharing GFS and PAST Mahesh Balakrishnan.
G Robert Grimm New York University Farsite: A Serverless File System.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
The Google File System.
Farsite: Ferderated, Available, and Reliable Storage for an Incompletely Trusted Environment Microsoft Reseach, Appear in OSDI’02.
Wide-area cooperative storage with CFS
5.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 5: Working with File Systems.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
Distributed File System: Data Storage for Networks Large and Small Pei Cao Cisco Systems, Inc.
Google File System.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 7 Configuring File Services in Windows Server 2008.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Team CMD Distributed Systems Team Report 2 1/17/07 C:\>members Corey Andalora Mike Adams Darren Stanley.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Federated, Available, and Reliable Storage for an Incompletely Trusted Environment Atul Adya, Bill Bolosky, Miguel Castro, Gerald Cermak, Ronnie Chaiken,
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
Review Session for Fourth Quiz Jehan-François Pâris Summer 2011.
FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment.
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
1 JTE HPC/FS Pastis: a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca Fabio Picconi Pierre Sens LIP6, Université Paris 6.
FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment A. Atta, W. J. Bolowsky, M. Castro, G. Cermak, R. Chaiken, J.
Pond: the OceanStore Prototype Sean Rhea, Patric Eaton, Dennis Gells, Hakim Weatherspoon, Ben Zhao, and John Kubiatowicz University of California, Berkeley.
1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.
Latest Relevant Techniques and Applications for Distributed File Systems Ela Sharda
Distributed File Systems Case Studies: Sprite Coda.
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
Strong Security for Distributed File Systems Group A3 Ka Hou Wong Jahanzeb Faizan Jonathan Sippel.
Practical Byzantine Fault Tolerance
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Presenters: Rezan Amiri Sahar Delroshan
Serverless Network File Systems Overview by Joseph Thompson.
Peer-to-peer Information Systems Universität des Saarlandes Max-Planck-Institut für Informatik – AG5: Databases and Information Systems Group Prof. Dr.-Ing.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
1 JTE HPC/FS Pastis: a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca Fabio Picconi Pierre Sens LIP6, Université Paris 6.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
POND: THE OCEANSTORE PROTOTYPE S. Rea, P. Eaton, D. Geels, H. Weatherspoon, J. Kubiatowicz U. C. Berkeley.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
1 Objectives Discuss File Services in Windows Server 2008 Install the Distributed File System in Windows Server 2008 Discuss and create shared file resources.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Presenter: Seikwon KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】
Outline for Today’s Lecture Administrative: –Happy Thanksgiving –Sign up for demos. Objective: –Peer-to-peer file systems Mechanisms employed Issues Some.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 24: GFS.
1 JTE HPC/FS Pastis: a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca Fabio Picconi Pierre Sens LIP6, Université Paris 6.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Ivy: A Read/Write Peer-to- Peer File System Authors: Muthitacharoen Athicha, Robert Morris, Thomer M. Gil, and Benjie Chen Presented by Saurabh Jha 1.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Google File System.
Google Filesystem Some slides taken from Alan Sussman.
Google File System CSE 454 From paper by Ghemawat, Gobioff & Leung.
Introduction to Networks
Providing Secure Storage on the Internet
A Redundant Global Storage Architecture
THE GOOGLE FILE SYSTEM.
The SMART Way to Migrate Replicated Stateful Services
Federated, Available, and Reliable Storage for an Incompletely Trusted Environment Atul Adya, William J. Bolosky, Miguel Castro, Gerald Cermak, Ronnie.
Presentation transcript:

FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment Presented by: Boon Thau Loo CS294-4 (Adapted from Adya’s OSDI’02 slides)

Outline Introduction System Architecture File System features Evaluation Conclusion

FARSITE Goals A symbiotic, serverless, distributed file system Symbiotic – works among cooperating but not completed trusted clients. Serverless Runs entirely on client machines. Logically centralized but physically distributed. Ensure user privacy and data integrity. Scalable and efficient. Easy management and self-configuration.

Why serverless? Servers are more powerful, more secured, better maintained, but… Reliant on system administrators Do you trust them with your data? Are they competent? Expensive (special hardware) High-performance I/O, RAID disk, special rooms Centralized points of failure High-value targets for attacks

Target Environment Large university or company High-bandwidth, low-latency network Clients are cooperative but not completed trusted Rough scale: 10 5 total machines, total files, total bytes Machine availability Lower than dedicated servers, higher than Internet hosts. Uncorrelated machine downtimes. Workload: High access locality, low persistent update rates, usually sequential but rarely concurrent read/write sharing. Small but significant fraction of malicious users. No user-sensitive data persist beyond user logoff or reboot.

FARSITE Non-Goals Efficient large-scale write sharing Transactional semantics Versioning User-specifiable importance High-performance parallel I/O Disconnected operation with offline conflicts (Coda-like)

Enabling Technologies Availability: enough disk space for replicas Low disk costs Unused disk capacity Duplicate files: ~50% space savings Privacy: fast crypto Symmetric encryption: 72 MB/sec One-way hashing bandwidth: 53 MB/sec Disk sequential I/O bandwidth: 32 MB/sec Computing RSA signature: 6.5 msec < Rotation time for a 7200-RPM disk.

System Architecture File system construct: Hierarchical directory namespace Namespace - Logical repository of files (a directory). Namespace root – A virtual file server Created by administrator. Machine roles: Client – Directly interacts with users Directory group member – Manages file metadata. File host – Stores encrypted replicas of data files.

System Architecture (Cont…) Directory Group: Manages file metadata for a namespace Choice of machines for namespace root assigned by administrator. Subtree of namespace can be delegated to subgroup under heavy load. Separate data and metadata: Use Byzantine agreement protocol in directory group to protect metadata against malicious clients. Replicate and encrypt data in file hosts. Stores cryptographically secure hash of data in directory group for validation.

System Architecture (Cont…)

Certificates Machine Certificates Associate a machine with own public keys. Establishing the validity of machine. Namespace certificates Associate namespace roots to set of managing machines. Administrator grants root certificate. User certificates Associate users to their public keys For read/write access control to files. Certification authorities (CAs)

Client reads a file….. 1.Send message to directory group. 2.Directory group proves its authority (recursively to root) using namespace certificates. 3.Directory Group grants client: Lease on file for a period for local access. One-way hash of file contents for validation List of file hosts storing data. 4.Client retrieves replica from a file host: Validates content using one-way hash Decrypts using private key. Works on local cached copy for lease period.

Client writes to file… 1. Client sends updated hash of file contents to directory group. 2. Directory group verifies user permission to write to the file using user certificate. 3. Once verified, directory group directs file hosts to retrieve new data from clients. 4. Leases on existing clients may be recalled by directory group to satisfy new update request.

File System Features Reliability and Availability Security Consistency Scalability Efficiency Manageability Differences from NTFS

Reliability and Availability Goals: Long-term persistence Immediate accessibility of file data during request. Directories/meta-data maintained via BFT Needs replicas to protect against malicious nodes. 3f + 1 replicas for tolerating f faults Files replicated via simple replication File replicas to ensure high degree of availability. f +1 replicas to tolerate f faults Data migration & repair Away from unavailable machines Swap machine locations between high/low availability file replicas.

Security Access control Directory groups maintain an access control list (ACL) of public keys of all authorized writers. Privacy During file creation, client generates a random file key used to encrypt the file. File key is encrypted with public keys of all authorized readers of the file. Encrypted file keys are stored with file. Data Integrity Merkle hash tree over file data blocks. Copy of tree stored with file. Copy of root hash kept in directory group. Logarithmic (in file size) validation time for any file block. Linear validation time for entire file.

Durability FARSITE does not update at once all replicas of a file: Would be too slow. Use a background delayed update mechanism. Updates to metadata are written in compressed logs on on client’s local disk. Log is sent to directory group periodically, and at end-of-lease.

Consistency Data consistency Content leases with expiration - Read/write or read-only Single-writer multi-reader semantics Request for conflicting lease triggers a recall of lease by directory group. Single-client serialization of concurrent non-reads access. Inappropriate for large-scale write sharing. Other leases (refer to paper) Name leases – Allows clients to modify private namespace regions without contacting directory group. Mode leases – Windows File-sharing semantics (read, write, delete, exclude-read, exclude-write, exclude-delete) Access leases – Windows deletion semantics (public, protected, private)

Efficiency Space Efficiency Users share identical files and applications. Detect similar using file-content hashes. Half of all occupied disk space can be reclaimed. Convergent encryption –  Use one-way hash of file block as key to encrypt file block.  User’s file key encrypts the hashes rather than file blocks. Time Efficiency (Performance) Local caches (data and pathnames) Delay between file creation/updates and replica creation. Merkle trees for data integrity. Limit leases per file.

Manageability Local-Machine Administrations Removing an old disk is a special case of hardware failure. Supports different versions of software Argues that backup processes are unnecessary with FARSITE  Replicas are present anyway.  Versioning systems are more effective for archival purposes.

Differences from NTFS Snapshot handle vs true file handle Overheads of consistency management (lease recalls and maintenance) Limits on outstanding leases per file. Client may be granted snapshot that can become stale. Lazy directory rename operations Potentially 10 5 clients sharing namespace.

Evaluation Method 5 FARSITE machines 1-hour representative trace from 3-week traces Played at real time (~ 450,000 operations) Measured file operations latency Results (untuned code): 20% faster than Microsoft CIFS 5.5x slower than NTFS

Conclusion Scalable, decentralized, network file system. Use loosely coupled, unsecured and unreliable machines to establish a secured and reliable virtual file server. Synthesis of known techniques: Replication, BFT, leases, namespaces, client caching, lazy updates, cryptography, certificates, etc. Would you use FARSITE?

Related work Distributed File Systems NFS, AFS, xFS, Frangipani, Coda, etc Peer-to-peer immutable storage systems CFS, PAST, Eternity Service, Freenet, Pasis, Intermemory Peer-to-peer mutable storage systems Oceanstore, Pangaea, Ivy