CalvinFS: Consistent WAN Replication and Scalable Metdata Management for Distributed File Systems Thomas Kao.

Slides:



Advertisements
Similar presentations
Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky David G. Andersen * Princeton, Intel Labs, CMU Dont Settle for Eventual : Scalable Causal Consistency.
Advertisements

Weed File System Simple and highly scalable distributed file system (NoFS)
High throughput chain replication for read-mostly workloads
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
CEDCOM High performance architecture for big data applications Tanguy Raynaud CEDAR Project.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
6/24/2015B.RamamurthyPage 1 File System B. Ramamurthy.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Large Scale Sharing GFS and PAST Mahesh Balakrishnan.
Wide-area cooperative storage with CFS
Northwestern University 2007 Winter – EECS 443 Advanced Operating Systems The Google File System S. Ghemawat, H. Gobioff and S-T. Leung, The Google File.
Case Study - GFS.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Cloud Storage: All your data belongs to us! Theo Benson This slide includes images from the Megastore and the Cassandra papers/conference slides.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Some key-value stores using log-structure Zhichao Liang LevelDB Riak.
Presented by Dr. Greg Speegle April 12,  Two-phase commit slow relative to local transaction processing  CAP Theorem  Option 1: Reduce availability.
VLDB2012 Hoang Tam Vo #1, Sheng Wang #2, Divyakant Agrawal †3, Gang Chen §4, Beng Chin Ooi #5 #National University of Singapore, †University of California,
Megastore: Providing Scalable, Highly Available Storage for Interactive Services J. Baker, C. Bond, J.C. Corbett, JJ Furman, A. Khorlin, J. Larson, J-M.
IMDGs An essential part of your architecture. About me
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
1 File Management Chapter File Management n File management system consists of system utility programs that run as privileged applications n Concerned.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Presenters: Rezan Amiri Sahar Delroshan
Databases Illuminated
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Distributed File Systems Architecture – 11.1 Processes – 11.2 Communication – 11.3 Naming – 11.4.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
Presenter: Seikwon KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】
Google File System Robert Nishihara. What is GFS? Distributed filesystem for large-scale distributed applications.
CSci8211: Distributed Systems: RAMCloud 1 Distributed Shared Memory/Storage Case Study: RAMCloud Developed by Stanford Platform Lab  Key Idea: Scalable.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 24: GFS.
State Machine Replication State Machine Replication through transparent distributed protocols State Machine Replication through a shared log.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
Scriber- Vibha Goyal Date:- March 03, 2016 Course:- CS 525University of Illinois at Urbana Champaign CalvinFS: Consistent WAN Replication and Scalable.
Distributed Systems: Distributed File Systems Ghada Ahmed, PhD. Assistant Prof., Computer Science Dept. Web:
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Ivy: A Read/Write Peer-to- Peer File System Authors: Muthitacharoen Athicha, Robert Morris, Thomer M. Gil, and Benjie Chen Presented by Saurabh Jha 1.
Gorilla: A Fast, Scalable, In-Memory Time Series Database
Slide credits: Thomas Kao
Curator: Self-Managing Storage for Enterprise Clusters
CSCI5570 Large Scale Data Processing Systems
File System Implementation
Google Filesystem Some slides taken from Alan Sussman.
A Survey on Distributed File Systems
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
Filesystems 2 Adapted from slides of Hank Levy
Distributed File Systems
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
Presentation transcript:

CalvinFS: Consistent WAN Replication and Scalable Metdata Management for Distributed File Systems Thomas Kao

Background Scalable solutions provided for data storage, why not file systems?

Motivation Often bottlenecked by the metadata management layer Availability susceptible to data center outages Still provides expected file system semantics

Key Contributions Distributed database system for scalable metadata management Strongly consistent geo-replication of file system state

Calvin: Log Many front end servers Asynchronously-replicated distributed block store Small number of “meta-data” log servers Transaction requests are replicated and appended, in order, by the “meta log”

Calvin: Storage Layer Knowledge of physical data store organization and actual transaction semantics Read/write primitives that execute on one node Placement manager Multiversion key-value store at each node, plus consistent hashing mechanism

Calvin: Scheduler Drives local transaction execution Fully examines transaction before execution Deterministic locking Transaction protocol: No distributed commit protocol Perform all local reads Serve remote reads Collect remote read results Execute transaction to completion

CalvinFS Architecture Components Block store Calvin database Client library Design Principles: Main-memory metadata store Potentially many small files Scalable read/write throughput Tolerate slow writes Linearizable and snapshot reads Hash-partitioned metadata Optimize for single-file operations

CalvinFS Block Store Variable-size immutable blocks 1 byte to 10 megabytes Block storage and placement Unique ID Block “buckets” Global Paxos-replicated config file Compacts small blocks

CalvinFS Metadata Management Key-value store Key: absolute path of file/directory Value: entry type, permissions, contents

Metadata Storage Layer Six transaction types: Read(path) Create{File, Dir}(path) Resize(path, size) Write(path, file_offset, source, source_offset, num_bytes) Delete(path) Edit permissions(path, permissions)

Recursive Operations on Directories Use OLLP Analyze phase Determines affected entries and read/write set Run phase Check that read/write set has not grown

Performance: File Counts and Memory Usage 10 million files of varying size per machine Far less memory used per machine Handles many more files than HDFS

Performance: Throughput Linear scalability Sub-linear scalability Linear scalability

Write/append latency dominated by WAN replication Performance: Latency Write/append latency dominated by WAN replication

Performance: Fault Tolerance Able to tolerate outages with little to no hit to availability

Discussion Pros Fast metadata management Deployments are scalable on large clusters Huge storage capabilities High throughput of reads and updates Resistant to datacenter outages Cons File creation is distributed transaction, doesn’t scale well Metadata operations have to recursively modify all entries in affected subtree File-fragmentation addressed using mechanism that entirely rewrites files

Discussion Questions Unlimited number of files? What about larger files?