The Chubby lock service for loosely-coupled distributed systems Mike Burrows, Google Inc. Presented By: Harish Rayapudi, Shiva Prasad Malladi.

Slides:



Advertisements
Similar presentations
CM20145 Concurrency Control
Advertisements

Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
The google file system Cs 595 Lecture 9.
Remote Procedure Call (RPC)
The Google File System Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani 1CS5204 – Operating Systems.
Implementing A Simple Storage Case Consider a simple case for distributed storage – I want to back up files from machine A on machine B Avoids many tricky.
The SMART Way to Migrate Replicated Stateful Services Jacob R. Lorch, Atul Adya, Bill Bolosky, Ronnie Chaiken, John Douceur, Jon Howell Microsoft Research.
Chubby Lock server for distributed applications 1Dennis Kafura – CS5204 – Operating Systems.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Distributed Systems 2006 Styles of Client/Server Computing.
Chapter 13 – File and Database Systems
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
CS 582 / CMPE 481 Distributed Systems
Other File Systems: LFS and NFS. 2 Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger –So, a lot of reads do not require.
Distributed Systems CS Google Chubby and Message Ordering Recitation 4, Sep 29, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
The Chubby lock service for loosely-coupled distributed systems Presented by Petko Nikolov Cornell University 3/12/09.
The Chubby Lock Service for Loosely-coupled Distributed Systems Mike Mosharaf Chowdhury.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed File System: Design Comparisons II Pei Cao Cisco Systems, Inc.
CHUBBY and PAXOS Sergio Bernales 1Dennis Kafura – CS5204 – Operating Systems.
PRASHANTHI NARAYAN NETTEM.
Distributed storage for structured data
Case Study - GFS.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Proxy Design Pattern Source: Design Patterns – Elements of Reusable Object- Oriented Software; Gamma, et. al.
Sun NFS Distributed File System Presentation by Jeff Graham and David Larsen.
The Chubby lock service for loosely-coupled distributed systems Mike Burrows (Google), OSDI 2006 Shimin Chen Big Data Reading Group.
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Distributed Deadlocks and Transaction Recovery.
1 The Google File System Reporter: You-Wei Zhang.
70-291: MCSE Guide to Managing a Microsoft Windows Server 2003 Network Chapter 7: Domain Name System.
Distributed File Systems
Lecture 12 Recoverability and failure. 2 Optimistic Techniques Based on assumption that conflict is rare and more efficient to let transactions proceed.
Module 7: Resolving NetBIOS Names by Using Windows Internet Name Service (WINS)
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Transactions and Locks A Quick Reference and Summary BIT 275.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo.
The Totem Single-Ring Ordering and Membership Protocol Y. Amir, L. E. Moser, P. M Melliar-Smith, D. A. Agarwal, P. Ciarfella.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo.
© Spinnaker Labs, Inc. Chubby. © Spinnaker Labs, Inc. What is it? A coarse-grained lock service –Other distributed systems can use this to synchronize.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture Chunkservers Master Consistency Model File Mutation Garbage.
Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
ZOOKEEPER. CONTENTS ZooKeeper Overview ZooKeeper Basics ZooKeeper Architecture Getting Started with ZooKeeper.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Bigtable A Distributed Storage System for Structured Data.
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Chapter 3 System Models.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Ion Stoica, UC Berkeley November 7, 2016
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
NFS and AFS Adapted from slides by Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift.
EECS 498 Introduction to Distributed Systems Fall 2017
Outline Announcements Fault Tolerance.
Ali Ghodsi and Ion Stoica, UC Berkeley February 21, 2018
Fault Tolerance Distributed Web-based Systems
Chapter 2: Operating-System Structures
Overview Multimedia: The Role of WINS in the Network Infrastructure
The SMART Way to Migrate Replicated Stateful Services
Chapter 2: Operating-System Structures
Presentation transcript:

The Chubby lock service for loosely-coupled distributed systems Mike Burrows, Google Inc. Presented By: Harish Rayapudi, Shiva Prasad Malladi

Overview 1. Introduction 2. Why Chubby? 3. System Structure 4. Files, directories and handles 5. Locks and Sequencers 6. Events and API 7. Caching 8. Sessions and KeepAlives 9. Fail-overs and Backup 10. Scaling mechanisms 11. Summary

Introduction Chubby lock service provides coarse-grained locking. Provides reliable storage for a loosely-coupled system. Chubby’s design mainly concentrates on availability and reliability rather than high performance. Chubby instance, known as chubby cell might serve ten thousand 4- processor machines connected by a high speed LAN. We use lock service to synchronize activities and to provide basic information about their environment. Google File System(GFS) uses chubby to elect a master server. Before chubby, Google used many ad hoc methods for primary election.

Why Chubby? We could have used a library with paxos instead of chubby lock service, but a lock system in better in the following ways: 1. A lock system makes it easy to implement availability to clients, reliability and primary election rather than a library system. 2. It is better than a name service, as it reduces the dependency on many servers. 3. A lock-based protocol is familiar to most of the programmers. 4. A lock-service enables client to make decisions correctly when less than a majority of its own members are up.

System Structure 1. Chubby has two main components a server and a library that communicate through RPC’s. 2. Chubby cell consists of small set of servers called a replicas. 3. The replicas elect a master and only the master can initiate the reads and writes.

System Structure 4 Client finds the master through the replicas and when they found the master they send all their request to the master. 5 Write requests are served by all replicas through the master and the read requests are done only by the master alone, which is safe. 6 Anytime the master fails, lease expires all the replicas elect a new master. 7 If the replicas fail, the simple replacement system selects a fresh machine from a free pool.

Files, directories, and handles Chubby a file system similar to that of UNIX, likewise it has tree of files and directories, with name components separated by slashes. For example /ls/foo/wombat/pouch is common to all chubby names which stands for lock service. is the name of chubby cell which is resolved to one or more servers through a DNS lookup. is the name of the directory and the file.

Files, directories, and handles contd. In Chubby to access a file, we need permissions on the file rather than the directory. For example if a file F’s write ACL name is foo, and the ACL directory contains a corresponding entry bar, i.e. a user bar has permissions to write F. Here we have name space which collectively contains files and directories. The Nodes may be either permanent or ephemeral. In Chubby ephemeral files are used as temporary files to indicate that client is alive.

Files, directories, and handles contd. Handles are similar to the UNIX file descriptors. They include Check digits that prevent clients from creating or guessing handles. A sequence number that tells master whether this handle was created by it or any previous other master. Mode information given an open time to allow the master to recreate its state when an old handle is given to a newly restarted master.

Locks and sequencers In Chubby a client handle may hold a lock in two modes exclusive (writer) mode or, shared (reader) mode. In Chubby holding a lock called F is not necessary to access file F, doesn’t prevent other clients doing so (different from mandatory locks). In order to acquire a lock in chubby, a write permission is required (to prevent unprivileged readers). We introduce sequencers, a lock holder may request a sequencer, which describes about the lock and its state.

Locks and sequencers contd. A sequencer consists of name of the lock, the type in which it was acquired, lock generation number. Chubby provides an imperfect yet easier mechanism to reduce the risk of delayed or re-ordered requests to servers. A lock, if it becomes free when its holder has failed or becomes in accessible, cannot be claimed by another client till sometime called as lock-delay (faulty clients).

Events Chubby clients have various events that include: File contents modified. Child node added, removed, or modified. Chubby master failed over. A handle has become invalid. A lock acquired. Conflicting lock request from other client. Here the events are delivered after the corresponding action has taken place.

API The main calls that act on a handle are: calls Actions GetContentsAndStat():Returns the contents and metadata of a file. SetContents():Used to write contents of a file. Delete() :Deletes the node if it has no children. Acquire(), TryAcquire(), Release(): Acquire and release locks. SetSequencer():Associates a sequencer with a handle. CheckSequencer():Checks whether a sequencer is valid.

Primary Election This API described above is used to perform primary election: All potential primaries open the lock file to acquire a lock. The one which is successful becomes primary and rest become replicas. The primary then writes its identity into lock file using setContents(). The primary obtains a sequencer using GetSequencer(), which it passes to the server and is confirmed using CheckSequencer().

Caching The cache has to be consistent, this is done by the server by sending invalidations to clients. This is done as follows: The server sends invalidations to the clients that have cached an invalid modification. Upon receiving this invalidation, a client flushes the invalidated state and acknowledges by making next Keep Alive call. Chubby also allows clients to cache locks, i.e. an event is informs a lock holder to release the lock when another client has requested the same lock.

Sessions and KeepAlives A chubby session is a relationship between a Chubby cell and Cubby client. Each session has an associated lease, an interval of time in the future where the master shouldn’t terminate the session. The end of this interval is session lease timeout, the master can advance the session lease timeout when it gets request from client. The master extends the lease timeout and informs the client. (default extension is 12s).

Sessions and KeepAlives contd. If a client’s local lease timeout expires, it becomes unsure whether the master has terminated its session. It goes into a session called jeopardy and waits for a grace period (default 45s). If the client and master manage to exchange a successful KeepAlive before the grace period, the client again enables it cache. Otherwise it thinks that session has expired. When the session has survived the communication problem, a safe event tells the client to proceed, else an expired event is sent.

Fail-Overs The original master has lease M1 and client has C1, the master then commits to M2 without informing client. The Client lease(C2) expires and it flushes its cache and starts a timer for the grace period. When a new master is elected, it uses a conservative approximation M3 of the session lease. The request(6) succeeds but typically does not extend master lease further because M3 was conservative. The reply(7) allows the client to extend its lease(C3) and inform its session isn’t in jeopardy.

Backup The master of a chubby cell writes a snapshot of its database into the GFS file server. This is done for every few hours. The GFS file server is in a separate building to ensure that the backup will survive building damage and introduce no cyclic dependencies. It provides a means of disaster recovery and for initializing the database of a newly replaced replica.

Scaling Mechanism Clients are individual process Up to 90,000 clients communicate directly with a chubby master server and client machine are identical in chubby Techniques are required to reduce the client - master communication Techniques Creating arbitrary number of chubby cells and client always use a nearby cell Increase lease time from default 12s to 60s. This reduces the KeepAlive messages Clients cache file data and meta-data reducing the number of calls to server Use of protocol-conversion servers. They translate the chubby protocol into less- complex protocol.

Proxies Proxy are used to reduce the load on the chubby cell They can handle both KeepAlive and read requests which are major part of RPC traffic They can’t reduce write requests which constitutes less than 1% of chubby’s workload If a proxy handles N proxy clients, then KeepAlive traffic is reduced by a factor of N proxy They add additional RPC to writes and first-time reads Unavailability is doubled because of proxy and chubby master

Use and behaviour Table taken from paper 230 k / 24 k ≈ 10 clients use each cached file, on average. Few clients hold locks and shared locks are rare. KeepAlive messages dominate the RPC traffic 61 outages observed over a period of few weeks

Fail-over Problems Each created new session is written to the database and with many sessions created this is overhead Server was modified to store a session when it first modifies the session or when it acquires a lock New read-only sessions will not be written to the database and if a fail- over occurs they are discarded These discarded sessions could read bad data before their lease expires when they check with the new master Under new design, sessions are not recorded in database

Problems encountered Lack of Quotas Chubby is not used as a storage system and so has no storage quotas Module was written to keep track of data uploads and to store metadata Some other services started using this module As a result, a single 1.5 Mbyte file was being written on each user action Space used by this service exceeded the space needs of all chubby clients combined File size is limited to 256kBytes and new services are migrated to appropriate storage systems

Lessons Learnt Developers rarely consider availability Fine-grained locking could be ignored Poor API choices have unexpected affects RPC use effects transport protocols

Summary Chubby is a distributed lock service intended for coarse-grained synchronization of activities within Google’s distributed systems Caching, protocol-conversion servers, and simple load adaptation are used to scale to tens of thousands of client processes per Chubby instance GFS and Bigtable use chubby to elect a primary from redundant replicas Chubby is a standard repository for files that require high availability

Thank you