The Chubby lock service for loosely-coupled distributed systems Mike Burrows (Google), OSDI 2006 Shimin Chen Big Data Reading Group.

Slides:



Advertisements
Similar presentations
CS 542: Topics in Distributed Systems Diganta Goswami.
Advertisements

Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
G O O G L E F I L E S Y S T E M 陳 仕融 黃 振凱 林 佑恩 Z 1.
Chubby Lock server for distributed applications 1Dennis Kafura – CS5204 – Operating Systems.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.
Distributed Systems CS Google Chubby and Message Ordering Recitation 4, Sep 29, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
The Chubby lock service for loosely-coupled distributed systems Presented by Petko Nikolov Cornell University 3/12/09.
The Chubby Lock Service for Loosely-coupled Distributed Systems Mike Mosharaf Chowdhury.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed File System: Design Comparisons II Pei Cao Cisco Systems, Inc.
CHUBBY and PAXOS Sergio Bernales 1Dennis Kafura – CS5204 – Operating Systems.
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
Wide-area cooperative storage with CFS
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 7: Planning a DNS Strategy.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
Distributed storage for structured data
Case Study - GFS.
File Systems (2). Readings r Silbershatz et al: 11.8.
1 Chapter Overview Understanding Windows Name Resolution Using WINS.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Distributed File Systems Sarah Diesburg Operating Systems CS 3430.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
Mapping Internet Addresses to Physical Addresses (ARP)
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Guide to Linux Installation and Administration, 2e 1 Chapter 9 Preparing for Emergencies.
Institute of Computer and Communication Network Engineering OFC/NFOEC, 6-10 March 2011, Los Angeles, CA Lessons Learned From Implementing a Path Computation.
70-291: MCSE Guide to Managing a Microsoft Windows Server 2003 Network Chapter 7: Domain Name System.
Consistency And Replication
Distributed File Systems
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
Module 7: Resolving NetBIOS Names by Using Windows Internet Name Service (WINS)
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Serverless Network File Systems Overview by Joseph Thompson.
Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.
Chapter 10 Chapter 10: Managing the Distributed File System, Disk Quotas, and Software Installation.
Getting Started with OPC.NET OPC.NET Software Client Interface Client Base Server Base OPC Wrapper OPC COM Server Server Interface WCF Alternate.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo.
The Client-Server Model And the Socket API. Client-Server (1) The datagram service does not require cooperation between the peer applications but such.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo.
CSC590 Selected Topics Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
© Spinnaker Labs, Inc. Chubby. © Spinnaker Labs, Inc. What is it? A coarse-grained lock service –Other distributed systems can use this to synchronize.
Distributed Systems CS Consistency and Replication – Part IV Lecture 13, Oct 23, 2013 Mohammad Hammoud.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture Chunkservers Master Consistency Model File Mutation Garbage.
Introduction to AFS IMSA Intersession 2003 An Overview of AFS Brian Sebby, IMSA ’96 Copyright 2003 by Brian Sebby, Copies of these slides.
CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.
Detour: Distributed Systems Techniques
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Chapter 3 System Models.
The Chubby lock service for loosely-coupled distributed systems Mike Burrows, Google Inc. Presented By: Harish Rayapudi, Shiva Prasad Malladi.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Ion Stoica, UC Berkeley November 7, 2016
Distributed File Systems
Distributed Shared Memory
Ivy Eva Wu.
EECS 498 Introduction to Distributed Systems Fall 2017
Ali Ghodsi and Ion Stoica, UC Berkeley February 21, 2018
Overview Multimedia: The Role of WINS in the Network Infrastructure
Presentation transcript:

The Chubby lock service for loosely-coupled distributed systems Mike Burrows (Google), OSDI 2006 Shimin Chen Big Data Reading Group

Introduction What is Chubby?  Lock service in a loosely-coupled distributed system (e.g., 10K 4-processor machines connected by 1Gbps Ethernet)  Client interface similar to whole-file advisory locks with notification of various events (e.g., file modifications)  Primary goals: reliability, availability, easy-to-understand semantics How is it used?  Used in Google: GFS, Bigtable, etc.  Elect leaders, store small amount of meta-data, as the root of the distributed data structures

What is this paper about? “Building Chubby was an engineering effort … it was not research. We claim no new algorithms or techniques. The purpose of this paper is to describe what we did and why, rather than to advocate it.”

Outline Introduction Design Mechanisms for scaling Use, surprises and design errors Summary

System Structure A chubby cell consists of a small set of servers (replicas) A master is elected from the replicas via a consensus protocol  Master lease: several seconds  If a master fails, a new one will be elected when the master leases expire Client talks to the master via chubby library  All replicas are listed in DNS; clients discover the master by talking to any replica a replica

System Structure Replicas maintain copies of a simple database Clients send read/write requests only to the master For a write:  The master propagates it to replicas via the consensus protocol  Replies after the write reaches a majority of replicas For a read:  The master satisfies the read alone a replica

System Structure If a replica fails and does not recover for a long time (a few hours)  A fresh machine is selected to be a new replica, replacing the failed one  It updates the DNS  Obtains a recent copy of the database  The current master polls DNS periodically to discover new replicas a replica

Simple UNIX-like File System Interface Chubby supports a strict tree of files and directories  No symbolic links, no hard links  /ls/foo/wombat/pouch 1 st component (ls): lock service (common to all names) 2 nd component (foo): the chubby cell (used in DNS lookup to find the cell master) The rest: name inside the cell  Can be accessed via Chubby’s specialized API / other file system interface (e.g., GFS) Support most normal operations (create, delete, open, write, …) Support advisory reader/writer lock on a node

ACLs and File Handles Access Control List (ACL)  A node has three ACL names (read/write/change ACL names)  An ACL name is a name to a file in the ACL directory  The file lists the authorized users File handle:  Has check digits encoded in it; cannot be forged  Sequence number: a master can tell if this handle is created by a previous master  Mode information at open time: If previous master created the handle, a newly restarted master can learn the mode information

Locks and Sequences Locks: advisory rather than mandatory Potential lock problems in distributed systems  A holds a lock L, issues request W, then fails  B acquires L (because A fails), performs actions  W arrives (out-of-order) after B’s actions Solution #1: backward compatible  Lock server will prevent other clients from getting the lock if a lock become inaccessible or the holder has failed  Lock-delay period can be specified by clients Solution #2: sequencer  A lock holder can obtain a sequencer from Chubby  It attaches the sequencer to any requests that it sends to other servers (e.g., Bigtable)  The other servers can verify the sequencer information

Chubby Events Clients can subscribe to events (up-calls from Chubby library)  File contents modified: if the file contains the location of a service, this event can be used to monitor the service location  Master failed over  Child node added, removed, modified  Handle becomes invalid: probably communication problem  Lock acquired (rarely used)  Locks are conflicting (rarely used)

APIs Open()  Mode: read/write/change ACL; Events; Lock-delay  Create new file or directory? Close() GetContentsAndStat(), GetStat(), ReadDir() SetContents(): set all contents; SetACL() Delete() Locks: Acquire(), TryAcquire(), Release() Sequencers: GetSequencer(), SetSequencer(), CheckSequencer()

Example: Primary Election Open(“write mode”); If (successful) { // primary SetContents(“identity”); } Else { // replica open (“read mode”, “file-modification event”); when notified of file modification: primary= GetContentsAndStat(); }

Caching Strict consistency: easy to understand  Lease based  master will invalidate cached copies upon a write request Write-through caches

Sessions, Keep-Alives, Master Fail-overs Session:  A client sends keep-alive requests to a master  A master responds by a keep-alive response  Immediately after getting the keep-alive response, the client sends another request for extension  The master will block keep-alives until close the expiration of a session  Extension is default to 12s Clients maintain a local timer for estimating the session timeouts (time is not perfectly synchronized) If local timer runs out, wait for a 45s grace period before ending the session  Happens when a master fails over

Explanation

Other Details Database implementation  a simple database with write ahead logging and snapshotting Backup:  Write a snapshot to a GFS server in a different building Mirroring files across multiple cells  Configuration files (e.g., locations of other services, access control lists, etc.)

Outline Introduction Design Mechanisms for scaling Use, surprises and design errors Summary

Why scaling is important? Clients connect to a single instance of master in a cell  Much more client processes that number of machines Existing mechanisms:  More Chubby cells  Increase lease time from 12s to 60s to reduce KeepAlive messages (dominant requests in experiments)  Client caches

New Mechanisms Proxies:  Handle KeepAlive and read requests, pass write requests to the master  Reduce traffic but reduce availability Partitioning: partition name space  A master handles nodes with hash(name) mod N == id  Limited cross-partition messages

Outline Introduction Design Mechanisms for scaling Use, surprises and design errors Summary

Typical cell in a 10-minute period

Cell outages Stats 61 outages for all cells in a few weeks  most outages were 15s or less  52 outages were under 30s  incur under 30s delay in applications  The remaining nine outages were caused by network maintenance (4), suspected network connectivity problems (2), software errors (2), and overload (1).

Java Clients Google systems are mostly written in C++  Chubby client library is complex (7000 lines)  Hard to port to Java and maintain in Java JNI is inefficient Java users run a protocol-conversion server with a simple RPC protocol Don’t know how to avoid running this additional server

Use as a Name Service Chubby’s most popular use For availability, need to set DNS TTL short, but often overwhelms DNS server  Must poll each DNS entry: O(N 2 )  Chubby is invalidation based  Besides KeepAlives, no need to poll

Abusive Clients Review before clients can use shared Chubby cells Problem cases encountered:  Repeated open/close calls for polling a file Cache file handles aggressively  Lack of quotas 256Kbytes file size limit  Publish/subscribe system on Chubby? inefficient

Summary Lock Service Loosely-couple distributed system UNIX-like file system interface Reliability and availability