Distributed Systems CS 15-440 Google Chubby and Message Ordering Recitation 4, Sep 29, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.

Slides:



Advertisements
Similar presentations
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 6 Instructor: Haifeng YU.
Advertisements

Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Chubby Lock server for distributed applications 1Dennis Kafura – CS5204 – Operating Systems.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Systems CS Consistency and Replication – Part II Lecture 11, Oct 10, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Distributed Systems Spring 2009
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
CS 582 / CMPE 481 Distributed Systems
Distributed Systems CS Consistency and Replication – Part I Lecture 10, Oct 5, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Distributed Systems CS Google Protocol Buffers and Publish-Subscribe Recitation 3, Sep 22, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
Distributed Systems CS Synchronization – Part II Lecture 8, Sep 28, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Distributed Systems CS Case Study: Replication in Google Chubby Recitation 5, Oct 06, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Lecture 13 Synchronization (cont). EECE 411: Design of Distributed Software Applications Logistics Last quiz Max: 69 / Median: 52 / Min: 24 In a box outside.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Logical Clocks (2). Topics r Logical clocks r Totally-Ordered Multicasting r Vector timestamps.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
The Chubby lock service for loosely-coupled distributed systems Mike Burrows (Google), OSDI 2006 Shimin Chen Big Data Reading Group.
1 The Google File System Reporter: You-Wei Zhang.
Distributed Systems CS Consistency and Replication – Part II Lecture 11, Oct 2, 2013 Mohammad Hammoud.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion Steve Ko Computer Sciences and Engineering University at Buffalo.
Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.
Group Communication Group oriented activities are steadily increasing. There are many types of groups:  Open and Closed groups  Peer-to-peer and hierarchical.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Logical Clocks. Topics Logical clocks Totally-Ordered Multicasting Vector timestamps.
Lamport’s Logical Clocks & Totally Ordered Multicasting.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Consistency.
Synchronization Chapter 5.
Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS?
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Distributed Systems CS Consistency and Replication – Part IV Lecture 21, Nov 10, 2014 Mohammad Hammoud.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Distributed Coordination. Turing Award r The Turing Award is recognized as the Nobel Prize of computing r Earlier this term the 2013 Turing Award went.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Systems CS Consistency and Replication – Part I Lecture 10, September 30, 2013 Mohammad Hammoud.
CIS825 Lecture 2. Model Processors Communication medium.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
© Spinnaker Labs, Inc. Chubby. © Spinnaker Labs, Inc. What is it? A coarse-grained lock service –Other distributed systems can use this to synchronize.
Distributed Systems CS Consistency and Replication – Part IV Lecture 13, Oct 23, 2013 Mohammad Hammoud.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Logical Clocks. Topics r Logical clocks r Totally-Ordered Multicasting.
Distributed Systems CS Consistency and Replication – Part I Lecture 11, Oct 19, 2015 Mohammad Hammoud.
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 6: Synchronyzation 3/5/20161 Distributed Systems - COMP 655.
Fault Tolerance (2). Topics r Reliable Group Communication.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Lecture 9: Multicast Sep 22, 2015 All slides © IG.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
Detour: Distributed Systems Techniques
Distributed Systems CS
Distributed Systems CS
Distributed Systems CS
Distributed Systems CS
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Distributed Systems CS
Distributed Systems CS
Distributed Systems CS
Distributed Systems CS
Lecture 9: Ordered Multicasting
Distributed Systems CS
Presentation transcript:

Distributed Systems CS Google Chubby and Message Ordering Recitation 4, Sep 29, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud

Today…  Last recitation session:  Google Protocol Buffers and Publish-Subscribe  Today’s session:  Google Chubby  A Google library and infrastructure for synchronization  Ordered Communication  Ordering events and enforcing ordering while communicating  Announcement:  Project 1 due on Oct 3 rd

Overview Recap Google Chubby Ordered Communication

Recap: Google Physical Infrastructure Google has created a large distributed system from commodity PCs Commodity PC Rack Approx 40 to 80 PCs One Ethernet switch (Internal=100Mbps, external = 1Gbps) Cluster Approx 30 racks (around 2400 PCs) 2 high-bandwidth switches (each rack connected to both the switches for redundancy) Placement and replication generally done at cluster level Data Center

Recap: Google Data center Architecture (To avoid clutter the Ethernet connections are shown from only one of the clusters to the external links)

Recap: Google System Architecture

Recap: Google Infrastructure

Overview Recap Google Chubby Ordered Communication

Google Chubby Google Chubby offers the coordination and storage services to other services (e.g., to Google File System) It provides coarse-grained distributed locks to synchronize distributed activities in a large-scale, asynchronous environment It can be used to support the election of primary in a set of replicas It can be used as a name-service within Google It provides a file system offering the reliable storage of small files Chubby is an all-in-one package consisting of file-system, locking service, naming service and election facilitator!

Chubby Interface Chubby provides an abstraction based on a file system concept that every data object is a file Files are organized into hierarchical namespace Example /ls/chubby_cell/directory_name/…/file_name Lock ServiceAn identifier for describing the name of the instance of Chubby

Chubby as a file-system and a locking service The interface provides an easy mechanism to store small files Chubby provides following Interfaces General Interfaces File-System Interfaces Locking Service Interfaces

Chubby – General Interfaces Chubby provides interfaces for opening, closing and deleting a file in its namespace Open call: Opens a file or directory and returns a handle Client can specify if the file has to be opened for reading, writing or locking Close call: Relinquishes the handle Delete calls: Remove the file or directory

Chubby – File-System Interfaces Chubby provides two services: Whole-file reading and writing operations Single atomic operations are provided to read and write complete data in the file Chubby can be used to store small files (but not large files) Access control A file is associated with an Access Control List (ACL) ACL can be get and set through interfaces

Chubby – Locking Service Interfaces In Chubby, a file can be opened as a lock The owner of the lock has the handle to the file Chubby provides three interfaces Acquire: The call gets a handle to the lock Release: This call releases the lock TryAcquire: This is a Non-blocking variant of the Acquire call Chubby provides advisory locks, and not mandatory locks Advantage: Extra flexibility and resilience Disadvantage: Programmer has to manage the conflict

Summary of Chubby Interfaces

Chubby Architecture A Chubby Instance (or a chubby cell) is the first level of hierarchy inside Chubby ( ls ) /ls/chubby_cell/directory_name/…/file_name Chubby instance is implemented as a small number of replicated servers (typically 5) with one designated master Clients access these replicas using Chubby Library Uses Protocol Buffers to communicate Replicas are placed at failure-independent sites Typically, they are placed within a cluster but not within a rack

Chubby Namespace Architecture The hierarchical namespace of directories and files/locks is maintained in a database at each replicas The consistency of replicated database is ensured through a consensus protocol that uses operation logs Logs can be used to reconstruct the state of the system Problem: Logs can become too large over time Solution: Chubby takes a snapshot of the system periodically, and erases the old logs

Chubby Session Chubby Session is the relationship between client and a Chubby cell KeepAlive messages maintain the session

Client Caching and Consistency Client caches file data, meta data and handles that are open Cache consistency Whenever a mutation is to occur, the associated operation is blocked until all caches are invalidated Invalidation messages are piggybacked on KeepAlive messages Disadvantages: Cached copies are not invalidated, and not simultaneous updated Operation cannot progress until all replicas are invalidated Advantages: Simple and elegant for small files and locks

Chubby Architecture Diagram

Overview Recap Google Chubby Ordered Communication

In several applications, ordering of events is vital For example, consider a flight-booking system Reserve Cancel Prices 15% Off Client Server time Server cancels the reservation before booking – even when the messages are reliably delivered! We will study how to ensure ordered delivery of events in group communication

Ordered Multicast – An Example An example where total-ordering is necessary In an eCommerce application, the bank database has been replicated across many servers Let us consider a 2-replica scenario Bal=1000 Replicated Database Event 1 = Add $1000 Event 2 = Add interest of 5% Bal= Bal= Bal= Bal=2100 The updates from Event 1 and Event 2 should be performed in the same order on every replicated server. Else the data is inconsistent.

Three Types of Ordering FIFO Order Causal Order Total Order

FIFO Ordering FIFO Order If a process sends a multicasts a message m before m’, then no correct process delivers m’ if it has not already delivered m In the example, F1 and F2 are in FIFO Order Drawback: FIFO Order does not specify any order for the messages generated across different processes e.g, F1 and F3 can be delivered in any order

Causal Ordering Causal Order If process P i multicasts a message m i and P j multicasts m j, and if m i  m j (operator ‘  ’ is Lamport’s happened-before relation) then any correct process that delivers m j will deliver m i before m j Relationship between FIFO and Causal order: Causal Order implies FIFO Order, but FIFO Order does not imply Causal Order In the example, C1 and C3 are in Causal Order Drawback: The happened-before relation between m i and m j should be induced before communication

Total Ordering Total Order If process P i multicasts a message m i and P j multicasts m j, and if one correct process delivers m i before m j then every correct process delivers m i before m j In the example, T1 and T2 are in Total Order Drawback: Total order does not imply FIFO or causal orders

Totally Ordered Multicast Totally Ordered Multicast is a multicast communication paradigm that ensures that all messages are delivered in the same order at all the receivers Approach: Process P i sends timestamped multicast message msg i to all the receivers in the group At the sender, the message is buffered in a local queue queue i Any incoming message at P j is queued in queue j, according to its timestamp, and acknowledged to every other process. Process Process 2 0 Process

Totally Ordered Multicast (cont’d) A receiver will deliver the message to the application if The message is at the head of the queue, and The message has been acknowledged by each other process Assumptions in Totally Ordered Multicast: Communication is reliable There is no out-of-order delivery of messages that are transmitted from the same sender

Application of Vector Clocks: Causally Ordered Multicast In Causally Ordered Communication, a message m is delivered to an application only if all messages that causally precede m has been received Vector Clocks allow implementation of Causally Ordered Multicast Here, a multicast message is delivered to an application in the causal order Under some criteria, Causally Ordered Multicast is weaker than Totally Ordered Multicast If two messages are not related to each other, it does not matter in which order they are delivered to the application

Causally Ordered Multicast – An Example

Causally Ordered Multicast – Approach Clocks are adjusted only when sending and receiving messages When sending a message m from Process P i : VC i [i] = VC i [i] + 1 ts(m) = VC i When it delivers a message with ts(m) : VC j [k] = max(VC j [k], ts(m)[k]) ; (for all k) When P j receives a message m (with timestamp ts(m) ) from P i, it will deliver the message to the application only if: ts(m)[i] = VC j [i]+1 m is the next message that P j was expecting from P i ts(m)[k] <= VC j [k]; (for all k != i) P j has seen all the messages that have been seen by P i when it sent the message m

References errors/

Totally Ordered Logical Clocks Lamport’s clock does not specify a total order The order of events might change if two events on different processes have numerically identical time values We can create a total order by using the (time value, processId) tuple An event a on process P j is assigned a totally-ordered time value TC j (a) = (C j,j) If event a is generated on process P i, and b and on P j, then we define TC i (a)< TC j (b) if C i < C j, or C i = C j and i < j NOTE: Totally Ordered Logical Clocks does not imply “total ordering” of messages that we today studied under “Ordered Communication”