The Chubby lock service for loosely-coupled distributed systems Mike Burrows, Google Inc. Presented By: Harish Rayapudi, Shiva Prasad Malladi
Overview 1. Introduction 2. Why Chubby? 3. System Structure 4. Files, directories and handles 5. Locks and Sequencers 6. Events and API 7. Caching 8. Sessions and KeepAlives 9. Fail-overs and Backup 10. Scaling mechanisms 11. Summary
Introduction Chubby lock service provides coarse-grained locking. Provides reliable storage for a loosely-coupled system. Chubby’s design mainly concentrates on availability and reliability rather than high performance. Chubby instance, known as chubby cell might serve ten thousand 4- processor machines connected by a high speed LAN. We use lock service to synchronize activities and to provide basic information about their environment. Google File System(GFS) uses chubby to elect a master server. Before chubby, Google used many ad hoc methods for primary election.
Why Chubby? We could have used a library with paxos instead of chubby lock service, but a lock system in better in the following ways: 1. A lock system makes it easy to implement availability to clients, reliability and primary election rather than a library system. 2. It is better than a name service, as it reduces the dependency on many servers. 3. A lock-based protocol is familiar to most of the programmers. 4. A lock-service enables client to make decisions correctly when less than a majority of its own members are up.
System Structure 1. Chubby has two main components a server and a library that communicate through RPC’s. 2. Chubby cell consists of small set of servers called a replicas. 3. The replicas elect a master and only the master can initiate the reads and writes.
System Structure 4 Client finds the master through the replicas and when they found the master they send all their request to the master. 5 Write requests are served by all replicas through the master and the read requests are done only by the master alone, which is safe. 6 Anytime the master fails, lease expires all the replicas elect a new master. 7 If the replicas fail, the simple replacement system selects a fresh machine from a free pool.
Files, directories, and handles Chubby a file system similar to that of UNIX, likewise it has tree of files and directories, with name components separated by slashes. For example /ls/foo/wombat/pouch is common to all chubby names which stands for lock service. is the name of chubby cell which is resolved to one or more servers through a DNS lookup. is the name of the directory and the file.
Files, directories, and handles contd. In Chubby to access a file, we need permissions on the file rather than the directory. For example if a file F’s write ACL name is foo, and the ACL directory contains a corresponding entry bar, i.e. a user bar has permissions to write F. Here we have name space which collectively contains files and directories. The Nodes may be either permanent or ephemeral. In Chubby ephemeral files are used as temporary files to indicate that client is alive.
Files, directories, and handles contd. Handles are similar to the UNIX file descriptors. They include Check digits that prevent clients from creating or guessing handles. A sequence number that tells master whether this handle was created by it or any previous other master. Mode information given an open time to allow the master to recreate its state when an old handle is given to a newly restarted master.
Locks and sequencers In Chubby a client handle may hold a lock in two modes exclusive (writer) mode or, shared (reader) mode. In Chubby holding a lock called F is not necessary to access file F, doesn’t prevent other clients doing so (different from mandatory locks). In order to acquire a lock in chubby, a write permission is required (to prevent unprivileged readers). We introduce sequencers, a lock holder may request a sequencer, which describes about the lock and its state.
Locks and sequencers contd. A sequencer consists of name of the lock, the type in which it was acquired, lock generation number. Chubby provides an imperfect yet easier mechanism to reduce the risk of delayed or re-ordered requests to servers. A lock, if it becomes free when its holder has failed or becomes in accessible, cannot be claimed by another client till sometime called as lock-delay (faulty clients).
Events Chubby clients have various events that include: File contents modified. Child node added, removed, or modified. Chubby master failed over. A handle has become invalid. A lock acquired. Conflicting lock request from other client. Here the events are delivered after the corresponding action has taken place.
API The main calls that act on a handle are: calls Actions GetContentsAndStat():Returns the contents and metadata of a file. SetContents():Used to write contents of a file. Delete() :Deletes the node if it has no children. Acquire(), TryAcquire(), Release(): Acquire and release locks. SetSequencer():Associates a sequencer with a handle. CheckSequencer():Checks whether a sequencer is valid.
Primary Election This API described above is used to perform primary election: All potential primaries open the lock file to acquire a lock. The one which is successful becomes primary and rest become replicas. The primary then writes its identity into lock file using setContents(). The primary obtains a sequencer using GetSequencer(), which it passes to the server and is confirmed using CheckSequencer().
Caching The cache has to be consistent, this is done by the server by sending invalidations to clients. This is done as follows: The server sends invalidations to the clients that have cached an invalid modification. Upon receiving this invalidation, a client flushes the invalidated state and acknowledges by making next Keep Alive call. Chubby also allows clients to cache locks, i.e. an event is informs a lock holder to release the lock when another client has requested the same lock.
Sessions and KeepAlives A chubby session is a relationship between a Chubby cell and Cubby client. Each session has an associated lease, an interval of time in the future where the master shouldn’t terminate the session. The end of this interval is session lease timeout, the master can advance the session lease timeout when it gets request from client. The master extends the lease timeout and informs the client. (default extension is 12s).
Sessions and KeepAlives contd. If a client’s local lease timeout expires, it becomes unsure whether the master has terminated its session. It goes into a session called jeopardy and waits for a grace period (default 45s). If the client and master manage to exchange a successful KeepAlive before the grace period, the client again enables it cache. Otherwise it thinks that session has expired. When the session has survived the communication problem, a safe event tells the client to proceed, else an expired event is sent.
Fail-Overs The original master has lease M1 and client has C1, the master then commits to M2 without informing client. The Client lease(C2) expires and it flushes its cache and starts a timer for the grace period. When a new master is elected, it uses a conservative approximation M3 of the session lease. The request(6) succeeds but typically does not extend master lease further because M3 was conservative. The reply(7) allows the client to extend its lease(C3) and inform its session isn’t in jeopardy.
Backup The master of a chubby cell writes a snapshot of its database into the GFS file server. This is done for every few hours. The GFS file server is in a separate building to ensure that the backup will survive building damage and introduce no cyclic dependencies. It provides a means of disaster recovery and for initializing the database of a newly replaced replica.
Scaling Mechanism Clients are individual process Up to 90,000 clients communicate directly with a chubby master server and client machine are identical in chubby Techniques are required to reduce the client - master communication Techniques Creating arbitrary number of chubby cells and client always use a nearby cell Increase lease time from default 12s to 60s. This reduces the KeepAlive messages Clients cache file data and meta-data reducing the number of calls to server Use of protocol-conversion servers. They translate the chubby protocol into less- complex protocol.
Proxies Proxy are used to reduce the load on the chubby cell They can handle both KeepAlive and read requests which are major part of RPC traffic They can’t reduce write requests which constitutes less than 1% of chubby’s workload If a proxy handles N proxy clients, then KeepAlive traffic is reduced by a factor of N proxy They add additional RPC to writes and first-time reads Unavailability is doubled because of proxy and chubby master
Use and behaviour Table taken from paper 230 k / 24 k ≈ 10 clients use each cached file, on average. Few clients hold locks and shared locks are rare. KeepAlive messages dominate the RPC traffic 61 outages observed over a period of few weeks
Fail-over Problems Each created new session is written to the database and with many sessions created this is overhead Server was modified to store a session when it first modifies the session or when it acquires a lock New read-only sessions will not be written to the database and if a fail- over occurs they are discarded These discarded sessions could read bad data before their lease expires when they check with the new master Under new design, sessions are not recorded in database
Problems encountered Lack of Quotas Chubby is not used as a storage system and so has no storage quotas Module was written to keep track of data uploads and to store metadata Some other services started using this module As a result, a single 1.5 Mbyte file was being written on each user action Space used by this service exceeded the space needs of all chubby clients combined File size is limited to 256kBytes and new services are migrated to appropriate storage systems
Lessons Learnt Developers rarely consider availability Fine-grained locking could be ignored Poor API choices have unexpected affects RPC use effects transport protocols
Summary Chubby is a distributed lock service intended for coarse-grained synchronization of activities within Google’s distributed systems Caching, protocol-conversion servers, and simple load adaptation are used to scale to tens of thousands of client processes per Chubby instance GFS and Bigtable use chubby to elect a primary from redundant replicas Chubby is a standard repository for files that require high availability
Thank you