ZOOKEEPER
CONTENTS ZooKeeper Overview ZooKeeper Basics ZooKeeper Architecture Getting Started with ZooKeeper
ZOOKEEPER OVERVIEW ZooKeeper? A high-performance coordination service for distributed applications (naming, configuration management, synchronization, and group Services) Used to implement consensus, group management, leader election, and presence protocols Runs in Java and has bindings for both Java and C Features of ZooKeeper Shared hierarchical namespace: consists of znodes (data registers) in memory High performance: can be used in large, distributed systems Reliability: keeps from being a single point of failure Strict ordered access: sophisticated synchronization primitives can be implemented at the client (Zookeeper stamps each update with a number) Replication: replicated itself over a sets of hosts called an ensemble
MASTER-WORKER APPLICATION Master crashes If the master is faulty and becomes unavailable, the system cannot allocate new tasks or reallocate tasks from workers that have also failed. Worker crashes If a worker crashes, the tasks assigned to it will not be completed. Communication failures If the master and a worker cannot exchange messages, the worker might not learn of new tasks assigned to it.
REQUIREMENTS FOR MASTER- WORKER ARCHITECTURE Master election It is critical for progress to have a master available to assign tasks to workers. Crash detection The master must be able to detect when workers crash or disconnect. Group membership management The master must be able to figure out which workers are available to execute tasks. Metadata management The master and the workers must be able to store assignments and execution statuses in a reliable manner.
ZOOKEEPER BASICS ZooKeeper does not expose primitives directly. Instead, it exposes a file system-like API comprised of a small set of calls. Recipes : ZooKeeper operations that manipulate small data nodes, called znodes, that are organized hierarchically as a tree, just like in a file system.
API OVERVIEW A znode contains any data as a byte array. ZooKeeper does not allow partial writes or reads of the znode data ZooKeeper API create /path data delete /path exists /path setData /path data getData /path getChildren /path
DIFFERENT MODES FOR ZNODES Persistent znodes Can be deleted only through a call to delete Ephemeral znodes Can be deleted if the client that created it crashes or closes its connection to ZooKeeper Sequential znodes A sequential znode is assigned a unique, monotonically increasing integer. To summarize, there are four options for the mode of a znode persistent, ephemeral, persistent_sequential, ephemeral_sequential
WATCHES AND NOTIFICATIONS
VERSIONS
ZOOKEEPER ARCHITECTURE
ZOOKEEPER QUORUMS In quorum mode, ZooKeeper replicates its data tree across all servers in the ensemble. In public administration, a quorum is the minimum number of legislators required to be present for a vote. For instance, We have five ZooKeeper servers, but a quorum of three. So long as any three servers have stored the data, the client can continue, and the other two servers will eventually catch up and store the data.
SESSIONS Before executing any request against a ZooKeeper ensemble, a client must establish a session with the service. It uses a TCP connection to communicate with the server, but the session may be moved to a different server if the client has not heard from its current server for some time. Sessions offer order guarantees, which means that requests in a session are executed in FIFO (first in, first out) order. Typically, a client has only a single session open, so its requests are all executed in FIFO order.
STATES AND THE LIFETIME OF A SESSION The lifetime of a session corresponds to the period between its creation and its end. Timeout Server side: If the service does not see messages associated to a given session during time t, it declares the session expired Client side: If it has heard nothing from the server at 1/3 of t, it sends a heartbeat message to the server. At 2/3 of t, the ZooKeeper client starts looking for a different server, and it has another 1/3 of t to find one.
EXAMPLE OF CLIENT RECONNECTING
ZOOKEEPER WITH QUORUMS
IMPLEMENTING A PRIMITIVE: LOCKS WITH ZOOKEEPER Here we discuss a simple recipe just to illustrate how applications can use ZooKeeper. Process We have n processes trying to acquire a lock. To acquire a lock, each process tries to create a znode, say /lock as ephemeral type. If p succeeds in creating the znode, it has the lock and can proceed to execute its critical section. Other processes that try to create /lock fail watch for changes to /lock and try to acquire the lock again once they detect that /lock has been deleted.
IMPLEMENTATION OF A MASTER-WORKER EXAMPLE we will implement some of the functionality of the master-worker example using the zkCli tool. Master The master watches for new workers and tasks, assigning tasks to available workers. Worker Workers register themselves with the system, to make sure that the master sees they are available to execute tasks, and then watch for new tasks. Client Clients create new tasks and wait for responses from the system.
THE MASTER ROLE(1/2)
THE MASTER ROLE(2/2)
THE WORKER ROLE
THE CLIENT ROLE(1/2)
THE CLIENT ROLE(2/2)
CONCLUSION ZooKeeper Overview A high-performance coordination service for distributed applications ZooKeeper Basics exposes a file system-like API Watches and Notifications ZooKeeper Architecture Quorums, Session