MAHADEV KONAR Apache ZooKeeper
What is ZooKeeper? A highly available, scalable, distributed coordination kernel
Use Cases » Leader Election » Group Membership » Work Queues » Event Notifications/workflow management » Configuration Management » Cluster Management » Sharding
What is ZooKeeper again? File api without partial reads/writes No renames Ordered updates and strong persistence guarantees Conditional updates (version) Watches for data changes Ephemeral znodes Generated file names
Data Model Hierarchal namespace Each znode has data and children data is read and written in its entirety / apps users locks servers app1 read-1 master regionserver
ZooKeeper API String create(path, data, acl, flags) void delete(path, expectedVersion) Stat setData(path, data, expectedVersion) (data, Stat) getData(path, watch) Stat exists(path, watch) String[] getChildren(path, watch)
ZooKeeper Service All servers store a copy of the data (in memory) A leader is elected at startup Followers service clients, all updates go through leader Update responses are sent when a majority of servers have persisted the change ZooKeeper Service Server Leader Client
ZooKeeper and HBase Master Failover Region Servers and Master discovery via ZooKeeper HBase clients connect to ZooKeeper to find configuration data Region Servers and Master failure detecti0n
Hbase and ZooKeeper as of now! / / root-region-server rs master Master If more than one master, they fight Root Region Server This znode holds the location of the server hosting the root of all tables in hbase rs A directory in which there is a znode per Hbase region server Region Servers register themselves with ZooKeeper when they come online On Region Server failure (detected via ephemeral znodes and notification via ZooKeeper), the master splits the edits out per region shutdown
Common Problems/Error Cases Garbage Collection at the Region Servers Causes zookeeper clients to stall Session expiry Low throughput and connection loss Mostly due to under provisioned ZooKeeper instances Disk and Memory usage Bad Usage example: NameNode, RegionServer, JobTracker, ZooKeeper running on the same node
Release 3.3.0, whats in for Hbase? Allow configuration of session timeout min/max bounds HBase needs large session timeouts Improved logging information to detect issues Improved debugging tools Improved documentation Improved performance and robustness Queue implementation available
Upcoming 3.4 release No Connectionloss Use Netty - allow encryption Testing Mockito More of backwards compatibility testing
More ZooKeeper in Hbase? Table Schema and state in ZooKeeper read only, online Region Server state transitions via ZooKeeper Store region assignment in ZooKeeper for each Region Server seCases
Questions?