Elastic Consistent Hashing for Distributed Storage Systems

Elastic Consistent Hashing for Distributed Storage Systems
Wei Xie and Yong Chen Texas Tech University 31st IEEE International Parallel & Distributed Processing Symposium May 29 – June 2, 2017 Buena Vista Palace Hotel Orlando, Florida USA

Elastic Distributed Data Store
Elasticity means resize (the number of active servers) dynamically as workload varies (to save power) More difficult than elastic computing Reduce power consumption (cost), or improve resource utilization

How Resizing Works Power down last 3,
need to migrated red and blue data (otherwise they are not available) 3-way replication, 10 server, same color represents same data Last 3 servers can be powered-down now Data re-replicated Power on 8th server Data re-integration

Resize Agility is Critical
The resizing agility (how fast resize can take place) Determines how good the workload variation is tracked Xu et al. SpringFS: Bridging Agility and Performance in Elastic Distributed Storage

Resize Footprint Should be Small
If resizing introduces lots of IO workload, the number of server needed to be active increases The longer and more intensive the resize IO, the less machine hour savings

What Determines Agility and Resize Footprint?
Size down: need to migrate data to avoid data on the powering-down servers unavailable. Agility Size up: need to migrate data to keep the data on the powering-on servers up-to-date. Resize footprint

Consistent Hashing based Distributed Storage Systems
Distributed hash table/database:Cassandra, Dynamo, Voldemort, Riak Distributed data Storage/file system: GlusterFS, Sheepdog, Ceph (uses a variant of CH) Advantage: Scalability Add or remove server only requires small data to be moved

Elasticity with Consistent Hashing
Consistent hashing is good fit for achieving elasticity Resize with small data movement Size-down: move the data on the deactivating server to other servers Size-up: move data back to the activating server Can we do better?

Elasticity with Consistent Hashing
Can we do better? Size-down: how about no data movement? Size-up: how about only moving modified data？ How to do it? Make sure data always has copies on active servers Need to know what data are modified

Size-down with No Movement
The problem is to prevent the servers being power-down to be a copyset of data. Copyset: a combination of servers that contain the replicas of a data item E.g., {1, 2, 3}, {2, 3, 4}, {3, 4, 5}

Size-down with No Movement
Solution 1: shutdown those servers that do not make copysets, e.g. shutdown {1, 5, 9} Limitation: CH usually uses virtual node for load balance that significantly increases the number of copysets Solution 2: change data layout Reserve some servers that all copysets have one of these servers and they never shut down

Non-elastic Data Layout
A typical pseudo-random based data layout as seen in most CH-based distributed FS Almost all server must be “on” to ensure 100% availability No elastic resizing capability

Elastic Data Layout General rule Take advantage of replication
Always keep the first (primary) replicas “on” The other replicas can be activated on demand

Primary Server Layout Peak write performance: N/3 (same as non-elastic) Limited scaling to N/3

Equal-work Data Layout
To scale even smaller (less primary servers), each server will have different size To achieve proportional performance scaling (performance ~ num_active_servers), an equal-work data layout needed Servers may need different capacity to ensure balanced storage utilization

Primary-server Layout with CH
Modifies data placement in original CH so that one replica is always placed on a primary server To achieve equal-work layout, the cluster must be configured accordingly

Equal-work Data Layout
Number of data chunks on primary: Number of data chunks on secondary:

Data Re-integration After a node is turned down, no data will be written to it. When this node joins again, any newly created data/modified data might need to re-integrate to it.

Data Re-integration Data re-integration incurs lots of I/O operations and degrades performance (or require extra machines) when scaling up 3-phase workload: high load -> low load -> high load No resizing: 10 servers always on With resizing: 10 servers -> 2 servers -> 10 servers

Selective Data Re-Integration
Each resize is associated with a version and a membership table Dirty table to track all OIDs that are dirty When re-integration finishes, OID is removed from table The rate of re-integration is controlled by issuing number of OIDs per minute

Implementation Primary-secondary data placement/replication implemented in Sheepdog 10 Sheepdog servers, 1 client server Manually resize the cluster by killing & starting sheepdog process Sheepdog uses Corosync to manage membership Dirty data tracking implemented using Redis

Evaluation Without selective re-integration, the size-up has a delay, which indicates that it requires powering up more servers to achieve the same performance (which means less machine hour savings) With selective re-integration, the size-up delay is much less The experiment indicates the effectiveness but it is small scale

Large-scale Trace Analysis
Use the Cloudera trace (CC-a: <100 machines, 1 month, 69TB; CC-b: 300 machines, 9 days, 473TB) Apply our policy and analyze the effect of resizing Number of servers needed for workload ideally Add delays by clean-up Add number of servers to compensate for re-integration workload

Machine hour compared to the ideal resizing
Machine Hour Saving Machine hour compared to the ideal resizing Trace Original CH + full +selective CC-a 1.32 1.24 1.21 CC-b 1.51 1.37 1.33 Elastic layout gives 6% to 10% extra saving in machine hours Selective data reintegration gives 2% improvement

Summary We propose primary-secondary data placement/replication scheme to provide better elasticity in consistent hashing based data store We use selective background data re-integration technique to reduce the I/O footprint when re-integrating nodes to a cluster First work studying elasticity for saving power in consistent hashing based store Further study the problem that the capacity may be full when scaling down Testing in larger cluster configurations

Welcome to visit our website for more details.
Questions! Welcome to visit our website for more details. DISCL lab: Personal site:

Elastic Consistent Hashing for Distributed Storage Systems

Similar presentations

Presentation on theme: "Elastic Consistent Hashing for Distributed Storage Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Elastic Consistent Hashing for Distributed Storage Systems

Similar presentations

Presentation on theme: "Elastic Consistent Hashing for Distributed Storage Systems"— Presentation transcript:

Similar presentations

About project

Feedback