Elastic Consistent Hashing for Distributed Storage Systems

Slides:

Advertisements

Similar presentations

Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.

Advertisements

Load Rebalancing for Distributed File Systems in Clouds Hung-Chang Hsiao, Member, IEEE Computer Society, Hsueh-Yi Chung, Haiying Shen, Member, IEEE, and.

Introduction to DBA.

Ceph: A Scalable, High-Performance Distributed File System Priya Bhat, Yonggang Liu, Jing Qin.

Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?

Proteus: Power Proportional Memory Cache Cluster in Data Centers Shen Li, Shiguang Wang, Fan Yang, Shaohan Hu, Fatemeh Saremi, Tarek Abdelzaher.

Scalability Module 6.

Cloud Data Center/Storage Power Efficiency Solutions Junyao Zhang 1.

Achieving Load Balance and Effective Caching in Clustered Web Servers Richard B. Bunt Derek L. Eager Gregory M. Oster Carey L. Williamson Department of.

1 The Google File System Reporter: You-Wei Zhang.

Middleware Enabled Data Sharing on Cloud Storage Services Jianzong Wang Peter Varman Changsheng Xie 1 Rice University Rice University HUST Presentation.

Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)

Min Xu1, Yunfeng Zhu2, Patrick P. C. Lee1, Yinlong Xu2

A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah.

Supplement C Waiting Line Models Operations Management by R. Dan Reid & Nada R. Sanders 4th Edition © Wiley 2010.

MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Partitioning and Replication.

Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.

1 Virtual Machine Memory Access Tracing With Hypervisor Exclusive Cache USENIX ‘07 Pin Lu & Kai Shen Department of Computer Science University of Rochester.

11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.

)1()1( Presenter: Noam Presman Advanced Topics in Storage Systems – Semester B 2013 Authors: A.Cidon, R.Stutsman, S.Rumble, S.Katti,

The IEEE International Conference on Cluster Computing 2010

Practical IT Research that Drives Measurable Results 1Info-Tech Research Group Get Moving with Server Virtualization.

April 9-10, 2015 Texas Tech University Semiannual Meeting Unistore: A Unified Storage Architecture for Cloud Computing Project Members: Wei Xie,

Improving Multi-Core Performance Using Mixed-Cell Cache Architecture

Unistore: Project Updates

Tom Van Steenkiste Supervisor: Predrag Buncic

C Loomis (CNRS/LAL) and V. Floros (GRNET)

Memory Hierarchy Ideal memory is fast, large, and inexpensive

Memory Management Virtual Memory.

Jonathan Walpole Computer Science Portland State University

Chapter 2 Memory and process management

Efficient data maintenance in GlusterFS using databases

CSE 486/586 Distributed Systems Case Study: Amazon Dynamo

Measurement-based Design

Dynamo: Amazon’s Highly Available Key-value Store

QlikView Licensing.

One vs. two production environments

DuraStore – Achieving Highly Durable Data Centers

Unistore: A Unified Storage Architecture for Cloud Computing

MyRocks at Facebook and Roadmaps

Jiang Zhou, Wei Xie, Dong Dai, and Yong Chen

Database Performance Tuning and Query Optimization

The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.

13 Aggregate Planning.

Unistore: Project Updates

What is the Azure SQL Datawarehouse?

Providing Secure Storage on the Internet

Outline Midterm results summary Distributed file systems – continued

Material Requirements Planning (MRP)

Resolving collisions: Open addressing

Systems Analysis and Design in a Changing World, 6th Edition

Supplement D Waiting Line Models

Cloud Computing Architecture

Specialized Cloud Architectures

Cloud Computing Architecture

Scalable Multi-Match Packet Classification Using TCAM and SRAM

Chapter 11 Database Performance Tuning and Query Optimization

Indexing 4/11/2019.

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S

Using the Cloud for Backup, Archiving & Disaster Recovery

THE GOOGLE FILE SYSTEM.

The SMART Way to Migrate Replicated Stateful Services

Increasing Effective Cache Capacity Through the Use of Critical Words

Performance-Robust Parallel I/O

CSE 486/586 Distributed Systems Case Study: Amazon Dynamo

The Database World of Azure

Presentation transcript:

Elastic Consistent Hashing for Distributed Storage Systems Wei Xie and Yong Chen Texas Tech University 31st IEEE International Parallel & Distributed Processing Symposium May 29 – June 2, 2017 Buena Vista Palace Hotel Orlando, Florida USA

Elastic Distributed Data Store Elasticity means resize (the number of active servers) dynamically as workload varies (to save power) More difficult than elastic computing Reduce power consumption (cost), or improve resource utilization

How Resizing Works Power down last 3, need to migrated red and blue data (otherwise they are not available) 3-way replication, 10 server, same color represents same data Last 3 servers can be powered-down now Data re-replicated Power on 8th server Data re-integration

Resize Agility is Critical The resizing agility (how fast resize can take place) Determines how good the workload variation is tracked Xu et al. SpringFS: Bridging Agility and Performance in Elastic Distributed Storage

Resize Footprint Should be Small If resizing introduces lots of IO workload, the number of server needed to be active increases The longer and more intensive the resize IO, the less machine hour savings

What Determines Agility and Resize Footprint? Size down: need to migrate data to avoid data on the powering-down servers unavailable. Agility Size up: need to migrate data to keep the data on the powering-on servers up-to-date. Resize footprint

Consistent Hashing based Distributed Storage Systems Distributed hash table/database:Cassandra, Dynamo, Voldemort, Riak Distributed data Storage/file system: GlusterFS, Sheepdog, Ceph (uses a variant of CH) Advantage: Scalability Add or remove server only requires small data to be moved

Elasticity with Consistent Hashing Consistent hashing is good fit for achieving elasticity Resize with small data movement Size-down: move the data on the deactivating server to other servers Size-up: move data back to the activating server Can we do better?

Elasticity with Consistent Hashing Can we do better? Size-down: how about no data movement? Size-up: how about only moving modified data？ How to do it? Make sure data always has copies on active servers Need to know what data are modified

Size-down with No Movement The problem is to prevent the servers being power-down to be a copyset of data. Copyset: a combination of servers that contain the replicas of a data item E.g., {1, 2, 3}, {2, 3, 4}, {3, 4, 5}

Size-down with No Movement Solution 1: shutdown those servers that do not make copysets, e.g. shutdown {1, 5, 9} Limitation: CH usually uses virtual node for load balance that significantly increases the number of copysets Solution 2: change data layout Reserve some servers that all copysets have one of these servers and they never shut down

Non-elastic Data Layout A typical pseudo-random based data layout as seen in most CH-based distributed FS Almost all server must be “on” to ensure 100% availability No elastic resizing capability

Elastic Data Layout General rule Take advantage of replication Always keep the first (primary) replicas “on” The other replicas can be activated on demand

Primary Server Layout Peak write performance: N/3 (same as non-elastic) Limited scaling to N/3

Equal-work Data Layout To scale even smaller (less primary servers), each server will have different size To achieve proportional performance scaling (performance ~ num_active_servers), an equal-work data layout needed Servers may need different capacity to ensure balanced storage utilization

Primary-server Layout with CH Modifies data placement in original CH so that one replica is always placed on a primary server To achieve equal-work layout, the cluster must be configured accordingly

Equal-work Data Layout Number of data chunks on primary: Number of data chunks on secondary:

Data Re-integration After a node is turned down, no data will be written to it. When this node joins again, any newly created data/modified data might need to re-integrate to it.

Data Re-integration Data re-integration incurs lots of I/O operations and degrades performance (or require extra machines) when scaling up 3-phase workload: high load -> low load -> high load No resizing: 10 servers always on With resizing: 10 servers -> 2 servers -> 10 servers

Selective Data Re-Integration Each resize is associated with a version and a membership table Dirty table to track all OIDs that are dirty When re-integration finishes, OID is removed from table The rate of re-integration is controlled by issuing number of OIDs per minute

Implementation Primary-secondary data placement/replication implemented in Sheepdog 10 Sheepdog servers, 1 client server Manually resize the cluster by killing & starting sheepdog process Sheepdog uses Corosync to manage membership Dirty data tracking implemented using Redis

Evaluation Without selective re-integration, the size-up has a delay, which indicates that it requires powering up more servers to achieve the same performance (which means less machine hour savings) With selective re-integration, the size-up delay is much less The experiment indicates the effectiveness but it is small scale

Large-scale Trace Analysis Use the Cloudera trace (CC-a: <100 machines, 1 month, 69TB; CC-b: 300 machines, 9 days, 473TB) Apply our policy and analyze the effect of resizing Number of servers needed for workload ideally Add delays by clean-up Add number of servers to compensate for re-integration workload

Machine hour compared to the ideal resizing Machine Hour Saving Machine hour compared to the ideal resizing Trace Original CH + full +selective CC-a 1.32 1.24 1.21 CC-b 1.51 1.37 1.33 Elastic layout gives 6% to 10% extra saving in machine hours Selective data reintegration gives 2% improvement

Summary We propose primary-secondary data placement/replication scheme to provide better elasticity in consistent hashing based data store We use selective background data re-integration technique to reduce the I/O footprint when re-integrating nodes to a cluster First work studying elasticity for saving power in consistent hashing based store Further study the problem that the capacity may be full when scaling down Testing in larger cluster configurations

Welcome to visit our website for more details. Questions! Welcome to visit our website for more details. DISCL lab: http://discl.cs.ttu.edu/ Personal site: https://sites.google.com/site/harvesonxie/