Jiang Zhou, Wei Xie, Dong Dai, and Yong Chen

Slides:



Advertisements
Similar presentations
Novasky: Cinematic-Quality VoD in a P2P Storage Cloud Speaker : 童耀民 MA1G Authors: Fangming Liu†, Shijun Shen§,Bo Li†, Baochun Li‡, Hao Yin§,
Advertisements

Pei Fan*, Ji Wang, Zibin Zheng, Michael R. Lyu Toward Optimal Deployment of Communication-Intensive Cloud Applications 1.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
Hystor : Making the Best Use of Solid State Drivers in High Performance Storage Systems Presenter : Dong Chang.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
Min Xu1, Yunfeng Zhu2, Patrick P. C. Lee1, Yinlong Xu2
A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
Job scheduling algorithm based on Berger model in cloud environment Advances in Engineering Software (2011) Baomin Xu,Chunyan Zhao,Enzhao Hua,Bin Hu 2013/1/251.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
Jiahao Chen, Yuhui Deng, Zhan Huang 1 ICA3PP2015: The 15th International Conference on Algorithms and Architectures for Parallel Processing. zhangjiajie,
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Fast Data Analysis with Integrated Statistical Metadata in Scientific Datasets By Yong Chen (with Jialin Liu) Data-Intensive Scalable Computing Laboratory.
Using Pattern-Models to Guide SSD Deployment for Big Data in HPC Systems Junjie Chen 1, Philip C. Roth 2, Yong Chen 1 1 Data-Intensive Scalable Computing.
| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.
April 9-10, 2015 Texas Tech University Semiannual Meeting Unistore: A Unified Storage Architecture for Cloud Computing Project Members: Wei Xie,
Unistore: Project Updates
Authors: Jiang Xie, Ian F. Akyildiz
Web Server Load Balancing/Scheduling
Chapter 1: Introduction
Web Server Load Balancing/Scheduling
Curator: Self-Managing Storage for Enterprise Clusters
Edinburgh Napier University
BD-CACHE Big Data Caching for Datacenters
Chapter 1: Introduction
DuraStore – Achieving Highly Durable Data Centers
File System Implementation
Unistore: A Unified Storage Architecture for Cloud Computing
Swapping Segmented paging allows us to have non-contiguous allocations
Chapter 1: Introduction
An Adaptive Data Separation Aware FTL for Improving the Garbage Collection Efficiency of Solid State Drives Wei Xie and Yong Chen Texas Tech University.
Ping-Sung Yeh, Te-Hao Hsu Conclusions Results Introduction
Elastic Consistent Hashing for Distributed Storage Systems
Accelerating MapReduce on a Coupled CPU-GPU Architecture
Understanding System Characteristics of Online Erasure Coding on Scalable, Distributed and Large-Scale SSD Array Systems Sungjoon Koh, Jie Zhang, Miryeong.
A Survey on Distributed File Systems
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
Experiment Evaluation
The Composite-File File System: Decoupling the One-to-one Mapping of Files and Metadata for Better Performance Shuanglong Zhang, Helen Catanese, Andy An-I.
Unistore: Project Updates
Li Weng, Umit Catalyurek, Tahsin Kurc, Gagan Agrawal, Joel Saltz
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
Main memory and mass storage
An Introduction to Computer Networking
Outline Midterm results summary Distributed file systems – continued
Computer Architecture
O.S Lecture 14 File Management.
A Software-Defined Storage for Workflow Applications
Chapter 5: Computer Systems Organization
Pei Fan*, Ji Wang, Zibin Zheng, Michael R. Lyu
Database Systems Instructor Name: Lecture-3.
Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.
Multithreaded Programming
Specialized Cloud Architectures
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
THE GOOGLE FILE SYSTEM.
Automatic and Efficient Data Virtualization System on Scientific Datasets Li Weng.
COMP755 Advanced Operating Systems
Attributed Consistent Hashing for Heterogeneous Storage Systems
Efficient Migration of Large-memory VMs Using Private Virtual Memory
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Dong Hyun Kang, Changwoo Min, Young Ik Eom
Presentation transcript:

Pattern-Directed Replication Scheme for Heterogeneous Object-based Storage Jiang Zhou, Wei Xie, Dong Dai, and Yong Chen Data-Intensive Scalable Computing Lab, Texas Tech University CCGrid-2017 1/16

Outline Motivation Pattern-Directed Replication Scheme Evaluation Architecture Overview I/O Trace Collection Data Access Pattern Analysis Pattern-Directed Replication Evaluation Conclusion & Future Work Here is the outline of the presentation. I will begin by discussing the Motivations, Pattern-directed replication scheme, Evaluation, then the conclusion and discuss future work. CCGrid-2017 2/16

Motivation Object-based storage model is an important storage architecture (e.g., Lustre, Ceph, Sheepdog) Objects are the logical entities Exploit the inherent advantages of storage devices Heterogeneous storage systems are promising Magnetic media (hard disk drives, HDDs) Non-volatile storage-class memory (SCM) High bandwidth, low latency, and mechanical-component- free Limitations of small capacity, short lifetime, and high cost The motivation of our work is to understand data access patterns for data replication in heterogeneous object-based storage systems. The object-based storage model, which abstracts files as multiple data objects stored in object-based devices, becomes increasingly important for distributed storage systems in data centers. It has been widely adopted in production systems like Lustre, Ceph, and Sheepdog. Recently, heterogeneous storage systems that take advantages of emerged new devices, such as non-volatile storage class memory (SCM) including solid state drives, phase change memory, in addition to conventional[kən'vɛnʃənl] hard disk drives (HDDs), have gained increasing attention. Compared with HDDS, SCM has the advantage of high bandwidth, low latency, and mechanical-component-free, but is limited to small capacity, short lifetime and high cost. CCGrid-2017 3/16

Motivation Replication is a common approach to enhance data availability in storage systems Automatically copies data and distributes them with multiple replicas The data can be retrieved from other replicas if one copy is not accessed or corrupted Limitation of current replication schemes for object- based storage There has been little work on data replication for heterogeneous object-based storage systems Can’t take full advantages of heterogeneous device merits Does not consider the object access patterns well Place hot, frequently accessed data on SSDs while the cold, sequentially accessed data on HDDs Do not consider access correlation of objects On the other hand, replication is a common approach to enhance data availability in storage systems. Its core concept is to automatically replicate data objects and distribute them into multiple devices. With replication, data can be retrieved from other replicas if one copy is not accessible or is corrupted. Although numerous studies have been devoted to the design and development of data replication schemes, there has been little work on data replications for heterogeneous object-based storage systems. As the performance and capacity of heterogeneous devices are different, it is desired to place replicas according to their characteristics. Existing replication strategies mainly use frequency or sequentiality to work with heterogeneous storage, in which they place replicas of hot, frequently accessed data on SSDs while the cold, sequentially accessed data on HDDs. However, this does not consider the object access patterns well. For instance, two strongly correlated objects that are normally accessed successively in term of time might be placed into HDD and SSD, respectively, which leads to poor performance when this same access pattern appears again in the future. CCGrid-2017 4/16

Pattern-Directed Replication Scheme (PRS) Addressing two technical challenges Analyze object access pattern to guide data replication Optimize replica layout to take full advantage of heterogeneous device characteristics Approach Selectively replicate data objects Group objects based on their temporal locality or access frequency and reorganize them for replication Use a pseudo-random algorithm to optimize replica layout by considering the device bandwidth and capacity features This work introduces a novel pattern-directed replication scheme (PRS) to achieve highly efficient data replication in heterogeneous object-based storage systems. The promise of the PRS relies on addressing two technical challenges: analyzing object access pattern to guide data replication and optimizing replica layout to take full advantage of heterogeneous device characteristics. Different from traditional schemes, the PRS selectively replicates data objects and distributes replicas to different storage devices according to their features. It reorganizes objects with object distance calculation and arranges replicas according to application’s data access pattern identified. In addition, the PRS uses a pseudo-random algorithm to optimize replica layout by considering the storage device performance and capacity features. CCGrid-2017 5/16

Architecture Overview Next, we introduce the detailed design of a pattern-directed replication scheme (PRS) for heterogeneous object-based storage systems. First, as shown in the figure, the application running on physical machines or virtual machines (VMs) send I/O requests to the backend storage system. The data objects are placed with an original data layout for its first replica, e.g., via the consistent hashing algorithm to distribute data. Second, the second and third replicas for an object will be created with PRS after pattern analysis. The Data access pattern analysis is based on the runtime object I/O requests that are traced by a trace collector in each storage node. PRS analyzes the trace to find data access patterns (can be performed periodically in real time or offline). It groups objects based on identified access patterns for making the second or more replications, and places them with a replica distribution algorithm in the background. Third, when the same application runs again, the I/O requests will be redirected to the new copy in an optimized layout for better performance if the object has been replicated with PRS. CCGrid-2017 6/16

I/O Trace Collection Collected aspects of object requests Object ID Start offset in an object Size of access Access time Operation code An example of FIO trace on Sheepdog One node trace Time Opcode Object id offset size An I/O trace collector is responsible for collecting the runtime object information of object-based storage. We collect five aspects of object request action for further pattern analysis, as shown below. The right figure shows an example of FIO trace on Sheepdog, a distributed storage system for virtual machine storage. In a distributed storage system, each storage node generates one trace file that contains all its I/O requests in which the node receives or forwards. The trace file is used for data access pattern analysis and object reorganization for replication in the node. CCGrid-2017 7/16

Data Access Pattern Object division and placement Tend to provide data replication for virtual machine storage The file is divided into fixed objects with 4MB size when storing The figure shows a typical data placement for virtual machine using object-based storage. In this figure, the 40GB image file is divided into fixed objects with 4MB size when storing. CCGrid-2017 8/16

Two Types of Object Correlation Local data access pattern A temporal-based correlation Reflect the object access order in period of time Objects that are accessed together in a given interval and accessed periodically Global data access pattern Represent the objects that are most frequently accessed during the application run time Adopt an configuration-aware method to find hot objects A certain percentage of objects will be considered as hot data according to the proportion of HDD and SSD capacity To find the object access behavior, we define two types of data access patterns: a local data access pattern and a global data access pattern. The local data access pattern represents the temporal-based correlation, which reflects the object access order in a period of time. Objects that are accessed together in a given interval and accessed periodically belong to this pattern. The global data access pattern means the objects’ behavior is no longer limited to data access in a period of time. Instead, they represent the objects that are most frequently accessed during the application run time. We adopt an configuration-aware method to find the hot objects that belong to the global pattern, which means a certain percentage of objects will be considered as hot data according to the proportion of HDD and SSD space capacity. CCGrid-2017 9/16

Pattern-Directed Replication Scheme Group objects that belong to local or global patterns With the object clustering algorithm The objects in the same group/pattern are grouped to a new object and replicated First replicate objects (hot data) with a global pattern Then replicate objects that belong to local patterns With the object clustering algorithm, the PRS groups objects based on the local pattern or global pattern. The objects in the same group are grouped to a new object and replicated. For the identified data access pattern, the PRS first replicates objects with a global pattern. That means a proportion of hot objects will first be grouped for replication. The objects that do not belong to any global pattern will be grouped for identified local patterns. CCGrid-2017 10/16

Optimized replica placement Adopt a pseudo-random number algorithm to optimize replica placement First assign heterogeneous nodes to various segments with different ranges according to each node’s average bandwidth The data is then mapped to one node/range by pseudo-random hashing functions If the number of replicas is m, then m random numbers are generated for mapping m copies to different nodes/ranges PRS first assigns heterogeneous nodes to different segments in a number line. Each node is assigned to one or more segments considering its average bandwidth. After the segment assignment of nodes, the data is then mapped to one segment by pseudo random hash functions. We choose the pseudo random number mapping because it can distribute data proportional to the segment length. If the number of replicas is m, there are m random numbers generated for mapping m segments in the random number sequence. As a node may contain more data for higher performance, the replicas will be placed on other nodes if one or more storage nodes have no sufficient storage space for load balance. CCGrid-2017 11/16

Evaluation Evaluate PRS scheme based on Sheepdog Use SIMD-oriented Fast Mersenne Twister (SFMT) for pseudo numbers 32-nodes cluster with Linux kernel 2.6.32 16 storage nodes with half HDD nodes and half SSD nodes 16 client nodes hosting VMs Benchmark Run 1 to 16 clients to launch FIO on different virtual machines Each job accessed an independent file with 100MB in an asynchronous way with the request size of 256KB we present the evaluation results of the proposed PRS based on the Sheepdog storage system. The experiments were conducted on a local 32-node cluster, including 16 storage nodes with half HDD nodes and half SSD nodes, and 16 client nodes hosting virtual machines. We use the SFMT algorithm to generate pseudo random numbers to optimize replication placement. And we use multiple clients running on different nodes to perform read operations and get the aggregate bandwidth. We run 1 to 16 clients to launch FIO on different virtual machines. Each FIO instance has 16 jobs, in which each job accessed an independent file with 100MB in an asynchronous way with the request size of 256KB . CCGrid-2017 12/16

Evaluation (Cont.) Test procedure Comparison Make 3 replicas First run the benchmark for one time to collect I/O trace Then the objects that belong to identified access patterns are replicated asynchronously Comparison Consistent hashing algorithm CRUSH algorithm (straw buckets) Make 3 replicas In our tests, we first run the application for one time to collect I/O trace. Then the objects with identified data access patterns are replicated asynchronously, thus the application can benefit from the PRS for replica access in the future runs. We compare the PRS with the data replication scheme based on consistent hashing algorithm and straw buckets, a typical replication algorithm in CRUSH, in Sheepdog, with each one making 3 replicas. CCGrid-2017 13/16

FIO Performance As seen in the figures, compared with consistent hashing and CRUSH algorithms, the PRS scheme shows an improved bandwidth. It achieves a more rapid growth for performance with the increase of clients. As the PRS groups objects with identified access patterns, it can reduce the I/O access times and network communication cost. Simultaneously, the PRS scheme places replications in an optimized layout, which efficiently use the performance advantage of SSDs. (a) Random read performance (b) Sequential read performance Compared with consistent hashing and CRUSH algorithms, PRS shows an improved bandwidth CCGrid-2017 14/16

Summary Introduce a novel pattern-directed replication scheme (PRS) to achieve highly efficient data replication in heterogeneous object-based storage systems Build a prototype of PRS within the Sheepdog distributed storage system and carry out evaluation tests Experimental results show that PRS wells considers unique features of storage devices in a heterogeneous environment and significantly improve the performance So in summary, this work introduces a novel pattern-directed replication scheme to achieve highly efficient data replication in heterogeneous object-based storage systems. To verify the potential, we have built a prototype of the PRS within the Sheepdog distributed storage system and carried out evaluation tests. The experimental results show that the PRS is an effective replication scheme for heterogeneous storage systems. It well considers unique features of storage devices in a heterogeneous environment and significantly improves the performance. CCGrid-2017 15/16

Thank you Q&A For more information http://discl.cs.ttu.edu Thanks! This research is supported in part by the National Science Foundation under grant CNS-1162488, CNS-1338078, IIP-1362134, and CCF-1409946. CCGrid-2017 16/16