1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.

Slides:



Advertisements
Similar presentations
The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
Advertisements

Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.
Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.
Web Server Benchmarking Using the Internet Protocol Traffic and Network Emulator Carey Williamson, Rob Simmonds, Martin Arlitt et al. University of Calgary.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Performance Engineering Methodology Chapter 4. Performance Engineering Performance engineering analyzes the expected performance characteristics of a.
1 The Case for Versatile Storage System NetSysLab The University of British Columbia Samer Al-Kiswany, Abdullah Gharaibeh, Matei Ripeanu.
Chapter 9 Designing Systems for Diverse Environments.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Are P2P Data-Dissemination Techniques Viable in Today's Data- Intensive Scientific Collaborations? Samer Al-Kiswany – University of British Columbia joint.
Asper School of Business University of Manitoba Systems Analysis & Design Instructor: Bob Travica System architectures Updated: November 2014.
1 stdchk : A Checkpoint Storage System for Desktop Grid Computing Matei Ripeanu – UBC Sudharshan S. Vazhkudai – ORNL Abdullah Gharaibeh – UBC The University.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
H-1 Network Management Network management is the process of controlling a complex data network to maximize its efficiency and productivity The overall.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Data Deduplication in Virtualized Environments Marc Crespi, ExaGrid Systems
Windows ® Powered NAS. Agenda Windows Powered NAS Windows Powered NAS Key Technologies in Windows Powered NAS Key Technologies in Windows Powered NAS.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu.
1 The Google File System Reporter: You-Wei Zhang.
Version 4.0. Objectives Describe how networks impact our daily lives. Describe the role of data networking in the human network. Identify the key components.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
Emalayan Vairavanathan
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
Experience with Using a Performance Predictor During Development a Distributed Storage System Tale Lauro Beltrão Costa *, João Brunet +, Lile Hattori #,
1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.
Why GridFTP? l Performance u Parallel TCP streams, optimal TCP buffer u Non TCP protocol such as UDT u Order of magnitude greater l Cluster-to-cluster.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
University of Kaiserslautern Department of Computer Science Integrated Communication Systems ICSY License4Grid: Adopting DRM for Licensed.
Moving Large Amounts of Data Rob Schuler University of Southern California.
Distributed File System By Manshu Zhang. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
1 MosaStore -A Versatile Storage System Lauro Costa, Abdullah Gharaibeh, Samer Al-Kiswany, Matei Ripeanu, Emalayan Vairavanathan, (and many others from.
Investigating Survivability Strategies for Ultra-Large Scale (ULS) Systems Vanderbilt University Nashville, Tennessee Institute for Software Integrated.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.
1 Enabling Efficient and Reliable Transitions from Replication to Erasure Coding for Clustered File Systems Runhui Li, Yuchong Hu, Patrick P. C. Lee The.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
VMware vSphere Configuration and Management v6
CEDPS Data Services Ann Chervenak USC Information Sciences Institute.
Network design Topic 6 Testing and documentation.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
Tackling I/O Issues 1 David Race 16 March 2010.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
Seminar On Rain Technology
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Demystifying Deduplication
StoRM Architecture and Daemons
Introduction to Networks
Large Scale Test of a storage solution based on an Industry Standard
Direct Attached Storage and Introduction to SCSI
A Software-Defined Storage for Workflow Applications
Nolan Leake Co-Founder, Cumulus Networks Paul Speciale
Presentation transcript:

1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu

2 Motivating Example: GridFTP Server Motivation: reduce the cost of GridFTP server while maintaining performance and reliability  A high-performance data transfer protocol  Widely used in data-intensive scientific communities  Typical deployments employ cluster-based storage systems

3 The Solution in a Nutshell A hybrid architecture: combines scavenged and dedicated, low bandwidth storage Features:  Low cost  Reliable  High performance

4 Outline  The Opportunity  The Solution

5 The Opportunity  Scavenging idle storage  High percentage of available idle space (e.g., ~50% at Microsoft, ~60% at ORNL)  Well-connected machines  Decoupling the two components of data reliability, durability and availability  Durability is more important than availability  Relax availability to reduce overall reliability overhead

6 The Solution: Internal Design  Scavenged nodes:  Maintain n replicas  Replication bandwidth bMbps  Durable component:  Durably maintain one replica  Replication bandwidth BMbps  Logically centralized metadata service  Clients access the system via the scavenged nodes only b b b B => Object is available when at least one replica exist at the scavenged nodes

7 Features Revisited  Low cost  Idle resources  low-cost durable component  Reliable  Supports full durability  Configurable availability  High-performance  Aggregates multiple I/O channels  Decouples data and metadata management b b b B

8 Outline  Availability Study  Performance Evaluation: GridFTP Server

9 Availability Study Questions:  What is the advantage of having a durable component?  What is the impact of parameter constraints (e.g., replication level and bandwidth) on availability and overhead?  What replica placement scheme enables maximum availability? To address these questions:  analytical model  low-level simulator Question: Tool:

10 What is the advantage of adding a durable component?  Evaluate the durability of the symmetric architecture  Compare the replication overhead  Evaluate the availability of the hybrid architecture

11 Durability of Symmetric Architecture  Durability decreases when increasing storage load  Minimum configuration to support full durability => n = 8 b = 8Mbps n = replication level, b = replication bandwidth

12 Overhead: Hybrid vs. Symmetric Architecture Symmetric Architecture: n = 8 replicas, b = 8Mbps Hybrid Architecture: n = 4 replicas, b = 2Mbps, B = 1Mbps Configuration: Hybrid (Mbps) Symmetric (Mbps) Mean Median th per Maximum8926,472 Advantages of adding durable component :  Reduces amount of replication traffic ~ 2.5 times  Reduces the peak bandwidth ~ 7 times  Reduces replication traffic variability  Increases storage efficiency 50%

13 Availability of Hybrid Architecture Configuration: Configuration: n = 4 replicas, b = 2Mbps, B = 1Mbps The hybrid system is able to support acceptable availability

14 Outline  Availability Study  Performance Evaluation: GridFTP Server

15 A Scavenged GridFTP Server Main challenge: transparent integration of legacy components Prototype Components  Globus’ GridFTP Server  MosaStore scavenged sotrage system

16 Scavenged GridFTP Software Components Server AServer B

17 Evaluation -- Throughput Throughput for 40 clients reading 100 files of 100MB each. The GridFTP server is supported by 10 storage nodes each connected at 1Gbps. Ability to support an intense workload: => 60% increase in aggregate throughput

18 Summary and Contributions This study demonstrates a hybrid storage architecture that combines scavenged and durable storage Contributions:  Integrating scavenged with low-bandwidth durable storage  Tools to provision the system:  Analytical model => course grained prediction  Low-level simulator => detailed predictions  A prototype implementation => demonstrates high-performance Features:  Reliable – full durability, configurable availability  Low-cost - built atop scavenged resources  Offers high-performance throughput

19

20 Standard Deployments: Data Locality Limitation Explained Server A Server B

21 The Solution: Limitations  Lower availability: trade-off availability for stronger durability and lower maintenance overhead  Asymmetric system: the hybrid nature of the system may increase its complexity  The system mostly benefit read- dominant workloads: due to the limited bandwidth of the durable node

22 Another Usage Scenario A data-store geared towards read-mostly workload: photo-sharing web services (e.g., Flickr, Facebook)

23 Analytical Modeling (1) the number of replicas is modeled using a Markov chain model, assume exponentially distributed  and λ. => Can be analyzed analytically as an M/M/K/K queue. Each state represents the number of available replicas at the volatile nodes. The rate λ0 depends on the durable node’s bandwidth. Where ρ = λ/ , γ = λ 0 / 

24 Analytical Modeling (2)  Limitations:  The model does not capture transient failures  The model assumes exponentially distributed replica repair and life times  The model analyzes the state of a single object  Advantages:  unveils the key relationships between system characteristics  offers a good approximation for availability which enables validating the simulator

25 Distribution of Availability maintenance overhead What is the effect of having one replica stored on a medium with low access rate on the resulting maintenance overhead and availability? Configuration: Configuration: n = 4 replicas, b = 2Mbps, B = 1Mbps Storage load (TB) Mean5.8* * * * th percentile 004.7* *10 -3 Maximum (worst) 1.1* * * *10 -1

26 Impact of Durable Node Replication Bandwidth Durable node’s bandwidth (B) 1 Mbps2 Mbps4 Mbps8 Mbps Mean1.90* * * * th percentile 5.17* * * *10 -4 Maximum4.93* * * *10 -3 Durable node’s bandwidth (B) 1 Mbps2 Mbps4 Mbps8 Mbps Mean41 99 th percentile Maximum Statistics of Unavailability Statistics of Aggregate Replication Bandwidth

27 Impact of Scavenged Nodes Replication Bandwidth Volatile nodes’ bandwidth (b) 1 Mbps2 Mbps4 Mbps8 Mbps Mean7.07* * * * th percentile1.18* * * *10 -5 Maximum6.07* * * *10 -3 Volatile nodes’ bandwidth (b) 1 Mbps2 Mbps4 Mbps8 Mbps Mean th percentile Maximum ,8643,616 Statistics of Unavailability Statistics of Aggregate Replication Bandwidth

28 Impact of Replication Level Replication level (n) 3456 Mean1.97* * * * th percentile5.17* * Maximum4.93* * * *10 -4 Replication level (n) 3456 Mean th percentile Maximum Statistics of Unavailability Statistics of Aggregate Replication Bandwidth