1 Elastically Replicated Information Services: Sustaining the Availability of Distributed Storage Across Dynamic Topological Changes Sponsored by Program.

Slides:



Advertisements
Similar presentations
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

2P13 Week 11. A+ Guide to Managing and Maintaining your PC, 6e2 RAID Controllers Redundant Array of Independent (or Inexpensive) Disks Level 0 -- Striped.
CS 795 – Spring  “Software Systems are increasingly Situated in dynamic, mission critical settings ◦ Operational profile is dynamic, and depends.
1 Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes Yunfeng Lin, Ben Liang, Baochun Li INFOCOM 2007.
Availability in Global Peer-to-Peer Systems Qin (Chris) Xin, Ethan L. Miller Storage Systems Research Center University of California, Santa Cruz Thomas.
1 Placement of Continuous Media in Wireless Peer-to-Peer Networks Shahram Ghadeharizadeh, Bhaskar Krishnamachari, Shanshan Song, IEEE Transactions on Multimedia,
Mutual Information Mathematical Biology Seminar
Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.
Dept. of Computer Science & Engineering, CUHK1 Trust- and Clustering-Based Authentication Services in Mobile Ad Hoc Networks Edith Ngai and Michael R.
1 Energy-Efficient localization for networks of underwater drifters Diba Mirza Curt Schurgers Department of Electrical and Computer Engineering.
Distributed Cluster Repair for OceanStore Irena Nadjakova and Arindam Chakrabarti Acknowledgements: Hakim Weatherspoon John Kubiatowicz.
An Authentication Service Against Dishonest Users in Mobile Ad Hoc Networks Edith Ngai, Michael R. Lyu, and Roland T. Chin IEEE Aerospace Conference, Big.
Present by Chen, Ting-Wei Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt,
Erasure Coding vs. Replication: A Quantiative Comparison
WSN Simulation Template for OMNeT++
1 Topology Design of Structured Campus Networks by Habib Youssef Sadiq M. SaitSalman A. Khan Department of Computer Engineering King Fahd University of.
Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures.
Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.
Enhancing TCP Fairness in Ad Hoc Wireless Networks Using Neighborhood RED Kaixin Xu, Mario Gerla University of California, Los Angeles {xkx,
Replica Placement Strategy for Wide-Area Storage Systems Byung-Gon Chun and Hakim Weatherspoon RADS Final Presentation December 9, 2004.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
Failures in the System  Two major components in a Node Applications System.
Distance Indexing on Road Networks A summary Andrew Chiang CS 4440.
Super-Resolution of Remotely-Sensed Images Using a Learning-Based Approach Isabelle Bégin and Frank P. Ferrie Abstract Super-resolution addresses the problem.
1 Assessment of Imprecise Reliability Using Efficient Probabilistic Reanalysis Farizal Efstratios Nikolaidis SAE 2007 World Congress.
Exploiting Application Semantics: Harvest, Yield CS 444A Fall 99 Software for Critical Systems Armando Fox & David Dill © 1999 Armando Fox.
The automation of generalized curves method presentation on the map at any scales Prof. Tadeusz Chrobak AGH University of Science and Technology Poland.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.
© 2011 Cisco All rights reserved.Cisco Confidential 1 APP server Client library Memory (Managed Cache) Memory (Managed Cache) Queue to disk Disk NIC Replication.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Presenter: Jonathan Murphy On Adaptive Routing in Wavelength-Routed Networks Authors: Ching-Fang Hsu Te-Lung Liu Nen-Fu Huang.
Brian Macpherson Ph.D, Professor of Statistics, University of Manitoba Tom Bingham Statistician, The Boeing Company.
Group 3 Sandeep Chinni Arif Khan Venkat Rajiv. Delay Tolerant Networks Path from source to destination is not present at any single point in time. Combining.
Problem Formulation Elastic cloud infrastructures provision resources according to the current actual demand on the infrastructure while enforcing service.
Probabilistic Coverage in Wireless Sensor Networks Authors : Nadeem Ahmed, Salil S. Kanhere, Sanjay Jha Presenter : Hyeon, Seung-Il.
Department of Computer Science Aruna Balasubramanian, Brian Neil Levine, Arun Venkataramani DTN Routing as a Resource Allocation Problem.
Davie 5/18/2010.  Thursday, May 20 5:30pm  Ursa Minor  Co-sponsored with CSS  Guest Speakers  Dr. Craig Rich – TBA  James Schneider – Cal Poly.
1 Presented By: Michael Bieniek. Embedded systems are increasingly using chip multiprocessors (CMPs) due to their low power and high performance capabilities.
Adaptive Hopfield Network Gürsel Serpen Dr. Gürsel Serpen Associate Professor Electrical Engineering and Computer Science Department University of Toledo.
A Robust Method for Lane Tracking Using RANSAC James Ian Vaughn Daniel Gicklhorn CS664 Computer Vision Cornell University Spring 2008.
Chap 7: Consistency and Replication
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
+ Clusters Alternative to SMP as an approach to providing high performance and high availability Particularly attractive for server applications Defined.
The IEEE International Conference on Cluster Computing 2010
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
11/02/2001 Workshop on Optical Networking 1 Design Method of Logical Topologies in WDM Network with Quality of Protection Junichi Katou Dept. of Informatics.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
DEPARTMENT/SEMESTER ME VII Sem COURSE NAME Operation Research Manav Rachna College of Engg.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
Movement-Based Check-pointing and Logging for Recovery in Mobile Computing Systems Sapna E. George, Ing-Ray Chen, Ying Jin Dept. of Computer Science Virginia.
Network Dynamics and Simulation Science Laboratory Structural Analysis of Electrical Networks Jiangzhuo Chen Joint work with Karla Atkins, V. S. Anil Kumar,
Performance Comparison of Ad Hoc Network Routing Protocols Presented by Venkata Suresh Tamminiedi Computer Science Department Georgia State University.
Seminar On Rain Technology
-1/16- Maximum Battery Life Routing to Support Ubiquitous Mobile Computing in Wireless Ad Hoc Networks C.-K. Toh, Georgia Institute of Technology IEEE.
Pouya Ostovari and Jie Wu Computer & Information Sciences
PERFORMANCE MANAGEMENT IMPROVING PERFORMANCE TECHNIQUES Network management system 1.
SEMINAR TOPIC ON “RAIN TECHNOLOGY”
 How do you know how long your design is going to last?  Is there any way we can predict how long it will work?  Why do Reliability Engineers get paid.
Overview Modern chip designs have multiple IP components with different process, voltage, temperature sensitivities Optimizing mix to different customer.
Vineet Mittal Should more be added here Committee Members:
Section 7 Erasure Coding Overview
Mean Value Analysis of a Database Grid Application
ElasticTree Michael Fruchtman.
Supporting Fault-Tolerance in Streaming Grid Applications
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
RAID RAID Mukesh N Tekwani
RAID RAID Mukesh N Tekwani April 23, 2019
Presentation transcript:

1 Elastically Replicated Information Services: Sustaining the Availability of Distributed Storage Across Dynamic Topological Changes Sponsored by Program for Research in Computing and Information Sciences and Engineering (PRECISE) NSF-EIA Grant Jose Torres-Berrocal Dr. Bienvenido Velez-Rivera Research in Process

2 Research Objective Develop a Method or Algorithm to dynamically sustain the availability of a distributed storage system over a desire threshold value while having topology changes.

3 Availability Definition  Availability generally refers to the probability (P) that a system is operating correctly at any given moment. AvailableFailed 1 - P P State Diagram

4 Definition Distributed Storage Cluster (DSC) Storage Node X0X0 XNXN Data Object A distributed storage cluster (DSC) comprises two or more storage nodes which function in a coordinated fashion as a single storage system. 0 N

5 Example of a DSC failures  When a node fails, the objects it contains become unavailable  Thus the SYSTEM becomes unavailable DSC with No Redundancy X1X1 X2X2 Failed Node System Fails due to missing object 12

6 50% Using Replication to Tolerate Failures on a DSC DSC with Redundancy X1X1 X1X1 X2X2 X2X2 Object Replicas Object In Failed Node Available at Another Node This is what RAID’s do Failed Node No

7 Storage Systems Must Adapt to Changes Internet Store 24/7 operation Dynamic Changes Unattended

8 Availability as nodes are added compared to desired threshold  Adding nodes changes topology.  Topology changes could change at any time affecting availability. A(t) Availability # Nodes f(#nodes) = ? Desirable g(#nodes) = Near Constant Threshold (Minimal tolerable availability)

9 Road Map  State the problem  Solution design constraints  Ongoing research  Previous work compliance  Preliminary conclusions

10 Design Constraints for Method desirability  Distributed Storage Management  24/7 operation  Minimal Redundancy  Works with Write intensive as well as Read intensive contexts  Minimum human intervention  Manage dynamic incidental changes due to the addition of nodes

11 Elastically Replicated Info Services Research Methodology  Develop a mathematical model for a Distributed Storage Cluster (DSC)  Develop simulator to derive system availability  Parameters  Mean Time to Failure (MTTF)  Provided by devices manufacturers  Object count  Node count  Redundancy  Node utilization  Test alternative algorithms

12 Math Model of a DSC DSC math model X0X0 0 DSC with 9 nodes/disks And 5 distinct objects X3X3 X4X4 X2X2 X0X0 X3X3 X2X2 X1X1 X1X Nodes/Disks ObjectsObjectsObjectsObjects

13 Uniform Distribution algorithm Uniform distribution. (a)DSC initial state. (b)DSC after adding one node. (c)DSC after adding next node. (d)Keep adding nodes until #nodes = #objects.

14 Centric algorithm Centric. (a)DSC initial state. (b)DSC will maintain objects location as initial state while adding nodes.

15 Utilization vs. Availability relationship Uniform distribution No Disk Minimum Availability (A) Maximum Utilization (U) Maximum Availability (A) Minimum Utilization (U) A U ? #Nodes

16 Extreme Algorithm Results Uniform distribution algorithm. Availability Decreases even with the use of redundancy Availability decreases rapidly as nodes are added by using Uniform distribution

17 DSC Hybrid Model – Redundancy Calculation DSC Matrix visualization – hybrid distribution. 10 original objects. 6 out of 10 copies

18 DSC Hybrid Model – Utilization Factor Calculation DSC Matrix visualization – hybrid distribution. 4 out of 10 nodes 2 out of 10 nodes

19 Hybrid Algorithm Results Up dist. variable and Down dist. constant. Up dist. Constant and Down dist. variable. Down Region Utilization parameter affects availability more than the Up region parameter Even though availability decreases, the family of curves follow a similar trend with no significant change

20 Hybrid and Extreme Algorithms comparison Hybrid plot is for u-50 d-5 at 50% red. Overall utilization decreases by using Centric algorithm Hybrid algorithm sustains availability longer than Uniform distribution Hybrid falls between Centric and Uniform in both parameters

21 Current Methods to Comply With Design Constraints  Consensus Based  Cache  RAID  Data Trading

22 Current methods compliance with design constraints DesignConstraintsGoal Current Method ERIS Consensus Based CacheRAID Data Trading Distributed Storage Management 24/7 operation Minimal Redundancy Works with Write intensive as well as Read intensive contexts Manage Dynamic changes due to the addition of nodes Minimum human intervention

23 Preliminary Conclusions   Availability decreases rapidly as nodes are added when using a constant replication value on the System and maximum usability   An ERIS type method is needed.   The utilization of the System is a counter part of the availability, meaning that at increasing utilization, decreasing availability.   What actually makes the system vulnerable in terms of utilization is that the more places where the objects can be located the more opportunity is to lose an object.   The region or group of nodes where the fewer replicas are is the predominant point of failure of the System (The chain breaks on the weakest link).

24

25

26 Current Methods Characteristics  Pre Dynamic Methods  Fit characteristics  Distributed Storage  Controlled Redundancy  Partial Fit characteristics  Works with Write intensive as well as Read intensive contexts – Depends on pre configured parameter according to a priori studies  Unfit characteristics  24/7 operation – Has to stop operation to allow changes to pre configuration parameters  Don’t manage dynamic incidental changes to any number of nodes  Not fully automatic

27 Consensus Based Characteristics

28 Cache Method Characteristics Network Node 3 9 Node 21 9 Node

29 RAID Characteristics

30 Node 8 Node 6 Node 3 A B C D B AC B Data Trading Characteristics

31 Simulator Validation Teoric vs. Simulator calibration curves.