Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Elastically Replicated Information Services: Sustaining the Availability of Distributed Storage Across Dynamic Topological Changes Sponsored by Program.

Similar presentations


Presentation on theme: "1 Elastically Replicated Information Services: Sustaining the Availability of Distributed Storage Across Dynamic Topological Changes Sponsored by Program."— Presentation transcript:

1 1 Elastically Replicated Information Services: Sustaining the Availability of Distributed Storage Across Dynamic Topological Changes Sponsored by Program for Research in Computing and Information Sciences and Engineering (PRECISE) NSF-EIA Grant 99-77071 Jose Torres-Berrocal Dr. Bienvenido Velez-Rivera Research in Process

2 2 Research Objective Develop a Method or Algorithm to dynamically sustain the availability of a distributed storage system over a desire threshold value while having topology changes.

3 3 Availability Definition  Availability generally refers to the probability (P) that a system is operating correctly at any given moment. AvailableFailed 1 - P P State Diagram

4 4 Definition Distributed Storage Cluster (DSC) Storage Node X0X0 XNXN Data Object A distributed storage cluster (DSC) comprises two or more storage nodes which function in a coordinated fashion as a single storage system. 0 N

5 5 Example of a DSC failures  When a node fails, the objects it contains become unavailable  Thus the SYSTEM becomes unavailable DSC with No Redundancy X1X1 X2X2 Failed Node System Fails due to missing object 12

6 6 50% Using Replication to Tolerate Failures on a DSC DSC with Redundancy X1X1 X1X1 X2X2 X2X2 Object Replicas Object In Failed Node Available at Another Node This is what RAID’s do Failed Node No

7 7 Storage Systems Must Adapt to Changes Internet Store 24/7 operation Dynamic Changes Unattended

8 8 Availability as nodes are added compared to desired threshold  Adding nodes changes topology.  Topology changes could change at any time affecting availability. A(t) Availability # Nodes f(#nodes) = ? Desirable g(#nodes) = Near Constant Threshold (Minimal tolerable availability)

9 9 Road Map  State the problem  Solution design constraints  Ongoing research  Previous work compliance  Preliminary conclusions

10 10 Design Constraints for Method desirability  Distributed Storage Management  24/7 operation  Minimal Redundancy  Works with Write intensive as well as Read intensive contexts  Minimum human intervention  Manage dynamic incidental changes due to the addition of nodes

11 11 Elastically Replicated Info Services Research Methodology  Develop a mathematical model for a Distributed Storage Cluster (DSC)  Develop simulator to derive system availability  Parameters  Mean Time to Failure (MTTF)  Provided by devices manufacturers  Object count  Node count  Redundancy  Node utilization  Test alternative algorithms

12 12 Math Model of a DSC DSC math model X0X0 0 DSC with 9 nodes/disks And 5 distinct objects X3X3 X4X4 X2X2 X0X0 X3X3 X2X2 X1X1 X1X1 876 534 21 Nodes/Disks ObjectsObjectsObjectsObjects012345678 0100001000 1010000100 2001000010 3000100001 4000010000

13 13 Uniform Distribution algorithm Uniform distribution. (a)DSC initial state. (b)DSC after adding one node. (c)DSC after adding next node. (d)Keep adding nodes until #nodes = #objects.

14 14 Centric algorithm Centric. (a)DSC initial state. (b)DSC will maintain objects location as initial state while adding nodes.

15 15 Utilization vs. Availability relationship Uniform distribution No Disk Minimum Availability (A) Maximum Utilization (U) Maximum Availability (A) Minimum Utilization (U) A U ? #Nodes

16 16 Extreme Algorithm Results Uniform distribution algorithm. Availability Decreases even with the use of redundancy Availability decreases rapidly as nodes are added by using Uniform distribution

17 17 DSC Hybrid Model – Redundancy Calculation DSC Matrix visualization – hybrid distribution. 10 original objects. 6 out of 10 copies

18 18 DSC Hybrid Model – Utilization Factor Calculation DSC Matrix visualization – hybrid distribution. 4 out of 10 nodes 2 out of 10 nodes

19 19 Hybrid Algorithm Results Up dist. variable and Down dist. constant. Up dist. Constant and Down dist. variable. Down Region Utilization parameter affects availability more than the Up region parameter Even though availability decreases, the family of curves follow a similar trend with no significant change

20 20 Hybrid and Extreme Algorithms comparison Hybrid plot is for u-50 d-5 at 50% red. Overall utilization decreases by using Centric algorithm Hybrid algorithm sustains availability longer than Uniform distribution Hybrid falls between Centric and Uniform in both parameters

21 21 Current Methods to Comply With Design Constraints  Consensus Based  Cache  RAID  Data Trading

22 22 Current methods compliance with design constraints DesignConstraintsGoal Current Method ERIS Consensus Based CacheRAID Data Trading Distributed Storage Management 24/7 operation Minimal Redundancy Works with Write intensive as well as Read intensive contexts Manage Dynamic changes due to the addition of nodes Minimum human intervention

23 23 Preliminary Conclusions   Availability decreases rapidly as nodes are added when using a constant replication value on the System and maximum usability   An ERIS type method is needed.   The utilization of the System is a counter part of the availability, meaning that at increasing utilization, decreasing availability.   What actually makes the system vulnerable in terms of utilization is that the more places where the objects can be located the more opportunity is to lose an object.   The region or group of nodes where the fewer replicas are is the predominant point of failure of the System (The chain breaks on the weakest link).

24 24

25 25

26 26 Current Methods Characteristics  Pre Dynamic Methods  Fit characteristics  Distributed Storage  Controlled Redundancy  Partial Fit characteristics  Works with Write intensive as well as Read intensive contexts – Depends on pre configured parameter according to a priori studies  Unfit characteristics  24/7 operation – Has to stop operation to allow changes to pre configuration parameters  Don’t manage dynamic incidental changes to any number of nodes  Not fully automatic

27 27 Consensus Based Characteristics

28 28 Cache Method Characteristics Network Node 3 9 Node 21 9 Node 20 9 9 9 9

29 29 RAID Characteristics

30 30 Node 8 Node 6 Node 3 A B C D B AC B Data Trading Characteristics

31 31 Simulator Validation Teoric vs. Simulator calibration curves.


Download ppt "1 Elastically Replicated Information Services: Sustaining the Availability of Distributed Storage Across Dynamic Topological Changes Sponsored by Program."

Similar presentations


Ads by Google