Download presentation
Presentation is loading. Please wait.
Published byAlisha Scott Modified over 6 years ago
1
Authors Alessandro Duminuco, Ernst Biersack Taoufik and En-Najjary
Proactive Replication in Distributed Storage Systems Using Machine Availability Estimation Authors Alessandro Duminuco, Ernst Biersack Taoufik and En-Najjary Presented by Xiaoyu Sun 9/21/2018
2
outline Motivation Goal of this paper An adaptive control problem
Impact of estimation time A hybrid scheme for availability Validation Experiment Conclusion What system we are talking about? Why the problem which this paper try to handle is important? What approaches people already used? The disadvantage and advantage of these approaches 9/21/2018
3
Motivation Peer-to-Peer based distributed storage system Service guarantees What is p2p? What implementation p2p can do? content delivery, networking, search (Yacy a free distributed search engine ) 9/21/2018
4
Motivation Durability Availability
once stored, data are never lost, although the data may not be available all the time Availability assures that data can be retrieved in any moment 9/21/2018
5
Motivation Methods used for Redundancy in Storage System
Replication of the original data parity encoding of the original data 9/21/2018
6
Motivation Types of failure behavior Permanent failure behavior;
To copy with permanent failures and to assure durability; Transient failure behavior; Reintegrated in the system; Hints: from the traces of peer availability, we know that temporary disconnections are much more frequent than permanent ones. 9/21/2018
7
Motivation Advantage of Reactive approach
Adaptiveness availability Disadvantage of Reactive approach Waste of resources Bursty use of resources 9/21/2018
8
Motivation Advantage of Proactive approach
A fixed repair rate Smooth the resource usage Disadvantage of Proactive approach Fail to handle the changing failure behaviors Durability compromised 9/21/2018
9
Goal of this paper Durability Adaptiveness
A limited network bandwidth Durability Adaptiveness Maximize the smoothness of the repair rate Maximize the smoothness of the bandwidth needed for the repairs Degraded performance of other activities 9/21/2018
10
An adaptive control problem
Periodically infer the failure behavior of the peers The number of available peers at time t Repair rate is a time dependent signal ∆T observation period used by estimator and the interval between two updates of R(t) Signal the occurrence a repair or a reconnection 9/21/2018
11
The system model Connected state Temporarily disconnected state
Abandon state µ: Single peer disconnection rate. A session time : Single peer reconnection rate. Repair rate is a time dependent signal A disconnection time P: abandon probability 9/21/2018
12
The system model Q1 represents the peers in the connected state
G/G/1 first G stands for probability distribution of the inter arrival times Second G stands for the probability distribution of the service times 1 stands for the number of servers M stands for Exponential probability density G any arbitrary probability distribution D all customers have the same value Q2 represents the peers in the disconnected state 9/21/2018
13
The system model L = λW The long-term average number of customers in a stable system L is equal to the long-term average arrival rate, λ, multiplied by the long-term average time a customer spends in the system, W; or expressed algebraically: L = λW 9/21/2018
14
The estimator The estimator is to estimate two parameters μ and P
9/21/2018
15
The estimator 9/21/2018
16
The controller 9/21/2018
17
Impact of estimation time
The estimation time ∆T is the most crucial parameters of this model. Impact on bandwidth usage ∆T ∆T=0 Robustness of the Estimation One tries to push system reactivity too much The time needed to estimate the parameters is dynamic Any different choice would make the controller follow short term fluctuation Cause: uneven use the bandwidth resources correlated failures of many nodes where most of available fragments will suddenly disappear to choose the maximum ¢T that divides the time in segments in which the system can be approximated as being statistically stable 9/21/2018
18
Impact of estimation time
The implementation of this paper does not fix ∆T D means the average number of disconnections observed during an estimation period 9/21/2018
19
A hybrid scheme for availability
The objective of the controller is to make the repair rate equal to the rate of permanent failures. Define a threshold here THpro If the number of available fragments hits a lower THpro, the system switches to a purely reactive scheme. 9/21/2018
20
Validation System Model Validation 9/21/2018
21
Validation 9/21/2018
22
Validation Estimator Validation
The convergence time depends on μ. This leads us to say that in a changing environment we cannot use a constant estimation period, but instead ¢TD should be adapted to the order of magnitude of the parameter μ as we did in eq. (8). 9/21/2018
23
Validation Controller Validation 9/21/2018
24
Validation Upper left lower right 9/21/2018
25
Experiments Goal of experiments The capacity to assure durability ;
The smoothness of the repair rate; 9/21/2018
26
Experiments the cost of the reactive scheme is a bursty repair
activity 9/21/2018
27
Experiments D is too big with respect to the
parameter dynamics and the estimator is not able to cope with their changes too small values of D the estimation is not reliable and for too big values the distribution of the number of available fragments degrades too much 9/21/2018
28
Experiments 9/21/2018
29
Experiments 9/21/2018
30
Experiments 9/21/2018
31
Conclusion This system combines the resilience of reactive schemes with the smoothness of proactive schemes. Validated the proposed scheme and demonstrated its effectiveness using synthetic data 9/21/2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.