Efficient replica maintenance for distributed storage systems Byung-Gon Chun, Frank Dabek, Andreas Haeberlen, Emil Sit, Hakim Weatherspoon, M. Frans Kaashoek,

Efficient replica maintenance for distributed storage systems Byung-Gon Chun, Frank Dabek, Andreas Haeberlen, Emil Sit, Hakim Weatherspoon, M. Frans Kaashoek, John Kubiatowicz, and Robert Morris. 2006. 1

About the Paper 137 Citations (Google Scholar) In Proceedings of the 3rd conference on Networked Systems Design \& Implementation - Volume 3 (NSDI'06), Vol. 3. USENIX Association, Berkeley, CA, USA, 4-4. Research of OceanStore’s Project 2

Credit Modified version of http://nrl.iis.sinica.edu.tw/Web2.0/presentation/ER MfDSS.ppt 3

Outline Motivation Understanding durability Improving repair time Reducing transient costs Implementation Conclusion 4

Related terms Distributed Storage Systems Storage systems that aggregate the disks of many nodes scattered throughout the Internet 5

Motivation One of the most important goals of distributed storage system: Robust Solution? -> replication put Durability: objects that an application has put into the system are not lost due to disk failure get Availability: get will be able to return the object promptly 6

Motivation Failure Transient failure: availability Out of power Scheduled maintenance Network problem Permanent failure: durability Disk failure 7

Contribution Develop and Implement an algorithm Carbonite to store immutable objects durably and at a low bandwidth cost in a distributed storage systems Create replica fast enough to handle failure Keep track of all the replicas Use a model to determine a reasonable # of replications 8

Outline Motivation Understanding durability Improving repair time Reducing transient costs Implementation Conclusion 9 Model of the relationship between: Network capacity Amount of replicated data Number of replicas Durability

Providing Durability Durability is more practical and useful than availability Challenge to durability Create new replicas faster than losing them Reduce network bandwidth Distinguish transient failures from permanent disk failures Reintegration 10

Challenges to Durability Create new replicas faster than replicas are destroyed Creation rate < failure rate  system is infeasible Higher number of replicas do not allow the system to survive a high average failure rate Creation rate = failure rate + ε (ε is small)  bursts of failures may destroy all of the replicas 2015/6/22 11

Number of Replicas as a Birth-Death Process Assumption: independent exponential inter-failure and inter-repair times λ f : average failure rate μ i : average repair rate at state i r L : lower bound of number of replicas, is the target number of replicas in order to survive bursts of failures 2015/6/22 12

Model Simplification Fixed μ and λ  the equilibrium number of replicas is Θ = μ/ λ Higher values of Θ decrease the time it takes to repair an object 2015/6/22 13

Example PlanetLab 490 nodes Average inter-failure time 39.85 hours 150 KB/s bandwidth Assumption 500 GB per node r L = 3 λ = 365 day / 490 * (39.85 / 24) = 0.439 disk failures / year μ = 365 day / (500 GB * 3 / 150 KB/sec) = 3 disk copies / year Θ = μ/ λ = 6.85 2015/6/22 PlanetLab A large research testbed for computer networking and distributed system research with nodes located around the world Different organizations each donate one or more computers 14 λ depends on # of nodes Frequency of failure μ depends on Amount of data Bandwidth Θ = μ/ λ = constant * bandwidth*#nodes* inter-failure time /(amount of data*R L )

Impact of Θ 2015/6/22 Θ = 1 15 Bandwith ↑  μ ↑  Θ ↑ r L ↑  μ ↓  Θ ↓ Θ is the theoretical upper limit of extra replica number If Θ < 1, the system can no longer maintain full replication regardless of r L Θ = 1 Θ = constant * bandwidth/R L

Choosing r L Guidelines Large enough to ensure durability At least one more than the maximum burst of simultaneous failures Small enough r L, if a low value of r L would suffice, the bandwidth maintaining the extra replicas would be wasted. 2015/6/22 16

r L vs Durablility Higher r L means high cost but tolerates more bursts of failures Larger data size  λ ↑  need higher r L Analytical results from PlanetLab traces (4 years) 2015/6/22 17

Outline Motivation Understanding durability Improving repair time Reducing transient costs Implementation Conclusion 18 Proper placement of replicas on servers

Node’s Scope Defintion: Each node, n, designates a set of other nodes that can potentially hold copies of the objects that n is responsible for. We call the size of that set the node’s scope. scope є [r L, N] N: total number of nodes in the system 2015/6/22 19

Effect of Scope Small scope Easy to keep track of objects Needs more time to create new object Big scope Reduces repair time, thus increases durability Needs to monitor more nodes If large number of objects and random placement, may increase the likelihood of simultaneous failures 2015/6/22 20

Scope vs. Repair Time Scope ↑  repair work is spread over more access links and completes faster r L ↓  scope must be higher to achieve the same durability 2015/6/22 21

Outline Motivation Understanding durability Improving repair time Reducing transient costs Implementation Conclusion 22 Reduce bandwidth wasted due to the transient failure How to distinguish the failure, i.e., transient and permanent failure. Reintegrate replicas stored on nodes after transient failures.

Reintegration Reintegrate replicas stored on nodes after transient failures System must be able to track all the replicas 2015/6/22 23

Effect of Node Availability a: the average fraction of time that a node is available Pr [new replica needs to be created] = Pr [less than r L replicas are available] : Chernoff bound: 2r L /a replicas are needed to keep r L copies available 2015/6/22 24

Node Availability vs. Reintegration Reintegration can work safely with 2r L /a replicas 2/a is the penalty for not being able to distinguishing transient and permanent failures r L = 3 2015/6/22 25

Create replicas as needed Probability of making replicas depends on a Estimate a? 2015/6/22 26

Timeouts A heuristic to avoid misclassifying temporary failures as permanent setting a timeout can reduce response to transient failures but its success depends greatly on its relationship to the downtime distribution and can in some instances reduce durability as well. 27

Four Replication Algorithms Cates Fixed number of replicas r L Timeout Total Recall Batch (In addition to r L replicas, make e additional copies so that can make repair less frequent) Carbonite Timeout + reintegration Oracle Hypothetical system that can differentiate transient failures from permanent failures 2015/6/22 28

Comparison 2015/6/22 29

Outline Motivation Understanding durability Improving repair time Reducing transient costs Implementation Conclusion 30 Challenges Like Node Monitoring

Node Monitoring for Failure Detection Carbonite requires that each node know the number of available replicas of each object for which it is responsible The goal of monitoring is to allow the nodes to track the number of available replicas Challenges: Monitoring consistent hashing systems Monitoring host availability 2015/6/22 31

Outline Motivation Understanding durability Improving repair time Reducing transient costs Implementation Conclusion 32

Conclusion Described a set of techniques that allow wide- area systems to efficiently store and maintain large amounts of data Keep all data durable and uses 44% more network traffic than a hypothetical system that only responds to permanent failures. In comparison, Total Recall and DHash require almost a factor of two more network traffic than this hypothetical system. 2015/6/22 33

34 Questions or Comments? Thanks!

Efficient replica maintenance for distributed storage systems Byung-Gon Chun, Frank Dabek, Andreas Haeberlen, Emil Sit, Hakim Weatherspoon, M. Frans Kaashoek,

Similar presentations

Presentation on theme: "Efficient replica maintenance for distributed storage systems Byung-Gon Chun, Frank Dabek, Andreas Haeberlen, Emil Sit, Hakim Weatherspoon, M. Frans Kaashoek,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Efficient replica maintenance for distributed storage systems Byung-Gon Chun, Frank Dabek, Andreas Haeberlen, Emil Sit, Hakim Weatherspoon, M. Frans Kaashoek,

Similar presentations

Presentation on theme: "Efficient replica maintenance for distributed storage systems Byung-Gon Chun, Frank Dabek, Andreas Haeberlen, Emil Sit, Hakim Weatherspoon, M. Frans Kaashoek,"— Presentation transcript:

Similar presentations

About project

Feedback