Fabián E. Bustamante, Fall 2005 Efficient Replica Maintenance for Distributed Storage Systems B-G Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon,

Fabián E. Bustamante, Fall 2005 Efficient Replica Maintenance for Distributed Storage Systems B-G Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon, M. Kaashoek, J. Kubiatowicz, and R. Morris, In Proc. of NSDI, May 2006. Presenter: Fabián E. Bustamante

EECS 443 Advanced Operating Systems Northwestern University 2 Replication in Wide-Area Storage Applications put & get objects in/from the wide-area storage system Objects are replicated for –Availability Get on an object will return promptly –Durability Object put by the app are not lost due to disk failures –An object may be durably stored but not immediately available

EECS 443 Advanced Operating Systems Northwestern University 3 Goal: durability at low bandwidth cost Durability is a more practical & useful goal Threat to durability –Loose the last copy of an object –So, create copies faster than they are destroyed Challenges –Replication can eat your bandwidth –Hard to distinguish bet/ transient & permanent failure –After recover, some replicas may be in nodes the lookup algorithm does not check Paper presents Carbonite – efficient wide-area replication technique for durability

EECS 443 Advanced Operating Systems Northwestern University 4 System Environment Use PlanetLab (PL) as representative –>600 nodes distributed world-wide –History traces collected by CoMon project (every 5’) –Disk failures from event logs of PlanetLab Central Synthetic traces –632 nodes as PL –Failure inter-arrival times from exponential dist. (mean session time and downtime as in PL) –Two years instead of one and avg node lifetime of 1 year Simulation –Trace-driven event-based simulator –Assumptions Network paths are independent All nodes reachable from all other nodes Each node with same link capacity Dates3/1/05-2/28/06 Hosts632 Transient failures21355 Disk failures219 Transient host downtime (s) (median,avg,90th) 1208, 104647, 14242 Any failure interarrival (s)305, 1467, 3306 Disk failure interarrival (s)544411, 143476, 490047

EECS 443 Advanced Operating Systems Northwestern University 5 Understanding durability To handle some avg. rate of failure – create new replicas faster than they are destroyed –Function of per-node access link, number of nodes, amount of data stored per node Infeasible system – unable to keep pace w/ avg. failure rate – will eventually adapt by discarding objects (which ones?) If creation rate is just above failure rate – failure burst may be a problem Target replicas to maintain – r L Durability does not increased continuously with r L

EECS 443 Advanced Operating Systems Northwestern University 6 Improving repair time Scope – set of other nodes that can hold copies of the objects a node is responsible for Small scope –Easier to keep track of copies –Effort of creating copies fall on a small set of nodes –Addition of nodes may result on needless copying of objects (when combined w/ consistent hashing) Large scope –Spread work among more nodes –Network traffic source/ destination are spread –Temp failures will be noticed by more nodes

EECS 443 Advanced Operating Systems Northwestern University 7 Reducing transient costs Impossible to distinguish transient/permanent failures To minimize net traffic due to transient failures: reintegrate replicas Carbonite –Selecet a suitable value for r L –Respond to detected failure by creating new replica –Reintegrate replicas Bytes sent by different maintenance algorithms

EECS 443 Advanced Operating Systems Northwestern University 8 Reducing transient costs Bytes sent w/ and w/o reintegration Impact of timeouts on bandwidth and durability

EECS 443 Advanced Operating Systems Northwestern University 9 Assumptions The PlanetLab testbed can be seen as representative of something Immutable data Relatively stable system membership & data loss driven by disk failures Disk failures are uncorrelated Simulation –Network paths are independent –All nodes reachable from all other nodes –Each node with same link capacity

Fabián E. Bustamante, Fall 2005 Efficient Replica Maintenance for Distributed Storage Systems B-G Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon,

Similar presentations

Presentation on theme: "Fabián E. Bustamante, Fall 2005 Efficient Replica Maintenance for Distributed Storage Systems B-G Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fabián E. Bustamante, Fall 2005 Efficient Replica Maintenance for Distributed Storage Systems B-G Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon,

Similar presentations

Presentation on theme: "Fabián E. Bustamante, Fall 2005 Efficient Replica Maintenance for Distributed Storage Systems B-G Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon,"— Presentation transcript:

Similar presentations

About project

Feedback