Download presentation
Presentation is loading. Please wait.
1
Replica Placement Strategy for Wide-Area Storage Systems Byung-Gon Chun and Hakim Weatherspoon RADS Final Presentation December 9, 2004
2
Final Presentation:2 Environment Store large quantities of data persistently and availably Storage Strategy –Redundancy - duplicate data to protect against data loss –Place data throughout wide area for availability and durability Avoid correlated failures –Continuously repair loss redundancy as needed Detect permanent node failures and trigger data recovery
3
Final Presentation:3 Assumptions Data is maintained on nodes, in the wide area, and in well maintained sites. Sites contribute resources –Nodes (storage, cpu) –Network - bandwidth Nodes collectively maintain data –Adaptive - Constant change, Self-organizing, self-maintaining Costs –Data Recovery Process of maintaining data availability –Limit wide area bandwidth used to maintain data
4
Final Presentation:4 Challenge Avoiding correlated failures/downtime with careful data placement –Minimize cost of resources used to maintain data Storage Bandwidth –Maximize Data availability
5
Final Presentation:5 Outline Analysis of correlated failures –Show that correlated failures exist - are significant Effects of common subnet (admin area, geographic location, etc) –Pick a threshold and extra redundancy Effects of extra redundancy –Vary extra redundancy –Compare random, random w/ constraint, and oracle placement –Show that margin between oracle and random is small
6
Final Presentation:6 Analysis of PlanetLab Trace characteristics Trace-driven simulation Model maintaining data on PlanetLab Create trace using all-pairs ping* –Collected from February 16, 2003 to October 6, 2004 Measure –Correlated failures v. time –Probability of k nodes down simultaneously –{5th Percentile, Median} number of available replicas v. time –Cumulative number of triggered data recovery v. time *Jeremy Stribling http://infospect.planet-lab.org/pings
7
Final Presentation:7 Analysis of PlanetLab II Correlated failures
8
Final Presentation:8 Analysis I - Node characteristics
9
Final Presentation:9 Analysis II- Correlated Failures
10
Final Presentation:10 Correlated Failures
11
Final Presentation:11 Correlated Failures (machine with downtime <= 1000 slots)
12
Final Presentation:12 Availability Trace
13
Final Presentation:13 Replica Placement Strategies Random RandomSite –Avoid to place multiple replicas in the same site –A site in PlanetLab is identified by 2B IP address prefix. RandomBlacklist –Avoid to use machines, in blacklist, that are top k machines with long down time RandomSiteBlacklist –Combine RandomSite and RandomBlacklist
14
Final Presentation:14 Comparison of simple strategies (m=1, th=9, n=14, |blacklist|=35) StrategyRandomRandom Site Random Blacklist Random Site Blacklist # of repairs 9075858186918160 Improve ment (%) 5.444.2310.08
15
Final Presentation:15 Simulation setup Placement Algorithm –Random vs. Oracle –Oracle strategies Max-Lifetime-Availability Min-Max-TTR, Min-Sum-TTR, Min-Mean-TTR Simulation Parameters –Replication m = 1, threshold th = 9, total replicas n = 15 –Initial repository size 2TB –Write rate 1Kbps per node and 10Kbps per node 300 storage nodes System increases in size at rate of 3TB and 30TB per year, respective. Metrics –Number of available nodes –Number of data repairs
16
Final Presentation:16 Comparison of simple strategies(m=1, th=9)
17
Final Presentation:17 Results - Random Placement (1Kbps)
18
Final Presentation:18 Results - Oracle Max-Lifetime-Avail (1Kbps)
19
Final Presentation:19 Results - Breakdown of Random (1Kbps)
20
Final Presentation:20 Results - Random (10Kbps)
21
Final Presentation:21 Results - Breakdown of Random (10Kbps)
22
Final Presentation:22 Conclusion There does exist correlated downtimes. Random is sufficient –A minimum data availability threshold and extra redundancy is sufficient to absorb most correlation.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.