Presentation is loading. Please wait.

Presentation is loading. Please wait.

StarFish: highly-available block storage Eran Gabber Jeff Fellin Michael Flaster Fengrui Gu Bruce Hillyer Wee Teck Ng Banu O¨ zden Elizabeth Shriver 2003.

Similar presentations


Presentation on theme: "StarFish: highly-available block storage Eran Gabber Jeff Fellin Michael Flaster Fengrui Gu Bruce Hillyer Wee Teck Ng Banu O¨ zden Elizabeth Shriver 2003."— Presentation transcript:

1 StarFish: highly-available block storage Eran Gabber Jeff Fellin Michael Flaster Fengrui Gu Bruce Hillyer Wee Teck Ng Banu O¨ zden Elizabeth Shriver 2003 USENIX Annual Technical Conference Presenter: D00922019 林敬棋

2 Introduction Important data need to be protected. ◦ Making replicas. Replication on remote sites ◦ Reduce the amount of data lost in failure. ◦ Decrease the time required to recover from catastrophic site failure.

3 StarFish A highly-available geographically-dispersed block storage system. ◦ Does not require expensive dedicated communication lines to all replicas to achieve highly-available. ◦ Achieves good performance even during recovery from a replica failure. ◦ Single-owner access semantics.

4 Architecture StarFish consists of ◦ One Host Element(HE)  Provides storage virtualization and read cache. ◦ N Storage Element(SE)  Q: write quorum size.  Synchronous updates to a quorum of Q SEs, and asynchronous updates to the rest.

5 Recommended Setup N = 3, Q = 2 MAN: Metropolitan Area Network WAN:Wide Area Network

6 Another Deployment

7 SE Recovery Write log ◦ HE keeps a circular buffer of recent writes. ◦ Each SE maintains a circular buffer of recent writes on a log disk. Three types of recovery ◦ Quick recovery ◦ Replay recovery ◦ Full recovery

8 Availability and Reliability Assume that the failure and recovery processes of the network links and SEs are i.i.d Poisson processes with combined mean failure and recovery rates of λ and μ per second. Similarly, the HE has Poisson-distributed λ he and μ he.

9 Availability The steady-state probability that at least Q SEs are available. Derived from the standard machine repairman mode.

10 Machine Repairman Model

11 Availability(cont.)

12 Availability(cont.)  X ★ 9 : the number of 9s in an availability measure. Achieve a much higher availability when N = 2Q + 1. For fixed N, availability decrease with larger quorum size. ◦ Increasing quorum size trades off availability for reliability.

13 Reliability The probability of no data loss. The reliability increases with larger Q. Two approaches ◦ Make Q > floor(N/2) and at least Q SEs are available.  Reduce availability and performance. ◦ Read-only consistency

14 Read-only Consistency Available in read-only mode during failure. ◦ Read-only mode obviates the need for Q SEs to be available to handle updates. ◦ Increase availability

15 Availability with Read-only Consistency

16 Observations If ρ he = 0, availability is independent of Q. ◦ Can always recover from HE. If ρ he increase, availability increase with Q. Largest increase occurs from Q = 1 to Q = 2, and bounded by 3/16 when ρ = 1. ◦ Diminishing gain after Q = 2. ◦ Suggest Q = 2 in practical system.

17 Availability with Read-only Consistency(cont.) N < 2 Q

18 Implementation

19 Performance Measurements Compares with a direct-attached RAID unit.

20 Settings Different network delays ◦ 1, 2, 4, 8, 23, 36, 65 ms Different bandwidth limitations ◦ 31, 51, 62, 93, 124 Mb/s. Benchmark: ◦ Micro-benchmark  Read hit  Read miss  Write ◦ PostMark

21 Effects of network delays and HE cache size Near SE delay: 4ms; Far SE delay: 8ms No cache miss if HE cache size = 400 MB

22 Observation Large HE cache improves performance. ◦ HE can respond to more read requests without communicating with SE.  Does not change write requests. ◦ Especially beneficial when local SE has significant delays. Q = 2 and 400MB cache size is not influenced by the delay to local SE. ◦ Depend on near SE.

23 Normal Operation and placement of the far SE  1-8: 1, 2, 4, 8 ms; 4-12: 4, 8, 12 ms  23-65: 23, 36, 65 ms; 31-124: 31,51,62,93,124 Mbps  Local SE delay: 0ms N = 3

24 Normal Operation and placement of the far SE(Cont.) N = 3 8 threads

25 Normal Operation and placement of the far SE(Cont.)

26 Observation Performance is influenced mostly by two parameters ◦ Write quorum size ◦ Delay to the SE. StarFish can provide adequate performance when one of the SEs is placed in a remote location. ◦ At least 85% of the performance of a direct- attached RAID.

27 Recovery Performance degrades more during full recovery.

28 Conclusion The StarFish system reveals significant benefits from a third copy of the data at an intermediate distance. A StarFish system with 3 replicas, a write quorum size of 2, and read-only consistency yields better than 99.9999% availability assuming individual Storage Element availability of 99%.


Download ppt "StarFish: highly-available block storage Eran Gabber Jeff Fellin Michael Flaster Fengrui Gu Bruce Hillyer Wee Teck Ng Banu O¨ zden Elizabeth Shriver 2003."

Similar presentations


Ads by Google