Presentation is loading. Please wait.

Presentation is loading. Please wait.

Low-Overhead Byzantine Fault-Tolerant Storage James Hendricks, Gregory R. Ganger Carnegie Mellon University Michael K. Reiter University of North Carolina.

Similar presentations


Presentation on theme: "Low-Overhead Byzantine Fault-Tolerant Storage James Hendricks, Gregory R. Ganger Carnegie Mellon University Michael K. Reiter University of North Carolina."— Presentation transcript:

1 Low-Overhead Byzantine Fault-Tolerant Storage James Hendricks, Gregory R. Ganger Carnegie Mellon University Michael K. Reiter University of North Carolina at Chapel Hill

2 James Hendricks, 2 Motivation As systems grow in size and complexity… Must tolerate more faults, more types of faults Modern storage systems take ad-hoc approach Not clear which faults to tolerate Instead: tolerate arbitrary (Byzantine) faults But, Byzantine fault-tolerance = expensive? Fast reads, slow large writes

3 James Hendricks, 3 Write bandwidth Number of faults tolerated (f) Crash fault-tolerant erasure-coded storage (non-Byzantine) Replicated Byzantine fault-tolerant storage Bandwidth (MB/s) Low-overhead erasure-coded Byzantine fault-tolerant storage f +1-of-2f +1 erasure-coded f +1 replicas 3f +1 replicas

4 James Hendricks, 4 Summary of Results We present a low overhead Byzantine fault- tolerant erasure-coded block storage protocol Write overhead: 2-round + crypto. checksum Read overhead: cryptographic checksum Performance of our Byzantine-tolerant protocol nearly matches that of protocol that tolerates only crashes Within 10% for large enough requests

5 James Hendricks, 5 B Erasure codes An m-of-n erasure code encodes block B into n fragments, each size |B|/m, such that any m fragments can be used to reconstruct block B d1d1 d2d2 d3d3 d4d4 d5d5 3-of-5 erasure code d 1, …, d 5 ← encode(B) Read d 1, d 2, d 3 B ← decode(d 1, d 2, d 3 ) Read d 1, d 3, d 5 B ← decode(d 1, d 3, d 5 )

6 James Hendricks, 6 Design of Our Protocol

7 James Hendricks, 7 Parameters and interface Parameters f : Number of faulty servers tolerated m ≥ f + 1: Fragments needed to decode block n = m + 2f ≥ 3f + 1: Number of servers Interface: Read and write fixed-size blocks Not a filesystem. No metadata. No locking. No access control. No support for variable- sized reads/writes. A building block for a filesystem

8 James Hendricks, 8 2-of-4 Erasure code Send fragment to each server Write protocol: prepare & commit B Response with cryptographic token Forward tokens Return status d3d3 d1d1 d2d2 d4d4 PrepareCommit

9 James Hendricks, 9 Read protocol: find & read B Request timestamps Propose timestamp Read at timestamp Return fragments d1d1 d2d2 Find timestampsRead at timestamp

10 James Hendricks, 10 Read protocol common case Request timestamps and optimistically read fragments Return fragments and timestamp B d1d1 d2d2

11 James Hendricks, 11 Issue 1 of 3: Wasteful encoding Erasure code m + 2 f Hash m + 2 f Send to each server But only wait for m + f responses  This is wasteful! d1d1 d2d2 d3d3 d4d4 B m: Fragments needed to decode f: Number of faults tolerated

12 James Hendricks, 12 Solution 1: Partial encoding Instead: -Erasure code m+f -Hash m+f -Hear from m+f Pro: Compute f fewer frags Con: Client may need to send entire block on failure -Should happen rarely d1d1 d2d2 d3d3 B

13 James Hendricks, 13 Issue 2: Block must be unique Fragments must comprise a unique block If not, different readers read different blocks Challenge: servers don’t see entire block Servers can’t verify hash of block Servers can’t verify encoding of block given hashes of fragments

14 James Hendricks, 14 Sol’n 2: Homomorphic fingerprinting Fragment is consistent with checksum if hash and homomorphic fingerprint [PODC07] match Key property: Block decoded from consistent fragments is unique hash4 hash3 hash2 hash1 fp2 fp1 d′ 4 hash4′ fp4 fp4′

15 James Hendricks, 15 Issue 3: Write ordering Reads must return most recently written block -Required for linearizability (atomic) -Faulty server may propose uncommitted write -Must be prevented. Prior approaches: 4f +1 servers, signatures, or 3+ round writes Our approach: -3f +1 servers, MACs, 2 round writes

16 James Hendricks, 16 Solution 3: Hashes of nonces Client Servers Prepare: Store hash(nonce) Return nonce nonce Collect nonces Commit: store nonces Find timestamps Return timestamp, nonces Return nonce_hash with fragment Read at timestamp Prepare nonces Compare hash(nonce) with nonce_hash Write Read

17 James Hendricks, 17 Bringing it all together: Write -Erasure code m+f fragments -Hash & fingerprint fragments -Send to first m+f servers -Verify hash, fingerprint -Choose nonce -Generate MAC -Store fragment -Forward MACs to servers -Verify MAC -Free older fragments -Write completed Overhead: Not in crash-only protocol didi Client Servers

18 James Hendricks, 18 Bringing it all together: Read -Request fragments from first m servers -Request latest nonce, timestamp, checksum -Return fragment (if requested) -Return latest nonce, timestamp, checksum -Verify provided checksum matches fragment hash&fp -Verify timestamps match -Verify nonces -Read complete didi Overhead: Not in crash-only protocol Client Servers

19 James Hendricks, 19 Evaluation

20 James Hendricks, 20 Experimental setup m = f + 1 Single client, NVRAM at servers Write or read 64 kB blocks -Fragment size decreases as f increases 3 GHz Pentium D, Intel PRO/1000 Fragments needed to decode block Number of faults tolerated

21 James Hendricks, 21 Prototype implementation Four protocols implemented: Our protocol Crash-only erasure-coded Crash-only replication-based P ASIS [Goodson04] emulation Read validation: Decode, encode, hash 4f+1 fragments 4f+1 servers, versioning, garbage collection All use same hashing and erasure coding libraries Byzantine tolerant Erasure coded

22 James Hendricks, 22 Write throughput Number of faults tolerated (f) Bandwidth (MB/s) Crash-only replication based (f +1 = m replicas) P ASIS -emulation Our protocol Crash-only erasure-coded

23 James Hendricks, 23 Write throughput Number of faults tolerated (f) Bandwidth (MB/s) Our protocol Crash-only erasure-coded 64 kB f +1 16 kB fragments

24 James Hendricks, 24 Write response time Latency (ms) Number of faults tolerated (f ) D. Crash-only replication based C. P ASIS -emulation B. Our protocol A. Crash-only erasure-coded A B C D RPC Hash Encode Fingerprint Overhead breakdown (f =10)

25 James Hendricks, 25 Read throughput Bandwidth (MB/s) Number of faults tolerated (f) Crash-only replication based P ASIS -emulation (compute bound) Our protocol Crash-only erasure-coded

26 James Hendricks, 26 Read response time Latency (ms) Number of faults tolerated (f ) D. Replicated C. PASIS-emulation B. Our protocol A. Erasure-coded A B C D RPC Hash Encode Fingerprint Overhead breakdown (f =10)

27 James Hendricks, 27 Conclusions Byzantine fault-tolerant storage can rival crash-only storage performance We present a low overhead Byzantine fault- tolerant erasure-coded block storage protocol and prototype -Write overhead: 2-round, hash and fingerprint -Read overhead: hash and fingerprint -Close to performance of systems that tolerate only crashes for reads and large writes

28 James Hendricks, 28 Backup slides

29 James Hendricks, 29 Why not multicast? May be unstable or unavailable (UDP) More data means more work at server (network, disk, hash) Doesn’t scale with clients!

30 James Hendricks, 30 Cryptographic hash overhead Byzantine storage requires cryptographic hashing. Does this matter? Systems must tolerate non-crash faults E.g., “misdirected write” Many modern systems checksum data E.g., Google File System ZFS supports SHA-256 cryptographic hash function May hash data for authentication Conclusion: BFT may not introduce new hashing

31 James Hendricks, 31 Is 3f+1 servers expensive? Consider a typical storage cluster Usually more primary drives than parity drives Usually several hot spares Conclusion: May already use 3f +1 servers Primary drives Parity drives Hot spares f = number of faults tolerated


Download ppt "Low-Overhead Byzantine Fault-Tolerant Storage James Hendricks, Gregory R. Ganger Carnegie Mellon University Michael K. Reiter University of North Carolina."

Similar presentations


Ads by Google