Download presentation
Presentation is loading. Please wait.
Published byEthan Fowler Modified over 9 years ago
1
Carnegie Mellon Increasing Intrusion Tolerance Via Scalable Redundancy Greg Ganger greg.ganger@cmu.edu Natassa9 Ailamaki Mike Reiter Priya Narasimhan Chuck Cranor
2
Carnegie Mellon Technical Objective To design, implement and evaluate new protocols for implementing intrusion-tolerant services that scale better Here, “scale” refers to efficiency as number of servers and number of failures tolerated grows Targeting three types of services Read-write data objects Custom “flat” object types for particular applications, notably directories for implementing an intrusion-tolerant file system Arbitrary objects that support object nesting
3
Carnegie Mellon Expected Impact Significant efficiency and scalability benefits over today’s protocols for intrusion tolerance For example, for data services, we anticipate At-least twofold latency improvement even at small configurations (e.g., tolerating 3-5 Byzantine server failures) over current best And improvements will grow as system scales up A twofold improvement in throughput, again growing with system size Without such improvements, intrusion tolerance will remain relegated to small deployments in narrow application areas
4
Carnegie Mellon The Problem Space Distributed services manage redundant state across servers to tolerate faults We consider tolerance to Byzantine faults, as might result from an intrusion into a server or client A faulty server or client may behave arbitrarily We also make no timing assumptions in this work An “asynchronous” system Primary existing practice: replicated state machines Offers no load dispersion, requires data replication, and degrades as system scales with O(N 2 ) messages
5
Carnegie Mellon Our approach Combine techniques to eliminate work in common cases Server-side versioning allows optimism with read-time repair, if nec. allows work to be off-loaded to clients in lieu of server agreement Quorum systems (and erasure coding) allows load dispersion (and more efficient redundancy for bulk data) Several others applied to defend against Byzantine actions Major risk? could be complex for arbitrary objects
6
Carnegie Mellon Evaluation We are Scenario I: “centralized server setting” Baseline: the BFT library Popular, publicly available implementation of Byzantine fault-tolerant state machine replication (by Castro & Liskov) Reported to be an efficient implementation of that approach Two measures Average latency of operations, from client’s perspective Peak sustainable throughput of operations Our consistency definition: linearizability of invocations
7
Carnegie Mellon Outline Overview Read-write storage protocol Some results Continuing work
8
Carnegie Mellon Read-write block storage Clients erasure-code/replicate blocks into fragments Storage-nodes version fragments on every write Storage-nodes F3F3 F1F1 F2F2 F4F4 F5F5 Client Data block Fragments F1F1 F2F2 F3F3 F4F4 F5F5
9
Carnegie Mellon Challenges: Concurrency Concurrent updates can violate linearizability Data 45123 Servers 45123
10
Carnegie Mellon Challenges: Server Failures Can attempt to mislead clients Typically addressed by “voting” Servers ???? 31245 4’
11
Carnegie Mellon 54 Challenges: Client Failures Byzantine client failures can also mislead clients Typically addressed by submitting a request via an agreement protocol Servers Data? 1234’?2’
12
Carnegie Mellon Consistency via versioning Leverage versioning storage nodes for consistency Allow writes to proceed with versioning All writes create new data versions Partial writes and concurrency won’t destroy data Reader detects and resolves update conflicts Concurrency rare in FS workloads (typically < 1%) Offloads work to client resulting in greater scalability Only perform extra work when needed Optimistically assume fault-free, concurrency-free operation Single round-trip for reads and writes in common case
13
Carnegie Mellon Our system model Crash-recovery storage-node fault model Up to t total bad storage-nodes (crashed/Byzantine) Up to b ≤ t Byzantine (arbitrary faults) So, t - b faults are crash-recovery faults Client fault model Any number of crash or Byzantine clients Asynchronous timing model Point-to-point authenticated channels
14
Carnegie Mellon Read/write protocol Unit of update: a block Complete blocks are read and written Erasure-coding may be used for space-efficiency Update semantics: Read–write No guarantee of contents between read & write Sufficient for block-based storage Consistency: Linearizability Liveness: wait-freedom
15
Carnegie Mellon R/W protocol: Write 1. Client erasure-codes data-item into N data-fragments 2. Client tags write requests with logical timestamp Round-trip required to read logical time 3. Client issues requests to at least W storage-nodes 4. Storage-nodes validate integrity of request 5. Storage-nodes insert request into version history 6. Write completes after W requests have completed
16
Carnegie Mellon R/W protocol: Read 1. Client reads latest version from storage-node subset Read set guaranteed to intersect with latest complete write 2. Client determines latest candidate write ( candidate ) Set of responses containing the latest timestamp 3. Client classifies the candidate as one of: Complete Incomplete Repairable For consistency: only complete writes can be returned
17
Carnegie Mellon R/W protocol: Read classification Based on client’s (limited) system knowledge Failures and asynchrony lead to imperfect information Candidate classification rules: Complete: candidate exists on W nodes candidate is decoded and returned Incomplete: candidate exists on W nodes Read previous version to determine new candidate Iterate…perform classification on new candidate Repairable: candidate may exist on W nodes Repair and return data-item
18
Carnegie Mellon D 0 determined complete, returned Example: Successful read (N=5, W=3, t=1, b=0) Time ØØØØØ D0D0 D0D0 D0D0 D1D1 T0T0 T1T1 Storage Nodes D0D0 D1D1 D0D0 T1T1 Client read operation after T 1 12345 ØD0D0 D 1 latest candidateD 1 incompleteD 0 latest candidate
19
Carnegie Mellon Example: Repairable read (N=5, W=3, t=1, b=0) Time ØØØØØ D0D0 D0D0 D0D0 D1D1 T0T0 T1T1 T2T2 Storage Nodes D0D0 D1D1 D2D2 T2T2 Client read operation after T 2 D2D2 12345 D2D2 D2D2 D2D2 D 2 repairableRepair D 2 D2D2 D2D2 D2D2 D2D2 Return D 2 D 2 latest candidate
20
Carnegie Mellon Protecting against Byzantine storage-nodes Must defend against servers that modify data in their possession Solution: Cross checksums [Gong 89] Hash each data-fragment Concatenate all N hashes Append cross checksum to each fragment Clients verify hashes against fragments and use cross checksums as “votes” Data-item Data-fragments Hashes Cross checksum
21
Carnegie Mellon Protecting against Byzantine clients Must ensure all fragment sets decode to same value Solution: Validating timestamps Write: place hash of cross checksum in timestamp also prevents multiple values being written at same timestamp Storage-nodes validate their fragment against corresponding hash Read: regenerate fragments and cross checksum Data-items Data-fragments ≠ Example: Byzantine encoding with “poisonous” fragment F1F1 F2F2 F3F3 F4F4 F5F5
22
Carnegie Mellon Experimental setup Prototype system: PASIS 20 node cluster Dual 1 GHz Pentium III storage-nodes Single 2 GHz Pentium IV clients 100 Mb switched Ethernet 16 KB data-item size (before encoding) Blowup of over the data-item size Each fragment is the data-item size
23
Carnegie Mellon PASIS response time 1234 0 2 4 6 8 10 12 14 16 18 20 Mean response time (ms) Total failures tolerated (t) 1-way 16KB ping Writes b = t Readsb = t Writesb = 1 Readsb = 1 Fault models b = t and b = 1 N = 2t + 2b + 1 N = 17N = 11 Decode computationNW delay: redundant fragments
24
Carnegie Mellon Throughput experiment Same system set-up as resp. time experiment Clients issue read or write requests Increase number of clients to increase load Demonstrate value of erasure-codes Increase m to reduce per storage-node load Compare with Byzantine atomic broadcast BFT library [Castro & Liskov 99] Supports arbitrary operations Replica (with multicast): limits write throughput O(N 2 ) messages: limits performance scalability
25
Carnegie Mellon Reduce per storage-node load with erasure-codes BFT uses replication which increases per storage-node load PASIS vs. BFT: Write throughput 02468 0 500 1000 1500 2000 2500 3000 3500 Throughput (req/s) Clients PASIS BFT mNb = t = 1 25 36 14 60% PASIS has higher write throughput than BFT
26
Carnegie Mellon PASIS vs. BFT: Read throughput 02468 0 500 1000 1500 2000 2500 3000 3500 Throughput (req/s) Clients PASIS BFT mN 25 b = t = 1 36 14
27
Carnegie Mellon Continuing work New testbed: 70 servers connected with switched Gbit/sec experiments can then explore higher scalability points baseline and our results will come from this testbed Protocol for arbitrary deterministic functions on objects built from same basic primitives Protocol for objects with nested objects adds requirement of replicated invocations
28
Carnegie Mellon Summary Goal: To design, implement and evaluate new protocols for implementing intrusion-tolerant services that scale better Here, “scale” refers to efficiency as number of servers and number of failures tolerated grows Started with a protocol for read-write storage based on versioning and quorums scales efficiently (and much better than BFT) also flexible (can add assumptions to reduce costs) Going forward (in progress) generalize types of objects and operations that can be supported
29
Carnegie Mellon Questions?
30
Carnegie Mellon Garbage collection Pruning old versions is necessary to reclaim space Versions prior to latest complete write can be pruned Storage-nodes need to know latest complete write In isolation they do not have this information Perform read operation to classify latest complete write Many possible policies exist for when to clean what Best to clean during idle time (if possible) Rank blocks in order of greatest potential gains Work remains in this area
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.