Presentation is loading. Please wait.

Presentation is loading. Please wait.

Carnegie Mellon Increasing Intrusion Tolerance Via Scalable Redundancy Greg Ganger Natassa9 Ailamaki Mike Reiter Priya Narasimhan Chuck.

Similar presentations


Presentation on theme: "Carnegie Mellon Increasing Intrusion Tolerance Via Scalable Redundancy Greg Ganger Natassa9 Ailamaki Mike Reiter Priya Narasimhan Chuck."— Presentation transcript:

1 Carnegie Mellon Increasing Intrusion Tolerance Via Scalable Redundancy Greg Ganger greg.ganger@cmu.edu Natassa9 Ailamaki Mike Reiter Priya Narasimhan Chuck Cranor

2 Carnegie Mellon Technical Objective To design, implement and evaluate new protocols for implementing intrusion-tolerant services that scale better  Here, “scale” refers to efficiency as number of servers and number of failures tolerated grows Targeting three types of services  Read-write data objects  Custom “flat” object types for particular applications, notably directories for implementing an intrusion-tolerant file system  Arbitrary objects that support object nesting

3 Carnegie Mellon Expected Impact Significant efficiency and scalability benefits over today’s protocols for intrusion tolerance For example, for data services, we anticipate  At-least twofold latency improvement even at small configurations (e.g., tolerating 3-5 Byzantine server failures) over current best  And improvements will grow as system scales up  A twofold improvement in throughput, again growing with system size Without such improvements, intrusion tolerance will remain relegated to small deployments in narrow application areas

4 Carnegie Mellon The Problem Space Distributed services manage redundant state across servers to tolerate faults  We consider tolerance to Byzantine faults, as might result from an intrusion into a server or client  A faulty server or client may behave arbitrarily  We also make no timing assumptions in this work  An “asynchronous” system Primary existing practice: replicated state machines  Offers no load dispersion, requires data replication, and degrades as system scales with O(N 2 ) messages

5 Carnegie Mellon Our approach Combine techniques to eliminate work in common cases  Server-side versioning  allows optimism with read-time repair, if nec.  allows work to be off-loaded to clients in lieu of server agreement  Quorum systems (and erasure coding)  allows load dispersion (and more efficient redundancy for bulk data)  Several others applied to defend against Byzantine actions Major risk?  could be complex for arbitrary objects

6 Carnegie Mellon Evaluation We are Scenario I: “centralized server setting” Baseline: the BFT library  Popular, publicly available implementation of Byzantine fault-tolerant state machine replication (by Castro & Liskov)  Reported to be an efficient implementation of that approach Two measures  Average latency of operations, from client’s perspective  Peak sustainable throughput of operations Our consistency definition: linearizability of invocations

7 Carnegie Mellon Outline Overview Read-write storage protocol Some results Continuing work

8 Carnegie Mellon Read-write block storage Clients erasure-code/replicate blocks into fragments Storage-nodes version fragments on every write Storage-nodes F3F3 F1F1 F2F2 F4F4 F5F5 Client Data block Fragments F1F1 F2F2 F3F3 F4F4 F5F5

9 Carnegie Mellon Challenges: Concurrency Concurrent updates can violate linearizability Data 45123 Servers 45123

10 Carnegie Mellon Challenges: Server Failures Can attempt to mislead clients  Typically addressed by “voting” Servers ???? 31245 4’

11 Carnegie Mellon 54 Challenges: Client Failures Byzantine client failures can also mislead clients  Typically addressed by submitting a request via an agreement protocol Servers Data? 1234’?2’

12 Carnegie Mellon Consistency via versioning Leverage versioning storage nodes for consistency Allow writes to proceed with versioning  All writes create new data versions  Partial writes and concurrency won’t destroy data Reader detects and resolves update conflicts  Concurrency rare in FS workloads (typically < 1%)  Offloads work to client resulting in greater scalability Only perform extra work when needed  Optimistically assume fault-free, concurrency-free operation  Single round-trip for reads and writes in common case

13 Carnegie Mellon Our system model Crash-recovery storage-node fault model  Up to t total bad storage-nodes (crashed/Byzantine)  Up to b ≤ t Byzantine (arbitrary faults)  So, t - b faults are crash-recovery faults Client fault model  Any number of crash or Byzantine clients Asynchronous timing model Point-to-point authenticated channels

14 Carnegie Mellon Read/write protocol Unit of update: a block  Complete blocks are read and written  Erasure-coding may be used for space-efficiency Update semantics: Read–write  No guarantee of contents between read & write  Sufficient for block-based storage Consistency: Linearizability Liveness: wait-freedom

15 Carnegie Mellon R/W protocol: Write 1. Client erasure-codes data-item into N data-fragments 2. Client tags write requests with logical timestamp  Round-trip required to read logical time 3. Client issues requests to at least W storage-nodes 4. Storage-nodes validate integrity of request 5. Storage-nodes insert request into version history 6. Write completes after W requests have completed

16 Carnegie Mellon R/W protocol: Read 1. Client reads latest version from storage-node subset  Read set guaranteed to intersect with latest complete write 2. Client determines latest candidate write ( candidate )  Set of responses containing the latest timestamp 3. Client classifies the candidate as one of:  Complete  Incomplete  Repairable For consistency: only complete writes can be returned

17 Carnegie Mellon R/W protocol: Read classification Based on client’s (limited) system knowledge  Failures and asynchrony lead to imperfect information Candidate classification rules: Complete: candidate exists on  W nodes  candidate is decoded and returned Incomplete: candidate exists on  W nodes  Read previous version to determine new candidate  Iterate…perform classification on new candidate Repairable: candidate may exist on  W nodes  Repair and return data-item

18 Carnegie Mellon D 0 determined complete, returned Example: Successful read (N=5, W=3, t=1, b=0) Time ØØØØØ D0D0 D0D0 D0D0 D1D1 T0T0 T1T1 Storage Nodes D0D0 D1D1 D0D0 T1T1 Client read operation after T 1 12345 ØD0D0 D 1 latest candidateD 1 incompleteD 0 latest candidate

19 Carnegie Mellon Example: Repairable read (N=5, W=3, t=1, b=0) Time ØØØØØ D0D0 D0D0 D0D0 D1D1 T0T0 T1T1 T2T2 Storage Nodes D0D0 D1D1 D2D2 T2T2 Client read operation after T 2 D2D2 12345 D2D2 D2D2 D2D2 D 2 repairableRepair D 2 D2D2 D2D2 D2D2 D2D2 Return D 2 D 2 latest candidate

20 Carnegie Mellon Protecting against Byzantine storage-nodes Must defend against servers that modify data in their possession Solution: Cross checksums [Gong 89]  Hash each data-fragment  Concatenate all N hashes  Append cross checksum to each fragment  Clients verify hashes against fragments and use cross checksums as “votes” Data-item Data-fragments Hashes Cross checksum

21 Carnegie Mellon Protecting against Byzantine clients Must ensure all fragment sets decode to same value Solution: Validating timestamps  Write: place hash of cross checksum in timestamp  also prevents multiple values being written at same timestamp  Storage-nodes validate their fragment against corresponding hash  Read: regenerate fragments and cross checksum Data-items Data-fragments ≠ Example: Byzantine encoding with “poisonous” fragment F1F1 F2F2 F3F3 F4F4 F5F5

22 Carnegie Mellon Experimental setup Prototype system: PASIS 20 node cluster  Dual 1 GHz Pentium III storage-nodes  Single 2 GHz Pentium IV clients 100 Mb switched Ethernet 16 KB data-item size (before encoding)  Blowup of over the data-item size  Each fragment is the data-item size

23 Carnegie Mellon PASIS response time 1234 0 2 4 6 8 10 12 14 16 18 20 Mean response time (ms) Total failures tolerated (t) 1-way 16KB ping Writes b = t Readsb = t Writesb = 1 Readsb = 1 Fault models b = t and b = 1 N = 2t + 2b + 1 N = 17N = 11 Decode computationNW delay: redundant fragments

24 Carnegie Mellon Throughput experiment Same system set-up as resp. time experiment Clients issue read or write requests  Increase number of clients to increase load Demonstrate value of erasure-codes  Increase m to reduce per storage-node load Compare with Byzantine atomic broadcast  BFT library [Castro & Liskov 99]  Supports arbitrary operations  Replica (with multicast): limits write throughput  O(N 2 ) messages: limits performance scalability

25 Carnegie Mellon Reduce per storage-node load with erasure-codes BFT uses replication which increases per storage-node load PASIS vs. BFT: Write throughput 02468 0 500 1000 1500 2000 2500 3000 3500 Throughput (req/s) Clients PASIS BFT mNb = t = 1 25 36 14 60% PASIS has higher write throughput than BFT

26 Carnegie Mellon PASIS vs. BFT: Read throughput 02468 0 500 1000 1500 2000 2500 3000 3500 Throughput (req/s) Clients PASIS BFT mN 25 b = t = 1 36 14

27 Carnegie Mellon Continuing work New testbed: 70 servers connected with switched Gbit/sec  experiments can then explore higher scalability points  baseline and our results will come from this testbed Protocol for arbitrary deterministic functions on objects  built from same basic primitives Protocol for objects with nested objects  adds requirement of replicated invocations

28 Carnegie Mellon Summary Goal: To design, implement and evaluate new protocols for implementing intrusion-tolerant services that scale better  Here, “scale” refers to efficiency as number of servers and number of failures tolerated grows Started with a protocol for read-write storage  based on versioning and quorums  scales efficiently (and much better than BFT)  also flexible (can add assumptions to reduce costs) Going forward (in progress)  generalize types of objects and operations that can be supported

29 Carnegie Mellon Questions?

30 Carnegie Mellon Garbage collection Pruning old versions is necessary to reclaim space  Versions prior to latest complete write can be pruned Storage-nodes need to know latest complete write  In isolation they do not have this information  Perform read operation to classify latest complete write Many possible policies exist for when to clean what Best to clean during idle time (if possible)  Rank blocks in order of greatest potential gains  Work remains in this area


Download ppt "Carnegie Mellon Increasing Intrusion Tolerance Via Scalable Redundancy Greg Ganger Natassa9 Ailamaki Mike Reiter Priya Narasimhan Chuck."

Similar presentations


Ads by Google