Presentation is loading. Please wait.

Presentation is loading. Please wait.

SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov.

Similar presentations


Presentation on theme: "SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov."— Presentation transcript:

1 SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov MPI-SWS Peter Druschel MPI-SWS

2 SOSP 2007 2 © 2007 Andreas Haeberlen, MPI-SWS Motivation Distributed state, incomplete information General case: Multiple admins with different interests Admin

3 SOSP 2007 3 © 2007 Andreas Haeberlen, MPI-SWS General faults occur in practice Many faults are not 'fail-stop' Node is still running, but its behavior changes Examples: Hardware malfunctions Misconfigurations Software modifications by users Hacker attacks...

4 SOSP 2007 4 © 2007 Andreas Haeberlen, MPI-SWS Dealing with general faults is difficult How to detect faults? How to identify the faulty nodes? How to convince others that a node is (not) faulty? Incorrect message Responsible admin

5 SOSP 2007 5 © 2007 Andreas Haeberlen, MPI-SWS Learning from the 'offline' world Relies on accountability Example: Banks Can be used to detect, identify and convince But: Existing fault-tolerance work mostly focused on prevention Goal: A general+practical system for accountability RequirementSolution CommitmentSigned receipts Tamper-evident recordDouble-entry bookkeeping InspectionsAudits

6 SOSP 2007 6 © 2007 Andreas Haeberlen, MPI-SWS Outline Introduction What is accountability? How can we implement it? How well does it work?

7 SOSP 2007 7 © 2007 Andreas Haeberlen, MPI-SWS Ideal accountability Whenever a node is faulty in any way, the system generates a proof of misbehavior against that node Fault := Node deviates from expected behavior Recall that our goal is to detect faults identify the faulty nodes convince others that a node is (or is not) faulty Can we build a system that provides the following guarantee?

8 SOSP 2007 8 © 2007 Andreas Haeberlen, MPI-SWS Can we detect all faults? Problem: Faults that affect only a node's internal state Requires online trusted probes at each node Focus on observable faults: Faults that causally affect a correct node This allows us to detect faults without introducing any trusted components A A X C C 1001010110 0010110101 1100100100 0

9 SOSP 2007 9 © 2007 Andreas Haeberlen, MPI-SWS Can we always get a proof? Problem: He-said-she-said situation Three possible causes: A never sent X B refuses to accept X X was lost by the network Cannot get proof of misbehavior! Generalize to verifiable evidence: a proof of misbehavior, or a challenge that the node cannot answer What if, after a long time, no response has arrived? Does not prove the fault, but we can suspect the node A A X B B C C ? I sent X! I never received X! ?!

10 SOSP 2007 10 © 2007 Andreas Haeberlen, MPI-SWS Practical accountability We propose the following definition of a distributed system with accountability: This is useful Any (!) fault that affects a correct node is eventually detected and linked to a faulty node It can be implemented in practice Whenever a fault is observed by a correct node, the system eventually generates verifiable evidence against a faulty node

11 SOSP 2007 11 © 2007 Andreas Haeberlen, MPI-SWS Outline Introduction What is accountability? How can we implement it? How well does it work?

12 SOSP 2007 12 © 2007 Andreas Haeberlen, MPI-SWS Adds accountability to a given system Implemented as a library Provides secure record, commitment, auditing, etc. Assumptions: Implementation: PeerReview 1. System can be modeled as collection of deterministic state machines 2. Nodes have reference implementations of the state machines 3. Correct nodes can eventually communicate 4. Nodes can sign messages

13 SOSP 2007 13 © 2007 Andreas Haeberlen, MPI-SWS M PeerReview from 10,000 feet All nodes keep a log of their inputs & outputs Including all messages Each node has a set of witnesses, who audit its log periodically If the witnesses detect misbehavior, they generate evidence make the evidence avai- lable to other nodes Other nodes check evi- dence, report fault A's log B's log A A B B M C C D D E E A's witnesses M

14 SOSP 2007 14 © 2007 Andreas Haeberlen, MPI-SWS PeerReview detects tampering A B Message Hash chain Send(X) Recv(Y) Send(Z) Recv(M) H0H0 H1H1 H2H2 H3H3 H4H4 B's log ACK What if a node modifies its log entries? Log entries form a hash chain Inspired by secure histories [Maniatis02] Signed hash is included with every message  Node commits to its current state  Changes are evident Hash(log)

15 SOSP 2007 15 © 2007 Andreas Haeberlen, MPI-SWS PeerReview detects inconsistencies What if a node keeps multiple logs? forks its log? Check whether the signed hashes form a single hash chain H3'H3' Read X H4'H4' Not found Read Z OK Create X H0H0 H1H1 H2H2 H3H3 H4H4 OK "View #1" "View #2"

16 SOSP 2007 16 © 2007 Andreas Haeberlen, MPI-SWS Module B PeerReview detects faults How to recognize faults in a log? Assumption: Node can be modeled as a deterministic state machine To audit a node: Replay inputs to a trusted copy of the state machine Check outputs against the log Module A Module B =? Log Network Input Output State machine if ≠ Module A

17 SOSP 2007 17 © 2007 Andreas Haeberlen, MPI-SWS PeerReview offers provable guarantees PeerReview guarantees that: 1) Faults will be detected 2) Good nodes cannot be accused Formal definitions and proof in a TR If node commits a fault + has a correct witness, then witness obtains a proof of misbehavior (PoM), or a challenge that the faulty node cannot answer If node is correct there can never be a PoM, and it can answer any challenge

18 SOSP 2007 18 © 2007 Andreas Haeberlen, MPI-SWS Outline Introduction What is accountability? How can we implement it? How well does it work? Is it widely applicable? How much does it cost? Does it scale?

19 SOSP 2007 19 © 2007 Andreas Haeberlen, MPI-SWS PeerReview is widely applicable App #1: NFS server in the Linux kernel Many small, latency-sensitive requests Tampering with files Lost updates App #2: Overlay multicast Transfers large volume of data Freeloading Tampering with content App #3: P2P email Complex, large, decentralized Denial of service Attacks on DHT routing More information in the paper Metadata corruption Incorrect access control Censorship

20 SOSP 2007 20 © 2007 Andreas Haeberlen, MPI-SWS How much does PeerReview cost? Dominant cost depends on number of witnesses W O(W 2 ) component Baseline 1 2 3 4 5 100 80 60 40 20 0 Avg traffic (Kbps/node) Number of witnesses Baseline traffic Signatures and ACKs Checking logs W dedicated witnesses

21 SOSP 2007 21 © 2007 Andreas Haeberlen, MPI-SWS Mutual auditing Small probability of error is inevitable Example: Replication Can use this to optimize PeerReview Accept that an instance of a fault is found only with high probability Asymptotic complexity: O(N 2 )  O(log N) Small random sample of peers chosen as witnesses Node

22 SOSP 2007 22 © 2007 Andreas Haeberlen, MPI-SWS PeerReview is scalable Assumption: Up to 10% of nodes can be faulty Probabilistic guarantees enable scalability Example: Email system scales to over 10,000 nodes with P=0.999999 DSL/cable upstream Email system w/o accountability O((log N) 2 ) O(log N) Email system + PeerReview (P=0.999999) Email system + PeerReview (P=1.0) System size (nodes) Avg traffic (Kbps/node)

23 SOSP 2007 23 © 2007 Andreas Haeberlen, MPI-SWS Summary Accountability is a new approach to handling faults in distributed systems detects faults identifies the faulty nodes produces evidence Our practical definition of accountability: Whenever a fault is observed by a correct node, the system eventually generates verifiable evidence against a faulty node PeerReview: A system that enforces accountability Offers provable guarantees and is widely applicable Thank you! Andreas Haeberlen received a SOSP travel scholarship, which was supported by Infosys


Download ppt "SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov."

Similar presentations


Ads by Google