SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07.

Slides:



Advertisements
Similar presentations
CS 542: Topics in Distributed Systems Diganta Goswami.
Advertisements

Accountable systems or how to catch a liar? Jinyang Li (with slides from authors of SUNDR and PeerReview)
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
© 2010 Andreas Haeberlen 1 Accountable Virtual Machines OSDI (October 4, 2010) Andreas Haeberlen University of Pennsylvania Paarijaat Aditya Rodrigo Rodrigues.
1 The Case for Byzantine Fault Detection. 2 Challenge: Byzantine faults Distributed systems are subject to a variety of failures and attacks Hacker break-in.
Reliable Client Accounting for P2P-Infrastructure Hybrids Paarijaat Aditya †, Ming-Chen Zhao ‡, Yin Lin *, Andreas Haeberlen ‡, Peter Druschel †, Bruce.
P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.
© 2005 Andreas Haeberlen, Rice University 1 Glacier: Highly durable, decentralized storage despite massive correlated failures Andreas Haeberlen Alan Mislove.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
Byzantine Generals Problem: Solution using signed messages.
Distributed Detection Of Node Replication Attacks In Sensor Networks Presenter: Kirtesh Patil Acknowledgement: Slides on Paper originally provided by Bryan.
DICOM INTERNATIONAL DICOM INTERNATIONAL CONFERENCE & SEMINAR April 8-10, 2008 Chengdu, China DICOM Security Eric Pan Agfa HealthCare.
Termination Detection. Goal Study the development of a protocol for termination detection with the help of invariants.
Termination Detection Part 1. Goal Study the development of a protocol for termination detection with the help of invariants.
Edith C. H. Ngai1, Jiangchuan Liu2, and Michael R. Lyu1
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
CS 582 / CMPE 481 Distributed Systems
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
NSDI (April 24, 2009) © 2009 Andreas Haeberlen, MPI-SWS 1 NetReview: Detecting when interdomain routing goes wrong Andreas Haeberlen MPI-SWS / Rice Ioannis.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 15 Wenbing Zhao Department of Electrical and Computer Engineering.
CS-550 (M.Soneru): Recovery [SaS] 1 Recovery. CS-550 (M.Soneru): Recovery [SaS] 2 Recovery Computer system recovery: –Restore the system to a normal operational.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering.
CSE331: Introduction to Networks and Security Lecture 24 Fall 2002.
© 2006 Andreas Haeberlen, MPI-SWS 1 The Case for Byzantine Fault Detection Andreas Haeberlen MPI-SWS / Rice University Petr Kouznetsov MPI-SWS Peter Druschel.
Practical Byzantine Fault Tolerance (The Byzantine Generals Problem)
Building and Programming the Cloud, Mysore, Jan Accountable distributed systems and the accountable cloud Peter Druschel joint work with Andreas.
1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
A Survey of Rollback-Recovery Protocols in Message-Passing Systems M. Elnozahy, L. Alvisi, Y. Wang, D. Johnson Carnegie Mellon University Presented by:
Hashing it Out in Public Common Failure Modes of DHT-based Anonymity Schemes Andrew Tran, Nicholas Hopper, Yongdae Kim Presenter: Josh Colvin, Fall 2011.
1 Highly Secure and Efficient Routing Ioannis Avramopulos, Hisashi Kobayashi Randolph Wang Arvind Krishamurthy Dept. of EE Dept. of CS Dept. of CS Dept.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
SecureMR: A Service Integrity Assurance Framework for MapReduce Author: Wei Wei, Juan Du, Ting Yu, Xiaohui Gu Source: Annual Computer Security Applications.
Accountability Aditya Akella. Outline Accountable Virtual Machines Accountability in and via SDN.
Presented by Keun Soo Yim March 19, 2009
1 The Internet and Networked Multimedia. 2 Layering  Internet protocols are designed to work in layers, with each layer building on the facilities provided.
EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Practical Byzantine Fault Tolerance
Evoting using collaborative clustering Justin Gray Osama Khaleel Joey LaConte Frank Watson.
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
1 ZYZZYVA: SPECULATIVE BYZANTINE FAULT TOLERANCE R.Kotla, L. Alvisi, M. Dahlin, A. Clement and E. Wong U. T. Austin Best Paper Award at SOSP 2007.
Practical Byzantine Fault Tolerance and Proactive Recovery
BFTW 3 workshop (Sep 22, 2009)© 2009 Andreas Haeberlen 1 The Fault Detection Problem Andreas Haeberlen MPI-SWS Petr Kuznetsov TU Berlin / Deutsche Telekom.
Mangai Vetrivelan Snigdha Joshi Avani Atre. Sensor Network Vulnerabilities o Unshielded Sensor Network Nodes vulnerable to be compromised. o Attacks on.
Re-Configurable Byzantine Quorum System Lei Kong S. Arun Mustaque Ahamad Doug Blough.
CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
PeerReview: Practical Accountability for Distributed Systems SOSP 07.
SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
International Conference Security in Pervasive Computing(SPC’06) MMC Lab. 임동혁.
Agenda  Quick Review  Finish Introduction  Java Threads.
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
Distributed Systems Lecture 6 Global states and snapshots 1.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Topic 36: Zero-Knowledge Proofs
Problem: Internet diagnostics and forensics
Primary-Backup Replication
Controlling the Cost of Reliability in Peer-to-Peer Overlays
Chapter 12: Query Processing
ACM Transactions on Information and System Security, November 2001
From Viewstamped Replication to BFT
EEC 688/788 Secure and Dependable Computing
Accountable Virtual Machines
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Distributed Resource Management: Distributed Shared Memory
Presentation transcript:

SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07

Problems How to: Detect Byzantine faults whose effects are observed by a correct node. Link faults to faulty nodes. Defend correct nodes against false accusations.

Accountability Use accountability to detect and expose node faults. Maintain a tamper-evident record that captures all actions of each node. Detect a faulty node when it’s behavior deviates from that of a correct node.

Limitations of current systems Designed for a specific type of faults or for a specific application. Based on many strong assumptions. Not provide verifiable evidence of misbehavior. Use formal specification of a system to check for misbehavior. Can only detect faulty nodes that misbehave repeatedly.

Overview Model a node as a deterministic state machine. Each node keeps a secure log that records all sent and received messages, all inputs and outputs. To check a node j, node i will: Get j’s log. Replay j’s log using a reference implementation. Compare the results.

The problem of detection Ideal completeness: a faulty node should be exposed by all correct nodes. Ideal accuracy: no correct node is ever exposed by a correct node (no false positives).

Types of faults can be detected Available data: messages sent and received among nodes. Can only detect faults that manifest themselves through messages. Can only detect faults that are observed by a correct nodes. Need to consider: Verifiability of outputs. Missing and long delayed messages.

Problem statement Terms: Detectably fault, detectably ignorant. Accomplices (of i): nodes that send messages caused by an incorrect message sent by i Completeness: Eventually, every detectably ignorant node is suspected forever by every correct node. If node i is detectably faulty, then eventually, some faulty accomplice is exposed or suspected forever by every correct node.

Problem statement (cont) Accuracy: No correct node is forever suspected by a correct node. No correct node is ever exposed by a correct node.

System model Failure indications: exposed(j) suspected(j) trusted(j)

Assumptions The state machines S i are deterministic. A message sent from a correct node to another is eventually received. Use a hash function H() that is: pre- image resistant, second pre-image resistant, and collision resistant. Each node has a unique identifier. Nodes can sign messages, and faulty nodes can node forge the signature.

Assumptions (cont) Each node has access to a reference implementation of all S i. The implementation can take a snapshot and can be initialized from a snapshot. Function ω that maps each node to a set of witnesses. The set {i} U ω(i) contains at least one correct node.

Tamper-evident logs Log entry Hash value Authenticator If a prefix of a node’s log does not match the hash value then that node is faulty

Tamper-evident logs α k j can be used to check if j’s log contains e k To inspect x entries of j: i challenge j to return e k-(x-1),… e k and h k-x. i calculate h k and compare with the value in the authenicator.

Commitment protocol To ensure that a node can not add an entry for a message it has never received and that a node’s log is complete. When i send a message to j: i creates (s k,SEND,{j,m}), attach h k-1, s k and σ i (s k ||h k ) to m and send m. j calculate the signature, if valid then j creates (s l, RECV,{i,m}) and retusn ACK to i with h l-1, s l and σ j (s l ||h l ). i verify the signature and send a challenge to j’s witnesses if the signature is not valid.

Consistency protocol A faulty node can hide itself by keeping more than one log or a log with multiple branches

Consistency protocol If i receives authenticators from j, it must eventually forward those authenticators to j’s witnesses. Periodically, each ω of j’s witnesses will challenge j to return a list of entries (from k to l) then ω check for consistency. Finally, ω extracts all authenticators j receives from other nodes and send them to corresponding witness sets.

Audit protocol To check if the node’s behavior consistent with it’s reference implementation. Each witness of i will: Look up the most recent authenticator of i. Challenge to get all log entries since the last audit and add them to λ ωi. Create an instance of i’s reference implementation, initialize the most recent snapshot. Replay all the inputs and compare the outputs. Expose i if the outputs are not equal.

Challenge/response protocol Audit challenge: Consists two authenticators α k i and α l i (k < l) i’s log must contains e k – e l, otherwise faulty If i is correct, returns the corresponding log segment.

Challenge/response protocol Send challenge: Consists the message m with all needed information attached. i must acknowledge m, otherwise faulty. If i has not yet received m, accepts m and returns an ACK. If i has already received m, just resends the ACK.

Evidence transfer protocol To ensure that all correct nodes eventually collect the same evidence against faulty nodes. Every node i periodically fetches challenges collected by witnesses of every other node j. If a correct node i obtains a challenge for j, i indicates suspected(j). When I receives a message from j, i challenges j. If i receives valid answers to all pending challenges of j, i indicates trusted(j). If i obtains a misbehavior from j, i indicates exposed(j).

Overhead Signing messages. Extra messages to implement the protocols. Taking snapshots of nodes. Replay nodes’ execution

Extension P f : probability that an all-faulty witness set exists. P m : probability that a given instance of misbehavior remains undetected. The message complexity grows with O(logN).

Applications Overlay multicast. NFS P2P (ePOST)

Evaluation Strategy of the freeloader in Overlay Multicast.

Evaluation (cont) Message latency in NFS

Evaluation (cont) Throughput of NFS

Evaluation (cont) Average traffic in ePOST

Evaluation (cont) Scalability

Evaluation (cont) Scalability