Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Slides:



Advertisements
Similar presentations
Distributed Data Processing
Advertisements

1 The Case for Byzantine Fault Detection. 2 Challenge: Byzantine faults Distributed systems are subject to a variety of failures and attacks Hacker break-in.
The Google File System Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani 1CS5204 – Operating Systems.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
The SMART Way to Migrate Replicated Stateful Services Jacob R. Lorch, Atul Adya, Bill Bolosky, Ronnie Chaiken, John Douceur, Jon Howell Microsoft Research.
Distributed components
Chapter 6 Introducing Active Directory
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
3.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 3: Introducing Active Directory.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
Last Class: Weak Consistency
© 2006 Andreas Haeberlen, MPI-SWS 1 The Case for Byzantine Fault Detection Andreas Haeberlen MPI-SWS / Rice University Petr Kouznetsov MPI-SWS Peter Druschel.
Hands-On Microsoft Windows Server 2003 Administration Chapter 5 Administering File Resources.
BASE: Using Abstraction to Improve Fault Tolerance Rodrigo Rodrigues, Miguel Castro, and Barbara Liskov MIT Laboratory for Computer Science and Microsoft.
Farsite: Ferderated, Available, and Reliable Storage for an Incompletely Trusted Environment Microsoft Reseach, Appear in OSDI’02.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Computer Science Lecture 16, page 1 CS677: Distributed OS Last Class:Consistency Semantics Consistency models –Data-centric consistency models –Client-centric.
3.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 3: Introducing Active Directory.
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
Understanding Active Directory
Client/Server Architectures
Byzantine Fault Tolerance CS 425: Distributed Systems Fall Material drived from slides by I. Gupta and N.Vaidya.
Federated, Available, and Reliable Storage for an Incompletely Trusted Environment Atul Adya, Bill Bolosky, Miguel Castro, Gerald Cermak, Ronnie Chaiken,
Overview of Active Directory Domain Services Lesson 1.
(ITI310) SESSIONS : Active Directory By Eng. BASSEM ALSAID.
FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment.
The Design Discipline.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
1 System Models. 2 Outline Introduction Architectural models Fundamental models Guideline.
FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment A. Atta, W. J. Bolowsky, M. Castro, G. Cermak, R. Chaiken, J.
5.1 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.
HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Practical Byzantine Fault Tolerance
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
Peer-to-peer Information Systems Universität des Saarlandes Max-Planck-Institut für Informatik – AG5: Databases and Information Systems Group Prof. Dr.-Ing.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2012 Lecture 26 November 29, 2012 Presented By: Imranul Hoque 1.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
Introduction to Active Directory
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 24: GFS.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
Langley Research Center An Architectural Concept for Intrusion Tolerance in Air Traffic Networks Jeffrey Maddalon Paul Miner {jeffrey.m.maddalon,
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
PERFORMANCE MANAGEMENT IMPROVING PERFORMANCE TECHNIQUES Network management system 1.
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
BChain: High-Throughput BFT Protocols
Overview of Active Directory Domain Services
Peer-to-peer networking
(ITI310) SESSIONS 6-7-8: Active Directory.
Providing Secure Storage on the Internet
Chapter 1 Database Systems
Outline Announcements Fault Tolerance.
Principles of Computer Security
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Prophecy: Using History for High-Throughput Fault Tolerance
Fault-Tolerant State Machine Replication
The SMART Way to Migrate Replicated Stateful Services
Federated, Available, and Reliable Storage for an Incompletely Trusted Environment Atul Adya, William J. Bolosky, Miguel Castro, Gerald Cermak, Ronnie.
EEC 688/788 Secure and Dependable Computing
Abstractions for Fault Tolerance
Presentation transcript:

Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell

Byzantine fault isolation \'biz- ə n- tēn folt ī-sə-'lā- shən\ n (2006) : methodology for designing a distributed system that can, under Byzantine failure, operate with application- defined partial correctness '' ˙ ' Farsite \'fär-sīt\ n (2000) : serverless distributed file system developed at Microsoft Research, designed to be scalable, strongly consistent, and secure despite running on an untrusted infrastructure of desktop PCs Definitions Byzantine fault \'biz- ə n- tēn folt\ n (1982) : a failure of a system component that produces arbitrary behavior ' ˙ ' BFI \ bē-ef-'ī\ n (2006) : Byzantine fault isolation '

Talk Outline Context – Farsite system Why BFT doesn’t scale Farsite’s use of multiple BFT groups The need for isolating Byzantine faults Formal system specification BFI in Farsite

Farsite System client server client server

Farsite System usersBFT group metadata clients – Metadata

usersBFT groupclients Using Byzantine agreement protocol, assign sequence numbers to messages Prepare-commit among 2 T + 1 servers T = tolerable faults R = count of replicas R > 3 T Deterministically update metadata Reply to client Farsite System– Metadata

The Cost of BFT Groups computation messages message delays  

machine count throughput multiple idealtypicalflatBFT Throughput vs. Scale

Workload Sharing Workload clientserver

BFT at Scale

Multiple BFT Groups

Tree of BFT Groups

/ users cruftemacs viOutlook public AliceBob docscode C++C# foobar Proj X srcbinsrcbin

Delegation to New Group / users cruftemacs viOutlook public AliceBob docscode C++C# foobar Proj X srcbinsrcbin

Pathname Resolution / users cruftemacs viOutlook public AliceBob docscode C++C# foobar Proj X srcbinsrcbin /users/Alice/code/C#/bar

Machine Failures at Scale

Group Failures at Scale

System Failure at Scale

Quantitative Fault Analysis Example system –File system distributed among interacting BFT groups Simplifying assumptions –Files are partitioned evenly among BFT groups –Machine failures are independent Machine fault probability = Evaluate: operational fault rate –Probability that an operation on a randomly selected file exhibits a fault

Operational Faults vs. System Scale ,00010,000100,000 system scale (count of BFT groups) operational fault rate BFT 4, no BFIBFT 7, no BFIBFT 10, no BFI BFT 4, ideal BFIBFT 4, tree (4) BFIBFT 4, tree (16) BFI 10 – –2 10 –3 10 –4 10 –5 10 –6 10 –7 6  10 –  10 –6 3  10 –5

BFI versus no BFI

computation throughput reduction: messages  4 32  10 60% % 4-member BFT groups with BFI 10-member BFT groups without BFI

refinement BFI via Formal Specification state actions state semantic spec distributed system spec actions + faults ment + faults Improved! NEW

C++emacs tools src a.ha.cppa.exe Farsite Semantic Spec cl.exe open handlespending operations open read move / code bin a.obj

Farsite Distributed-System Spec

Farsite Refinement del C++emacs tools src a.ha.cppa.execl.exe open handlespending operations read move / code bin a.obj

Actions are State Transitions / open handles pending operations a.cpp

Proving Refinement Inductively / open handles pending operations a.cpp

Refinement with Byzantine Faults del C++emacs tools src a.ha.cppa.execl.exe open handlespending operations read move code bin a.obj /

Refinement with Byzantine Faults del C++emacs tools src a.ha.cppa.execl.exe open handlespending operations read move / code bin a.obj

emacssrc a.ha.cppa.exe bin a.obj code Hello world,,)*&#()*&{ 1[9^^x **{ o [[ …. 2 %% {^ \-~-/ ^} " ",". { _ } / } ==_.:Y:. _=={ { _/ `--^--' \_} } / \ / \ / { ( ) y \ ! | | ! /,-.i~ ~i i~ ~i,-. (!!( V )!!) ^-'-'-^-'-'-^ Safety –A tainted file may have arbitrary contents and attributes –A tainted file may appear not linked into namespace –A tainted file may pretend not to have children it actually has –A tainted file may pretend to have children that do not exist –A tainted file may pretend another tainted file is a child or parent Liveness –Operations involving a tainted file may not complete Semantic Fault Specification C++ tools cl.exe / A tainted file may have arbitrary contents and attributes A tainted file may appear not linked into namespace A tainted file may pretend not to have children it actually has A tainted file may pretend to have children that do not exist A tainted file may pretend another tainted file is a child or parent Operations involving a tainted file may not complete foobar

Maintain redundant info across BFT group boundaries Augment messages with info that justifies correctness Ensure unambiguous chains of authority over data Carefully order messages and state updates for operations involving multiple BFT groups Distributed-System Improvements Maintain redundant info across BFT group boundaries Augment messages with info that justifies correctness Ensure unambiguous chains of authority over data Carefully order messages and state updates for operations involving multiple BFT groups

Summary of BFI Methodology Formally specify your system –Semantic spec: user’s view of system –Distributed-system spec: designer’s view of system –Refinement interprets distributed-system spec in semantic terms Modify distributed-system spec to express Byzantine faults Simultaneously –Strategically weaken semantic spec to describe faults –Improve distributed-system spec to quarantine faults Refinement lets you know when you are done

Conclusions BFT groups have negative throughput scaling Scalable systems can be built from multiple BFT groups System scale increases the probability of non-maskable Byzantine faults If faults are not isolated, a single faulty group can corrupt the entire system. BFI is a methodology for isolating Byzantine faults BFI uses formal system specification Improves fault tolerance without hurting throughput, unlike increasing BFT group size

Contact Information

Backup Slides

Semantic specification –1800 lines of TLA+ –114 definitions Distributed-system specification –11,500 lines of TLA+ –775 definitions Why so big? –Windows file-system semantics are complex –Scalability and strong consistency –Byzantine fault isolation Farsite Spec Stats