Marcos K. Aguilera Microsoft Research Silicon Valley No Time for Asynchrony Michael Walfish UCL, Stanford, UT Austin.

Slides:

Advertisements

Similar presentations

Remus: High Availability via Asynchronous Virtual Machine Replication

Advertisements

Layering and the network layer CS168, Fall 2014 Sylvia Ratnasamy

Distributed Systems Overview Ali Ghodsi

1 The Case for Byzantine Fault Detection. 2 Challenge: Byzantine faults Distributed systems are subject to a variety of failures and attacks Hacker break-in.

Accountability in Hosted Virtual Networks Eric Keller, Ruby B. Lee, Jennifer Rexford Princeton University VISA 2009.

P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 PRACTICAL DISTRIBUTED COMMIT IN MODERN ENVIRONMENTS by Jyrki Nummenmaa.

Replication and Consistency (2). Reference r Replication in the Harp File System, Barbara Liskov, Sanjay Ghemawat, Robert Gruber, Paul Johnson, Liuba.

Virtual Synchrony Ki Suh Lee Some slides are borrowed from Ken, Jared (cs ) and Justin (cs )

The Need for Language Support for Fault-Tolerant Distributed Systems Cezara Dr ă goi, INRIA ENS CNRS Thomas A. Henzinger, IST Austria Damien Zufferey,

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.

Timeliness, Failure Detectors, and Consensus Performance Idit Keidar and Alexander Shraer Technion – Israel Institute of Technology.

© 2006 Andreas Haeberlen, MPI-SWS 1 The Case for Byzantine Fault Detection Andreas Haeberlen MPI-SWS / Rice University Petr Kouznetsov MPI-SWS Peter Druschel.

Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.

Josef Widder1 Why, Where and How to Use the  - Model Josef Widder Embedded Computing Systems Group INRIA Rocquencourt, March 10,

1 Failure Detectors: A Perspective Sam Toueg LIX, Ecole Polytechnique Cornell University.

Learning from the Past for Resolving Dilemmas of Asynchrony Paul Ezhilchelvan and Santosh Shrivastava Newcastle University England, UK.

CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 19: Paxos All slides © IG.

The Structuring of Systems Using Upcalls David D. Clark 4/26/20111Frank Sliz, CS533, Upcalls.

Composition Model and its code. bound:=bound+1.

PETAL: DISTRIBUTED VIRTUAL DISKS E. K. Lee C. A. Thekkath DEC SRC.

Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.

Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.

Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:

Protocol Layering Chapter 10. Looked at: Architectural foundations of internetworking Architectural foundations of internetworking Forwarding of datagrams.

An Efficient Topology-Adaptive Membership Protocol for Large- Scale Cluster-Based Services Jingyu Zhou * §, Lingkun Chu*, Tao Yang* § * Ask Jeeves §University.

LiNK: An Operating System Architecture for Network Processors Steve Muir, Jonathan Smith Princeton University, University of Pennsylvania

Copyright © Clifford Neuman and Dongho Kim - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Advanced Operating Systems Lecture.

Bringing Paxos Consensus in Multi-agent Systems Andrei Mocanu Costin Bădică University of Craiova.

CS533 Concepts of Operating Systems Jonathan Walpole.

DSN 2002 June page 1 BBN, UIUC, Boeing, and UM Intrusion Tolerance by Unpredictable Adaptation (ITUA) Franklin Webber BBN Technologies ParthaPal.

An Introduction to Consensus with Raft

BFTW 3 workshop (Sep 22, 2009)© 2009 Andreas Haeberlen 1 The Fault Detection Problem Andreas Haeberlen MPI-SWS Petr Kuznetsov TU Berlin / Deutsche Telekom.

Paxos: Agreement for Replicated State Machines Brad Karp UCL Computer Science CS GZ03 / M st, 23 rd October, 2008.

A. Haeberlen Fault Tolerance and the Five-Second Rule 1 HotOS XV (May 18, 2015) Ang Chen Hanjun Xiao Andreas Haeberlen Linh Thi Xuan Phan Department of.

CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.

Storage Systems CSE 598d, Spring 2007 Rethink the Sync April 3, 2007 Mark Johnson.

Protocol Layering Chapter 11.

END-TO-END Arguments in System Design END-TO-END Arguments in System Design J. SaltzerD. Reed D. Clark M.I.T. Laboratory, 1981 Presented By Mohammad Malli.

CS426: Building Decentralized Systems Mahesh Balakrishnan.

Multi-phase Commit Protocols1 Based on slides by Ken Birman, Cornell University.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.

Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.

Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.

CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.

Operating System Reliability Andy Wang COP 5611 Advanced Operating Systems.

ETHANE: TAKING CONTROL OF THE ENTERPRISE

Distributed Systems – Paxos

Presented by Muhammad Abu Saqer

Alternative system models

Lecture 17: Leader Election

Introduction to Networking

Operating System Reliability

Operating System Reliability

Implementing Consistency -- Paxos

Distributed Systems, Consensus and Replicated State Machines

If You Can’t Beat Them, Augment Them

FLP Impossibility & Weakest Failure Detector

Operating System Reliability

Operating System Reliability

PERSPECTIVES ON THE CAP THEOREM

CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy)

Operating System Reliability

Structuring of Systems using Upcalls

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Implementing Consistency -- Paxos

The Synchronous Data Center

Operating System Reliability

Operating System Reliability

Presentation transcript:

Marcos K. Aguilera Microsoft Research Silicon Valley No Time for Asynchrony Michael Walfish UCL, Stanford, UT Austin

Problem: Nodes in Distributed Systems Fail Pragmatic response: end-to-end timeouts  Getting them right: hard. primarybackup apps OS VM protocols drivers Getting them wrong: bad.

Problem: Nodes in Distributed Systems Fail Pragmatic response: end-to-end timeouts  Getting them right: hard. primarybackup apps OS VM protocols drivers Getting them wrong: bad.

Problem: Nodes in Distributed Systems Fail Pragmatic response: end-to-end timeouts  Getting them right: hard. primarybackup apps OS VM protocols drivers Getting them wrong: bad.

Problem: Nodes in Distributed Systems Fail Pragmatic response: end-to-end timeouts  Getting them right: hard. Getting them wrong: bad. Current view/lore/wisdom: design for asynchrony  Very general  guarantee of safety primarybackup ? Paxos apps OS VM protocols drivers

Different Points of View 1. Keep it simple: 2. Keep it safe: 3. Our view:  We want simplicity, safety, and high availability  Our mantra: “no end-to-end timeouts” “rely on time and timeouts” “design for asynchrony” there is good in both

A Proposal That Meets Our Goals app OS VM/hypervisor network driver network card Spies indicate crashed-or-not authoritatively Why do we want device drivers killing OSes? backupprimary spy failure detector

Scope Enterprises, data centers Not Byzantine failures This Talk Will Argue: 1. Asynchrony is problematic  (And often disregarded in practice) 2. Spy-based failure detection meets our goals

Asynchrony Detracts From Safety 1. “Safety under asynchrony” downplays liveness  But highest layers in a system have deadlines  Lower layer loses liveness  at deadline, higher layer may be bereft  lose “whole system” safety

Asynchrony Detracts from Safety (Cont’d.) 2. Under asynchrony, components hide useful info.  Unresponsiveness  higher layers guess  Wrong guesses  loss of safety 3.  complex designs (example: Paxos)  Complexity  mistakes  safety violations asynchronous component ?

Empirical Observations Against Asynchrony Paxos-using systems rely on synchrony for safety World fundamentally synchronous  Electrons, CPUs, human beings, organizations  “Safety under asynchrony” hard to meet  Generality of asynchrony maybe not needed in reality Leases, … Paxos Chubby [Burrows OSDI06], Petal [Lee ASPLOS96], WheelFS [Stribling et al. NSDI09], …

Recap Argument Against Asynchrony Appeal of asynchrony: generality  safety Argument against asynchrony: Async. components can lead to unsafe systems Hard to meet “safety under asynchrony” Asynchrony doesn’t represent reality People forced to depart from asynchrony anyway

Our Argument, Continued 1. Asynchrony is problematic  (And often disregarded in practice) 2. Spy-based failure detection meets our goals

A Powerful Abstraction: Perfect Failure Detectors “ ” “up” ? Want a model where:  CRASHED?( ) [Chandra & Toueg, JACM 96] processes Perfect failure detector (PFD) A perfect failure detector is an oracle Asynchronous model:

PFDs  Safe, Simple Distributed Algorithms Replication by primary-backup instead of Paxos Other examples in the paper (not our contribution) PFD backupprimary “ ”

How to Build a Perfect Failure Detector? FD ? Failure detection (not PFD) uses status messages Hard to make this FD a PFD  Variable timing, system a black box

Current proposals coarse [Fetzer IEEE Trans. 2003, Ricciardi & Birman PODC91]  Focus on E2E behavior  Use E2E timeouts  Kill/exclude any suspect Realizing Perfect Failure Detectors PFD Recall our third goal: high availability. Approach is “surgical”:  Operate inside layers  Use only local timing  Kill as a last resort app OS VM/hypervisor network driver network card ? ?

Spies Orchestrated to Form Surgical PFD network switch app OS VM/hypervisor network driver network card Example: spy in VM tracks OS state Lower-level spies also monitor higher-level ones  Allows localization of smallest failed component PFD

Limitations and Discussion 1. Under network partition, PFD module blocks 2. To realize spies, must modify system infrastructure We think this is okay in data centers  Partitions often cause block anyway  One administrative domain Harder to address in wide area  Requires spies in Internet switches and routers  Network to host feedback not totally implausible

Summary and Conclusion End-to-end timing assumptions problematic. So:  Avoid timing with inside info., assassination  Avoid end-to-end by infiltrating many layers The gain: simple, safe, and live distributed systems But: PFDs, spies not a good fit for all environments Next step: get it implemented and deployed This is a call to arms