OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

NETWORK ALGORITHMS Presenter- Kurchi Subhra Hazra.
Teaser - Introduction to Distributed Computing
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:
CS 5204 – Operating Systems1 Paxos Student Presentation by Jeremy Trimble.
Paxos Lamport the archeologist and the “Part-time Parliament” of Paxos: – The Part-time Parliament, TOCS 1998 – Paxos Made Simple, ACM SIGACT News 2001.
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Distributed Systems Overview Ali Ghodsi
Distributed Shared Memory, Related Issues, and New Challenges in Large-Scale Dynamic Systems Vincent Gramoli 1.
The SMART Way to Migrate Replicated Stateful Services Jacob R. Lorch, Atul Adya, Bill Bolosky, Ronnie Chaiken, John Douceur, Jon Howell Microsoft Research.
DISTRIBUTED SYSTEMS II REPLICATION CNT. II Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Coding for Atomic Shared Memory Emulation Viveck R. Cadambe (MIT) Joint with Prof. Nancy Lynch (MIT), Prof. Muriel Médard (MIT) and Dr. Peter Musial (EMC)
Data-Centric Reconfiguration with Network-Attached Disks Alex Shraer (Technion) Joint work with: J.P. Martin, D. Malkhi, M. K. Aguilera (MSR) I. Keidar.
1 Principles of Reliable Distributed Systems Lectures 11: Authenticated Byzantine Consensus Spring 2005 Dr. Idit Keidar.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CS 582 / CMPE 481 Distributed Systems
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.
Edward Bortnikov – Topics in Reliable Distributed Computing Slides partially borrowed from Nancy Lynch (DISC ’02) Seth Gilbert (DSN ’03) and Idit.
Timed Quorum Systems … for large-scale and dynamic environments Vincent Gramoli, Michel Raynal.
November 22, 2007 Vincent Gramoli1/61 Distributed Shared Memory for Large-Scale Dynamic Systems Vincent Gramoli supervised by Michel Raynal.
SQUARE Scalable Quorum-based Atomic Memory with Local Reconfiguration Vincent Gramoli, Emmanuelle Anceaume, Antonino Virgillito.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 19: Paxos All slides © IG.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
Ahmad Al-Shishtawy 1,2,Tareq Jamal Khan 1, and Vladimir Vlassov KTH Royal Institute of Technology, Stockholm, Sweden {ahmadas, tareqjk,
Bringing Paxos Consensus in Multi-agent Systems Andrei Mocanu Costin Bădică University of Craiova.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.
Services and Algorithms for Sensor Networks: a Theoretical Perspective Nancy Lynch, MIT NEST PI Meeting July, 2003.
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch Set 11: Asynchronous Consensus 1.
Toward Fault-tolerant P2P Systems: Constructing a Stable Virtual Peer from Multiple Unstable Peers Kota Abe, Tatsuya Ueda (Presenter), Masanori Shikano,
Fault Tolerant Services
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
SysRép / 2.5A. SchiperEté The consensus problem.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
Systems Research Barbara Liskov October Replication Goal: provide reliability and availability by storing information at several nodes.
1 Communication and Data Management in Dynamic Distributed Systems Nancy Lynch MIT June 20, 2002 …
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.
“Distributed Algorithms” by Nancy A. Lynch SHARED MEMORY vs NETWORKS Presented By: Sumit Sukhramani Kent State University.
Relying on Safe Distance to Achieve Strong Partitionable Group Membership in Ad Hoc Networks Authors: Q. Huang, C. Julien, G. Roman Presented By: Jeff.
Communication and Data Sharing for Dynamic Distributed Systems Nancy Lynch MIT Alex Shvartsman UConn.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Systems – Paxos
View Change Protocols and Reconfiguration
Distributed Systems: Paxos
EECS 498 Introduction to Distributed Systems Fall 2017
Distributed Systems, Consensus and Replicated State Machines
Presented By: Md Amjad Hossain
Fault-tolerance techniques RSM, Paxos
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Fault-Tolerant State Machine Replication
EECS 498 Introduction to Distributed Systems Fall 2017
EEC 688/788 Secure and Dependable Computing
Replicated state machine and Paxos
Slides for Chapter 15: Replication
Slides for Chapter 18: Replication
The SMART Way to Migrate Replicated Stateful Services
EEC 688/788 Secure and Dependable Computing
RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks Nancy Lynch, MIT Alex Shvartsman, U. Conn.
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Fault-Tolerant SemiFast Implementations of Atomic Read/Write Registers
Presentation transcript:

OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman

OPODIS 05 Goals Reconfigurable Distributed Storage (RDS) Atomic consistency (read/write) Fault Tolerance …in Dynamic and Asynchronous Systems.

OPODIS 05 Distributed Storage

OPODIS 05 Distributed Storage Data is replicated at several network locations

OPODIS 05 Distributed Storage Write Read Operation policy

OPODIS 05 …in Dynamic Networks

OPODIS 05 Distributed Storage in Dynamic Networks

OPODIS 05 Distributed Storage in Dynamic Networks leaving nodes joining nodes

OPODIS 05 Distributed Storage in Dynamic Networks

OPODIS 05 Distributed Storage in Dynamic Networks …requires a reconfiguration process.

OPODIS 05 Distributed Storage in Dynamic Networks …by achieving agreement.

OPODIS 05 Model Distributed –Connected set of processors –Each processor has a unique id i  I –MWMR, any processor is a potential client Asynchronous –Asynchronous processors –Point-to-point asynchronous unreliable channels Dynamic –Processors join and leave the system –Processors may crash

OPODIS 05 What is a configuration? Configuration –members is a set of processors, –read-quorums, write-quorums two sets of quorums – RQ  read-quorums,  WQ  write-quorums RQ  members WQ  members RQ  WQ   (only for a given configuration) Every client maintains a set of configurations, initially containing the default one.

OPODIS 05 Single Object Operations Overview After [ABD95] tag =  N  I, val a possible value val = Read() i (,val)=query();[prop(,val);] Write(val) i (,val’)=query();prop(,val); 1.(tag,val) query(NULL) : gathers ( tag,val ) pairs of all processors of a RQ and returns the one with the largest tag. 2.NULL prop(tag,val) : updates ( tag,val ) pairs at all processors of a WQ. Write tag Read tag

OPODIS 05 Reconfiguration Design Goals Sound –Totally ordered configurations Flexible –No dependences between configurations Non-intrusive –Makes possible concurrent read/write operations Fast –Strengthening fault tolerance

OPODIS 05 Decoupling Reconfiguration Reconfiguration = Replacing Configurations –{I} Installing a new configuration –{R} Removing old configuration(s) If {R} ≺ {I}  Operations are delayed If {I} ≺ {R}  Stronger configuration viability assumption is required

OPODIS 05 Solution ({R} ≺ {I})  ({I} ≺ {R})  {I} // {R} Tighter coupling between removal and installation

OPODIS 05 RDS Reconfiguration Reconfiguration is based on Paxos (3 phases leader-based consensus alorithm) l is the leader c is the current configuration configs is the set of active configurations A ballot has a unique identifier b and a value v, which is a configuration Paxos phases: –Prepare: l creates a new ballot and chooses/gets the value to propose. –Propose: l proposes and gathers votes from a majority. –Propagate: l propagates decision

OPODIS 05 RDS Reconfiguration l RQWQ Recon(c,c’)

OPODIS 05 RDS Reconfiguration l RQWQ Prepare phase Recon(c,c’) Creates a new larger ballot b

OPODIS 05 RDS Reconfiguration l RQWQ Prepare phase Recon(c,c’)

OPODIS 05 RDS Reconfiguration l RQWQ > Updates its ballot’s value v with the one received Updates its configs set Prepare phase Recon(c,c’)

OPODIS 05 RDS Reconfiguration l RQWQ > Propose phase Recon(c,c’)

OPODIS 05 RDS Reconfiguration l RQWQ > Recon(c,c’) Propose phase Updates their tag and val Adds v to their configs set

OPODIS 05 RDS Reconfiguration l RQWQ > Recon(c,c’) Propagation phase Update their tag and val Remove configuration c from their configs set

OPODIS 05 Proving Atomicity Ordering configurations Ordering operations Theorem 1: The set of installed configurations in the system is totally ordered. Theorem 2: If operation  1 precedes operation  2 then  1 ’s tag is not larger than  2 ’s tag.

OPODIS 05 Additional Assumptions Eventual stabilization with –Unique leader l –Message delay bound d (unkown to the algorithm) –Gossip with frequency d –Restricted reconfiguration rate –Some quorums remain alive in active configurations tsts t s : System stabilization time Let’s t r be the Request time 2d t l : Algorithm stabilization time tltl

OPODIS 05 Reconfiguration Latency Worst case scenario: Last reconfiguration was done by a different leader. Prepare max(t l, t r ) ProposePropagate 2d d tete t e : end time Reconfiguration is complete 5d

OPODIS 05 Reconfiguration Latency Other cases: The leader made the previous reconfiguration. max(t l, t r ) ProposePropagate 2dd tete t e : end time Reconfiguration is complete 3d

OPODIS 05 Operation Latency Phase latency: 2d is sufficient for the phase round trip. In some cases (pending reconfiguration), the phase might be delayed twice. 1st round trip Operation latency: Operations are bounded by 8d. In some cases, the propagation phase of the read operation can be ignored, leading to a possible bound of 2d. 2nd round trip 2d New configuration discovered

OPODIS 05 Experimental Results IOA to Java code following set of rules. Implementation of Attiya, Bar-Noy, and Dolev algorithm « ABD » (w/o Reconfiguration) and RDS which shares parts of the ABD code. Using majority-based configurations. Measuring operation latency 1.While varying configuration size 2.While varying algorithm instances

OPODIS 05 Experimental Results Operation latency of RDS is competitive with ABD, confirming the theory. Reconfiguration messages contain operation information which might accelerate operations in RDS.

OPODIS 05 Conclusion RDS, Reconfigurable Distributed Storage. With sound, flexible, non-intrusive and fast reconfiguration. It solves two problems in one: Configuration replacement and Consensus. Reconfiguration is inexpensive (time). Fault tolerance is strenghtened. RAMBO can become more agressive: it is exactly what we did here!