Timed Quorum Systems … for large-scale and dynamic environments Vincent Gramoli, Michel Raynal.

Slides:



Advertisements
Similar presentations
ECE /24/2005 A Survey on Position-Based Routing in Mobile Ad-Hoc Networks Alok Sabherwal.
Advertisements

Scalable and Dynamic Quorum Systems Moni Naor & Udi Wieder The Weizmann Institute of Science.
Location Services for Geographic Routing. Geographic Routing Three major components of geographic routing:  Location services (dissemination of location.
Scalable Content-Addressable Network Lintao Liu
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Distribution and Revocation of Cryptographic Keys in Sensor Networks Amrinder Singh Dept. of Computer Science Virginia Tech.
P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.
Distributed Shared Memory, Related Issues, and New Challenges in Large-Scale Dynamic Systems Vincent Gramoli 1.
Commensal Cuckoo: Secure Group Partitioning for Large-Scale Services Siddhartha Sen and Mike Freedman Princeton University.
Distributed Slicing in Dynamic Systems A. Fernández, V. Gramoli, E. Jiménez, A-M. Kermarrec, M. Raynal.
Lecture 7 Data distribution Epidemic protocols. EECE 411: Design of Distributed Software Applications Epidemic algorithms: Basic Idea Idea Update operations.
Gossip algorithms : “infect forever” dynamics Low-level objectives: – One-to-all: Disseminate rumor from source node to all nodes of network – All-to-all:
Beyond Trilateration: On the Localizability of Wireless Ad Hoc Networks Reported by: 莫斌.
1 Carnegie Mellon Robust Distributed Services in Embedded Networks Michael Reiter.
Coding for Atomic Shared Memory Emulation Viveck R. Cadambe (MIT) Joint with Prof. Nancy Lynch (MIT), Prof. Muriel Médard (MIT) and Dr. Peter Musial (EMC)
TRUST Spring Conference, April 2-3, 2008 Write Markers for Probabilistic Quorum Systems Michael Merideth, Carnegie Mellon University Michael Reiter, University.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
Effective Quorum Construction for Consistency Management in Mobile Ad Hoc Networks Takahiro HARA Osaka University, Japan.
State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
1 Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes Yunfeng Lin, Ben Liang, Baochun Li INFOCOM 2007.
OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.
P2P Architecture for Self-* Atomic Memory Emmanuelle Anceaume Maria Gradinariu Vincent Gramoli Antonino Virgillito.
Fault Tolerant Storage And Quorum Systems in Dynamic Environments Uri Nadav, Master thesis Advisor: Moni Naor The Weizmann Institute of Science.
Dariusz Kowalski University of Connecticut & Warsaw University joint work with Alex Shvartsman University of Connecticut & MIT Performing Tasks in Asynchronous.
© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Reliable Communication for Highly Mobile Agents ECE 7995: Term Paper.
Distributed Computing Group A Self-Repairing Peer-to-Peer System Resilient to Dynamic Adversarial Churn Fabian Kuhn Stefan Schmid Roger Wattenhofer IPTPS.
1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.
When Parallel met Distributed Hagit Attiya CS, Technion.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
Distributed Slicing in Dynamic Systems A. Fernández, V. Gramoli, E. Jiménez, A-M. Kermarrec, M. Raynal.
Zoë Abrams, Ashish Goel, Serge Plotkin Stanford University Set K-Cover Algorithms for Energy Efficient Monitoring in Wireless Sensor Networks.
EWSN 04 – Berlin, Jan. 20, 2004 Silence is Golden with High Probability: Maintaining a Connected Backbone in Wireless Sensor Networks Paolo Santi* Janos.
Oct 1999SRDS 991 On Diffusing Updates in a Byzantine Environment Dahlia Malkhi Yishay Mansour Michael K. Reiter.
Chapter Resynchsonous Stabilizer Chapter 5.1 Resynchsonous Stabilizer Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of Jan 2004, Shlomi.
November 22, 2007 Vincent Gramoli1/61 Distributed Shared Memory for Large-Scale Dynamic Systems Vincent Gramoli supervised by Michel Raynal.
A Local Facility Location Algorithm Supervisor: Assaf Schuster Denis Krivitski Technion – Israel Institute of Technology.
SQUARE Scalable Quorum-based Atomic Memory with Local Reconfiguration Vincent Gramoli, Emmanuelle Anceaume, Antonino Virgillito.
MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Correctness of Gossip-Based Membership under Message Loss Maxim Gurevich, Idit Keidar Technion.
On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.
1 Plaxton Routing. 2 Introduction Plaxton routing is a scalable mechanism for accessing nearby copies of objects. Plaxton mesh is a data structure that.
Distributed Asynchronous Bellman-Ford Algorithm
Ahmad Al-Shishtawy 1,2,Tareq Jamal Khan 1, and Vladimir Vlassov KTH Royal Institute of Technology, Stockholm, Sweden {ahmadas, tareqjk,
On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Algorithms for Radio Networks Exercise 11 Stefan Rührup
Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May.
PODC Distributed Computation of the Mode Fabian Kuhn Thomas Locher ETH Zurich, Switzerland Stefan Schmid TU Munich, Germany TexPoint fonts used in.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
Re-Configurable Byzantine Quorum System Lei Kong S. Arun Mustaque Ahamad Doug Blough.
1 Revisiting Hierarchical Quorum Systems Nuno Preguiça, J. Legatheaux Martins Henry Canivel.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
1 Eventual Leader Election in Evolving Mobile Networks Luciana Arantes 1, Fabiola Greve 2, Véronique Simon 1, and Pierre Sens 1 1 Université de Paris 6.
Vertex Coloring Distributed Algorithms for Multi-Agent Networks
Systems Research Barbara Liskov October Replication Goal: provide reliability and availability by storing information at several nodes.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Distributed Storage Systems: Data Replication using Quorums.
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.
1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Pastry Scalable, decentralized object locations and routing for large p2p systems.
Accessing nearby copies of replicated objects
Providing Secure Storage on the Internet
Distributed Systems, Consensus and Replicated State Machines
Minimizing the Aggregate Movements for Interval Coverage
Fault-Tolerant SemiFast Implementations of Atomic Read/Write Registers
Presentation transcript:

Timed Quorum Systems … for large-scale and dynamic environments Vincent Gramoli, Michel Raynal

OPODIS’07Gramoli, Raynal Context Large-scale dynamic distributed systems

OPODIS’07Gramoli, Raynal Context Large-scale dynamic distributed systems Nodes communicate through message-passing

OPODIS’07Gramoli, Raynal Context Large-scale dynamic distributed systems Nodes communicate through message-passing Nodes join/leave the system at any time

OPODIS’07Gramoli, Raynal Goal To emulate a shared-memory in this context read write

OPODIS’07Gramoli, Raynal Goal To emulate a shared-memory in this context Providing atomic (i.e. linearizable) read/write operations read write

OPODIS’07Gramoli, Raynal Roadmap 1.Model and preliminary definitions 2.Related work 3.Timed Quorum System (TQS) 4.An efficient implementation of TQS 5.Conclusion

OPODIS’07Gramoli, Raynal Simple model System of n interconnected nodes with unique IDs Asynchronous communication with neighbors (nodes whose ID is known) Dynamism intensity (i.e. churn) c We consider a single object (local atomicity)

OPODIS’07Gramoli, Raynal Quorum System Quorums are sets (of nodes) that mutually intersect. A Quorum System (QS) is a set of quorums. Q1 Q2 Q3 Q1 ∩ Q2 ≠ Ø Q1 ∩ Q3 ≠ Ø Q2 ∩ Q3 ≠ Ø Ex. 3 quorums of size q=2

OPODIS’07Gramoli, Raynal Operations Atomic quorum-based operations for static settings: [Attiya, Bar-Noy, Dolev, JACM 1996] Each node of the quorums maintains: – A local value v of the object – A unique tag t, the version number of this value

OPODIS’07Gramoli, Raynal 1) Reading a value Q1 Q2 Q3 value? tag? v1,t1 Operations Phase 1: Consult the most up-to-date value v

OPODIS’07Gramoli, Raynal Operations 1) Reading a value Q1 Q2 Q3 v1,t1 Phase 1: Consult the most up-to-date value v Phase 2: Propagate the consulted value

OPODIS’07Gramoli, Raynal Operations 1) Reading a value Q1 Q2 Q3 v1,t1 Phase 1: Consult the most up-to-date value v Phase 2: Propagate the consulted value Theorem of Attiya and Welch 1998: « Read must write » to prevent new/old inversions for unbounded # of readers.

OPODIS’07Gramoli, Raynal Operations 1) Reading a value Q1 Q2 Q3 Output: v1 Phase 1: Consult the most up-to-date value v Phase 2: Propagate the consulted value

OPODIS’07Gramoli, Raynal Operations 2) Writing a value v2 Q1 Q2 Q3 Input: v2

OPODIS’07Gramoli, Raynal Operations 2) Writing a value v2 Q1 Q2 Q3 max tag? t1 Phase 1: Consult the value version and choose a new one strictly larger

OPODIS’07Gramoli, Raynal Operations 2) Writing a value v2 Q1 Q2 Q3 v2,t2 (with t2 > t1) Phase 1: Consult the value version and choose a new one strictly larger Phase 2: Propagate the new value associated with the new version

OPODIS’07Gramoli, Raynal Dynamic Solutions Reconfigurable storage: a failing QS is replaced by a new one. –RAMBO: Shvartsman, Lynch, DISC’02 –RDS: Chockler et al. OPODIS’05 Structured dynamic quorums: failed servers are replaced by new ones. –AM05: Abraham, Malkhi, Dist. Comp –NN05: Nadav, Naor, DISC’05 –SQUARE: Gramoli, Anceaume, Virgillito, SAC’07

OPODIS’07Gramoli, Raynal Dynamic Solutions Reconfigurable storage: a failing QS is replaced by new one. –RAMBO: Shvartsman, Lynch, DISC’02 –RDS: Chockler et al. OPODIS’05 Structured dynamic quorums: failed servers are replaced by new ones. –AM05: Abraham, Malkhi, Dist. Comp –NN05: Nadav, Naor, DISC’05 –SQUARE: Gramoli, Anceaume, Virgillito, SAC’07 All solutions require bounded churn during any finite period

OPODIS’07Gramoli, Raynal Dynamic Solutions Reconfiguration complexity vs. operation latency tradeoff RAMBO RDS reconfiguration complexity operation latency SQUARE Prevents scalability! AM05 NN05

OPODIS’07Gramoli, Raynal Timed Quorum System Dynamic quorum systems should be: –Probabilistic: # of failures not necessarily bounded –Timed: no property can hold forever

OPODIS’07Gramoli, Raynal Timed Quorum System Timed access strategy ω: A mapping from any time t to a probability distribution on the possible quorums. Δ-Timed Quorum System (TQS): For any Q 1 and Q 2 accessed resp. with ω(t 1 ) and ω(t 2 ), if |t 2 – t 1 | ≤ Δ, then Q 1  Q 2 ≠ Ø with high probability.

OPODIS’07Gramoli, Raynal Timed Quorum System Δ-Timed Quorum System (TQS): For any Q(t 1 ) and Q(t 2 ) accessed resp. with ω(t 1 ) and ω(t 2 ): if |t 2 – t 1 | ≤ Δ, then Q(t 1 )  Q(t 2 ) ≠ Ø with high probability. Time Q(t1) Q(t2) Q(t3) Q(t4) Q(t5) Q(t1)  Q(t2)Q(t2)  Q(t3)Q(t3)  Q(t4)Q(t3)  Q(t5) Δ Example of a TQS: {Q(t1),Q(t2),Q(t3),Q(t4),Q(t5)}

OPODIS’07Gramoli, Raynal Consistency Probabilistic Atomicity: –In the real-time sequence of operations: Any operation verifies atomicity w.r.t. all preceding successful operations, and it is said successful Or this operation is said unsuccessful –Any operation is successful with high probability

OPODIS’07Gramoli, Raynal Theorem 1: If at least one quorum is accessed every Δ period of time, then Δ-TQS implements probabilistic atomicity. Consistency Probabilistic Atomicity: –In the real-time sequence of operations: Any operation verifies atomicity w.r.t. all preceding successful operations, and it is said successful Or this operation is said unsuccessful –Any operation is successful with high probability

OPODIS’07Gramoli, Raynal Some observations Replication is necessary for data persistence In large-scale systems, operations are frequent Theorem « read must write » of Attiya and Welch indicates that some information must be replicated in any operation

OPODIS’07Gramoli, Raynal Efficient TQS Implementation Underlying gossip-based shuffle of neighborhood: –Each node has constantly a new random set of neighbors Classical quorum-based operations: –Consulting v and t at some quorum –Choosing v’ and t’ to propagate –Propagating v’ and t’ to some quorum

OPODIS’07Gramoli, Raynal Efficient TQS Implementation Disseminate until q = O(  n) nodes are contacted 1 k k 1 k1 Client 1 k l

OPODIS’07Gramoli, Raynal Efficient TQS Implementation Assumptions: –neighbors are chosen uniformly at random –at least one operation succeeds every Δ time –c = rate of arrival = rate of departure  [0,1) Results: –This algorithm implements a TQS

OPODIS’07Gramoli, Raynal Efficient TQS Implementation Assumptions: –neighbors are chosen uniformly at random –at least one operation succeeds every Δ time –c = rate of arrival = rate of departure  [0,1) Results: –This algorithm implements a TQS –Replication is piggybacked into operations

OPODIS’07Gramoli, Raynal Efficient TQS Implementation Assumptions: –neighbors are chosen uniformly at random –at least one operation succeeds every Δ time –c = rate of arrival = rate of departure  [0,1) Results: –This algorithm implements a TQS –Replication is piggybacked into operations –The quorum size is O(  nD ) where D = (1-c) -Δ

OPODIS’07Gramoli, Raynal Efficient TQS Implementation Assumptions: –neighbors are chosen uniformly at random –at least one operation succeeds every Δ time –c = rate of arrival = rate of departure  [0,1) Results: –This algorithm implements a TQS –Replication is piggybacked into operations –The quorum size is O(  nD ) where D = (1-c) -Δ –The operation latency is O(log k  nD ) message delays, where D = (1-c) -Δ

OPODIS’07Gramoli, Raynal Efficient TQS Implementation Assumptions: –neighbors are chosen uniformly at random –at least one operation succeeds every Δ time –c = rate of arrival = rate of departure  [0,1) Results: –This algorithm implements a TQS –Replication is piggybacked into operations –The quorum size is O(  nD ) where D = (1-c) -Δ –The operation latency is O(log k  nD ) message delays, where D = (1-c) -Δ –Smallest quorum size O(  n) for static systems when D=O(1) cf. [Malkhi, Reiter, Wool, Wright, Inf. and Comp. Journal 2001]

OPODIS’07Gramoli, Raynal Conclusion We defined Timed Quorum System that: Is inherently dynamic: –NO underlying structure –Timely intersection requirement Ensures Probabilistic Atomicity Scales well: –O(  nD) messages by operation –O(log k  nD) time by operation Is optimal: –When D=O(1), translates into best known static result: O(  n)

OPODIS’07Gramoli, Raynal Open Issue TQS in Mobile Sensor Networks: –Consultation phase: Gather motes to consult t and v Scatter motes to make t and v likely visible –Propagation phase: Gather motes to propagate t’ and v’ Scatter motes to make t’ and v’ likely visible