Josef Widder1 Why, Where and How to Use the  - Model Josef Widder Embedded Computing Systems Group INRIA Rocquencourt, March 10,

Slides:

Advertisements

Similar presentations

Impossibility of Distributed Consensus with One Faulty Process

Advertisements

Two absolute bounds for distributed bit complexity Yefim Dinitz, Noam Solomon, 2007 Presented by: Or Peri & Maya Shuster.

CS 542: Topics in Distributed Systems Diganta Goswami.

Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:

6.852: Distributed Algorithms Spring, 2008 Class 7.

Distributed Systems Overview Ali Ghodsi

Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©

Clock Synchronization. Problem 5:33 5:57 5:20 4:53 6:01.

Failure detector The story goes back to the FLP’85 impossibility result about consensus in presence of crash failures. If crash can be detected, then consensus.

Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.

How to Choose a Timing Model? Idit Keidar and Alexander Shraer Technion – Israel Institute of Technology.

Gossip Algorithms and Implementing a Cluster/Grid Information service MsSys Course Amar Lior and Barak Amnon.

Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©

Byzantine Generals Problem: Solution using signed messages.

Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:

CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.

Dynamic Tuning of the IEEE Protocol to Achieve a Theoretical Throughput Limit Frederico Calì, Marco Conti, and Enrico Gregori IEEE/ACM TRANSACTIONS.

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.

Timeliness, Failure Detectors, and Consensus Performance Idit Keidar and Alexander Shraer Technion – Israel Institute of Technology.

Clock Synchronization Ken Birman. Why do clock synchronization?  Time-based computations on multiple machines Applications that measure elapsed time.

Impossibility of Distributed Consensus with One Faulty Process Michael J. Fischer Nancy A. Lynch Michael S. Paterson Presented by: Oren D. Rubin.

Josef WidderBooting Clock Synchronization1 The  - Model, and how to Boot Clock Synchronization in it Josef Widder Embedded Computing Systems Group

1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.

Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.

On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.

Composition Model and its code. bound:=bound+1.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Failure Detectors.

On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.

1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:

Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.

On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.

Consensus and Its Impossibility in Asynchronous Systems.

1 © R. Guerraoui Regular register algorithms R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch.

CS603 Clock Synchronization February 4, What is the best we can do? Lundelius and Lynch ‘84 Assumptions: –No failures –No drift –Fully connected.

Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.

Lecture 11-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 28, 2010 Lecture 11 Leader Election.

CS294, Yelick Consensus revisited, p1 CS Consensus Revisited

CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.

Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.

Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.

Physical clock synchronization Question 1. Why is physical clock synchronization important? Question 2. With the price of atomic clocks or GPS coming down,

SysRép / 2.5A. SchiperEté The consensus problem.

1 © R. Guerraoui Distributed algorithms Prof R. Guerraoui Assistant Marko Vukolic Exam: Written, Feb 5th Reference: Book - Springer.

1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.

Replication predicates for dependent-failure algorithms Flavio Junqueira and Keith Marzullo University of California, San Diego Euro-Par Conference, Lisbon,

Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb

DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.

Lecture 7- 1 CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 7 Distributed Mutual Exclusion Section 12.2 Klara Nahrstedt.

CIS 825 Review session. P1: Assume that processes are arranged in a ring topology. Consider the following modification of the Lamport’s mutual exclusion.

Classifying fault-tolerance Masking tolerance. Application runs as it is. The failure does not have a visible impact. All properties (both liveness & safety)

Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.

Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.

The consensus problem in distributed systems

Lecture 17: Leader Election

Alternating Bit Protocol

Distributed Consensus

Distributed Systems, Consensus and Replicated State Machines

Maya Haridasan April 15th

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

The Θ-Model Ulrich Schmid Josef Widder Martin Hutle Daniel Albeseder

Physical clock synchronization

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Presentation transcript:

Josef Widder1 Why, Where and How to Use the  - Model Josef Widder Embedded Computing Systems Group INRIA Rocquencourt, March 10, 2004

Josef Widder2 Work in Progress  Motivation  Ideas  Our Approach  First Results in certain types of networks

Josef Widder3 Overview  Why: Because classic results are straight forward but have drawbacks  Where: A glance at synchrony in real networks  How: Transfer of algorithm to real systems

Josef Widder4 Consensus  Dwork, Lynch, Stockmeyer 88  Chandra, Toueg 96  There exist algorithmic solutions if  holds   is the upper bound on end-to-end message delays  What remains: Show that your system ensures 

Josef Widder5 Diverse assumptions on    is known/unknown   hold always/from some time one/sufficiently long   holds for all/some (FD) msgs … [DLS88] / [CT96]   holds eventually somewhere … [ADFT03] These are weak assumptions, still  is in there

Josef Widder6 By the way... upper bounds look like this

Josef Widder7 Upper bounds do not look like this  Let’s assume  = 8s and test it for a week  Approaches like [MF02]  delay of a protocol is 5   delay should be at most 5s  let’s define  = 1s

Josef Widder8 Can upper bounds be derived properly ?  Guarantees are (NP) hard to derive (scheduling, queuing)  problem must be simplified  simplification leads to incomplete guarantees

Josef Widder9 What do I have to analyze to ensure   local delays sender (processor load, task preemption, blocking factors)  outbound queues  net contention  inbound queues  local delays receiver (processor load, task preemption, blocking factors) This is hard, yet only delivers  at some probability.

Josef Widder10 Assumption Coverage The probability that our assumptions hold during operation Our Starting Point: We can improve coverage by means of system models

Josef Widder11 The Model  (t)... Upper envelope of message delays at time t  (t)... Lower envelope of message delays at time t Since  (t) is unbounded, local HW timers cannot timeout messages  time(r) free algorithms

Josef Widder12 Described Behavior (rough sketch) t end-to-end delays  

Josef Widder13 Coverage of   wc sender delays wc outbound queues wc net contention wc inbound queues wc receiver delays bc (no other tasks, no blocking…) queue empty empty channel queue empty bc (no other tasks, no blocking…) C  = 1C  < 1

Josef Widder14 Coverage of the  - Model How large is states(    ) ? And why is this interesting anyway ?

Josef Widder15 Consensus in Real Networks From FLP follows: Any solution to Consensus on a real network is a probabilistic solution pure asynchrony probabilistic solutions some synchrony correct solutions C model = 1 p solution < 1 C model < 1 p solution = 1 … not even talking about coverage of fault models

Josef Widder16 How large is coverage improvement ?  Coverage cannot be worse than in  assumption  if relation of  and  exists, improvement is large.  But even in networks without relation of  and  (if such exist?)  If by chance there exists just one case where  holds while  does not, coverage is improved

Josef Widder17 termination times often look like hence: How large is  ?  Step 1  timing uncertainty of networks  Step 2  establish , , and  on given networks, for a given system model for given algorithms Performance

Josef Widder18 Benchmark for Timing Uncertainty in clock synchronization the best precision one can reach is  =  -  [LL84] …  (1-1/n) comparison of two approaches in Ethernet  clock sync  their results  conclude where to use our model

Josef Widder19 Clock Sync in Ethernets  NTP [Mills]  Accuracy of ~1ms  SynUTC [ ]  Accuracy of ~100ns Why is there a difference of 4 orders of magnitude?

Josef Widder20 Wherefrom comes the difference ?  NTP runs at application level  SynUTC runs low level  current clock value is directly copied onto the bus  upon message receipt, receiver’s clock value is written in the received message as well  interval based clock sync algorithms [SS03]

Josef Widder21 Conclusions from this comparison  low level clock sync  high level applications use tightly synchronized clocks  But how does this help us in solving Consensus?  Fast Failure Detector Approach [HLL02] ([CT96]: just FD messages must satisfy timing assumptions)

Josef Widder22 Fast Failure Detectors  low level failure detection  high priority FD messages … n = 16…1024

Josef Widder23 Performance (after Step 1)  Timing uncertainty differs in same network depending on the layer the algorithm runs in   should be reasonable good in lower levels  Step 2: establish , , and  on given networks, for a given system model running given algorithms

Josef Widder24 Algorithms in Networks end-to-end delays ,  1.Leader Election 2.Token Circulation (1x) 3.1.  2.

Josef Widder25 Theoretical Analysis  Leader Election  bc(leader) = ... wc(leader) =   one Token Circulation  bc(token) = 3 ... wc(token) = 3   Leader  Token  bc(comb) = 4 ... wc(comb) = 4 

Josef Widder26 Establish Time Bounds  end-to-end delays  from decision to send a message until receiver makes its decision   = t s + trans + t r   = 2t s + trans + 2t r …message arrival laws rate of transmission to one p

Josef Widder27 Termination Times Leader Election Token Circulation

Josef Widder28 Termination Times (2) Leader Election  Token Round... by adding... by examination

Josef Widder29 Conclusions of Step 2a  during operation ,  do not only depend on the system  algorithms must be accounted as well  how many messages are sent  network load  this was a toy example BUT

Josef Widder30 Deterministic Ethernet … CSMA/DCR  bus  only one message on the medium at a time  deterministic collision resolution  upper bound on physical message transmission (i.e. trans not the end-to-end delay)  if a station wants to send a message at t 1 and sends it at t 2 (collision) then every station can at most send one message between t 1 and t 2

Josef Widder31 Hot: First Results in Deterministic Ethernet  =  -  … is only relevant for one broadcast in fact the time difference for receiving n - f msgs

Josef Widder32 First Results (2)... how many messages transferred during any message is in transit in deterministic Ethernet: but we require f+1 msgs:

Josef Widder33 First Results in Deterministic Ethernet (3)  n = 1024 and f = 511 … crash faults, hence n > 2f  derive properties which are equivalent to   2 in the system model  results apply in TDMA networks as well (due to inefficiency of the bus arbitration  might be even smaller)

Josef Widder34 Conclusions   - Model reaches higher assumption coverage  small timing uncertainty in lower network levels  , , , and  are related to  real network  algorithm   remains within reasonable bounds

Josef Widder35  anks !