Hwajung Lee.  Models are simple abstractions that help understand the variability -- abstractions that preserve the essential features, but hide the.

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

Communication Networks Recitation 3 Bridges & Spanning trees.
PROTOCOL VERIFICATION & PROTOCOL VALIDATION. Protocol Verification Communication Protocols should be checked for correctness, robustness and performance,
Leader Election Let G = (V,E) define the network topology. Each process i has a variable L(i) that defines the leader.  i,j  V  i,j are non-faulty.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Self Stabilization 1.
Distributed Systems and Algorithms Sukumar Ghosh University of Iowa Spring 2014.
Byzantine Generals Problem: Solution using signed messages.
1 Complexity of Network Synchronization Raeda Naamnieh.
Ordering and Consistent Cuts Presented By Biswanath Panda.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
CPSC 668Set 3: Leader Election in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Reliable Communication for Highly Mobile Agents ECE 7995: Term Paper.
CPSC 668Self Stabilization1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 2 – Distributed Systems.
Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.
1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.
Composition Model and its code. bound:=bound+1.
Gursharan Singh Tatla Transport Layer 16-May
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Introduction Distributed Algorithms for Multi-Agent Networks Instructor: K. Sinan YILDIRIM.
Why do we need models? There are many dimensions of variability in distributed systems. Examples: interprocess communication mechanisms, failure classes,
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
CSE 812. Outline What is a distributed system/program? Program Models Program transformation.
Chapter 14 Asynchronous Network Model by Mikhail Nesterenko “Distributed Algorithms” by Nancy A. Lynch.
Consensus and Its Impossibility in Asynchronous Systems.
RAM, PRAM, and LogP models
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 10, 2005 Session 9.
The Complexity of Distributed Algorithms. Common measures Space complexity How much space is needed per process to run an algorithm? (measured in terms.
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
1 Leader Election in Rings. 2 A Ring Network Sense of direction left right.
ITEC452 Distributed Computing Lecture 2 – Part 2 Models in Distributed Systems Hwajung Lee.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Hwajung Lee.  Models are simple abstractions that help understand the variability -- abstractions that preserve the essential features, but hide the.
Distributed systems (NET 422) Prepared by Dr. Naglaa Fathi Soliman Princess Nora Bint Abdulrahman University College of computer.
Hwajung Lee. Well, you need to capture the notions of atomicity, non-determinism, fairness etc. These concepts are not built into languages like JAVA,
Hwajung Lee. Why do we need these? Don’t we already know a lot about programming? Well, you need to capture the notions of atomicity, non-determinism,
Leader Election (if we ignore the failure detection part)
Several sets of slides by Prof. Jennifer Welch will be used in this course. The slides are mostly identical to her slides, with some minor changes. Set.
Hwajung Lee. Let G = (V,E) define the network topology. Each process i has a variable L(i) that defines the leader.   i,j  V  i,j are non-faulty ::
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE 668 Set 3: Leader Election in Rings 1.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 16: Distributed Shared Memory 1.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
ITEC452 Distributed Computing Lecture 2 – Part 2 Models in Distributed Systems Hwajung Lee.
Failure detection The design of fault-tolerant systems will be easier if failures can be detected. Depends on the 1. System model, and 2. The type of failures.
Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
High Performance Embedded Computing © 2007 Elsevier Lecture 4: Models of Computation Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Agenda  Quick Review  Finish Introduction  Java Threads.
Introduction to distributed systems description relation to practice variables and communication primitives instructions states, actions and programs synchrony.
Understanding Models. Modeling Communication: A message passing model System topology is a graph G = (V, E), where V = set of nodes ( sequential processes.
Distributed Systems and Algorithms Sukumar Ghosh University of Iowa Fall 2011.
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
ITEC452 Distributed Computing Lecture 2 – Part 3 Models in Distributed Systems Hwajung Lee.
Leader Election Let G = (V,E) define the network topology. Each process i has a variable L(i) that defines the leader.  i,j  V  i,j are non-faulty ::
Model and complexity Many measures Space complexity Time complexity
Understanding Models.
CS 5620 Distributed Systems and Algorithms
Leader Election (if we ignore the failure detection part)
Alternating Bit Protocol
ITEC452 Distributed Computing Lecture 3 Models in Distributed Systems
CS 5620 Distributed Systems and Algorithms
CS 5620 Distributed Systems and Algorithms
Presentation transcript:

Hwajung Lee

 Models are simple abstractions that help understand the variability -- abstractions that preserve the essential features, but hide the implementation details from observers who view the system at a higher level.  Results derived from an abstract model is applicable to a wide range of platforms.

 Two primary models that capture the essence of interprocess communication:  Message-passing Model  Shared-memory Model

 Process actions in a distributed system can be represented in the following notation:  System topology is a graph G = (V, E), where V = set of nodes (sequential processes) E = set of edges (links or channels, bi/unidirectional)

 Four types of actions by a process:  Internal action ▪a process performs computations in its own address space resulting in the modification of one or more of its local variables.  Communication action ▪a process sends a message to another process or receives a message from another process  Input action ▪A process reads data from sources external to the system  Output action ▪A process sends data outside of the system (called “environment”)

 Possible Dimension of Variability in Distributed Systems  Type of Channels ▪ Point-t0-point or not ▪ Reliable vs. Unreliable channels  Type of Systems ▪ Synchronous vs. Asynchronous systems

 Channels in a Message Passing Model  Message propagate along directed edges called channels  Communications are assumed to be point-to- point (a broadcasting is considered as a set of point-to-point communications)  Reliable vs. unreliable channels ▪ Reliable channel : a loss or corruption of message is not considered ▪ Unreliable channel: It will be considered in a later chapter

Axiom 1. Message m sent  message m received Axiom 2. Message propagation delay is arbitrary but finite. Axiom 3. m1 sent before m2  m1 received before m2. P Q

 Broad notion of Synchrony  Senders and receivers maintain synchronized clocks and operating with a rigid temporal relationship.  However, in practice, there are many aspects of synchrony, and transition from a fully asynchronous to a fully synchronous model is a gradual one.  Thus, we will see a few examples of the behaviors characterizing a synchronous systems.

Send & receive can be blocking or non-blocking Postal communication is asynchronous: Telephone communication is synchronous Synchronous clocks Physical clocks are synchronized with bound Synchronous processes Lock-step synchrony Synchronous channels Bounded delay Synchronous message-order First-in first-out channels Synchronous communication Communication via handshaking Any restriction defines some form of synchrony …

Address spaces of processes overlap M1M M2M2 Concurrent operations on a shared variable are serialized

State reading model Each process can read the states of its neighbors Link register model Each process can read from and write to adjacent registers. The entire local state is not shared.

 Communication via broadcast  Limited range  Dynamic topology

 Mobile agent  a program code that migrates from one process to another  Can interact with the variables of programs running on the host machine, use its resources, and take autonomous routing decisions.  Strong mobility/weak mobility  Ex.: Dartmouth’s D’Agents, IBM’s Aglets

 Model: (I, P, and B)  I is the agent identifier and is unique for every agent.  P designates the agent program  B is the briefcase and represent the data variables to be used by the agent

 Computing the lowest price of an item that is being sold in n different stores  I: agent identifier  P: price(i) denote the price of the item in store i  B: Briefcase variable best initially best = price(home) while current ≠ home do if price(i) < best then best := price(i) else skip end if; visit next; (next is determined by a traversal algorithm) end while

One object (or operation) of a strong model = More than one objects (or operations) of a weaker model. Often, weaker models are synonymous with fewer restrictions. One can add layers (additional restrictions) to create a stronger model from weaker one. Examples HLL model is stronger than assembly language model. Asynchronous is weaker than synchronous. Bounded delay is stronger than unbounded delay (channel)

Stronger models - simplify reasoning (design an algorithm) - simplify correctness proof - but, needs extra work to implement it Weaker models - are easier to implement. - Have a closer relationship with the real world “Can model X be implemented using model Y?” is an interesting question in computer science. Sample problems (Weak  Strong) Non-FIFO to FIFO channel Message passing to shared memory Non-atomic broadcast to atomic broadcast Complicated Simple using Implement

P Q buffer m1m4m3m2 Sends out m1, m2, m3, m4, …

{Sender process P}{Receiver process Q} var i : integer {initially 0} var k : integer {initially 0} buffer: buffer [0..∞] of msg {initially  k: buffer [k] = emptyrepeat {STORE} send m[i],i to Q; receive m[i],i from P; i := i+1 store m[i] into buffer[i]; forever {DELIVER} while buffer[k] ≠ empty do begin deliver content of buffer [k]; buffer [k] := empty; k := k+1; end forever

Observations (a) Needs Unbounded sequence numbers and (b) Unbounded number of buffer slots  Both are bad Now solve the same problem on a model where (a) The propagation delay has a known upper bound of T. (b) The messages are sent per unit time. (c) The messages are received at a rate faster than r.  The buffer requirement drops to (r xT). Thus, Synchrony pays. Question. How to solve the problem using bounded buffer space if the propagation delay is arbitrarily large? By using acknowledgements

{Read X by process i } read x[i] {Write X:= v by process i } - x[i] := v ; - atomically broadcast v to every other process j (j ≠ i); - after receiving broadcast, process j (j ≠ i) sets x[j] to v. Understand the significance of atomic operations. It is not trivial, but is very important in distributed systems

Atomic broadcast = either everybody or nobody receives {process i is the sender} for j = 1 to n-1 (j ≠ i) send message m to neighbor[j] (Easy! But…) There needs to be a lock or some types of synchronization or ack to make sure all the receivers got the message and update their data  We will talk more details on this in Chapter 4.4 Now include crash failure as a part of our model. What if the sender crashes in the middle of the task? How to Implement atomic broadcast in presence of crash?  Chapter 16 covers this issue. Note: Faults and Fault-Tolerant Systems are covered in Part D of the textbook, which are from Chapter 12 to Chapter 17

Reactive vs Transformational systems A reactive system never sleeps (like: a server or servers) A transformational (or non-reactive systems) reaches a fixed point & no further change occurs in the system after that. (Examples?) Named vs Anonymous systems In named systems, process id is a part of the algorithm. In anonymous systems, it is not so. All are equal. (-) Symmetry breaking is often a challenge. All processes start from the same initial state and will execute identical instructions at every step. But the outcome needs to be asymmetric. (Ex) Leader Election Algorithm (+) Easy to switch one process by another with no side effect. (+) Saves log 2 N bits from Space complexity point of view, which needs to store its name where N is the number of processes. Routing Table Calculation

Space complexity  How much space is needed per process to run an algorithm? (measured in terms of N, the size of the network) Time complexity  What is the max. time (number of steps) needed to complete the execution of the algorithm? Message complexity  How many message are exchanged to complete the execution of the algorithm?

Bit complexity  Measures how many bits are transmitted when the algorithm runs. It may be a better measure, since messages may be of arbitrary size.

Consider initializing the values of a variable x at the nodes of an n-cube. Process 0 is the leader, broadcasting a value v to initialize the cube. n=dimension = 3 and N = total number of processes = 2 n = 8 Consider initializing the values of a variable x at the nodes of an n-cube. Process 0 is the leader, broadcasting a value v to initialize the cube. n=dimension = 3 and N = total number of processes = 2 n = 8 source Each process j > 0 has a variable x[j], whose initial value is arbitrary. Finally, x[0] = x[1] = x[2] = … = x[7] = v Each process j > 0 has a variable x[j], whose initial value is arbitrary. Finally, x[0] = x[1] = x[2] = … = x[7] = v

{Process 0} m.value := x[0]; send m to all neighbors {Process i > 0} repeat receive m {m contains the value } ; if m is received for the first time then x[i] := m.value ; send x[i] to each neighbor j > I else discard m end if forever What is the message complexity? m m m Number of edges = |E| =1/2*N log 2 N

{Process 0} x[0] := v {Process i > 0} repeat if there exists a neighbor j < i : x[i] ≠ x[j] then x[i] := x[j] (PULL DATA) {this is a step} else skip end if forever What is the time complexity? (i.e. how many steps are needed?) Arbitrarily large. Why? A sender is passive and it is a receiver’s responsibility to pull the value.

{Process 0} x[0] := v {Process i > 0} repeat if there exists a neighbor j < i : x[i] ≠ x[j] then x[i] := x[j] (PULL DATA) {this is a step} else skip end if forever Node 7 can keep copying from 3, 5, 6 indefinitely long before the value in node 0 is eventually copied into it. Node 7 can keep copying from 3, 5, 6 indefinitely long before the value in node 0 is eventually copied into it.

Now, use “large atomicity”, where in one step, a process j reads the state x[k] of each neighbor k < j, and updates x[j] only when these are equal, but different from x[j]. What is space complexity per process? What is the time complexity? How many steps are needed? Time complexity is now O(n 2 ) = O(log 2 N) 2 (n = dimension of the cube) Why? log 2 N

Round Complexity (Time Complexity in rounds) Rounds are truly defined for synchronous systems. An asynchronous round consists of a number of steps where every eligible process (including the slowest one) takes at least one step. How many rounds will you need to complete the broadcast using the large atomicity model? (ex) In a system with four processes 0, 1, 2,  How many rounds is this? An easier way to measure complexity in rounds is to assume that processes executing their steps in lock-step synchrony Two rounds