Distributed Systems and Concurrency: Synchronization in Distributed Systems Majeed Kassis.

Distributed Systems and Concurrency: Synchronization in Distributed Systems
Majeed Kassis

Goals of distributed Mutual Exclusion
The purpose of distributed mutual exclusion is much like regular mutual exclusion: Safety : mutual exclusion Liveness : progress Fairness: bounded time and in-order. in-order? By logical time! New issues arise due to the distributed nature: Message Traffic We wish to reduce the number and size of messages sent to achieve mutual exclusion Synchronization Delay We wish to reduce the wasted waiting for permission (for mutual exclusion)

Distributed Mutual Exclusion is different
Regular mutual exclusion solved using shared state, e.g.: atomic test-and-set of a shared variable… shared queue… We solve distributed mutual exclusion with message passing. There is no shared state Assumptions: Network is reliable Ensuring reliability is done by others (communication protocols) Processes might fail This is acceptable in distributed systems. Network is asynchronous To ensure maximum performance, cannot enforce synchronization.

Solutions for Distributed Mutual Exclusion
Centralized Mutual Exclusion Coordinator based Contention Based Mutual Exclusion Priority queue Timestamp prioritization schemes Voting schemes Controlled (token-based) Mutual Exclusion Ring Based Tree Based Broadcast Based

Centralized: Coordinator Based
To enter critical section: send REQUEST to central server, wait for permission To leave: send RELEASE to central server

Node 3 is the coordinator 1 asks for permission, permission granted 2 asks for permission, enters wait queue 1 returns release, 2 is granted permission

Use a single coordinator Ask for permission to access critical section Requests are handled using a FIFO queue. Advantages Efficient – 3 messages for a complete lock/unlock cycle Easy to implement Disadvantages Central point of failure Central Performance bottleneck With an asynchronous network, impossible to achieve in-order fairness. Must elect/select central server

A shared priority queue
By Lamport, using Lamport clocks Each process i locally maintains Qi, part of a shared priority queue To run critical section, must have replies from all other processes AND be at the front of Qi. When you have all replies: All other processes are aware of your request You are aware of any earlier requests for the mutex

To enter critical section at process i : Stamp your request with the current time T Add request to Qi Broadcast REQUEST( T) to all processes Wait for all replies and for T to reach front of Qi To leave: Pop head of Qi, Broadcast RELEASE to all processes On receipt of REQUEST(T’) from process j: Add T’ to Qi If waiting for REPLY from j for an earlier request T, wait until j replies to you (why?) Otherwise REPLY On receipt of RELEASE Pop head of Qi

Advantages: Fair (why?) Short synchronization delay Disadvantages: Very unreliable Any process failure halts progress 3(N-1) messages per entry/exit

Timestamp: Ricart-Agrawala Algorithm
An improved version of Lamport’s shared priority queue Combines function of REPLY and RELEASE messages Delay REPLY to any requests later than your own Send all delayed replies after you exit your critical section

Requesting Site: A requesting node 𝑃 𝑖 sends a request message RQ(𝑇𝑖𝑚𝑒𝑠𝑡𝑎𝑚𝑝,𝑖) to all nodes. Receiving Site: Upon reception of a request message, the receiving node 𝑃 𝑗 sends a timestamped reply 𝑅𝑃(𝑇𝑖𝑚𝑒𝑠𝑡𝑎𝑚𝑝,𝑗) message ↔ 𝑃 𝑗 is not requesting or executing the critical section OR 𝑃 𝑗 is requesting the critical section but sent a request with a higher timestamp than the timestamp of 𝑃 𝑖 Otherwise, 𝑃 𝑗 will delay the reply message until exiting the critical section. Entering critical section: Once all reply messages are received. Defer - delay

Example: Ricart-Agrawala
S1 (2,1) S2 (1,2) Three nodes: S1, S2, S3 S1 sends at timestamp 2, (2,1) to S2 and S3 S2 sends at timestamp 1, to S1 and S3 Does S2 answer S1? No. S2’s timestamp is lower than S1’s Does S1 answer S2? Yes. S1’s timestamp is higher than S2’s S3

S1 S2 enters CS S2 (2,1) S1 replies to S2 (S1’s timestamp is higher than S2’s) S3 replies to S2 S3 replies to S1 S2 enters critical section. S3

S1 S1 enters CS S2 (2,1) Once S2 is done from critical section, sends reply to S1. S1 enters critical section S2 leaves CS S3

Ricart and Agrawala safety
Suppose request T1 is earlier than T2. Consider how the process for T2 collects its reply from process for T1: T1 must have already been time-stamped when request T2 was received, otherwise the Lamport clock would have been advanced past time T2 (why?) But then the process must have delayed reply to T2 until after request T1 exited the critical section. Therefore T2 will not conflict with T1.

Advantages: Fair Short synchronization delay Better than Lamport’s algorithm (in what?) Disadvantages: Unreliable – one node failure will cause starvation 2*(N-1) messages per synchronization procedure

Voting: Majority Rules
Instead of collecting REPLYs, collect VOTEs Each process VOTEs for which process can hold the mutex Each process can only VOTE once at any given time You hold the mutex if you have a majority of the VOTEs Only possible for one process to have a majority at any given time!

To enter critical section at process i : Broadcast REQUEST(T), collect VOTEs Can enter critical section if collect a majority of VOTEs To leave: Broadcast RELEASE-VOTE to all processes who VOTEd for you On receipt of REQUEST(T’) from process j: If you have not VOTEd, VOTE for T’ Otherwise, add T’ to Qi On receipt of RELEASE-VOTE : If Qi not empty, VOTE for pop( Qi )

Advantages: Can succeed even in node failures Up to 𝑁/2−1 failed nodes! Disadvantages: Not fair Deadlock! No guarantee that anyone receives a majority of votes

Dealing with deadlocks in Majority Rules
Allow processes to ask for their vote back If already VOTEd for T’ and get a request for an earlier request T, RESCIND-VOTE for T’ If receive RESCIND-VOTE request and not in critical section, RELEASE-VOTE and re-REQUEST Guarantees that some process will eventually get a majority of VOTEs Still not fair…

Voting: Maekawa Voting
Instead of asking permission from all processes Ask permission from a sub-set of processes. Subset Characteristics: Subsets must overlap! Why? Subset size must be roughly the same Requesting: Process sends a vote request to every member of its subset When all replies are received, access is granted. Responding: If process did not send reply, it sends “YES”. When a process exits its critical section: Informs the subset, which can then vote for other candidates. Uses less messages than majority rules!

Voting: Maekawa Voting
Can deadlock, not fair RESCIND-VOTE solution can fix deadlock… Unreliable Any failure in your voting sets prevents you from getting the mutex

Tokens: Ring-Based Token is passed around a ring of processes Notes:
Once a token reaches a process, may enter critical section Can enter critical section only if you hold the token Holds token until finished Cannot use same token twice in a row. Notes: Long synchronization delay Need to wait for up to N-1 messages, for N processors Not in-order Very unreliable Any process failure breaks the ring P0 P5 P1 P4 P2 P3

A fair ring-based algorithm
Token contains the time t of the earliest known outstanding request To enter critical section: Stamp your request with the current time Tr, wait for token When you get token with time t while waiting with request from time Tr, compare Tr to t: If Tr = t: hold token, run critical section If Tr > t: pass token If t not set or Tr < t: set token-time to Tr, pass token, wait for token To leave critical section: Set token-time to null (i.e., unset it), pass token

Tree Based Algorithms Using a span tree to reduce the number of messages exchanged per critical section access. The root of the tree holds the token at beginning The processes are organized in a logical tree structure, each node pointing to its parent. Each node maintains a queue of token requesting neighbors.

Tree Structure Nodes are arranged as a tree. Every node has a parent pointer. Each node has a FIFO queue of requests There is one token with the root and the owner of the token can enter the critical section. Message Complexity: approximately O(log(N))

Dynamic Nature of the Tree
As the token moves across nodes, the parent pointers change. They always point towards the holder of the token. It is thus possible to reach the token by following parent pointers.

Tree Based: Raymond Algorithm
If a node i (not holding the token) wishes to receive the token in order to enter into its critical section, it sends a request to its parent, node j. If node j FIFO is empty, node j adds i into its FIFO queue; j then issues a request to its parent, k, that it desires the token If node j FIFO queue is not empty, it adds i into the queue When node k has token and receives the request from j it sends token to j and sets j as its parent When node j receives the token from k, it forwards the token to i and i is removed from the queue of j If the queue of j is not empty after forwarding the token to i, j must issue a request to i in order to get the token back

Example: Raymond Algorithm
Setup 2: 4 wants to enter critical section, queue 4 is empty, 4 added to queue 4 and request sent to 7. Queue 7 is empty, 4 added to queue 7, request sent to 6 Queue 6 is not empty, 4 added to queue 6, and halts. Setup 1: 1 wants to enter critical section, queue 1 is empty, 1 added to queue 1 and request sent to 6 queue 6, is empty, 1 added to queue 6 and request sent to 3. queue 3 is empty, 1 added to queue 1. Since no arrows - stop. 1 3 2 1 4 5 4 1 6 1,4 7 4,7 1,4,7 want to enter their CS Setup 3: 7 wants to enter critical section, queue 7 is not empty, 7 added to queue 7, and halts.

Topology is known. 3 finishes, wants to send token to next in line – 1 is top of queue. 1 is removed from queue 3, and token sent to 6. 6 checks top of queue – 1, removes top of queue and sends token to one, with request 4 1 3 2 1 4 5 4 3 sends the token to 6 6 7 1,4 4,7

4 Arrows change direction following token 3 2 1 4 5 These two directed edges will 6 reverse their direction 7 The message complexity is O(diameter) of the tree. The authors claim that the average message complexity is O(log n) 4 4,7

4 added to queue 1 1 finishes executing Removes top queue And sends token to 6 4 4 3 2 1 4 5 6 7 The message complexity is O(diameter) of the tree. The authors claim that the average message complexity is O(log n) 4 4,7

Raymond Algorithm No two nodes can have a token at the same time, and thus cannot be in the critical section at one time No deadlock: Messages cannot get lost because all the time our parent pointers ensure that we have a rooted tree No starvation: Ultimately a starved process’s request will come to the front of all the request queues. At this point it will have the highest priority, and the token will have to flow back to the starved process

Broadcast Based: Suzuki-Kasami
A single token in the system If a process needs to enter a critical section Broadcast a request for the token! Process with the token sends it to the requestor if it does not need it Instead of enforcing a network-topology (ring, tree) We just broadcast requests to all nodes in network. Data Structures: Process has “𝒓𝒆𝒒𝒖𝒆𝒔𝒕” array containing latest request values of other processes. Token holder has “𝒍𝒂𝒔𝒕” array containing latest access to critical section of other processes. A queue “𝒘𝒂𝒊𝒕𝒊𝒏𝒈” of processes wishing to enter critical section.

Suzuki-Kasami Algorithm
𝑃 𝑖 entering critical section: Increment 𝑟𝑒𝑞𝑢𝑒𝑠𝑡[𝑖] Broadcast (𝑖, 𝑟𝑒𝑞𝑢𝑒𝑠𝑡[𝑖]) Wait until 𝑤𝑎𝑖𝑡, 𝑙𝑎𝑠𝑡 is received Enter critical section 𝑃 𝑖 exiting critical section: Updates 𝑙𝑎𝑠𝑡[𝑖]= 𝑟𝑒𝑞𝑢𝑒𝑠𝑡[𝑖] – this is the new last access to critical section For all 𝑃 𝑗 , if 𝑗 is not in 𝑤𝑎𝑖𝑡 queue and 𝑟𝑒𝑞𝑢𝑒𝑠𝑡 𝑗 =𝑙𝑎𝑠𝑡 𝑗 +1 Then add 𝑗 to wait queue. If wait queue is not empty, send the token and the queue to the process found at the beginning of the wait queue. After removing it, of course! If 𝑃 𝑗 receives a new request 𝑖, 𝑟𝑒𝑞𝑢𝑒𝑠𝑡𝑉𝑎𝑙 from 𝑃 𝑖 It updates its 𝑟𝑒𝑞𝑢𝑒𝑠𝑡 𝑖 =max⁡(𝑟𝑒𝑞𝑢𝑒𝑠𝑡 𝑖 ,𝑟𝑒𝑞𝑢𝑒𝑠𝑡𝑉𝑎𝑙) 𝑟𝑒𝑞𝑢𝑒𝑠𝑡 𝑗 =𝑙𝑎𝑠𝑡 𝑗 +1 Ensures that this is the node’s last request. If it is not equal it means node j has sent another request before being able to get hold of the token.

Broadcast Based: Suzuki-Kasami
Request and Last arrays are used for tracking purposes only. The queue that travels with the token is what ensures the order and correctness of entering the critical section. req=[1,0,0,0,0] req=[1,0,0,0,0] last=[0,0,0,0,0] Q=() 1 2 Each node has an array, each slot for a single node initial state: process 0 has sent a request to all, and grabbed the token req=[1,0,0,0,0] 4 req=[1,0,0,0,0] 3 req=[1,0,0,0,0]

Broadcast Based: Example
req=[1,1,1,0,0] req=[1,1,1,0,0] last=[0,0,0,0,0] Q=(1,2) 1 2 1 & 2 send requests to enter critical section 1 & 2 added to 0’s Q req=[1,1,1,0,0] 4 req=[1,1,1,0,0] 3 req=[1,1,1,0,0]

req=[1,1,1,0,0] req=[1,1,1,0,0] last=[1,0,0,0,0] Q=(1,2) 1 2 0 prepares to exit critical section req=[1,1,1,0,0] 4 req=[1,1,1,0,0] 3 req=[1,1,1,0,0]

req=[1,1,1,0,0] last=[1,0,0,0,0] Q=(2) req=[1,1,1,0,0] 1 2 0 passes token (Q and last) to 1 req=[1,1,1,0,0] 4 req=[1,1,1,0,0] 3 req=[1,1,1,0,0]

req=[2,1,1,1,0] last=[1,0,0,0,0] Q=(2,0,3) req=[2,1,1,1,0] 1 2 0 and 3 send requests req=[2,1,1,1,0] 4 req=[2,1,1,1,0] 3 req=[2,1,1,1,0]

req=[2,1,1,1,0] req=[2,1,1,1,0] 1 2 1 sends token to 2 req=[2,1,1,1,0] last=[1,1,0,0,0] Q=(0,3) 4 req=[2,1,1,1,0] 3 req=[2,1,1,1,0]

Algorithm Characteristics
Only the site holding the token can access the critical section All processes are involved in the assignment of the CS Request messages sent to all nodes using sequence numbers Used to keep track of outdated requests They advance independently on each site The main design issue of the algorithm is telling outdated requests from current ones

Suzuki-Kasami Algorithm Advantages
A requesting process gets the lock in finite time (no deadlock, no starvation). The request will reach all the processes in finite time By induction, one of these processes will have the token in finite time Thus the current request will get added to Q . There can at most be N − 1 messages before it a request will be served after at most N-1 others No need to maintain a logical structure.

Distributed Systems and Concurrency: Synchronization in Distributed Systems Majeed Kassis.

Similar presentations

Presentation on theme: "Distributed Systems and Concurrency: Synchronization in Distributed Systems Majeed Kassis."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Systems and Concurrency: Synchronization in Distributed Systems Majeed Kassis.

Similar presentations

Presentation on theme: "Distributed Systems and Concurrency: Synchronization in Distributed Systems Majeed Kassis."— Presentation transcript:

Similar presentations

About project

Feedback