Contents Clock Synchronization Logical Clocks Global State

Synchronization Chapter 5

Contents Clock Synchronization Logical Clocks Global State
Election Algorithms Mutual Exclusion Distributed Transactions Conclusion Critical Idea

Clock synchronization
A simple question: Is it possible to synchronize all the clocks in a distributed system ?

Physical Clocks (1) Some concept: Timer; counter, holding register; clock tick; clock skew. Prob: How do we synchronize them with real-world clocks How do we synchronize the clocks with each other Mean solar second: measuring a large numbers of day  taking average  dividing by 86400 TAI: the mean number of sticks of the cesium 133 clocks (since 1/1/1958) divided by 9,192,631,770

Physical Clocks (2) TAI: highly stable but late  leap second
By the way: raising their frequency from 60Hz or 50Hz  61Hz or 51Hz

Cristian’s Algorithm No more than δ/2ρ each machine sends a message to the time server (which has a WWV receiver) asking for the current time Probs: time must never run backward and it’s take a nonzero amount of time for the time server’s reply to get back to the sender

The Berkeley Algorithm
a)The time daemon asks all the other machines for their clock values b)The machines answer c)The time daemon tells everyone how to adjust their clock

Averaging Algorithms At the beginning of each interval, every machine broadcasts the current time according to its clock. Then it starts a local timer to collect all other broadcasts that arrive during some interval S. The simplest algorithm is just to average the values from all other machines. One of the most widely used algorithms in the Internet is the Network Time Protocol (NTP).

Assign time to dist. Sys. If a happens before b in the same process, C(a) < C(b). If a and b represent the sending and receiving of a message, respectively, C(a) < C(b). For all distinctive events a and b, C(a) ≠ C(b).

Totally order Multicasting
 Timestamp can be used to implement totally ordered multicast

Vector timestamps VT(a) < VT(b)  a: causally precede event b
Properties of vector timestamps Vi[i] is the number of events that have occurred so far at Pi If Vi[j] = k then Pj knows that k events have occurred at Pj Message r (from PJ): reaction of message a (PI) PK process message r if: vt(r)[j] = Vk[j] + 1 vt(r)[i] ≤ Vk[i] for all i≠j

Global State (1) A consistent cut An inconsistent cut

Global State (2) Organization of a process and channels for a distributed snapshot

Global State (3) Process Q receives a marker for the first time and records its local state Q records all incoming message Q receives a marker for its incoming channel and finishes recording the state of the incoming channel

Election Algorithms Election algorithms: algorithms for electing a coordinator (using this as a generic name for the special process). Election algorithms attempt to locate the process with the highest process number and designate it as coordinator. Goal: to ensure that when an election starts, it concludes with all processes agreeing on who the new coordinator is to be.

The Bully Algorithm (1) The bully election algorithm
Process 4 holds an election Process 5 and 6 respond, telling 4 to stop Now 5 and 6 each hold an election

The Bully Algorithm (2) Process 6 tells 5 to stop
Process 6 wins and tells everyone

Ring Algorithm We assume that the processes are physically or logically ordered, so that each process knows who is successor is. When any process notices that the coordinator is not functioning, it builds ELECTION message containing its own process num and sends message to its successor. If successor is down, the sender skips over the successor and goes to the next number along the ring, or the one after that, until a running process is located. At each step , the sender adds its own process num to the list in the message effectively making itself a candidate to be elected as a coordinator.

Ring Algorithm

Mutual Exclusion Centralized Algorithm
Process 1 asks the coordinator for permission to enter a critical region. Permission is granted Process 2 then asks permission to enter the same critical region. The coordinator does not reply. When process 1 exits the critical region, it tells the coordinator, when then replies to 2

Distributed Algorithm
When a process wants to enter a critical region, it builds a message containing the name of the critical region it wants to enter , its process number, and the current time. Sends the message to all other processes, conceptually including itself. The sending of message is assumed to be reliable. When a process receives a request message from other process, the action it takes depends on its state with respect to the critical region named in the message. Three cases have to be distinguished.

Distributed Algorithm contd
1. If the receiver is not in the critical region and does not want to enter it, it sends back an OK message to the sender. 2. If the receiver is already in the critical region, it does not reply. Instead, it queues the request. 3. If the receiver wants to enter the critical region but has not yet done so, it compares the timestamp in the incoming message, the lowest one wins. If the incoming message is lower, the receiver sends back an OK message. If its own message has a lowest timestamp, the receiver queues the incoming request and sends nothing.

Distributed Algorithm example
Two processes want to enter the same critical region at the same moment. Process 0 has the lowest timestamp, so it wins. When process 0 is done, it sends an OK also, so 2 can now enter the critical region.

Token Ring Algorithm An unordered group of processes on a network.
A logical ring constructed in software.

Comparison A comparison of three mutual exclusion algorithms.
Messages per entry/exit Delay before entry (in message times) Problems Centralized 3 2 Coordinator crash Distributed 2 ( n – 1 ) Crash of any process Group communication Token ring 1 to  0 to n – 1 Lost token, process crash A comparison of three mutual exclusion algorithms.

The Transaction Model (1)
Updating a master tape is fault tolerant.

The Transaction Model (2)
Primitive Description BEGIN_TRANSACTION Make the start of a transaction END_TRANSACTION Terminate the transaction and try to commit ABORT_TRANSACTION Kill the transaction and restore the old values READ Read data from a file, a table, or otherwise WRITE Write data to a file, a table, or otherwise Examples of primitives for transactions.

Four Characteristics Atomic:to the outside world, the transaction happens indivisibly Consistent: the transaction does not violate system invariants Isolated: concurrent transactions do not interfere with each other Durable: once a transaction commits, the changes are permanent

Limitations of Flat Transactions
Main limitation: do not allow partial results to be committed or aborted In the case of updating all of the hyperlinks to a webpage W, which moved to a new location BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi; END_TRANSACTION (a) BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi full => ABORT_TRANSACTION (b)

Classification of Transactions
A nested transaction A distributed transaction

Implementation Private Workspace
A file is only for read not modify  there is no need for a private copy a) The file index and disk blocks for a three-block file b) The situation after a transaction has modified block 0 and appended block 3 b) After committing

Writeahead Log a) A transaction
x = 0; y = 0; BEGIN_TRANSACTION; x = x + 1; y = y + 2 x = y * y; END_TRANSACTION; (a) Log [x = 0 / 1] (b) [y = 0/2] (c) [x = 1/4] (d) a) A transaction b) – d) The log before each statement is executed

Concurrency Control (1)
General organization of managers for handling transactions

Concurrency Control (2)
General organization of managers for handling distributed transactions

Serializability The whole idea behind concurrency control is to properly schedule conflicting operations (two read operations never conflict ) Synchronization can take place either through mutual exclusion mechanisms on shared data (i.e locking) Or explicitly ordering operations using timestamps

Two-phase locking A transaction T is granted a lock if there is no conflict The scheduler will never release a lock for data item x, until the data manager acknowledges it has performed the operation for which the lock was set Once the scheduler has released a lock on behalf of a transaction T, it will never grant another lock on behalf of T

Strict two-phase locking
In centralized 2PL: a single site is responsible for granting and releasing locks In primary 2PL: each data item is assigned a primary copy In distributed 2PL: the schedulers on each machine not only take care that locks are granted and released, but also that the operation is forwarded to the (local) data manager

Pessimistic Timestamp Ordering
Concurrency control using timestamps.

Conclusion Lamport timestamps: if a happen before b  C(a) < C(b).
Determining the global state can be done by synchronizing all processes so that each collects its own local state, along with the messages that are currently in transit. Synchronization between processes  choose a coordinator  election algorithms Mutual Exclusion algorithms: can be centralized or distributed

Conclusion & Critical Idea
A transaction consists of a series of operations A transaction is durable, meaning that if it completes, its effects are permanent Two-phase locking can lead to dead lock Acquiring all locks in some canonical order to prevent hold-and-wait cycles Using deadlock detection by maintaining an explicit graph for cycles Inheritance Priority Protocol

Consistency and Replication
Chapter 6

Content Definition of Consistency and Replication
Understand Replication Reason for Replication & Problem of Replication The only solution for Replication Problem Consistency Model Data-centric models & Client-centric models Distribution Protocols Distributing updates to replicas Consistency Protocols Implementation of consistency models

Consistency and Replication
Introduction: Replication: Replication of data Reason for Replication: Enhance reliability or improve performance Consistency: Consistency of replicated data Reason for Consistency: Keep replicas to be the same

Understanding Replication
Introduction: Object Replication Purpose: managing data in distributed system Consider objects instead of data alone Benefit of encapsulating and operating data Consistency Problem: Whenever a replica is updated, that replica becomes different from the others.  Synchronization Problem Two Approaches focus on who will deal with it: Object-specific Replication Middleware Replication

Understanding Replication
Two Approaches A distributed system for replication-aware distributed objects. A distributed system responsible for replica management

Only solution for Consistency Problem
Introduction: Synchronous replication  consistency Key idea: a single atomic operation, or transaction Difficulties: need to synchronize all replicas  a lot of communication time  expensive in terms of performance Only solution: Loosen the consistency constraints need not to be executed as atomic operations copies may not always same

Consistency Models Introduction: Consistency Model:
A contract between processes and the data store. Two kind of Models: Data-centric Consistency Models Guarantee for a number of processes Simultaneously update Sequential consistency Client-centric Consistency Models Guarantee for a single processes Lack simultaneous updates

Data-centric Consistency Models
The general organization of a logical data store, physically distributed and replicated across multiple processes.

Data-centric Consistency Models (7 kinds) Strict Consistency Linearizability and Sequential Consistency Causal Consistency FIFO Consistency Weak Consistency Release Consistency Entry Consistency Loosen constraints

Description Strict Absolute time ordering of all shared accesses matters. Linearizability All processes must see all shared accesses in the same order. Accesses are furthermore ordered according to a (nonunique) global timestamp Sequential All processes see all shared accesses in the same order. Accesses are not ordered in time Causal All processes see causally-related shared accesses in the same order. FIFO All processes see writes from each other in the order they were used. Writes from different processes may not always be seen in that order (a) Weak Shared data can be counted on to be consistent only after a synchronization is done Release Shared data are made consistent when a critical region is exited Entry Shared data pertaining to a critical region are made consistent when a critical region is entered. (b) Consistency models not using synchronization operations. Models with synchronization operations.

FIFO Consistency (1) Necessary Condition:
“Any read on a data item x returns a value corresponding to the result of the most recent write x” Writes done by a single process are seen by all other processes in the order in which they were issued but writes from different processes may be seen in a different order by different processes.

FIFO Consistency (2) A valid sequence of events of FIFO consistency

FIFO Consistency (3) x = 1; print (y, z); y = 1; print(x, z); z = 1; print (x, y); Prints: 00 (a) print ( y, z); Prints: 10 (b) print (x, z); Prints: 01 (c) Statement execution as seen by the three processes from the previous slide. The statements in bold are the ones that generate the output shown.

FIFO Consistency (4) Process P1 Process P2 x = 1;
if (y == 0) kill (P2); y = 1; if (x == 0) kill (P1); Two concurrent processes.

Weak Consistency (1) Properties:
Accesses to synchronization variables associated with a data store are sequentially consistent No operation on a synchronization variable is allowed to be performed until all previous writes have been completed everywhere No read or write operation on data items are allowed to be performed until all previous operations to synchronization variables have been performed.

Weak Consistency (2) int a, b, c, d, e, x, y; /* variables */ int *p, *q; /* pointers */ int f( int *p, int *q); /* function prototype */ a = x * x; /* a stored in register */ b = y * y; /* b as well */ c = a*a*a + b*b + a * b; /* used later */ d = a * a * c; /* used later */ p = &a; /* p gets address of a */ q = &b /* q gets address of b */ e = f(p, q) /* function call */ A program fragment in which some variables may be kept in registers.

Weak Consistency (3) A valid sequence of events for weak consistency.
An invalid sequence for weak consistency.

Client-centric Consistency Models
Different types of Client-centric Consistency Models: Eventual Consistency Monotonic-Read Consistency Monotonic-Write Consistency Read-Your-Writes Consistency Writes-Follow-Reads Consistency

Eventual Consistency The principle of a mobile user accessing different replicas of a distributed database.

Monotonic Reads The read operations performed by a single process P at two different local copies of the same data store. A monotonic-read consistent data store A data store that does not provide monotonic reads.

Monotonic Writes The write operations performed by a single process P at two different local copies of the same data store A monotonic-write consistent data store. A data store that does not provide monotonic-write consistency.

Read Your Writes A data store that provides read-your-writes consistency. A data store that does not.

Writes Follow Reads A writes-follow-reads consistent data store
A data store that does not provide writes-follow-reads consistency

Distribution Protocols
Purpose: Solve the following problem What is exactly propagated? Where updates are propagated? By whom propagation is initiated? Three Distribution Protocols: Replica Placement Update Propagation Epidemic Protocols

Replica Placement The logical organization of different kinds of copies of a data store into three concentric rings.

Replica Placement Replica Placement Permanent Replicas
The initial set of replicas Constitute a distributed data store Server-Initiated Replicas Copies of a data store Exist to enhance performance Created at the initiative of the data store Client-Initiated Replicas Created at the initiative of clients Commonly known as client caches

Update Propagation Introduction:
Update Propagation generally initiated at a client Subsequently forwarded to one of the copies Three design issues: State Vs. Operations What is actually to be propagated Pull Vs. Push Protocols Whether updates are pulled or pushed Unicasting Vs. Multicasting Whether unicasting or multicasting should be used

Epidemic Protocols Introduction:
Update propagation in eventual-consistent data stores is often implemented by a class of algorithms known as Epidemic Protocols. Don’t solve any update conflicts Only concern is propagating updates to all replicas in as few messages as possible. Update Propagation Models (Example: anti-entropy) P only pushes its own updates to Q P only pulls in new updates from Q P and Q send updates to each other

Consistency Protocols
Introduction: Consistency protocol Describe implementations of specific consistency models Models: sequential consistency, weak consistency with synchronization variables, and atomic consistency Three Protocols: Primary-Based Protocols Remote-Write Protocols & Local-Write Protocols Replicated-Write Protocols Active Replication & Quorum-Based Protocols Cache-coherence Protocols

Primary-Based Protocols
Remote-Write Protocols (1) Primary-based remote-write protocol with a fixed server to which all read and write operations are forwarded.

Remote-Write Protocols (2) The principle of primary-backup protocol.

Local-Write Protocols (1) Primary-based local-write protocol in which a single copy is migrated between processes.

Local-Write Protocols (2) Primary-backup protocol in which the primary migrates to the process wanting to perform an update.

Replicated-Write Protocols
Active Replication (1) The problem of replicated invocations.

Active Replication (2) Forwarding an invocation request from a replicated object. Returning a reply to a replicated object.

Quorum-Based Protocols Three examples of the voting algorithm: A correct choice of read and write set A choice that may lead to write-write conflicts A correct choice, known as ROWA (read one, write all)

End of Chapter 6 Thank you!

Contents Clock Synchronization Logical Clocks Global State

Similar presentations

Presentation on theme: "Contents Clock Synchronization Logical Clocks Global State"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Contents Clock Synchronization Logical Clocks Global State

Similar presentations

Presentation on theme: "Contents Clock Synchronization Logical Clocks Global State"— Presentation transcript:

Similar presentations

About project

Feedback