Greetings. Those of you who don't yet know me... Today is... and

Greetings. Those of you who don't yet know me... Today is... and I'll be talking about...

Coverage Nature of Causality Causality: Why is it Important/Useful?
Causality in Life vs. Causality in Distributed Systems Modeling Distributed Events - Defining Causality Logical Clocks General Implementation of Logical Clocks Scalar Logical Time Demo: Scalar Logical Time with Asynchronous Unicast Communication between Multiple Processes Conclusions Questions / Reference This is what I hope to cover... Point out that it might seem that I'm glossing over some topics, but that is only because we don't have time to cover the topic at a greater level of detail. Point out that during the course of the presentation, if anyone has questions, please do not hesitate to stop me.

Nature of Causality Consider a distributed computation which is performed by a set of processes: The objective is to have the processes work towards and achieve a common goal Processes do not share global memory Communication occurs through message passing only Common goal: Whatever the task is at hand... No global memory: For all intents and purposes, the processes could be running on different machines. Message passing: We use the underlying transport mechanism of choice, whether it be TCP, UDP, ...

Process Actions Actions are modeled as three types of events:
Internal Event: affects only the process which is executing the event, Send Event: a process passes messages to other processes Receive Event: a processes gets messages from other processes Explain diagram: Horizontal Lines - progress of a process Dots - Events Arrows - Message transfer Point out the internal, send, and receive events for process 1: Internal Events: b, g, m Send Events: d, o Receive Events: i b d g i m o P1 a c j l n r P2 f h P3 e k p q

Causal Precedence Ordering of events for a single process is simple: they are ordered by their occurrence. a b c d P Send and Receive events signify the flow of information between processes, and establish causal precedence between events at the sender and receiver Point out the causally related events in the second diagram: a -> b, b -> c Point out the events that are not causally related: c || d Causal Precedence: We'll talk a lot more about what the phrase "causal precedence" means in the context of Logical time.   a d P1 P2  c  b

Distributed Events The execution of this distributed computation results in the generation of a set of distributed events. The causal precedence induced by the send and receive events establishes a partial order of these distributed events: The precedence relation in our case is “Happened Before”, e.g. for two events a and b, a  b means “a happened before b”. Set of Distributed Events: composed of all the internal, send, and receive events of all the processes. There is a more complete way of describing the precedence relation as a binary relation, using set theory, but we don't have time to go over it. The discussion so far will suffice. a  b (Event a precedes event b) a P1 b P2

Causality: Why is it important/useful?
This causality among events (induced by the “happened before” precedence relation) is important for several reasons: Helps us solve problems in distributed computing: Can ensure liveness and fairness in mutual exclusion algorithms, Helps maintain consistency in replicated databases, Facilitates the design of deadlock detection algorithms in distributed systems, Mutual Exclusion: deadlock avoidance and fair resource sharing Replicated DB: multi-process DBMS who use message passing to relay their operations on the database to others

Importance of Causality (Continued)
Debugging of distributed systems: allows the resumption of execution. System failure recovery: allows checkpoints to be built which allow a system to be restarted from a point other than the beginning. Helps a process to measure the progress of other processes in the system: Allows processes to discard obsolete information, Detect termination of other processes

Importance of Causality
Allows distributed systems to optimize the concurrency of all processes involved: Knowing the number of causally dependent events in a distributed system allows one to measure the concurrency of a system: All events that are not causally related can be executed concurrently.

Causality: Life vs. Distributed Systems
We use causality in our lives to determine the feasibility of daily, weekly, and yearly plans, We use global time and (loosely) synchronized clocks (wristwatches, wall clocks, PC clocks, etc.)

Causality (Continued)
However, (usually) events in real life do not occur at the same rate as those in a distributed system: Distributed systems’ event occurrence rates are obviously much higher, Event execution times are obviously much smaller. Also, distributed systems do not have a “global” clock that they can refer to, There is hope though! We can use “Logical Clocks” to establish order. No global time: One might suggest that NTP, the Network Time Protocol, might be sufficient; it is not because it is not precise enough to capture causality

Modeling Distributed Events: Defining Causality and Order
Distributed program as a set of asynchronous processes p1, p2, …, pn, who communicate through a network using message passing only. Process execution and message transfer are asynchronous. P3 P1 P4 P2

Modeling Distributed Events
Notation: given two events e1 and e2, e1  e2 : e2 is dependent on e1 if e1  e2 and e2  e1 then e1 and e2 are concurrent: e1 || e2 e1 e1 P1 P1 P2 P2 e2 e2

Logical Clocks In a system of logical clocks, every process has a logical clock that is advanced using a set of rules P2 P1 Rules: we will discuss this notion later, for the case of scalar logical time P3

Logical Clocks - Timestamps
Every event is assigned a timestamp (which the processes use to infer causality between events). P1 P2 Data

Logical Clocks - Timestamps
The timestamps obey the monotonicity property. e.g. if an event a causally affects an event b, then the timestamp for a is smaller than b. Event a’s timestamp is smaller than event b’s timestamp. a P1 b P2

Formal Definition of Logical Clocks
The definition of a system of logical clocks: We have a logical clock C, which is a function that maps events to timestamps, e.g. For an event e, C(e) would be its timestamp P1 e P2 Data C(e)

For all events e in a distributed system, call them the set H, applying the function C to all events in H generates a set T: e  H, C(e)  T a b d P1 P2 c H = { a, b, c, d } T = { C(a), C(b), C(c), C(d) }

We define the relation for timestamps, “<“, to be our precedence relation: “happened before”. Elements in the set T are partially ordered by this precedence relation, i.e.: The timestamps for each event in the distributed system are partially ordered by their time of occurrence. More formally, e1  e2  C(e1) < C(e2)

What we’ve said so far is, “If e2 depends on e1, then e1 happened before e2.” This enforces monotonicity for timestamps of events in the distributed system, and is sometimes called the clock consistency condition. This notion links the causal dependencies of events and the timestamps of the events. Clock consistency won't be discussed because we don't have enough time, and clock consistency is only interesting if we discuss vector and matrix logical clocks in addition to scalar clocks.

General Implementation of Logical Clocks
We need to address two issues: The data structure to use for representing the logical clock and, The design of a protocol which dictates how the logical clock data structure updated

Logical Clock Implementation: Clock Structure
The structure of a logical clock should allow a process to keep track of its own progress, and the value of the logical clock. There are three well-known structures: Scalar: a single integer, Vector: a n-element vector (n is the number of processes in the distributed system), Matrix: a nn matrix Scalar: Used to keep track of local time and logical time of the system. Vector: Each process keeps track of its logical time, and every other processes logical time (thus the need for an n-element vector). Matrix: Each process keeps track of its own logical time, every other processes logical time, and every other processes view of everyones logical time (thus the need for a n-by-n matrix).

Logical Clock Implementation: Clock Structure
Vector: Each process keeps an n-element vector C1 C2 C3 Process 1’s Logical Time Process 1’s view of Process 2’s Logical Time Process 1’s view of Process 3’s Logical Time Matrix: Each process keeps an n-by-n matrix C1 C2 C3 C1´ C2 ´ C3 ´ C1 ´´ C2 ´´ C3 ´´ Process 1’s view of Process 3’s view of everyone’s Logical Time Process 1’s Logical Time and view of Process 2’s and Process 3’s logical time. ...

Logical Clock Implementation: Clock Update Protocol
The goal of the update protocol is to ensure that the logical clock is managed consistently; consequently, we’ll use two general rules: R1: Governs the update of the local clock when an event occurs (internal, send, receive), R2: Governs the update of the global logical clock (determines how to handle timestamps of messages received). All logical clock systems use some form of these two rules, even if their implementations differ; clock monotonicity (consistency) is preserved due to these rules.

Scalar Logical Time Scalar implementation – Lamport, 1978
Again, the goal is to have some mechanism that enforces causality between some events, inducing a partial order of the events in a distributed system, Scalar Logical Time is a way to totally orders all the events in a distributed system, As with all logical time methods, we need to define both a structure, and update methods.

Scalar Logical Time: Structure
Local time and logical time are represented by a single integer, i.e.: Each process pi uses an integer Ci to keep track of logical time. P1 C1 P2 C2 P3 C3

Scalar Logical Time: Logical Clock Update Protocol
Next, we need to define the clock update rules: For each process pi: R1: Before executing an event, pi executes the following: Ci = Ci + d (d > 0) d is a constant, typically the value 1. R2: Each message contains the timestamp of the sending process. When pi receives a message with a timestamp Cmsg, it executes the following: Ci = max(Ci, Cmsg) Execute R1

Scalar Logical Time: Update Protocol Example
C2 = 1 (R1) C1 = 1 (R1) C2 = 2 (R1) C2 = max(2, 1) (R2) C2 = 3 (R1) C1 = 2 (R1) C2 = 4 (R1) C2 = 5 (R1) C2 = 6 (R1) C1 = 3 (R1) C1 = max (3, 6) (R2) C2 = 7 (R1) C1 = 7 (R1)

Scalar Logical Time: Properties
Properties of this implementation: Maintains monotonicity and consistency properties, Provides a total ordering of events in a distributed system.

Scalar Logical Time: Pros and Cons
Advantages We get a total ordering of events in the system. All the benefits gained from knowing the causality of events in the system apply, Small overhead: one integer per process. Disadvantage Clocks are not strongly consistent: clocks lose track of the timestamp of the event on which they are dependent on. This is because we are using a single integer to store the local and logical time. Storage, maintenance, and communication overheads MUST be considered with vector and matrix clocks. There are several efficient implementations that allow these structures to be of feasible use. Show example of why scalar logical clocks are not strongly consistent.

Demo - Simple Scalar Logical Time Application
Consists of several processes, communicating asynchronously via Unicast, Only Send and Receive events are used; internal events can be disregarded since they only complicate the demo (imagine processes which perform no internal calculations), Scalar logical time is used, Written in Java.

Demo: Event Sequence Start one process (P1)
P1 uses a receive thread to process incoming messages asynchronously. P1 will sleep for a random number of seconds Upon waking, P1 will attempt to send a message to a random process, emulating asynchronous and random sending. P1 repeats this process. Start process 2 (P2). The design of the application allows processes to know who is in the system at all times. P2 performs the same steps as P1…

Conclusions Logical time is used for establishing an ordering of events in a distributed system, Logical time is useful for several important areas and problems in Computer Science, Implementation of logical time in a distributed system can range from simple (scalar-based) to complex (matrix-based), and covers a wide range of applications, Efficient implementations exist for vector and matrix based scalar clocks, and must be considered for any large scale distributed system.

Greetings. Those of you who don't yet know me... Today is... and

Similar presentations

Presentation on theme: "Greetings. Those of you who don't yet know me... Today is... and"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Greetings. Those of you who don't yet know me... Today is... and

Similar presentations

Presentation on theme: "Greetings. Those of you who don't yet know me... Today is... and"— Presentation transcript:

Similar presentations

About project

Feedback