Distributed Systems Dinesh Bhat - Advanced Systems (Some slides from 2009 class) CS 6410 – Fall 2010 Time Clocks and Ordering of events Distributed Snapshots.

Slides:



Advertisements
Similar presentations
Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport.
Advertisements

Global States.
Dan Deng 10/30/11 Ordering and Consistent Cuts 1.
Distributed Snapshots: Determining Global States of Distributed Systems Joshua Eberhardt Research Paper: Kanianthra Mani Chandy and Leslie Lamport.
Global States in a Distributed System By John Kor and Yvonne Cheng.
Distributed Computing 5. Snapshot Shmuel Zaks ©
Lecture 8: Asynchronous Network Algorithms
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Reliable Multicast Steve Ko Computer Sciences and Engineering University at Buffalo.
SES Algorithm SES: Schiper-Eggli-Sandoz Algorithm. No need for broadcast messages. Each process maintains a vector V_P of size N - 1, N the number of processes.
Uncoordinated Checkpointing The Global State Recording Algorithm.
Uncoordinated Checkpointing The Global State Recording Algorithm Cristian Solano.
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Distributed Computing 5. Snapshot Shmuel Zaks ©
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Logical Clocks and Global State.
S NAPSHOT A LGORITHM. W HAT IS A S NAPSHOT - INTUITION Given a system of processors and communication channels between them, we want each processor to.
CS 582 / CMPE 481 Distributed Systems
Causality & Global States. P1 P2 P Physical Time 4 6 Include(obj1 ) obj1.method() P2 has obj1 Causality violation occurs when order.
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
Ordering and Consistent Cuts
20101 Synchronization in distributed systems A collection of independent computers that appears to its users as a single coherent system.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering.
Lecture 13 Synchronization (cont). EECE 411: Design of Distributed Software Applications Logistics Last quiz Max: 69 / Median: 52 / Min: 24 In a box outside.
EEC 688/788 Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Ordering and Consistent Cuts Presented by Chi H. Ho.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Clock Synchronization and algorithm
Cloud Computing Concepts
EEC-681/781 Distributed Computing Systems Lecture 10 Wenbing Zhao Cleveland State University.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Logical Clocks and Global State.
Ordering and Consistent Cuts Edward Tremel 11/7/2013.
Chapter 5.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
1 Distributed Systems CS 425 / CSE 424 / ECE 428 Global Snapshots Reading: Sections 11.5 (4 th ed), 14.5 (5 th ed)  2010, I. Gupta, K. Nahrtstedt, S.
Distributed Computing 5. Snapshot Shmuel Zaks ©
© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.
1 Distributed Process Management Chapter Distributed Global States Operating system cannot know the current state of all process in the distributed.
Lamport’s Logical Clocks & Totally Ordered Multicasting.
Distributed Systems Fall 2010 Logical time, global states, and debugging.
Event Ordering Greg Bilodeau CS 5204 November 3, 2009.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport Massachusetts Computer Associates,Inc. Presented by Xiaofeng Xiao.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Several sets of slides by Prof. Jennifer Welch will be used in this course. The slides are mostly identical to her slides, with some minor changes. Set.
Event Ordering. CS 5204 – Operating Systems2 Time and Ordering The two critical differences between centralized and distributed systems are: absence of.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CS533 Concepts of Operating Systems Class 8 Time, Clocks, and the Ordering of Events in a Distributed System By Constantia Tryman.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Chapter 11 Global Properties (Distributed Termination)
Distributed Systems Lecture 6 Global states and snapshots 1.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Vector Clocks and Distributed Snapshots
CSE 486/586 Distributed Systems Global States
Theoretical Foundations
Lecture 9: Asynchronous Network Algorithms
EECS 498 Introduction to Distributed Systems Fall 2017
湖南大学-信息科学与工程学院-计算机与科学系
Logical Clocks and Casual Ordering
Non-Distributed Excercises
Chien-Liang Fok Distribution Seminar Chien-Liang Fok
Distributed Snapshot Distributed Systems.
CSE 486/586 Distributed Systems Global States
Distributed algorithms
CIS825 Lecture 5 1.
Presentation transcript:

Distributed Systems Dinesh Bhat - Advanced Systems (Some slides from 2009 class) CS 6410 – Fall 2010 Time Clocks and Ordering of events Distributed Snapshots – Determining Global States

Time, Clocks, and the Ordering of Events in a distributed System Leslie Lamport

Author Leslie Lamport – B.S., M.I.T., (Maths) – Ph D, Brandein University, 1972 – ACM SIGOPS Hall of Fame Award (2007) – IEEE John von Neumann Medal (2008) – Microsoft Research ( from 2001-present )

Outline  Introduction  Partial Ordering of Events  Logical Clocks and Total ordering  Distributed algorithm: An Example  Strong Clock Condition using Physical Clocks  Discussion

Introduction Defn 1 : Collection of nodes connected via network representing a single coherent system Defn 2 : Collection of processes within a computer WHY DS ? Resource sharing/ Load Balancing Computation speedup Reliability SW/HW preference Data access

Introduction Design issues in building a DS: Transparency – the distributed system should appear as a conventional, centralized system to the user. Fault tolerance – the distributed system should continue to function in the face of failure. Failure detection – in case of asynchronous DS, its hard to differentiate between message delay Vs lost message Scalability – as demands increase, the system should easily accept the addition of new resources to accommodate the increased demand.

Introduction Complexities in DS implementation : Unpredictable order of events in a distributed system Problem 1- Update A needs to occur before update B No global clock shared by processes – if there was any, FCFS could have solved all issues of event ordering Some ordering could be implicit Events in a single process happen in order Message between processes must be sent before they are received WHAT IS THE SOLUTION ???

Space Time diagram of events Process 1 Time Process 2 Process 3 a 1a 2 b1b 2 c 1 a3a4

Events Send Event Receive Event Local event Process

Partial Ordering of Events Causal events satisfying relation  “Happens Before” a) Two local events of a process : a  b b) Two endpoints of a same message : a  b c) Transitive : If a  b and b  c, then a  c Concurrent events a  b, a and b run independently

Space Time diagram of events Process 1 Time Process 2 Process 3 a 1a 2 b1b 2 c 1 c 2 b 3b 4 a3a4a5 b 5 a6

Logical Clocks to implement Partial Ordering of events Logical Clock = Way of assigning a number to an event Following Clock condition should be satisfied for partial ordering : – If a  b, then C(a) < C(b) “Converse is not true”

Logical Clocks to implement Partial Ordering of events

Logical Clocks(Contd..) ‘a’ and ‘b’ are local events, to hold C(a) < C(b), Assign distinct numbers to every successive local events in incremental fashion ‘a’ and ‘b’ are send and receive events of message M, to hold C(a) < C(b),

Total ordering of events Useful in implementing a distributed system Logical clocks could be extended to obtain total ordering If C(a) == C(b) && P(a) b Example : mutual exclusion problem – Multiple processes contending for one resource

Step 1: P i Sends Request Resource – P i sends Request T m :P i to P j – P i puts Request T m :P i on its request queue Total Ordering of Events : Example P1P1 P2P2 P3P3 T 1 :P 1 T 0 :P 1

Step 1: P i Sends Request Resource – P i sends Request T m :P i to P j – P i puts Request T m :P i on its request queue Time, Clocks, and the Ordering of Events in a Distributed System Total Ordering of Events : Example P1P1 P2P2 P3P3 T 0 :P 1 T 1 :P 1 T 0 :P 1 request

Step 2: P j Adds Message – P j puts Request T m :P i on its request queue – P j sends Acknowledgement T m :P j to P i Time, Clocks, and the Ordering of Events in a Distributed System Total Ordering of Events : Example P1P1 P2P2 P3P3 T 0 :P 1 T 1 :P 1 T 0 :P 1

Step 2: P j Adds Message – P j puts Request T m :P i on its request queue – P j sends Acknowledgement T m :P j to P i Time, Clocks, and the Ordering of Events in a Distributed System Total Ordering of Events : Example P1P1 P2P2 P3P3 T 2 :P 2 T 3 :P 3 T 0 :P 1 T 1 :P 1 T 0 :P 1 ack

Step 3: P i Sends Release Resource – P i removes Request T m :P i from request queue – P i sends Release T m :P i to each P j Time, Clocks, and the Ordering of Events in a Distributed System Total Ordering of Events : Example P1P1 P2P2 P3P3 T 0 :P 1 T 1 :P 1

Step 3: P i Sends Release Resource – P i removes Request T m :P i from request queue – P i sends Release T m :P i to each P j Time, Clocks, and the Ordering of Events in a Distributed System Total Ordering of Events : Example P1P1 P2P2 P3P3 T 4 :P 1 T 5 :P 1 T 0 :P 1 T 1 :P 1 release

Step 4: P j Removes Message – P j receives Release T m :P i from P i – P j removes Request T m :P i from request queue Time, Clocks, and the Ordering of Events in a Distributed System Total Ordering of Events : Example P1P1 P2P2 P3P3

Anomalous behavior Solution 1 : Attach a timestamp with T2:P2 P1P1 P2P2 P3P3 T 1 :P 2 T 3 :P 3 T1:P 2 T3:P 3 T2:P 2

Solution 2 : Physical Clocks Board diagram

Discussion Proof of concept Defines event ordering in distributed systems Building systems on top of Specifications, Set of assumptions ( No node failures, reliable channels ) Logical Clock counters – overflow issues ? Set of assumptions were strong ?

DISTRIBUTED SNAPSHOTS K. Mani Chandy and Leslie Lamport

Authors Leslie Lamport K Mani Chandy – Simon Ramo Professor, CalTech (since 1987) – Ph D, MIT, 1969 – Honeywell, IBM, UT Austin – IEEE Koji Kobayashi Award (1987) – Event driven architecture( marketed by Avaya )

Outline  Motivation  Cuts and Consistent cuts  Distributed system model  Distributed Snapshots  Global State Detection Algorithm  Properties of recorded Global state  Stability Detection

Motivation and Goals A process in a DS determines the global state of the system which is useful in detecting stability of the system Challenges : – Communication delay – Relative speeds of computations differ Stable properties examples: - Computation is terminated - kth computational phase is terminated - System is deadlocked State detection algorithm is similar to capturing panoramic view of migrating birds – Composite picture should be meaningful – Moving birds add complexity to the process

Distributed System Model State of processes and Channels – States of processes and Channels are defined by events and messages respectively Event e = - process p - s and s’ are states - channel c and message M

Cuts and Consistent Cuts

Consistent Global State A cut C is consistent if for all events e and e’ Intuitively if an event is part of a cut then all events that happened before it must also be part of the cut A consistent cut defines a consistent global state

Assumptions Processes do not fail Reliable communication channels FIFO delivery between a pair of processes Channels have infinite buffers

Distributed System Model Single Token System – compute global states QP QP QPQP In P In c In Q In c’

Nondeterministic System Distributed System Model Global state : S0 Global state : S1 Global state : S2 Global state: S3 Event P sends M Event P receives M’ Event Q sends M’ QPQPQPQP

Global State Detection Algorithm Marker Snapshot Protocol – P records its state and pushes an empty marker M on all outgoing channels – Q, when receives a marker M along its incoming channel c, If it was not in recording state, – Start recording Q’s state – Record c’s state as empty state – Pushes the Marker M onto all its outgoing channels If it was already in recording state, – Stops recording on incoming channel c, and records the state of c as the sequence of messages received since Q started recording and before Marker M is received on this channel When Q receives markers on all its incoming channels, stop recording the state of Q and its incoming channels and ‘call it a day’ for Q.

Properties of recorded global state

Stability Detection Algorithm – Start:definite=false, y(S 0 )=false – Repeat:record S*, definite=y(S*) Implications of “definite” – definite == false:can not say YES/NO stability – definite == true:stable property at termination Correctness – S 0 can lead to S*, S* can lead to S t – for all j: y(S j ) = y(S j +1)

Discussion ???

Other References WikiPedia Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms