Cloud Computing Concepts

Slides:



Advertisements
Similar presentations
Global States.
Advertisements

Distributed Snapshots: Determining Global States of Distributed Systems Joshua Eberhardt Research Paper: Kanianthra Mani Chandy and Leslie Lamport.
CS 542: Topics in Distributed Systems Diganta Goswami.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Reliable Multicast Steve Ko Computer Sciences and Engineering University at Buffalo.
Uncoordinated Checkpointing The Global State Recording Algorithm.
Uncoordinated Checkpointing The Global State Recording Algorithm Cristian Solano.
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
CS542 Topics in Distributed Systems Diganta Goswami.
Distributed Computing 5. Snapshot Shmuel Zaks ©
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Logical Clocks and Global State.
Distributed Systems Dinesh Bhat - Advanced Systems (Some slides from 2009 class) CS 6410 – Fall 2010 Time Clocks and Ordering of events Distributed Snapshots.
CS 582 / CMPE 481 Distributed Systems
Causality & Global States. P1 P2 P Physical Time 4 6 Include(obj1 ) obj1.method() P2 has obj1 Causality violation occurs when order.
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Slides for Chapter 10: Time and Global State
20101 Synchronization in distributed systems A collection of independent computers that appears to its users as a single coherent system.
Ordering and Consistent Cuts Presented by Chi H. Ho.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Composition Model and its code. bound:=bound+1.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Logical Clocks and Global State.
Lecture 3-1 Computer Science 425 Distributed Systems Lecture 3 Logical Clock and Global States/ Snapshots Reading: Chapter 11.4&11.5 Klara Nahrstedt.
1 Distributed Systems CS 425 / CSE 424 / ECE 428 Global Snapshots Reading: Sections 11.5 (4 th ed), 14.5 (5 th ed)  2010, I. Gupta, K. Nahrtstedt, S.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Lecture 6-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) September 12, 2013 Lecture 6 Global Snapshots Reading:
Distributed Snapshot. Think about these -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes?
“Virtual Time and Global States of Distributed Systems”
Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS?
Distributed Systems Fall 2010 Logical time, global states, and debugging.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
Distributed Snapshot. One-dollar bank Let a $1 coin circulate in a network of a million banks. How can someone count the total $ in circulation? If not.
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 5 Instructor: Haifeng YU.
Chapter 21 Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture 4-1 Computer Science 425 Distributed Systems (Fall2009) Lecture 4 Chandy-Lamport Snapshot Algorithm and Multicast Communication Reading: Section.
1 Chapter 11 Global Properties (Distributed Termination)
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Lecture 9: Multicast Sep 22, 2015 All slides © IG.
Distributed Systems Lecture 6 Global states and snapshots 1.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Global state and snapshot
CSE 486/586 Distributed Systems Reliable Multicast --- 1
PROTOCOL CORRECTNESS Tutorial 3 Theoretical
Global state and snapshot
Vector Clocks and Distributed Snapshots
CSE 486/586 Distributed Systems Global States
Global State and Gossip
EECS 498 Introduction to Distributed Systems Fall 2017
湖南大学-信息科学与工程学院-计算机与科学系
Slides for Chapter 14: Time and Global States
Time And Global Clocks CMPT 431.
Chapter 5 (through section 5.4)
Slides for Chapter 11: Time and Global State
Slides for Chapter 14: Time and Global States
ITEC452 Distributed Computing Lecture 8 Distributed Snapshot
CSE 486/586 Distributed Systems Global States
Jenhui Chen Office number:
CIS825 Lecture 5 1.
Slides for Chapter 14: Time and Global States
CSE 486/586 Distributed Systems Reliable Multicast --- 1
Chandy-Lamport Example
Distributed Snapshot.
Presentation transcript:

Cloud Computing Concepts Indranil Gupta (Indy) Topic: Snapshots Lecture A: What is a Global Snapshot? All slides © IG

Here’s a Snapshot Wikimedia commons

Distributed Snapshot More often, each country’s representative is sitting in their respective capital, and sending messages to each other (say emails). How do you calculate a “global snapshot” in that distributed system? What does a “global snapshot” even mean?

In the Cloud In a cloud: each application or service is running on mulitple servers Servers handling concurrent events and interacting with each other The ability to obtain a “global photograph” of the system is important Some uses of having a global picture of the system Checkpointing: can restart distributed application on failure Garbage collection of objects: objects at servers that don’t have any other objects (at any servers) with pointers to them Deadlock detection: Useful in database transaction systems Termination of computation: Useful in batch computing systems like Folding@Home, SETI@Home

What’s a Global Snapshot? Global Snapshot = Global State = Individual state of each process in the distributed system + Individual state of each communication channel in the distributed system Capture the instantaneous state of each process And the instantaneous state of each communication channel, i.e., messages in transit on the channels

Obvious First Solution Synchronize clocks of all processes Ask all processes to record their states at known time t Problems? Time synchronization always has error Your bank might inform you, “We lost the state of our distributed cluster due to a 1 ms clock skew in our snapshot algorithm.” Also, does not record the state of messages in the channels Again: synchronization not required – causality is enough!

Example Pi Pj Cij Cji

Pi Pj Cij Cji [$1000, 100 iPhones] [$600, 50 Androids] [empty] [Global Snapshot 0]

Pi Pj Cij Cji [$701, 100 iPhones] [$600, 50 Androids] [empty] [$299, Order Android ] [Global Snapshot 1]

Pi Pj Cij Cji [$701, 100 iPhones] [$101, 50 Androids] [$499, Order iPhone] [Global Snapshot 2] [$299, Order Android ]

Pi Pj Cij Cji [$1200, 1 iPhone order from Pj, 100 iPhones] [$101, 50 Androids] [empty] [Global Snapshot 3] [$299, Order Android ]

[ ($299, Order Android ), (1 iPhone) ] Pi Pj Cij Cji [$1200, 100 iPhones] [$101, 50 Androids] [Global Snapshot 4] [empty]

[ (1 iPhone) ] Pi Pj Cij Cji [$1200, 100 iPhones] [$400, 1 Android order from Pi, 50 Androids] [Global Snapshot 5] [empty]

[empty] Pi Pj Cij Cji [$1200, 100 iPhones] [$400, 1 Android order from Pi, 50 Androids, 1 iPhone] [Global Snapshot 6] … and so on …

Moving from State to State Whenever an event happens anywhere in the system, the global state changes Process receives message Process sends message Process takes a step State to state movement obeys causality Next: Causal algorithm for Global Snapshot calculation

Cloud Computing Concepts Indranil Gupta (Indy) Topic: Snapshots Lecture B: Global Snapshot Algorithm

System Model Problem: Record a global snapshot (state for each process, and state for each channel) System Model: N processes in the system There are two uni-directional communication channels between each ordered process pair : Pj  Pi and Pi  Pj Communication channels are FIFO-ordered First in First out No failure All messages arrive intact, and are not duplicated Other papers later relaxed some of these assumptions

Requirements Snapshot should not interfere with normal application actions, and it should not require application to stop sending messages Each process is able to record its own state Process state: Application-defined state or, in the worst case: its heap, registers, program counter, code, etc. (essentially the coredump) Global state is collected in a distributed manner Any process may initiate the snapshot We’ll assume just one snapshot run for now

Chandy-Lamport Global Snapshot Algorithm First, Initiator Pi records its own state Initiator process creates special messages called “Marker” messages Not an application message, does not interfere with application messages for j=1 to N except i Pi sends out a Marker message on outgoing channel Cij (N-1) channels Starts recording the incoming messages on each of the incoming channels at Pi: Cji (for j=1 to N except i)

Chandy-Lamport Global Snapshot Algorithm (2) Whenever a process Pi receives a Marker message on an incoming channel Cji if (this is the first Marker Pi is seeing) Pi records its own state first Marks the state of channel Cji as “empty” for j=1 to N except i Pi sends out a Marker message on outgoing channel Cij Starts recording the incoming messages on each of the incoming channels at Pi: Cji (for j=1 to N except i) else // already seen a Marker message Mark the state of channel Cji as all the messages that have arrived on it since recording was turned on for Cji

Chandy-Lamport Global Snapshot Algorithm (3) The algorithm terminates when All processes have received a Marker To record their own state All processes have received a Marker on all the (N-1) incoming channels at each To record the state of all channels Then, (if needed), a central server collects all these partial state pieces to obtain the full global snapshot

Example Example A B C D E P1 Time E F G P2 H I J P3 Message Instruction or Step

A B C D E P1 Time E F G P2 H I J P3 P1 is Initiator: Record local state S1, Send out markers Turn on recording on channels C21, C31 P2 Time P1 P3 A B C D E E F G H I J

A B C D E P1 Time E F G P2 H I J P3 S1, Record C21, C31 First Marker! Record own state as S3 Mark C13 state as empty Turn on recording on other incoming C23 Send out Markers

A B C D E P1 Time E F G P2 H I J P3 S1, Record C21, C31 S3

A B C D E P1 Time E F G P2 H I J P3 Duplicate Marker! State of channel C31 = < > S1, Record C21, C31 P2 Time P1 P3 A B C D E E F G H I J S3 C13 = < > Record C23

A B C D E P1 Time E F G P2 H I J P3 C31 = < > S1, Record C21, C31 C31 = < > P2 Time P1 P3 A B C D E E F G H I J First Marker! Record own state as S2 Mark C32 state as empty Turn on recording on C12 Send out Markers S3 C13 = < > Record C23

A B C D E P1 Time E F G P2 H I J P3 C31 = < > S1, Record C21, C31 C31 = < > P2 Time P1 P3 A B C D E E F G H I J S2 C32 = < > Record C12 S3 C13 = < > Record C23

A B C D E P1 Time E F G P2 H I J P3 C31 = < > S1, Record C21, C31 C31 = < > P2 Time P1 P3 A B C D E E F G H I J S2 C32 = < > Record C12 Duplicate! C12 = < > S3 C13 = < > Record C23

A B C D E P1 Time E F G P2 H I J P3 Duplicate! C21 = <message GD > S1, Record C21, C31 C31 = < > P2 Time P1 P3 A B C D E E F G H I J S2 C32 = < > Record C12 C12 = < > S3 C13 = < > Record C23

A B C D E P1 Time E F G P2 H I J P3 C21 = <message GD > S1, Record C21, C31 C31 = < > P2 Time P1 P3 A B C D E E F G H I J S2 C32 = < > Record C12 C12 = < > S3 C13 = < > Record C23 Duplicate! C23 = < >

Algorithm has Terminated C21 = <message GD > S1 C31 = < > P2 Time P1 P3 A B C D E E F G H I J S2 C32 = < > C12 = < > S3 C13 = < > C23 = < >

Collect the Global Snapshot Pieces C21 = <message GD > S1 C31 = < > P2 Time P1 P3 A B C D E E F G H I J S2 C32 = < > C12 = < > S3 C13 = < > C23 = < >

Next Global Snapshot calculated by Chandy-Lamport algorithm is causally correct What?

Cloud Computing Concepts Indranil Gupta (Indy) Topic: Snapshots Lecture C: Consistent Cuts

Cuts Cut = time frontier at each process and at each channel Events at the process/channel that happen before the cut are “in the cut” And happening after the cut are “out of the cut”

Consistent Cuts Consistent Cut: a cut that obeys causality A cut C is a consistent cut if and only if: for (each pair of events e, f in the system) Such that event e is in the cut C, and if f  e (f happens-before e) Then: Event f is also in the cut C

Example A B C D E P1 Time E F G P2 H I J P3 Inconsistent Cut G  D, but only D is in cut Consistent Cut

Our Global Snapshot Example … Time P1 P3 A B C D E E F G H I J S1 S3 C13 = < > C31 = < > S2 C32 = < > C12 = < > C21 = <message GD > C23 = < >

… is causally correct A B C D E P1 Time E F G P2 H I J P3 C21 = <message GD > S2 C32 = < > S3 C13 = < > Consistent Cut captured by our Global Snapshot Example C23 = < >

In fact… Any run of the Chandy-Lamport Global Snapshot algorithm creates a consistent cut

Chandy-Lamport Global Snapshot algorithm creates a consistent cut Let’s quickly look at the proof Let ei and ej be events occurring at Pi and Pj, respectively such that ei  ej (ei happens before ej) The snapshot algorithm ensures that if ej is in the cut then ei is also in the cut. That is: if ej  <Pj records its state>, then it must be true that ei  <Pi records its state>.

Chandy-Lamport Global Snapshot algorithm creates a consistent cut if ej  <Pj records its state>, then it must be true that ei  <Pi records its state>. By contradiction, suppose ej  <Pj records its state> and <Pi records its state>  ei Consider the path of app messages (through other processes) that go from ei  ej Due to FIFO ordering, markers on each link in above path will precede regular app messages Thus, since <Pi records its state>  ei , it must be true that Pj received a marker before ej Thus ej is not in the cut => contradiction

Next What is the Chandy-Lamport algorithm used for?

Cloud Computing Concepts Indranil Gupta (Indy) Topic: Snapshots Lecture D: Safety and Liveness

“Correctness” in Distributed Systems Can be seen in two ways Liveness and Safety Often confused – it’s important to distinguish from each other

Liveness Liveness = guarantee that something good will happen, eventually Eventually == does not imply a time bound, but if you let the system run long enough, then …

Liveness: Examples Liveness = guarantee that something good will happen, eventually Eventually == does not imply a time bound, but if you let the system run long enough, then … Examples in Real World Guarantee that “at least one of the atheletes in the 100m final will win gold” is liveness A criminal will eventually be jailed Examples in a Distributed System Distributed computation: Guarantee that it will terminate “Completeness” in failure detectors: every failure is eventually detected by some non-faulty process In Consensus: All processes eventually decide on a value

Safety Safety = guarantee that something bad will never happen

Safety: Examples Safety = guarantee that something bad will never happen Examples in Real World A peace treaty between two nations provides safety War will never happen An innocent person will never be jailed Examples in a Distributed System There is no deadlock in a distributed transaction system No object is orphaned in a distributed object system “Accuracy” in failure detectors In Consensus: No two processes decide on different values

Can’t we Guarantee both? Can be difficult to satisfy both liveness and safety in an asynchronous distributed system! Failure Detector: Completeness (Liveness) and Accuracy (Safety) cannot both be guaranteed by a failure detector in an asynchronous distributed system Consensus: Decisions (Liveness) and correct decisions (Safety) cannot both be guaranteed by any consensus protocol in an asynchronous distributed system Very difficult for legal systems (anywhere in the world) to guaranteed that all criminals are jailed (Liveness) and no innocents are jailed (Safety)

In the language of Global States Recall that a distributed system moves from one global state to another global state, via causal steps Liveness w.r.t. a property Pr in a given state S means S satisfies Pr, or there is some causal path of global states from S to S’ where S’ satisfies Pr Safety w.r.t. a property Pr in a given state S means S satisfies Pr, and all global states S’ reachable from S also satisfy Pr

Using Global Snapshot Algorithm Chandy-Lamport algorithm can be used to detect global properties that are stable Stable = once true, stays true forever afterwards Stable Liveness examples Computation has terminated Stable Non-Safety examples There is a deadlock An object is orphaned (no pointers point to it) All stable global properties can be detected using the Chandy-Lamport algorithm Due to its causal correctness

Summary The ability to calculate global snapshots in a distributed system is very important But don’t want to interrupt running distributed application Chandy-Lamport algorithm calculates global snapshot Obeys causality (creates a consistent cut) Can be used to detect stable global properties Safety vs. Liveness