Networks and Distributed Snapshot Sukumar Ghosh Department of Computer Science University of Iowa.

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks Power law graphs Small world graphs.
Advertisements

Fault Tolerance. Basic System Concept Basic Definitions Failure: deviation of a system from behaviour described in its specification. Error: part of.
Global States.
Small-world networks.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Byzantine Generals. Outline r Byzantine generals problem.
CS542 Topics in Distributed Systems Diganta Goswami.
The Byzantine Generals Problem Boon Thau Loo CS294-4.
The Byzantine Generals Problem Leslie Lamport, Robert Shostak, Marshall Pease Distributed Algorithms A1 Presented by: Anna Bendersky.
Prepared by Ilya Kolchinsky.  n generals, communicating through messengers  some of the generals (up to m) might be traitors  all loyal generals should.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Byzantine Generals Problem: Solution using signed messages.
Small-World Graphs for High Performance Networking Reem Alshahrani Kent State University.
Global State Collection. Global state collection Some applications - computing network topology - termination detection - deadlock detection Chandy-Lamport.
Distributed Snapshot (continued)
Byzantine Generals Problem Anthony Soo Kaim Ryan Chu Stephen Wu.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
Peer-to-Peer and Grid Computing Exercise Session 3 (TUD Student Use Only) ‏
Advanced Topics in Data Mining Special focus: Social Networks.
The Byzantine Generals Strike Again Danny Dolev. Introduction We’ll build on the LSP presentation. Prove a necessary and sufficient condition on the network.
Cloud Computing Concepts
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
The Byzantine Generals Problem Leslie Lamport Robert Shostak Marshall Pease.
Peer-to-Peer and Social Networks Random Graphs. Random graphs E RDÖS -R ENYI MODEL One of several models … Presents a theory of how social webs are formed.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
UBI529 Distributed Algorithms
Lecture #12 Distributed Algorithms (I) CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Small-world networks. What is it? Everyone talks about the small world phenomenon, but truly what is it? There are three landmark papers: Stanley Milgram.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
Review for Exam 2. Topics included Deadlock detection Resource and communication deadlock Graph algorithms: Routing, spanning tree, MST, leader election.
Ch11 Distributed Agreement. Outline Distributed Agreement Adversaries Byzantine Agreement Impossibility of Consensus Randomized Distributed Agreement.
Distributed Snapshot. Think about these -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes?
1 The Byzantine Generals Problem Leslie Lamport, Robert Shostak, Marshall Pease Presented by Radu Handorean.
Distributed Systems Fall 2010 Logical time, global states, and debugging.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Distributed Snapshot. One-dollar bank Let a $1 coin circulate in a network of a million banks. How can someone count the total $ in circulation? If not.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
Performance Evaluation Lecture 1: Complex Networks Giovanni Neglia INRIA – EPI Maestro 10 December 2012.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
Token-passing Algorithms Suzuki-Kasami algorithm The Main idea Completely connected network of processes There is one token in the network. The holder.
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
Distributed Systems Lecture 6 Global states and snapshots 1.
Consistent cut A cut is a set of events.
The OM(m) algorithm Recall what the oral message model is.
Peer-to-Peer and Social Networks
Distributed Snapshot.
Alternating Bit Protocol
Distributed Consensus
Distributed Consensus
Distributed Snapshot.
Peer-to-Peer and Social Networks Fall 2017
Peer-to-Peer and Social Networks
Consensus in Synchronous Systems: Byzantine Generals Problem
Distributed Snapshot Distributed Systems.
ITEC452 Distributed Computing Lecture 8 Distributed Snapshot
Distributed Snapshot.
Byzantine Generals Problem
Consistent cut If this is not true, then the cut is inconsistent
Distributed Snapshot.
Presentation transcript:

Networks and Distributed Snapshot Sukumar Ghosh Department of Computer Science University of Iowa

Contents Part 1. The evolution of network topologies Part 2. Distributed snapshot Part 3. Tolerating failures

Random Graphs How a connected topology evolves in the real world Erdös-Rényi graphs (ER graphs) Power-law graphs Small-world graphs

Random graphs: Erdös-Rényi model ER model is one of several models of random graphs Presents a theory of how social webs are formed. Start with a set of isolated nodes Connect each pair of nodes with a probability The resulting graph is known as

Erdös-Rényi model ER model is different from the model The model randomly selects one from the entire family of graphs with nodes and edges.

Properties of ER graphs Property 1. The expected number of edges is Property 2. The expected degree per node is Property 3. The expected diameter of is [ deg = expected degree of a node ]

Diameter of a network Let denote the distance of the shortest path between a pair of nodes and. For all such pairs of nodes, the largest value of is known as the diameter of the network.

Degree distribution in random graphs Probability that a node connects with a given set of nodes (and not to the remaining remaining nodes) is One can choose out of the remaining nodes in ways. So the probability distribution is (binomial distribution)

Degree distribution in random graphs N(k) = Number of nodes with degree k

Properties of ER graphs -- When, an ER graph is a collection of disjoint trees. -- When suddenly one giant (connected) component emerges. Other components have a much smaller size [Phase change]

Properties of ER graphs When the graph is almost always connected These give “ideas” about how a network can evolve. But not all random topologies are ER graphs! For example, social networks are often “clustered”, but ER graphs have poor (i.e. very low) clustering coefficient (what is clustering coefficient?)

Clustering coefficient For a given node, its local clustering coefficient (CC) measures what fraction of its various pairs of neighbors are neighbors of each other. CC(B) = 3/6 = ½CC(D) = 2/3 = CC(E) B’s neighbors are {A,C,D,E}. Only (A,D), (D,E), (E,C) are connected CC of a graph is the mean of the CC of its various nodes

The connectors Malcom Gladwell, a staff writer at the New Yorker magazine describes in his book The Tipping Point, a simple experiment to measure how social a person is.  He started with a list of 248 last names  A person scores a point if he or she knows someone with a last name from this list. If he/she knows three persons with the same last name, then he/she scores 3 points

The connectors (Outcome of the Tipping Point experiment) Altogether 400 persons from different groups were tested. It was found that (min) 9, (max) 118 {from a random sample} (min) 16, (max) 108 {from a highly homogeneous group} (min) 2, (max) 95 {from a college class} [Conclusion: Some people are very social, even in small or homogeneous samples. They are connectors]

Connectors Barabási observed that connectors are not unique to human society only, but true for many complex networks ranging from biology to computer science, where there are some nodes with an anomalously large number of links. This was not quite expected in ER graphs. The world wide web, the ultimate forum of democracy, is not a random network, as Barabási’s web-mapping project revealed..

Anatomy of the World Wide Web Barabási experimented with the Univ. of Notre Dame’s web.  325,000 pages  270,000 pages (i.e. 82%) had three or fewer links  42 had incoming links each.  The entire WWW exhibited even more disparity. 90% had ≤ 10 links, whereas a few (4-5) like Yahoo were referenced by close to a million pages! These are the hubs of the web. They help create short paths between nodes (mean distance = 19 for WWW obtained via extrapolation). (Some dispute this figure now)

Power law graph The degree distribution in of the web pages in the World Wide Web follow a power-law. In a power-law graph, the number of nodes with degree satisfies the condition. Also known as scale-free graph. Other examples are -- Income and number of people with that income -- Magnitude and number of earthquakes of that magnitude -- Population and number of cities with that population

Random vs. Power-law Graphs The degree distribution in of the web pages in the World Wide Web follows a power-law

Ra ndom vs. Power-Law networks

Example: Airline Routes Think of how new routes are added to an existing network

Preferential attachment New node Existing network A new node connects with an existing node with a probability proportional to its degree. The sum of the node degrees = 8 Also known as “Rich gets richer” policy

Preferential attachment continued Barabási and Albert showed that when large networks are formed via preferential attachment, the resulting graph exhibits a power-law distribution of the node degrees.

Other properties of power law graphs  Graphs following a power-law distribution have a small diameter (n = number of nodes).  The clustering coefficient decreases as the node degree increases (power law)  Graphs following a power-law distribution tend to be highly resilient to random edge removal, but quite vulnerable to targeted attacks on the hubs.

The small-world model Due to Watts and Strogatz (1998) They followed up on Milgram’s work (on six degrees of separation) and reason about why there is a small degree of separation between individuals in a social network. Research originally inspired by Watt’s efforts to understand the synchronization of cricket chirps, which show a high degree of coordination over long ranges, as though the insects are being guided by an invisible conductor. Disease spreads faster over a small-world network.

Questions not answered by Milgram Milgram’s experiment tried to validate the theory of six degrees of separation between any two individuals on the planet. Why six degrees of separation? Any scientific reason? What properties do these social graphs have? Are there other situations in which this model is applicable? Time to reverse engineer this.

What are small-world graphs Completely regular Small-world graphs (n >> k > ln (n) >1) Completely random n = number of nodes, k = number of neighbors of each node

Completely regular A ring lattice If then Diameter is too large!

Completely random Diameter is small, but the Clustering coefficient is small too!

Small-world graphs Start with the regular graph, and with probability p rewire each link to a randomly selected node. It results in a graph that has high clustering coefficient but low diameter …

Small-world graphs Small- world properties hold

Limitation of Watts-Strogatz model Jon Kleinberg argued Watts-Strogatz small-world model illustrates the existence of short paths between pairs of nodes. But it does not give any clue about how those short paths will be discovered. A greedy search for the destination will not lead to the discovery of these short paths.

Kleinberg’s Small-World Model Consider an grid. Each node has a link to every node at lattice distance (short range neighbors) & long range links. Choose long-range links at lattice distance with a probability proportional to r = 2 p = 1, q = 2 n n

Results Theorem 1. There is a constant (depending on and but independent of ), such that when, the expected delivery time of any decentralized algorithm is at least

More results Theorem 2. There is a decentralized algorithm A and a constant dependent on and but independent of so that when and, the expected delivery time of A is at most

Variation of search time with r Exponent r Log T

Distributed Snapshot

Think about these  How many messages are in transit on the internet?  What is the total cash reserve in the Bank of America?  How many cars are on the streets of Kolkata now?  How much pollutants are there in the air (or water now)?  What are most people in the US thinking about the election? How do we compute these?

UAV surveillance of traffic

Importance of snapshots Major uses in - data collection - surveillance - deadlock detection - termination detection - rollback recovery - global predicate computation

Importance of snapshots A snapshot may consist of the internal states of the recording processes, or it may consist of the state of external shared objects updated by an updater process.

Distributed Snapshot: First Case Assume that the snapshot consists of the internal states of the recording processes. The main issue is synchronization. An ad hoc combination of the local snapshots will not lead to a meaningful distributed snapshot.

One-dollar bank Let a $1 coin circulate in a network of a million banks. How can someone count the total $ in circulation? If not counted “properly,” then one may think the total $ in circulation to be one million.

Review Causal Ordering Causality helps identify sequential and concurrent events in distributed systems, since clocks are not always reliable. 1.Local ordering: a  b  c (based on the local clock) 2. Message sent  message received [Thus joke  Re: joke] 3. If a  b and b  c then a  c (  implies causally ordered before or happened before relation)

Consistent cut If this is not true, then the cut C is inconsistent A cut is a set of events. If a cut C is consistent then time

Consistent snapshot The set of states immediately following the events (actions) in a consistent cut forms a consistent snapshot of a distributed system. A snapshot that is of practical interest is the most recent one. Let C1 and C2 be two consistent cuts and. Then C2 is more recent than C1. Analyze why certain cuts in the one-dollar bank are inconsistent.

Consistent snapshot How to record a consistent snapshot? Note that 1. The recording must be non-invasive. 2. Recording must be done on-the-fly. You cannot stop the system.

Chandy-Lamport Algorithm Works on a (1)strongly connected graph (2) each channel is FIFO. An initiator initiates the algorithm by sending out a marker ( )

White and red processes Initially every process is white. When a process receives a marker, it turns red and remain red Every action by a process, and every message sent by a process gets the color of that process. So, white action = action by a white process red action = action by a red process white message = message sent by a white process red message = message sent by a red process

Two steps Step 1. In one atomic action, the initiator (a) Turns red (b) Records its own state (c) sends a marker along all outgoing channels Step 2. Every other process, upon receiving a marker for the first time (and before doing anything else) (a) Turns red (b) Records its own state (c) sends markers along all outgoing channels The algorithm terminates when (1) every process turns red, and (2) Every process has received a marker through each incoming channel.

Why does it work? Lemma 1. No red message is received in a white action.

Why does it work? Theorem. The global state recorded by Chandy-Lamport algorithm is equivalent to the ideal snapshot state SSS. Hint. A pair of actions (a, b) can be scheduled in any order, if there is no causal order between them, so (a; b) is equivalent to (b; a) SSS Easy conceptualization of the snapshot state All whiteAll red

Why does it work? Let an observer observe the following actions : w[i] w[k] r[k] w[j] r[i] w[l] r[j] r[l] … ≡ w[i] w[k] w[j] r[k] r[i] w[l] r[j] r[l] …[Lemma 1] ≡ w[i] w[k] w[j] r[k] w[l] r[i] r[j] r[l] …[Lemma 1] ≡ w[i] w[k] w[j] w[l] r[k] r[i] r[j] r[l] …[done!] Recorded state

Example 1: Count the tokens Let us verify that Chandy-Lamport snapshot algorithm correctly counts the tokens circulating in the system A B C How to account for the channel states? Compute this using the sent and received variables for each process. tokenno token token no token A B C token Are these consistent cuts? 1 2

Example 2: Communicating State Machines

Something unusual Let machine i start Chandy-Lamport snapshot before it has sent M along ch1. Also, let machine j receive the marker after it sends out M’ along ch2. Observe that the snapshot state is SSS =down ∅ upM’ Doesn’t this appear strange? This state was never reached during the computation!

Understanding snapshot

The observed state is a feasible state that is reachable from the initial configuration. It may not actually be visited during a specific execution. The final state of the original computation is always reachable from the observed state.

Discussions What good is a snapshot if that state has never been visited by the system? - It is relevant for the detection of stable predicates. - Useful for checkpointing.

Discussions What if the channels are not FIFO? Study how Lai-Yang algorithm works. It does not use any marker LY1. The initiator records its own state. When it needs to send a message m to another process, it sends a message (m, red). LY2. When a process receives a message (m, red), it records its state if it has not already done so, and then accepts the message m. Question 1. Why will it work? Question 1 Are there any limitations of this approach?

Food for thought Distributed snapshot = distributed read. Distributed reset = distributed write How difficult is distributed reset?

Distributed debugging ( Marzullo and Neiger, 1991) observer Distributed system e, VC(e)

Distributed debugging Uses vector clocks. S ij is a global state after the i th action by process 0 and the j th action by process 1

Distributed debugging Possibly ϕ : At least one consistent global state S is reachable from the initial global state, such that φ (S) = true. Definitely ϕ : All computations pass through some consistent global state S such that φ (S) = true. Never ϕ : No computation passes through some consistent global state S such that φ (S) = true. Definitely ϕ ⇒ Possibly ϕ

Examples ϕ = x+y =12(true at S 21 )Possibly ϕ ϕ = x+y > 15(true at S 31 )Definitely ϕ ϕ = x=y=5(true at S 40 and S 22 ) Never ϕ *Neither S 40 nor S 22 are consistent states*

Distributed Snapshot: Second case The snapshot consists of the external observations of the recording processes -- distributed snapshots of shared external objects. 1.How many cars are on the streets now? 2.How many trees have been downed by the storm?

Distributed snapshot of shared objects

The first algorithm 012i

Algorithm double collect function read while true X[0..n-1] := collect; Y[0..n-1] := collect; if ∀ i ∈ {0,..,n-1} location i was not changed between two collects then return Y; end function update (i,v) M[i] := v; end

Limitations of double collect Read may never terminate! Why? We need a better algorithm that guarantees termination.

Coordinated snapshot Engage multiple readers and ask them to record snapshots at the same time. It will work if the writer is sluggish and the clocks are accurately synchronized.

Faulty recorder Assume that there are n recorders. Each records a snapshot and shares with the others, so that each can form a complete snapshot. Easy when all recorders record correctly and transmit the information reliably. But what if one or more recorders are faulty or the communication is error prone?

Distributed Consensus Consensus is very important to take coordinated action. How can the recorders reach consensus in presence of communication failure? It reduces to the classic Byzantine Generals Problem

Byzantine Generals Problem Describes and solves the consensus problem on the synchronous model of communication. The network topology is a completely connected graph. Processes undergo byzantine failures, the worst possible kind of failure. Shows the power of the adversary.

Byzantine Generals Problem n generals {0, 1, 2,..., n-1} decide about whether to "attack" or to "retreat" during a particular phase of a war. The goal is to agree upon the same plan of action. Some generals may be "traitors" and therefore send either no input, or send conflicting inputs to prevent the "loyal" generals from reaching an agreement. Devise a strategy, by which every loyal general eventually agrees upon the same plan, regardless of the action of the traitors.

Byzantine Generals Attack = 1 Retreat = 0 {1, 1, 0, 0 } Every general will broadcast his/her judgment to everyone else. These are inputs to the consensus protocol. {1, 1, 0, 1 } {1, 1, 0, 0 } traitor The traitor may send conflicting input values

Byzantine Generals We need to devise a protocol so that all peers (call it a lieutenant) receives the same value from any given general (call it a commander). Clearly, the lieutenants will have to use secondary information. Note that the roles of the commander and the lieutenants will rotate among the generals.

Interactive consistency specifications IC1. Every loyal lieutenant receives the same order from the commander. IC2. If the commander is loyal, then every loyal lieutenant receives the order that the commander sends. commander lieutenants

The Communication Model Oral Messages 1. Messages are not corrupted in transit. (why? if the message gets altered then blame the sender) 2. Messages can be lost, but the absence of message can be detected. 3. When a message is received (or its absence is detected), the receiver knows the identity of the sender (or the defaulter). OM(m) represents an interactive consistency protocol in presence of at most m traitors.

An Impossibility Result Using oral messages, no solution to the Byzantine Generals problem exists with three or fewer generals and one traitor. Consider the two cases: In (a), to satisfy IC2, lieutenant 1 must trust the commander, but in IC2, the same idea leads to the violation of IC1.

Impossibility result (continued ) Using oral messages, no solution to the Byzantine Generals problem exists with 3m or fewer generals and m traitors (m > 0). The proof is by contradiction. Assume that such a solution exists. Now, divide the 3m generals into three groups of m generals each, such that all the traitors belong to one group. Let one general simulate each of these three groups. This scenario is equivalent to the case of three generals and one traitor. We already know that such a solution does not exist.

The OM(m) algorithm Recursive algorithm OM(m) OM(m-1) OM(m-2) OM(0) OM(m) = Consensus Algorithm with oral messages in presence of up to m traitors OM(0) = Direct broadcast

The OM(m) algorithm 1. Commander i sends out a value v (0 or 1) 2. If m > 0, then every lieutenant j ≠ i, after receiving v, acts as a commander and initiates OM(m-1) with everyone except i. 3. Every lieutenant, collects (n-1) values: (n-2) values received from the lieutenants using OM(m-1), and one direct value from the commander. Then he picks the majority of these values as the order from i

Example of OM(1)

Example of OM(2) OM(2) OM(1) OM(0)

Proof of OM(m) Lemma. Let the commander be loyal, and n > 2m + k, where m = maximum number of traitors. Then OM(k) satisfies IC2

Proof of OM(m) Proof If k=0, then the result trivially holds. Let it hold for k = r (r > 0) i.e. OM(r) satisfies IC2. We have to show that it holds for k = r + 1 too. By definition n > 2m+r+1, so n-1 > 2m+r So OM(r) holds for the lieutenants in the bottom row. Each loyal lieutenant collects n-m-1 identical good values and m bad values. So bad values are voted out (n-m-1 > m+r implies n-m-1 > m) “OM(r) holds” means each loyal lieutenant receives identical values from every loyal commander

The final theorem Theorem. If n > 3m where m is the maximum number of traitors, then OM (m) satisfies both IC1 and IC2. Proof. Consider two cases: Case 1. Commander is loyal. The theorem follows from the previous lemma (substitute k = m). Case 2. Commander is a traitor. We prove it by induction. Base case. m=0 trivial. (Induction hypothesis) Let the theorem hold for m = r. (Inductive step) We have to show that it holds for m = r+1 too.

Proof (continued) There are n > 3(r + 1) generals and r + 1 traitors. Excluding the commander, there are > 3r+2 generals of which there are r traitors. So > 2r+2 lieutenants are loyal. Since 3r+ 2 > 3.r, OM(r) satisfies IC1 and IC2 > 2r+2 r traitors

Proof (continued) In OM(r+1), a loyal lieutenant chooses the majority from (1) > 2r+1 values obtained from the loyal lieutenants via OM(r), (2) the r values from the traitors, and (3) the value directly from the commander. > 2r+2 r traitors The set of values collected in part (1) & (3) are the same for all loyal lieutenants – it is the same set of values that these lieutenants received from the commander. Also, by the induction hypothesis, in part (2) each loyal lieutenant receives identical values from each traitor. So every loyal lieutenant eventually collects the same set of values.

Conclusion 1.Distributed snapshot of shared objects can be tricky when the writer does not cooperate 2.Approximate snapshots is useful for a rough view. 3.Failures add new twist to the recording of snapshots. 4.Much work remains to be done for the upper layers of snapshot integration (What can you make out from a trail of Twitter data with not much correlation?)