6.852: Distributed Algorithms Spring, 2008 Instructors: Nancy Lynch, Victor Luchangco Teaching Assistant: Calvin Newport Course Secretary: Joanne Hanley.

Slides:



Advertisements
Similar presentations
Distributed Leader Election Algorithms in Synchronous Ring Networks
Advertisements

Impossibility of Distributed Consensus with One Faulty Process
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 6 Instructor: Haifeng YU.
Chapter 6 - Convergence in the Presence of Faults1-1 Chapter 6 Self-Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Shlomi Dolev, All Rights.
Lecture 8: Asynchronous Network Algorithms
Token-Dased DMX Algorithms n LeLann’s token ring n Suzuki-Kasami’s broadcast n Raymond’s tree.
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Chapter 15 Basic Asynchronous Network Algorithms
6.852: Distributed Algorithms Spring, 2008 Class 7.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Lecture 7: Synchronous Network Algorithms
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Self Stabilization 1.
Lecture 4: Elections, Reset Anish Arora CSE 763 Notes include material from Dr. Jeff Brumfield.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
CPSC 668Set 5: Synchronous LE in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
1 Complexity of Network Synchronization Raeda Naamnieh.
CPSC 668Set 1: Introduction1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 4: Asynchronous Lower Bound for LE in Rings1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Ordering and Consistent Cuts Presented By Biswanath Panda.
CPSC 668Set 3: Leader Election in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
CPSC 668Self Stabilization1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Impossibility of Distributed Consensus with One Faulty Process Michael J. Fischer Nancy A. Lynch Michael S. Paterson Presented by: Oren D. Rubin.
Leader Election in Rings
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 2 – Distributed Systems.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
Message Passing Systems A Formal Model. The System Topology – network (connected undirected graph) Processors (nodes) Communication channels (edges) Algorithm.
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
Election Algorithms and Distributed Processing Section 6.5.
Message Passing Systems A Formal Model. The System Topology – network (connected undirected graph) Processors (nodes) Communication channels (edges) Algorithm.
Distributed Computing 5. Synchronization Shmuel Zaks ©
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Selected topics in distributed computing Shmuel Zaks
Introduction Distributed Algorithms for Multi-Agent Networks Instructor: K. Sinan YILDIRIM.
Lecture #12 Distributed Algorithms (I) CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
Consensus and Its Impossibility in Asynchronous Systems.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE 668 Set 1: Introduction 1.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 10, 2005 Session 9.
6.852: Distributed Algorithms Spring, 2008 Class 13.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE
Leader Election. Leader Election: the idea We study Leader Election in rings.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
6.852: Distributed Algorithms Spring, 2008 Class 25-1.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 5: Synchronous LE in Rings 1.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
Lecture #14 Distributed Algorithms (II) CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Several sets of slides by Prof. Jennifer Welch will be used in this course. The slides are mostly identical to her slides, with some minor changes. Set.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
Chapter 21 Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE 668 Set 3: Leader Election in Rings 1.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
6.852: Distributed Algorithms Spring, 2008 Class 14.
Distributed Leader Election Krishnendu Mukhopadhyaya Indian Statistical Institute, Kolkata.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Lecture 9: Asynchronous Network Algorithms
Alternating Bit Protocol
Parallel and Distributed Algorithms
Lecture 8: Synchronous Network Algorithms
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Leader Election Ch. 3, 4.1, 15.1, 15.2 Chien-Liang Fok 4/29/2019
Presentation transcript:

6.852: Distributed Algorithms Spring, 2008 Instructors: Nancy Lynch, Victor Luchangco Teaching Assistant: Calvin Newport Course Secretary: Joanne Hanley

Distributed algorithms Algorithms that are supposed to work in distributed networks, or on multiprocessors. Accomplish tasks like: –Communication –Data management –Resource management –Consensus Must work in difficult settings: –Concurrent activity at many locations –Uncertainty of timing, order of events, inputs –Failure and recovery of machines/processors, of communication channels. So the algorithms can be complicated: –Hard to design –Hard to prove correct, analyze.

This course Theoretical, mathematical viewpoint Approach: –Define distributed computing environments. –Define abstract problems. –Describe algorithms that solve the problems. –Analyze complexity. –Identify inherent limitations, prove lower bounds and other impossibility results. Like sequential algorithms (6.046), but more complicated.

Distributed algorithms research Active area for years. PODC, DISC, SPAA, OPODIS; also ICDCS, STOC, FOCS,… Abstract problems derived from practice, in networking and multiprocessor programming. Static theory: –Assumes fixed network or shared-memory setting. –Set of participants, configuration, may be generally known. –In [Lynch text], [Attiya, Welch text], [Supplementary course papers] Dynamic theory: –Client/server, peer-to-peer, wireless, mobile ad hoc –Participants may join, leave, move. –Next term? Theory for modern multiprocessors: –Multicore processors –Transactional memory –New material for this term [Herlihy, Shavit book] All are still active research areas!

Administrative info (Handout 1) People and places What the course is about Prerequisites –Automata: Traditional courses (6.045, 6.840) study general theory of automata. We use automata as tools, to model processes, channels, … Important for clear understanding, because the algorithms are complicated. Source material Course requirements –Readings –Problem sets Tempo language, for writing pseudocode Code should (at least) syntax-check! –Grading Going over code in class: Bring book, papers.

Topic overview (Handout 2) Variations in model assumptions: –IPC method: Shared memory, message-passing –Timing: Synchronous (rounds) Asynchronous (arbitrary speeds) Partially synchronous (some timing assumptions, e.g., bounds on message delay, processor speeds, clock rates) –Failures: Processor: Stopping, Byzantine, occasional total state corruption Communication: Message loss, duplication; Channel failures, recovery Most important factor is the timing model! Synchronous model: Classes 1-6. –Basic, easy to program. –Not realistic, but sometimes can emulate in worse-behaved networks. –Impossibility results for synch networks carry over to worse networks. Asynchronous: Most of the rest of the course –Realistic, but hard to cope with. Partially synchronous: Last few classes. –Somewhere in between.

Detailed topic overview Synchronous networks: –Model –Leader election (symmetry-breaking) –Network searching, spanning trees –Processor failures: Stopping and Byzantine –Fault-tolerant consensus: Algorithms and lower bounds –Other problems: Commit, k-agreement Asynchronous model (I/O automata) Asynchronous networks, no failures: –Model –Leader election, network searching, spanning trees, revisited. –Synchronizers (used to run synch algs in asynch networks) –Logical time, replicated state machines. –Stable property detection (termination, deadlock, snapshots).

More topics Asynchronous shared-memory systems, no failures: –Model –Mutual exclusion algorithms and lower bounds –Resource allocation; Dining Philosophers Asynchronous shared-memory, with failures –Impossibility of consensus –Atomic (linearizable) objects, atomic read/write objects, atomic snapshots –Wait-free computability; wait-free consensus; wait-free vs. f-fault-tolerant objects Shared-memory multiprocessors –Contention, caching, locality –Practical mutual exclusion algorithms –Reader/writer locks –List algorithms: locking algorithms, optimistic algorithms. –Lock-freedom –Transactional memory: Compositionality, obstruction-freedom, disjoint- access parallelism.

Still more topics Asynchronous networks with failures –Asynchronous networks vs. asynchronous shared-memory –Impossibility of consensus, revisited –Failure detectors and consensus –Paxos consensus algorithm Self-stabilizing algorithms Partially-synchronous systems –Models –Mutual exclusion, consensus –Clock synchronization

Now start the course… Rest of today: –Synchronous network model –Leader election problem, in simple ring networks Reading: Chapter 2; Sections Next: Sections 3.6,

Synchronous network model Processes (or processors) at nodes of a network digraph, communicate using messages. Digraph: G = (V,E), n = |V| –out-nbrs i, in-nbrs i –distance(i,j), number of hops on shortest path –diam = max i,j distance(i,j) M: Message alphabet, plus  placeholder For each i in V, a process consisting of : –states i, a nonempty, not necessarily finite, set of states –start i, nonempty subset of states i –msgs i : states i  out-nbrs i  M  {  } –trans i : states i  vectors (indexed by in-nbrs i ) of M  {  }  states i Executes in rounds: –Apply msgs i to determine messages to send, –send and collect messages, –apply trans i to determine new state.

Remarks No restriction on amount of local computation. Deterministic (a simplification). Can define “halting states”, but not used as accepting states as in traditional automata theory. Later, can consider some complications: –Variable start times –Failures –Random choices Inputs and outputs: Can encode in the states, e.g., in special input and output variables.

Executions Formal notion used to describe how an algorithm operates. Definition (p. 20): –State assignment: Mapping from process indices to states. –Message assignment: Mapping from ordered pairs of process indices to M  {  }. –Execution: C 0, M 1, N 1, C 1, M 2, N 2, C 2,… C’s are state assignments M’s are messages sent N’s are messages received Infinite sequence Message assignments

Leader election Network of processes. Want to distinguish exactly one, as the “leader”. Eventually, exactly one process should output “leader” (set special status variable to “leader”). Motivation: Leader can take charge of: –Communication –Coordinating data processing (e.g., in commit protocols) –Allocating resources –Scheduling tasks –Coordinating consensus protocols –…–…

Special case: Ring network Variations: –Unidirectional or bidirectional –Ring size n known or unknown Numbered clockwise Processes don’t know the numbers; know neighbors as “clockwise” and “counterclockwise”. Theorem 1: Suppose all processes are identical (same sets of states, transition functions, etc.). Then it’s impossible to elect a leader, even under the best assumptions (bidirectional communication, ring size n known to all)

Proof of Theorem 1 By contradiction. Assume an algorithm that solves the problem. Assume WLOG that each process has exactly one start state (could choose same one for all). Then there is exactly one execution C 0, M 1, N 1, C 1, M 2, N 2, C 2,… Prove by induction on the number r of completed rounds that all processes are in identical states after r rounds. –Generate same messages, to corresponding neighbors. –Receive same messages. –Make the same state changes. Since the algorithm solves the leader election problem, someone eventually gets elected. Then everyone gets elected, contradiction.

So we need something more… To solve the problem at all, we need something more---a way of distinguishing the processes. Assume processes have unique identifiers (UIDs), which they “know”. –Formally, e.g., each process starts with its own UID in a special state variable. UIDs are elements of some data type, with specified operations, e.g.: –Arbitrary totally-ordered set with just ( ) comparisons. –Integers with full arithmetic. Different UIDs can be anywhere in the ring.

A basic algorithm [LeLann] [Chang, Roberts] Assumes: –Unidirectional communication (clockwise) –Processes don’t know n –UIDs, comparisons only Idea: –Each process sends its UID in a msg, to be relayed step-by-step around the ring. –When process receives a UID, compares with its own. –If incoming is: Bigger, pass it on. Smaller, discard. Equal, process declares itself the leader. –Elects process with the largest UID.

In terms of formal model: M = message alphabet = set of UIDs states i : Consists of values for state variables: –u, holds its own UID –send, a UID or , initially its own UID –status, one of {?, leader}, initially ? start i : Defined by the initializations msgs i : Send contents of send variable, to clockwise nbr. trans i : –Defined by pseudocode (p. 28): if incoming = v, a UID, then case v > u: send := v v = u: status := leader v < u: no-op endcase –Entire block of code is treated as atomic

Correctness proof Prove exactly one process ever gets elected leader. More strongly: –Let i max be the process with the max UID, u max. –Prove: i max outputs “leader” by end of round n. No other process ever outputs “leader”.

i max outputs “leader” after n rounds Prove by induction on number of rounds. But need to strengthen, to say what happens after r rounds, 0  r  n. Lemma 2: For 0  r  n-1, after r rounds, send at process (i max + r) mod n contains u max. That is, u max is making its way around the ring. Proof of Lemma 2: –Induction on r. –Base: Check the initialization. –Inductive step: Key fact: Everyone else lets u max pass through. Use Lemma 2 for r = n-1, and a little argument about the n th round to show the correct output happens. Key fact: i max uses arrival of u max as signal to set its status to leader.

Uniqueness No one except i max ever outputs “leader”. Again, strengthen claim: Lemma 3: For any r  0, after r rounds, if i  i max and j is any process in the interval [i max,i), then j’s send doesn’t contain u i. Thus, u i doesn’t get past i max when moving around the ring. i max i j Proof: –Induction on r. –Key fact: i max discards u i (if no one has already). Use Lemma 3 to show that no one except i max ever receives its own UID, so never elects itself.

Invariant proofs Lemmas 2 and 3 are examples of invariants---properties that are true in all reachable states. Another invariant: If r = n then the status variable of i max = leader. Invariants are usually proved by induction on the number of steps in an execution. –May need to strengthen, to prove by induction. –Inductive step requires case analysis. In this class: –We’ll outline key steps of invariant proofs, not present all details. –We’ll assume you could fill in the details if you had to. –Try some examples in detail. Invariant proofs may seem like overkill here, but –Similar proofs work for asynchronous algorithms, and even partially synchronous algorithms. –The properties, and proofs, are more subtle in those setting. Main method for proving properties of distributed algorithms.

Complexity bounds What to measure? –Time = number of rounds until “leader”: n –Communication = number of single-hop messages:  n 2 Variations: –Non-leaders announce “non-leader”: Any process announces “non-leader” as soon as it sees a UID higher than its own. No extra costs. –Everyone announces who the leader is: At end of n rounds, everyone knows the max. –No extra costs. –Relies on synchrony and knowledge of n. Or, leader sends a special “report” message around the ring. –Time:  2n –Communication:  n 2 + n –Doesn’t rely on synchrony or knowledge of n.

Halting Add halt states, special “looping” states from which all transitions leave the state unchanged, and that generate no messages. For all problem variations: –Can halt after n rounds. Depends on synchrony and knowledge of n. –Or, halt after receiving leader’s “report” message. Does not depend on synchrony or knowledge of n Q: Can a process just halt (in basic problem) after it sees and relays some UID larger than its own? A: No---it still has to stay alive to relay messages.

Reducing the communication complexity [Hirschberg, Sinclair] O(n log n), rather than O(n 2 ) Assumptions: –Bidirectional communication –Ring size not known. –UIDs with comparisons only Idea: –Successive doubling strategy Used in many distributed algorithms where network size is unknown. –Each process sends a UID token in both directions, to successively greater distances (double each time). –Going outbound: Token is discarded if it reaches a node whose UID is bigger. –Going inbound: Everyone passes the token back. –Process begins next phase only if/when it gets its token back. –Process that gets its own token in outbound direction, elects itself the leader.

In terms of formal model: Needs local process description. Involves bookkeeping, with hop counts. LTTR (p. 33)

Complexity bounds Time: –Worse than [LCR] but still O(n). –Time for each phase is twice the previous, so total time is dominated by last complete phase (geometric series). –Last phase is O(n), so total is also.

Communication bound: O(n log n) 1 +  log n  phases: 0,1,2,… Phase 0: All send messages both ways,  4n messages. Phase k > 0: –Within any block of 2 k consecutive processes, at most one is still alive at the start of phase k. Others’ tokens are discarded in earlier phases, stop participating. –So at most  n / (2 k-1 + 1)  start phase k. –Total number of messages at phase k is at most 4 (2 k  n / (2 k-1 + 1)  )  8n So total communication is as most 8 n (1 +  log n  ) = O(n log n) Out and back, both directions New distance

Non-comparison-based algorithms Q: Can we improve on worst-case O(n log n) messages to elect a leader in a ring, if UIDs can be manipulated using arithmetic? A: Yes, easily! Consider case where: –n is known –Ring is unidirectional –UIDs are positive integers, allowing arithmetic. Algorithm: –Phases 1,2,3,…each consisting of n rounds –Phase k Devoted to UID k. If process has UID k, circulates it at beginning of phase k. Others who receive it pass it on, then become passive (or halt). –Elects min

Complexity bounds Communication: –Just n (one-hop) messages Time: –u min n –Not practical, unless the UIDs are small integers. Q: What if n is unknown? A: Can still get O(n) messages, though now the time is even worse: O(2 umin n). –VariableSpeeds algorithm, Section –Different UIDs travel around the ring at different speeds, smaller UIDs traveling faster –UID u moves 1 hop every 2 u rounds. –Smallest UID gets all the way around before next smallest has gone half-way, etc.

Next time… Lower bound on communication for leader election in rings. Basic computational tasks in general synchronous networks: –Leader election, breadth-first search, shortest paths, broadcast and convergecast. Sections 3.6,