1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed Algorithms
Overview These lectures: Parallel machine –Prefix computation Distributed computing –Consensus problem 2Lectures on Parallel and Distributed Algorithms
3 Parallel machine - model Set of n processors and m memory cells Computation in synchronized rounds: –During one round each processor does either of local computation step (constant local cache) read/write to shared memory Minimize: –Time –Work (total number of processors steps) –Number of processors –Additional memory Lectures on Parallel and Distributed Algorithms
4 Types of parallel machines EREW: Exclusive Read Exclusive Write CREW: Concurrent Read Exclusive Write ERCW: Exclusive Read Concurrent Write CRCW: Concurrent Read Concurrent Write In each round a cell can be either read or written Exclusive Read/Write: only one processor can read/write to a memory cell during one round Concurrent Read/Write: many processors can read/write to a memory cell during one round Concurrent Write: arbitrary, maximum, sum, etc. Lectures on Parallel and Distributed Algorithms
5 Problem - prefix computation Input: m memory cells with integers Goal: for each cell i compute a function F(1,i), where F( , ) is such that –F(i,k) can be computed in constant time from F(i,j) and F(j+1,k) for any j between i and k –F(i,i) is a value stored originally in cell i Examples: –Computing a maximum (for every prefix) –Computing a sum (for every prefix) Lectures on Parallel and Distributed Algorithms
6 CRCW - simple solution Let the result of the concurrent writing of two processors be according to the function F( , ) m memory cells, m additional memory cells, m 2 processors Algorithm: Processor with Id im+j reads cell i j m and then writes the value to cell j Time: 2 Memory: m Work: O(m 2 ) Lectures on Parallel and Distributed Algorithms
7 EREW - algorithm m memory cells, n = m/log m processors Additional array M[1…n] Recursive Algorithm: Parallel Preprocessing: each processor i sequentially computes functions F(i log m + 1, i log m + 1),…, F(i log m + 1,(i+1)log m) then writes M[i] := F(i log m + 1,(i+1)log m) Parallel Recursion (pointer jumping): in step 1 t log n if i - 2 t-1 > 0 then a processor with ID i reads M[i - 2 t-1 ] and combines it with its current value M[i] -- as if M[i - 2 t-1 ] correspond to F((i - 2 t ) log m + 1, (i - 2 t-1 ) log m) and as if M[i] correspond to F((i - 2 t-1 ) log m + 1, i log m) -- and writes the result to M[i] Parallel Post-processing: each processor i sequentially computes functions F(1, i log m + 1),…,F(1, (i+1)log m) using value F(1, i log m) stored in M[i] and previously computed (in preprocessing part) values F(i log m + 1, i log m + 1), …, F(i log m + 1,(i+1)log m) Lectures on Parallel and Distributed Algorithms
8 Analysis Correctness: It is sufficient to show that after step t of recursive part each location M[i] contains computed value F(max{1, (i - 2 t ) log m + 1}, i log m) Proof by induction: for t = 1 it follows from initialization of M and preprocessing part; the inductive step follows immediately from the recursive algorithm Memory: O(n) for additional memory M used during recursion or none if modify the original values Time: O(log m) Parallel preprocessing and post-processing: O(log m) Parallel recursion: O(log m) Work: O(m) time O(log m) times number of processors O(m/log m) Lectures on Parallel and Distributed Algorithms
9 Conclusions Prefix computation –Finding maximum/minimum –Computing sums for all m prefixes, in optimal logarithmic time and linear work Lectures on Parallel and Distributed Algorithms
10 Textbook and Questions How to modify the prefix algorithms for smaller/larger number of processors? There is given a regular expression containing braces of type ( ) and [ ]. How to check in parallel, in logarithmic time, if it is a proper expression (each open brace has its corresponding closing counterpart)? Is it easier if there is only one kind of braces in the expression? Lectures on Parallel and Distributed Algorithms
11 Distributed message-passing model Set of n processors/processes with different IDs {p 1,...,p n } In each step each processor can either (depending on the algorithm) –send a message to any subset of other processors –receive incoming messages –perform local computation Computation can be either (depending on the adversary) –in synchronized rounds: in a round every processor performs three steps: local computation, sending and receiving, e.g., (p 1,p 2, p 3 ), (p 1,p 2, p 3 ), (p 1,p 2, p 3 ),... –in asynchronous pattern: steps are done according to some arbitrary order unknown to the processors, e.g., p 1,p 2,p 2,p 3,p 2,p 3,p 2,p 1,... Lectures on Parallel and Distributed Algorithms
12 Fault-tolerance Failures in the system: Lack of synchrony: unknown order of steps is generated by the adversary Processors’ crashes: adversary decides which processors crash and chooses steps for these events Messages are lost (not properly sent or received): malicious processors/links are selected by the adversary Byzantine failures: processors may cheat, e.g., can behave on the way described above, mess up content of messages, pretend they have different ID, etc. Lectures on Parallel and Distributed Algorithms
13 Analysis of distributed algorithms Designing the algorithm, our goal is to prove: Correctness: because the lack of central information and because of failures Termination: because of the lack of central control Efficiency: –Time –Work (total number of processors steps) –Number of messages sent –Total size of messages sent Lectures on Parallel and Distributed Algorithms
14 Consensus in synchronous crash model Consensus: Each processor has its initial value Goal: processors decide on the same value among initial ones We require from the algorithm: –Agreement: no two processors decide on different value –Termination: each processor decides eventually unless fails –Validity: if all initial values are the same then this value is a decision Lectures on Parallel and Distributed Algorithms
15 Model for consensus problem We consider model with crash failures (easier than others, e.g., Byzantine failures): a processor stops every activity, and messages sent during crash are delivered or lost arbitrarily (depending on the adversary) Asynchronous: impossible to solve even if one processor can crash Synchronous: requires at least f + 1 rounds if f processors crash Consensus can be viewed as a kind of maximum-finding problem: lets agree on the largest initial value (although could be easier, since we could agree on any initial value) Lectures on Parallel and Distributed Algorithms
16 Flooding algorithm for consensus f-resilient algorithm : algorithm that solves consensus problem if at most f crashes occur Flooding Algorithm: During each round 1 j f + 1 each processor sends to all other processors all the initial values about which it has already learnt Decision of a processor : if the set of collected initial values is a singleton then decide on this value, otherwise decide on default value (e.g., maximum) Lectures on Parallel and Distributed Algorithms
17 Flooding algorithm - example 4 processors, f = 2 crashes, default: maximum InitR1R2R3Decision p 1 : p 2 : 00, p 3 : 0 00,10,1 1 p 4 : ,1 1 Lectures on Parallel and Distributed Algorithms
18 Analysis of Flooding algorithm Agreement: there is a round j (clean) when no crash occurs. During this round all non-faulty processors exchange messages, hence sets of collected values will be the same after this round. Obviously they will not change after this round, and consequently all non-faulty processors decide the same Termination: after round f + 1 Validity: if all initial values are the same, set of collected initial values is always a singleton, and decision is on this value; otherwise on max among received values Message complexity - total number of messages sent: O(f n 2 ) Lectures on Parallel and Distributed Algorithms
19 Decreasing message complexity Modification of the algorithm: Processor sends messages to all processors during the first round and during round j > 1 only if in the previous round it has learnt about a new initial value Termination and Validity remain the same Agreement: similar argument; the only difference that the message exchange may not happen in a clean round, but by the end of the clean round: all previously learnt values were sent before this round, new ones are sent during this round Communication: there are constant number of different values and each of them causes sending it as newly learnt value at most n times, each time to at most n-1 processors, hence in total O(n 2 ) messages. Lectures on Parallel and Distributed Algorithms
20 Conclusion and Reading Distributed models –Message-passing –Synchronous/asynchronous –Fault-tolerance Distributed problems and algorithms –Consensus in synchronous crash setting Textbook: Johnsonbaugh, Schaefer: Algorithms, Chapter 12 Attiya, Welch: Distributed Computing, Chapter 5 Lectures on Parallel and Distributed Algorithms