Parallel Algorithms CS170 Fall 2016.

Parallel Algorithms CS170 Fall 2016

Parallel computation is here!
During your career, Moore’s Law will probably slow down a lot (possibly to a grinding halt…) Google’s engine (reportedly) has about 900,000 processors (recall Map-Reduce) The fastest supercomputers have > 107 cores and flops So, in an Algorithms course we must at least mention parallel algorithms

This lecture What are parallel algorithms, and how do they differ from (sequential) algorithms? What are the important performance criteria parallel algorithms? What are the basic tricks? What does the “landscape” look like? Sketches of two sophisticated parallel algorithms: MST and connected components

Parallel Algorithms need a completely new mindset!!
In sequential algorithms: We care about Time Acceptable: O(n), O(n log n), O(n2), O(|E||V|2)… Polynomial time Unacceptable: Exponential time 2n Sometimes unacceptable is the only possible: NP-complete problems How about in parallel algorithms?

To start, what is a parallel algorithm
To start, what is a parallel algorithm? What kinds of computers will it run on? PRAM Same clock, synchronous. Q: How about memory congestion? ShA: OK to Read concurrently, not OK to Write: CREW PRAM RAM … P processors

Language? Threads in Java, Python, etc.
Parallel languages facilitate parallel programming through syntax (parbegin/parend) In our pseudocode: Instead of “for every edge (u,v) in E do” we may say “for every edge (u,v) in E do in parallel”

And what do we care about?
Two things: Work = the total number of instructions executed by all processors Depth = clock time in parallel execution

And what is acceptable? Polynomial work Depth?
O(log n) -- or maybe O((log n)2) etc. Q: But how many processors? P = ? A: Pretend to have as many as you want! Saturate the problem with processors!

The reason: Brent’s Principle
If you can solve a problem with depth D and work W with as many processors as you want… …then you can also solve it with P processors with work O(W) and depth D’ = D + W/P

Proof Can simulate each parallel step t (work wt)
with ceil[wt/P] steps of our P processors Adding over all t, we get depth D’ < D + W/P time t P processors we have wt Processors each step

To recap: Brent’s Principle
If a problem can be solved in parallel with work W, depth D, and many processors, then it can be solved: with P processors the same work W’ = W and depth D’ = W/P +D

Toy Problem: Sum Sequential algorithm sum = 0; for i = 1 to n do sum = sum + A[i] return sum O(n) time In parallel?

function sum(A[1..n]) If n = 1 return A[1] for i = 1,…,n/2 do in parallel A[i] = A[2i -1] + A[2i] return sum(A[1..n/2])

 Work? Time? W(n) = W(n/2) + n/2  O(n) D(n) = D(n/2) + 2  O(log n)
Work efficient = same work as best sequential Depth log n (as little as possible) Important: sums, all sums: sums[j] = Σ1j A[i]

Another toy problem: compact
Given array make it into Also work efficient 

Another Basic Problem: Find-Root


Solution: pointer jumping
Repeat log n times: for every node v do in parallel if next[v] ≠ v next[v] = next[next[v]]

The parallel algorithm landscape
These are some of the very basic tricks of parallel algorithm design (like “divide and conquer” or “greedy” in algorithm design) There are a couple of others The go a long way, but not all the way… So, what happens to the problems we learned how to solve sequentially in CS170?

Matrix multiplication Merge sort FFT Connected components DFS/SCC
Shortest path MST LP, HornSAT Huffman Hackattack: (for i = 1 to n check if k[i] is the secret key)  (recall sum)  (redesign, pquicksort, radixsort)    (begging…)  (redesign)   (redesign)   (redesign)  (redesign)      (impossible, P-complete)   (redesign)      (embarrassing parallelism)

MST Prim? Applies the cut property to the component that contains S  sequential… Kruskal? Goes through the edges in sorted order  sequential ?

Borůvka’s Algorithm (1926)
(applies “cut principle” to all components at once) T = empty (the MST under construction) C (list of the cc’s of T) = [{1}, {2}, … , {n}] while |C| > 1 for each c in C do find the shortest edge out of c add it to T C = connected components of T

Little problem…  Solution (or: break edge ties lexicographically) 3
3.007  3 3 3.003 3.001

Borůvka’s Algorithm O(|E| log |V|)
T = empty C (list of the cc’s of T) = [{1}, {2}, … , {n}] while |C| > 1 for each c in C do log |V| stages find the shortest edge out of c O(|E|) and add it to T C = connected components of T O(V|)

Borůvka’s Algorithm in parallel?
T = empty C (list of the cc’s of T) = [{1}, {2}, … , {n}] while |C| > 1 for each c in C do in parallel find the shortest edge out of c W=|E|, D=log|V| add it to T C = connected components of T W = |E| log |V| D = log |V| Total: W = O(|E| log2 |V|), D = O(log2 |V|)

Borůvka’s Algorithm in parallel?
T = empty C (list of the cc’s of T) = [{1}, {2}, … , {n}] while |C| > 1 for each c in C do in parallel find the shortest edge out of c How??? add it to T C = connected components of T How??? Total: W = O(|E| log2 |V|), D = O(log2 |V|)

Connected Components function cc(V,E) returns array[V] of V
initialize: for every node v do in parallel: leader[v] = fifty-fifty(), ptr[v] = v for all non-leader node v do in parallel: chose an adjacent leader node u, if one exists, and set ptr[v] = u (ptr is now a bunch of stars) V’ = {v: ptr[v] = v} (the roots of the stars) E’ = {(u,v): u ≠ v in V’, there is (a, b) in E such that ptr[a] = u and ptr[b] = v} (“contract” the graph) label[] = cc(V’,E’) (compute cc recursively on the contracted graph) return cc[v] = label[ptr[v]]

Parallel Algorithms CS170 Fall 2016.

Similar presentations

Presentation on theme: "Parallel Algorithms CS170 Fall 2016."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallel Algorithms CS170 Fall 2016.

Similar presentations

Presentation on theme: "Parallel Algorithms CS170 Fall 2016."— Presentation transcript:

Similar presentations

About project

Feedback