Parallel Algorithms CS170 Fall 2016.

Slides:



Advertisements
Similar presentations
Lecture 3: Parallel Algorithm Design
Advertisements

IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture10.
Minimum Spanning Trees Definition Two properties of MST’s Prim and Kruskal’s Algorithm –Proofs of correctness Boruvka’s algorithm Verifying an MST Randomized.
More Graph Algorithms Minimum Spanning Trees, Shortest Path Algorithms.
1 Greedy 2 Jose Rolim University of Geneva. Algorithmique Greedy 2Jose Rolim2 Examples Greedy  Minimum Spanning Trees  Shortest Paths Dijkstra.
Data Structures, Spring 2004 © L. Joskowicz 1 Data Structures – LECTURE 13 Minumum spanning trees Motivation Properties of minimum spanning trees Kruskal’s.
Minimum Spanning Tree Algorithms
CSE 780 Algorithms Advanced Algorithms Minimum spanning tree Generic algorithm Kruskal’s algorithm Prim’s algorithm.
Lecture 12 Minimum Spanning Tree. Motivating Example: Point to Multipoint Communication Single source, Multiple Destinations Broadcast – All nodes in.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
External Memory Algorithms Kamesh Munagala. External Memory Model Aggrawal and Vitter, 1988.
More Graph Algorithms Weiss ch Exercise: MST idea from yesterday Alternative minimum spanning tree algorithm idea Idea: Look at smallest edge not.
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
Minimum Spanning Trees What is a MST (Minimum Spanning Tree) and how to find it with Prim’s algorithm and Kruskal’s algorithm.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
2IL05 Data Structures Fall 2007 Lecture 13: Minimum Spanning Trees.
CS223 Advanced Data Structures and Algorithms 1 Review for Final Neil Tang 04/27/2010.
Data Structures and Algorithms in Parallel Computing Lecture 2.
Union-Find  Application in Kruskal’s Algorithm  Optimizing Union and Find Methods.
SPANNING TREES Lecture 20 CS2110 – Fall Spanning Trees  Definitions  Minimum spanning trees  3 greedy algorithms (incl. Kruskal’s & Prim’s)
CSE 340: Review (at last!) Measuring The Complexity Complexity is a function of the size of the input O() Ω() Θ() Complexity Analysis “same order” Order.
Page :Algorithms in the Real World Parallelism: Lecture 3 Parallel techniques and algorithms - Contraction
P, NP, and NP-Complete Problems Section 10.3 The class P consists of all problems that can be solved in polynomial time, O(N k ), by deterministic computers.
Lecture 12 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
Growth of Functions & Algorithms
Lecture 3: Parallel Algorithm Design
Graphs Representation, BFS, DFS
Chapter 9 : Graphs Part II (Minimum Spanning Trees)
Unit 1. Sorting and Divide and Conquer
Lecture 26 CSE 331 Nov 2, 2016.
Lecture 2: Parallel computational models
Great Theoretical Ideas in Computer Science
Minimum Spanning Trees
Lecture 12 Algorithm Analysis
Greedy Algorithms / Minimum Spanning Tree Yin Tat Lee
PRAM architectures, algorithms, performance evaluation
Minimum-Cost Spanning Tree
CSCE350 Algorithms and Data Structure
i206: Lecture 14: Heaps, Graphs intro.
Minimum Spanning Tree.
Minimum Spanning Tree.
Graphs & Graph Algorithms 2
Search Related Algorithms
Minimum-Cost Spanning Tree
Data Structures – LECTURE 13 Minumum spanning trees
Lecture 26 CSE 331 Nov 1, 2017.
CSE838 Lecture notes copy right: Moon Jung Chung
Outline This topic covers Prim’s algorithm:
Minimum-Cost Spanning Tree
CS 583 Analysis of Algorithms
Minimum Spanning Trees
Kruskal’s Minimum Spanning Tree Algorithm
Lecture 12 Algorithm Analysis
Minimum Spanning Tree Algorithms
Lecture 27 CSE 331 Oct 31, 2014.
Lecture 28 CSE 331 Nov 7, 2012.
Lecture 27 CSE 331 Nov 2, 2010.
Minimum Spanning Tree.
Lecture 14 Shortest Path (cont’d) Minimum Spanning Tree
CS 584 Project Write up Poster session for final Due on day of final
Greedy Algorithms Comp 122, Spring 2004.
Lecture 12 Algorithm Analysis
Total running time is O(E lg E).
Some Graph Algorithms.
Major Design Strategies
Lecture 13 Shortest Path (cont’d) Minimum Spanning Tree
Review for Final Neil Tang 05/01/2008
Minimum-Cost Spanning Tree
Minimum Spanning Trees
Lecture 27 CSE 331 Nov 1, 2013.
Presentation transcript:

Parallel Algorithms CS170 Fall 2016

Parallel computation is here! During your career, Moore’s Law will probably slow down a lot (possibly to a grinding halt…) Google’s engine (reportedly) has about 900,000 processors (recall Map-Reduce) The fastest supercomputers have > 107 cores and 1017-18 flops So, in an Algorithms course we must at least mention parallel algorithms

This lecture What are parallel algorithms, and how do they differ from (sequential) algorithms? What are the important performance criteria parallel algorithms? What are the basic tricks? What does the “landscape” look like? Sketches of two sophisticated parallel algorithms: MST and connected components

Parallel Algorithms need a completely new mindset!! In sequential algorithms: We care about Time Acceptable: O(n), O(n log n), O(n2), O(|E||V|2)… Polynomial time Unacceptable: Exponential time 2n Sometimes unacceptable is the only possible: NP-complete problems How about in parallel algorithms?

To start, what is a parallel algorithm To start, what is a parallel algorithm? What kinds of computers will it run on? PRAM Same clock, synchronous. Q: How about memory congestion? ShA: OK to Read concurrently, not OK to Write: CREW PRAM RAM … P processors

Language? Threads in Java, Python, etc. Parallel languages facilitate parallel programming through syntax (parbegin/parend) In our pseudocode: Instead of “for every edge (u,v) in E do” we may say “for every edge (u,v) in E do in parallel”

And what do we care about? Two things: Work = the total number of instructions executed by all processors Depth = clock time in parallel execution

And what is acceptable? Polynomial work Depth? O(log n) -- or maybe O((log n)2) etc. Q: But how many processors? P = ? A: Pretend to have as many as you want! Saturate the problem with processors!

The reason: Brent’s Principle If you can solve a problem with depth D and work W with as many processors as you want… …then you can also solve it with P processors with work O(W) and depth D’ = D + W/P

Proof Can simulate each parallel step t (work wt) with ceil[wt/P] steps of our P processors Adding over all t, we get depth D’ < D + W/P time t P processors we have wt Processors each step

To recap: Brent’s Principle If a problem can be solved in parallel with work W, depth D, and many processors, then it can be solved: with P processors the same work W’ = W and depth D’ = W/P +D

Toy Problem: Sum Sequential algorithm sum = 0; for i = 1 to n do sum = sum + A[i] return sum O(n) time In parallel?

function sum(A[1..n]) If n = 1 return A[1] for i = 1,…,n/2 do in parallel A[i] = A[2i -1] + A[2i] return sum(A[1..n/2]) 2 3 1 7 5 4 8 6 5 8 9 14 13 23 36

 Work? Time? W(n) = W(n/2) + n/2  O(n) D(n) = D(n/2) + 2  O(log n) Work efficient = same work as best sequential Depth log n (as little as possible) Important: sums, all sums: sums[j] = Σ1j A[i]

Another toy problem: compact Given array 2 0 0 1 0 4 3 0 0 0 6 0 0 0 0 1 make it into 2 1 4 3 6 1 Also work efficient 

Another Basic Problem: Find-Root 

Solution: pointer jumping Repeat log n times: for every node v do in parallel if next[v] ≠ v next[v] = next[next[v]]

The parallel algorithm landscape These are some of the very basic tricks of parallel algorithm design (like “divide and conquer” or “greedy” in algorithm design) There are a couple of others The go a long way, but not all the way… So, what happens to the problems we learned how to solve sequentially in CS170?

Matrix multiplication Merge sort FFT Connected components DFS/SCC Shortest path MST LP, HornSAT Huffman Hackattack: (for i = 1 to n check if k[i] is the secret key)  (recall sum)  (redesign, pquicksort, radixsort)    (begging…)  (redesign)   (redesign)   (redesign)  (redesign)      (impossible, P-complete)   (redesign)      (embarrassing parallelism)

MST Prim? Applies the cut property to the component that contains S  sequential… Kruskal? Goes through the edges in sorted order  sequential ?

Borůvka’s Algorithm (1926) (applies “cut principle” to all components at once) T = empty (the MST under construction) C (list of the cc’s of T) = [{1}, {2}, … , {n}] while |C| > 1 for each c in C do find the shortest edge out of c add it to T C = connected components of T

Little problem…  Solution (or: break edge ties lexicographically) 3 3.007  3 3 3.003 3.001

Borůvka’s Algorithm O(|E| log |V|) T = empty C (list of the cc’s of T) = [{1}, {2}, … , {n}] while |C| > 1 for each c in C do log |V| stages find the shortest edge out of c O(|E|) and add it to T C = connected components of T O(V|)

Borůvka’s Algorithm in parallel? T = empty C (list of the cc’s of T) = [{1}, {2}, … , {n}] while |C| > 1 for each c in C do in parallel find the shortest edge out of c W=|E|, D=log|V| add it to T C = connected components of T W = |E| log |V| D = log |V| Total: W = O(|E| log2 |V|), D = O(log2 |V|)

Borůvka’s Algorithm in parallel? T = empty C (list of the cc’s of T) = [{1}, {2}, … , {n}] while |C| > 1 for each c in C do in parallel find the shortest edge out of c How??? add it to T C = connected components of T How??? Total: W = O(|E| log2 |V|), D = O(log2 |V|)

Connected Components function cc(V,E) returns array[V] of V initialize: for every node v do in parallel: leader[v] = fifty-fifty(), ptr[v] = v for all non-leader node v do in parallel: chose an adjacent leader node u, if one exists, and set ptr[v] = u (ptr is now a bunch of stars) V’ = {v: ptr[v] = v} (the roots of the stars) E’ = {(u,v): u ≠ v in V’, there is (a, b) in E such that ptr[a] = u and ptr[b] = v} (“contract” the graph) label[] = cc(V’,E’) (compute cc recursively on the contracted graph) return cc[v] = label[ptr[v]]