How to Parallelize an Algorithm Lecture 2
Today’s Outline Quiz Functional Programming Review Algorithm Parallelization Announcements Projects Readings Start by explaining a specific instantiation of the map reduce
Quiz This one is graded (unlike the last one)
Fold & Map review Fold: foldl f z [] = z Map: map f [] = [] foldl f z (x:xs) = foldl f (f z x) xs [foldr f z (x:xs) = f x (foldr f z xs)] Applies a function to every element in the list Each iteration can access the result of the previous Map: map f [] = [] map f (x:xs) = f x : map xs Applies a function to every element in the list Each iteration is independent of all others How would you parallelize Map? Fold?
Group Exercises See handout
Answers to Group Exercises Concat: concat xss = foldr (++) [] xss Given a list of lists, concats all sublists Group: group xss = foldl group_func [] xss group_func (result) (k,v) = if (has (k,v) result) then (map (update (k,v)) result) else (k,v) :: result update (k1,v1) (k2,v_list) = if (EQ k1 k2) then (k1, v1::vlist) (k2,v_list)
What issues are there in parallelization? Why Parallelize? Reasons to parallelize: Your reasons here Reasons to not parallelize: Your reasons here Reasons to parallelize: - Scalability - Better utilization of resources Reasons to not parallelize: - Problem isn’t easily parallelizable - No extra resources to use - Overhead of coordination larger than benefits of parallelization What issues are there in parallelization?
Implicit Serialization Example DocInfo: f = read_file("file.txt”) words = count_uniq_words(f) spaces = count_spaces(f) f = capitalize(f) f = remove_punct(f) words2 = count_uniq_words(f) puts(“unique words: ” + words) puts(“num spaces: ” + spaces) puts(“unique scrubbed words: ” + words2) The point of this exercise is to introduce the concept that imperative programming introduces a lot of unnecessary serialization. That, the only “real” required serialization is a data dependency. Which statements can be reordered?
Data Dependency Graph Which operations can be done in parallel? f = read_file("file.txt”) words = count_uniq_words(f) spaces = count_spaces(f) f = capitalize(f) f = remove_punct(f) words2 = count_uniq_words(f) puts(“unique words” + words) puts(“num spaces” + spaces) puts(“unique scrubbed words” + words2) read_file capitalize f0 f1 remove_punct count_uniq_words count_spaces f2 words spaces puts Show the dataflow dependency. Ask students to imagine a system where you could automatically generate and understand this graph. console0 puts count_uniq_words console1 puts Which operations can be done in parallel? console2 words2
Distributing Computation read_file capitalize Ram f0 f1 Cpu1 remove_punct Storage count_uniq_words count_spaces Cpu2 f2 words spaces puts Cpu3 Takes 5 steps console0 puts count_uniq_words console1 Cpu4 puts console2 words2
Eliminating Dependencies 1 of 3 read_file capitalize f0 f1 Synchronization points? f[0,1,2] console[0,1,2] remove_punct count_uniq_words count_spaces Ideas for removing them: Your ideas here f2 words spaces puts Show that you can shuffle operations and unnecessary side effects around by removing state from the shared context console0 puts count_uniq_words console1 puts console2 words2
Eliminating Dependencies 2 of 3 captialize, remove_punct can be combined and run first to create a copy of the data before “counting”. DocInfo 2.0: f = read_file("file.txt”) scrubbed_f = scrub_words(f) words = count_uniq_words(f) spaces = count_spaces(f) words2 = count_uniq_words(scrubbed_f) puts(“unique words” + words) puts(“num spaces” + spaces) puts(“unique scrubbed words” + words2) Show that you can shuffle operations and unnecessary side effects around by removing state from the shared context
Dependency Graph 2.0 scrub_words read_file f scrubbed_f f = read_file("file.txt”) scrubbed_f = scrub_words(f) words = count_uniq_words(f) spaces = count_spaces(f) words2 = count_uniq_words(scrubbed_f) puts(“unique words” + words) puts(“num spaces” + spaces) puts(“unique scrubbed words” + words2) scrub_words read_file f scrubbed_f count_uniq_words count_uniq_words count_spaces words spaces puts words2 This model graph shows you can break dependencies by copying the data console0 puts console1 puts console2
Distributing Computation 2.0 scrub_words read_file Ram f scrubbed_f Cpu1 Storage count_uniq_words count_uniq_words count_spaces Cpu2 words spaces puts words2 Cpu3 2 steps console0 puts console1 Cpu4 puts console2
Eliminating Dependencies 3 of 3 captialize, remove_punct only needs to be applied to each word (not the whole file) before “counting”. DocInfo 3.0: f = read_file("file.txt”) words = count_uniq_words(f) spaces = count_spaces(f) words2 = count_uniq_scrubbed_words(f) puts(“unique words” + words) puts(“num spaces” + spaces) puts(“unique scrubbed words” + words2) Show that you can shuffle operations and unnecessary side effects around by removing state from the shared context
Dependency Graph 3.0 read_file f0 f = read_file("file.txt”) words = count_uniq_words(f) spaces = count_spaces(f) words2 = count_uniq_scrubbed_words(f) puts(“unique words” + words) puts(“num spaces” + spaces) puts(“unique scrubbed words” + words2) read_file f0 count_uniq_words count_spaces count_uniq_scrubbed_words words spaces puts The point of this exercise is to introduce the concept that imperitive programming introduces a lot of unecessary serialization. That, the only “real” required serialization is a data dependency. console0 words2 puts console1 puts console2
Distributing Computation 3.0 read_file Ram f0 Cpu1 Storage count_uniq_words count_spaces count_uniq_scrubbed_words Cpu2 words spaces puts Cpu3 2 steps console0 words2 puts console1 Cpu4 puts console2
Parallelization Summary Parallelization tradeoff: Good: better scalability Bad: less algorithmic flexibility, higher complexity Neutral: optimizes for large input over small input Why avoid data dependencies? lowers complexity makes parallelization possible How do you avoid data dependencies? avoid stateful algorithms avoid side effects (clone state instead of modifying) avoid global variables and member variables
Parallelizing Map Definition of map: map f [] = [] map f (x:xs) = f x : map xs What’s required to parallelize map? function needs to be stateless function available to each computation unit input accessible by all computation units output ordering isn’t important
Parallelizing Fold Definition of fold: foldl f z [] = z foldl f z (x:xs) = foldl f (f z x) xs What’s required to parallelize fold? You can’t. Why can’t you parallelize fold? Each step depends on the result of the previous. How is fold useful in parallel computing then?
MapReduce maps a fold over the sorted result of a map! mapreduce fm fr l = map (reducePerKey fr) (group (map fm l)) reducePerKey fr (k,v_list) = (k, (foldl (fr k) [] v_list)) Assume map here is actually concatMap. Argument l is a list documents The result of first map is a list of key-value pairs The function fr takes 3 arguments key, context, current. With currying, this allows for locking the value of “key” for each list during the fold. MapReduce maps a fold over the sorted result of a map!