Download presentation
Presentation is loading. Please wait.
Published byAlexandrina Pearson Modified over 9 years ago
1
Lecture 2 – MapReduce: Theory and Implementation CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.
2
Last Class How do I process lots of data? Distribute the work Can I distribute the work? Maybe… if it’s not dependent on other tasks Example: Fibonnaci.
3
Last Class What problems can occur? Large tasks Unpredictable bugs Machine failure How do solve / avoid these? Break up into small chunks? Restart tasks? Use known working solutions
4
MapReduce Concept from functional programming Implemented by Google Applied to large number of problems
5
Functional Programming Review Java: int fooA(String[] list) { return bar1(list) + bar2(list); } int fooB(String[] list) { return bar2(list) + bar1(list); } Do they give the same result?
6
Functional Programming Review Functional Programming: fun fooA(l: int list) = bar1(l) + bar2(l) fun fooB(l: int list) = bar2(l) + bar1(l) Do they give the same result?
7
Functional Programming Review Operations do not modify data structures: They always create new ones Original data still exists in unmodified form
8
Functional Updates Do Not Modify Structures fun foo(x, lst) = let lst' = reverse lst in reverse ( x :: lst' ) foo: a’ -> a’ list -> a’ list The foo() function above reverses a list, adds a new element to the front, and returns all of that, reversed, which appends an item. But it never modifies lst!
9
Functions Can Be Used As Arguments fun DoDouble(f, x) = f (f x) It does not matter what f does to its argument; DoDouble() will do it twice. What is the type of this function? x: a’ f: a’ -> a’ DoDouble: (a’ -> a’) -> a’ -> a’
10
map (Functional Programming) Creates a new list by applying f to each element of the input list; returns output in order. map f lst: (’a->’b) -> (’a list) -> (’b list)
11
map Implementation This implementation moves left-to-right across the list, mapping elements one at a time … But does it need to? fun map f [] = [] | map f (x::xs) = (f x) :: (map f xs)
12
Implicit Parallelism In map In a purely functional setting, elements of a list being computed by map cannot see the effects of the computations on other elements If order of application of f to elements in list is commutative, we can reorder or parallelize execution This is the “secret” that MapReduce exploits
13
Fold Moves across a list, applying f to each element plus an accumulator. f returns the next accumulator value, which is combined with the next element of the list fold f x0 lst: ('a*'b->'b)->'b->('a list)->'b
14
fold left vs. fold right Order of list elements can be significant Fold left moves left-to-right across the list Fold right moves from right-to-left SML Implementation: fun foldl f a [] = a | foldl f a (x::xs) = foldl f (f(x, a)) xs fun foldr f a [] = a | foldr f a (x::xs) = f(x, (foldr f a xs))
15
Example fun foo(l: int list) = sum(l) + mul(l) + length(l) How can we implement this?
16
Example (Solved) fun foo(l: int list) = sum(l) + mul(l) + length(l) fun sum(lst) = foldl (fn (x,a)=>x+a) 0 lst fun mul(lst) = foldl (fn (x,a)=>x*a) 1 lst fun length(lst) = foldl (fn (x,a)=>1+a) 0 lst
17
Google MapReduce Input Handling Map function Partition Function Compare Function Reduce Function Output Writer
18
Input Handling Divides up data into bite-size chunks Starts up tasks Assigns tasks to idle workers
19
Map Input: Key, Value pair Output: Key, Value pairs Example: Annual Rainfall Per City
20
Map (Example) Example: Annual Rainfall Per City map(String key, String value): // key: date // value: weather info foreach (City c in value) EmitIntermediate(c, c.temperature)
21
Partition Function Allocates map output to particular reduces Input: key, number of reduces Output: Index of desired reduce Typical: hash(key) % numberOfReduces
22
Comparison Sorts input for each reduce Example: Annual rainfall per city Sorts rainfall data for each city Seattle: {0, 0, 0, 1, 4, 7, 10, …}
23
Reduce Input: Key, Sorted list of values Output: Single value Example: Annual rainfall per city
24
Reduce Input: Key, Sorted list of values Output: Single value Example: Annual rainfall per city
25
Reduce (Example) Example: Annual rainfall per city reduce(String key, Iterator values): // key: city // values: temperature sum = 0, count = 0 for each (v in values) sum += v count = count + 1 Emit(sum / count)
26
Output Writes the output to storage (GFS, etc)
28
MapReduce for Google Local Intersections Rendering Tiles Finding nearest gas stations
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.