Presentation is loading. Please wait.

Presentation is loading. Please wait.

MapReduce.

Similar presentations


Presentation on theme: "MapReduce."— Presentation transcript:

1 MapReduce

2 Google and MapReduce Google searches billions of web pages very, very quickly How? It uses a technique called “MapReduce” to distribute the work across a large number of computers, then combine the results This has made MapReduce a very popular approach Hadoop is an open source implementation of MapReduce Unless you work for Google, you will probably use Hadoop

3 How it works List(a, b, c, …).map(x => f(x)) gives List(f(a), f(b), f(c),…) List(a, b, c, …).reduce((x, y) => x  y) gives a  b  c … where  is some binary operator

4 Another view (in Japanese)

5 ForkJoin How does ForkJoin differ from MapReduce?
Answers from stackoverflow: ForkJoin recursively partitions a task into several subtasks, on a single machine. Takes advantage of multiple cores MapReduce only does one big split, with no communication between the parts until the reduce step. Massively scalable. Java fork/join starts quickly and scales well for small inputs (<5MB), but it cannot process larger inputs due to the size restrictions of shared-memory, single node architectures. MapReduce takes tens of seconds to start up, but scales well for much larger inputs (>100MB) on a compute cluster.

6 The End


Download ppt "MapReduce."

Similar presentations


Ads by Google