MapReduce.

MapReduce

Google and MapReduce Google searches billions of web pages very, very quickly How? It uses a technique called “MapReduce” to distribute the work across a large number of computers, then combine the results This has made MapReduce a very popular approach Hadoop is an open source implementation of MapReduce Unless you work for Google, you will probably use Hadoop

How it works List(a, b, c, …).map(x => f(x)) gives List(f(a), f(b), f(c),…) List(a, b, c, …).reduce((x, y) => x  y) gives a  b  c … where  is some binary operator

Another view (in Japanese)

ForkJoin How does ForkJoin differ from MapReduce?
Answers from stackoverflow: ForkJoin recursively partitions a task into several subtasks, on a single machine. Takes advantage of multiple cores MapReduce only does one big split, with no communication between the parts until the reduce step. Massively scalable. Java fork/join starts quickly and scales well for small inputs (<5MB), but it cannot process larger inputs due to the size restrictions of shared-memory, single node architectures. MapReduce takes tens of seconds to start up, but scales well for much larger inputs (>100MB) on a compute cluster.

The End

MapReduce.

Similar presentations

Presentation on theme: "MapReduce."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MapReduce.

Similar presentations

Presentation on theme: "MapReduce."— Presentation transcript:

Similar presentations

About project

Feedback