Download presentation
Presentation is loading. Please wait.
1
MapReduce
2
Google and MapReduce Google searches billions of web pages very, very quickly How? It uses a technique called “MapReduce” to distribute the work across a large number of computers, then combine the results This has made MapReduce a very popular approach Hadoop is an open source implementation of MapReduce Unless you work for Google, you will probably use Hadoop
3
How it works List(a, b, c, …).map(x => f(x)) gives List(f(a), f(b), f(c),…) List(a, b, c, …).reduce((x, y) => x y) gives a b c … where is some binary operator
4
Another view (in Japanese)
5
ForkJoin How does ForkJoin differ from MapReduce?
Answers from stackoverflow: ForkJoin recursively partitions a task into several subtasks, on a single machine. Takes advantage of multiple cores MapReduce only does one big split, with no communication between the parts until the reduce step. Massively scalable. Java fork/join starts quickly and scales well for small inputs (<5MB), but it cannot process larger inputs due to the size restrictions of shared-memory, single node architectures. MapReduce takes tens of seconds to start up, but scales well for much larger inputs (>100MB) on a compute cluster.
6
The End
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.