MapReduceCS-4513 D-term MapReduce CS-4513 Distributed Computing Systems (Slides include materials from Operating System Concepts, 7 th ed., by Silbershatz, Galvin, & Gagne, Distributed Systems: Principles & Paradigms, 2 nd ed. By Tanenbaum and Van Steen, and Modern Operating Systems, 2 nd ed., by Tanenbaum)
MapReduceCS-4513 D-term Why MapReduce An important new model of parallel and distributed computing Particularly for problems dealing with “big data” An abstraction to automate the mechanics of data handling and to let the programmer concentrate on semantics of the problem
MapReduceCS-4513 D-term From Operating System course Three fundamental models of parallel computing –Data Parallelism –Task Parallelism –Pipelined Parallelism Each requires a different set of tools Each requires a different mode of thinking
MapReduceCS-4513 D-term MapReduce A new model Fundamentally different from previous models Shares some elements with each one Promise (hope?) of solving new classes of problems that were previously very tedious to solve Not in textbooks Not in previous Distributed Systems courses at WPI
MapReduceCS-4513 D-term Learning about MapReduce Partition class into four teams Each team responsible for understanding and teaching the rest of the class about one subtopic minutes of class time per team Two teams on April 4 Two teams on April 8
MapReduceCS-4513 D-term MapReduce subtopics The abstraction itself and its algorithms Distributed MapReduce Class of problems that MapReduce can help solve Google File System to support MapReduce
MapReduceCS-4513 D-term MapReduce abstraction Explain the abstraction, what it does, etc. Explain the algorithms Show non-trivial programming examples Focus on how to think about a problem
MapReduceCS-4513 D-term Distributed MapReduce Show how it is naturally distributable and scalable Up to terabytes of data and more Show how mechanics of distribution and parallelization are automated Focus on Performance, Reliability, Fault-tolerance, Failure recovery
MapReduceCS-4513 D-term Classes of problems Identify classes of problems on which to use MapReduce Characterize them Why were they difficult before Why are people so excited about MapReduce Why did Google rewrite 10,000 existing programs in MapReduce form
MapReduceCS-4513 D-term Google File System What is so special about it? How different from traditional file systems How does it help MapReduce Focus on Performance, Reliability, Fault-tolerance, Failure recovery
MapReduceCS-4513 D-term Action items today Form teams (one for each subtopic) Roster to professor Get organized to Do reading Prepare topic
MapReduceCS-4513 D-term References See s See course web page