Presentation is loading. Please wait.

Presentation is loading. Please wait.

MapReduceCS-4513 D-term 20081 MapReduce CS-4513 Distributed Computing Systems (Slides include materials from Operating System Concepts, 7 th ed., by Silbershatz,

Similar presentations


Presentation on theme: "MapReduceCS-4513 D-term 20081 MapReduce CS-4513 Distributed Computing Systems (Slides include materials from Operating System Concepts, 7 th ed., by Silbershatz,"— Presentation transcript:

1 MapReduceCS-4513 D-term 20081 MapReduce CS-4513 Distributed Computing Systems (Slides include materials from Operating System Concepts, 7 th ed., by Silbershatz, Galvin, & Gagne, Distributed Systems: Principles & Paradigms, 2 nd ed. By Tanenbaum and Van Steen, and Modern Operating Systems, 2 nd ed., by Tanenbaum)

2 MapReduceCS-4513 D-term 20082 Why MapReduce An important new model of parallel and distributed computing Particularly for problems dealing with “big data” An abstraction to automate the mechanics of data handling and to let the programmer concentrate on semantics of the problem

3 MapReduceCS-4513 D-term 20083 From Operating System course Three fundamental models of parallel computing –Data Parallelism –Task Parallelism –Pipelined Parallelism Each requires a different set of tools Each requires a different mode of thinking

4 MapReduceCS-4513 D-term 20084 MapReduce A new model Fundamentally different from previous models Shares some elements with each one Promise (hope?) of solving new classes of problems that were previously very tedious to solve Not in textbooks Not in previous Distributed Systems courses at WPI

5 MapReduceCS-4513 D-term 20085 Learning about MapReduce Partition class into four teams Each team responsible for understanding and teaching the rest of the class about one subtopic 30-40 minutes of class time per team Two teams on April 4 Two teams on April 8

6 MapReduceCS-4513 D-term 20086 MapReduce subtopics The abstraction itself and its algorithms Distributed MapReduce Class of problems that MapReduce can help solve Google File System to support MapReduce

7 MapReduceCS-4513 D-term 20087 MapReduce abstraction Explain the abstraction, what it does, etc. Explain the algorithms Show non-trivial programming examples Focus on how to think about a problem

8 MapReduceCS-4513 D-term 20088 Distributed MapReduce Show how it is naturally distributable and scalable Up to terabytes of data and more Show how mechanics of distribution and parallelization are automated Focus on Performance, Reliability, Fault-tolerance, Failure recovery

9 MapReduceCS-4513 D-term 20089 Classes of problems Identify classes of problems on which to use MapReduce Characterize them Why were they difficult before Why are people so excited about MapReduce Why did Google rewrite 10,000 existing programs in MapReduce form

10 MapReduceCS-4513 D-term 200810 Google File System What is so special about it? How different from traditional file systems How does it help MapReduce Focus on Performance, Reliability, Fault-tolerance, Failure recovery

11 MapReduceCS-4513 D-term 200811 Action items today Form teams (one for each subtopic) Roster to professor Get organized to Do reading Prepare topic

12 MapReduceCS-4513 D-term 200812 References See e-mails See course web page


Download ppt "MapReduceCS-4513 D-term 20081 MapReduce CS-4513 Distributed Computing Systems (Slides include materials from Operating System Concepts, 7 th ed., by Silbershatz,"

Similar presentations


Ads by Google