Download presentation
Presentation is loading. Please wait.
1
Based on Lin and Dryer’s text: Chapter 3
2
Figure 2.6
4
A programmer has no control over: ◦ Where a mapper or reducer runs (i.e., on which node in the cluster). ◦ When a mapper or reducer begins or finishes. ◦ Which input key-value pairs are processed by a specific mapper. ◦ Which intermediate key-value pairs are processed by a specific reducer.
5
Ability to: Construct complex data types as keys and values for storage, processing and communications Specify and execute initialization code before a map and/or reduce and the same for termination code after map and/or reduce. To preserve state across multiple keys in map and/or in the reduce To control sorting order of intermediate keys To control partitioning of key space, and thus the set of keys a particular reduce will process
6
Address the issues without creating bottleneck for scalability ◦ Golden standard that MR attempts is sheer linear scalability ◦ Storing and manipulating state has the potential of hindering scalability How to improve performance? ◦ Make the functions efficient? ◦ Transfer of intermediate data efficient ◦ Aggregation of intermediate data is an important operation for efficiency ◦ Shrink the intermediate key space ◦ What else can we do?
7
http://hadoop.apache.org/common/docs/stable/api/org/apa che/hadoop/mapreduce/Mapper.html http://hadoop.apache.org/common/docs/stable/api/org/apa che/hadoop/mapreduce/Mapper.html http://hadoop.apache.org/common/docs/stable/api/org/apa che/hadoop/mapred/package-summary.html http://hadoop.apache.org/common/docs/stable/api/org/apa che/hadoop/mapred/package-summary.html http://www.slideshare.net/sh1mmer/upgrading-to-the-new- map-reduce-api http://www.slideshare.net/sh1mmer/upgrading-to-the-new- map-reduce-api
8
class Mapper method Map(docid a, doc d) H ← new AssociativeArray for all term t ∈ doc d do H{t} ← H{t} + 1 //Tally counts for entire document for all term t ∈ H do Emit(term t, count H{t})
9
class Mapper method Initialize H ← new AssociativeArray method Map(docid a, doc d) for all term t ∈ doc d do H{t} ← H{t} + 1 Tally counts across documents method Close for all term t ∈ H do Emit(term t, count H{t})
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.