Presentation is loading. Please wait.

Presentation is loading. Please wait.

Based on Lin and Dryer’s text: Chapter 3.  Figure 2.6.

Similar presentations


Presentation on theme: "Based on Lin and Dryer’s text: Chapter 3.  Figure 2.6."— Presentation transcript:

1 Based on Lin and Dryer’s text: Chapter 3

2  Figure 2.6

3

4  A programmer has no control over: ◦ Where a mapper or reducer runs (i.e., on which node in the cluster). ◦ When a mapper or reducer begins or finishes. ◦ Which input key-value pairs are processed by a specific mapper. ◦ Which intermediate key-value pairs are processed by a specific reducer.

5  Ability to:  Construct complex data types as keys and values for storage, processing and communications  Specify and execute initialization code before a map and/or reduce and the same for termination code after map and/or reduce.  To preserve state across multiple keys in map and/or in the reduce  To control sorting order of intermediate keys  To control partitioning of key space, and thus the set of keys a particular reduce will process

6  Address the issues without creating bottleneck for scalability ◦ Golden standard that MR attempts is sheer linear scalability ◦ Storing and manipulating state has the potential of hindering scalability  How to improve performance? ◦ Make the functions efficient? ◦ Transfer of intermediate data efficient ◦ Aggregation of intermediate data is an important operation for efficiency ◦ Shrink the intermediate key space ◦ What else can we do?

7  http://hadoop.apache.org/common/docs/stable/api/org/apa che/hadoop/mapreduce/Mapper.html http://hadoop.apache.org/common/docs/stable/api/org/apa che/hadoop/mapreduce/Mapper.html  http://hadoop.apache.org/common/docs/stable/api/org/apa che/hadoop/mapred/package-summary.html http://hadoop.apache.org/common/docs/stable/api/org/apa che/hadoop/mapred/package-summary.html  http://www.slideshare.net/sh1mmer/upgrading-to-the-new- map-reduce-api http://www.slideshare.net/sh1mmer/upgrading-to-the-new- map-reduce-api

8 class Mapper method Map(docid a, doc d) H ← new AssociativeArray for all term t ∈ doc d do H{t} ← H{t} + 1 //Tally counts for entire document for all term t ∈ H do Emit(term t, count H{t})

9 class Mapper method Initialize H ← new AssociativeArray method Map(docid a, doc d) for all term t ∈ doc d do H{t} ← H{t} + 1 Tally counts across documents method Close for all term t ∈ H do Emit(term t, count H{t})


Download ppt "Based on Lin and Dryer’s text: Chapter 3.  Figure 2.6."

Similar presentations


Ads by Google