Presentation is loading. Please wait.

Presentation is loading. Please wait.

MapReduce 資工碩一 黃威凱. Outline Purpose Example Method Advanced 資工碩一 黃威凱.

Similar presentations


Presentation on theme: "MapReduce 資工碩一 黃威凱. Outline Purpose Example Method Advanced 資工碩一 黃威凱."— Presentation transcript:

1 MapReduce 資工碩一 黃威凱

2 Outline Purpose Example Method Advanced 資工碩一 黃威凱

3 PURPOSE

4 Purpose Data mining Data processing 資工碩一 黃威凱

5 EXAMPLE

6 Example Find the maximum temperature of year National Climatic Data Center(NCDC) ◦ The data is stored using a line-oriented ASCII format, in which each line is a record ◦ There is a directory for each year from 1901 to 2001,each containing a gzipped file for each weather station with its readings for that year 資工碩一 黃威凱

7 Example(Data format) 資工碩一 黃威凱

8 Example (Gzipped file, example for 1990) ◦ % ls raw/1990 | head ◦ 010010-99999-1990.gz ◦ 010014-99999-1990.gz ◦ 010015-99999-1990.gz ◦ 010016-99999-1990.gz ◦ 010017-99999-1990.gz ◦ 010030-99999-1990.gz ◦ 010040-99999-1990.gz ◦ 010080-99999-1990.gz ◦ 010100-99999-1990.gz ◦ 010150-99999-1990.gz 資工碩一 黃威凱

9 METHOD

10 Method Analzing the data with Unix tools Analzing the data with Hadoop 資工碩一 黃威凱

11 Method(Unix tools) 資工碩一 黃威凱

12 Method(Unix tools) Here is the beginning of a run: ◦ %./max_temperature.sh ◦ 1901 317 ◦ 1902 244 ◦ 1903 289 ◦ 1904 256 ◦ 1905 283 ◦... The complete run for the century took 42 minutes in one run single EC2 High-CPU Extra Large Instance. 資工碩一 黃威凱

13 Method(Hadoop) Use MapReduce ◦ Map  Shuffle ◦ Reduce 資工碩一 黃威凱

14 Method(Hadoop) Map function ◦ Pull out the year and the air temperature ◦ Transform key-value pairs 資工碩一 黃威凱

15 Method(Hadoop) Map function ◦ The shuffle  Each reduce task is fed by many map tasks. 資工碩一 黃威凱

16 Method(Hadoop) Reduce function ◦ Iterate through the list and pick up the maximum reading ◦ Input  (1949, [111, 78])  (1950, [0, 22, -11]) ◦ Output:  (1949, 111)  (1950, 22) 資工碩一 黃威凱

17 Method(Hadoop) Data flow 資工碩一 黃威凱

18 Method(Hadoop) Java MapReduce-Mapper example 資工碩一 黃威凱

19 Method(Hadoop) Java MapReduce-Reduce example 資工碩一 黃威凱

20 Method(Hadoop) Java MapReduce-Job example 資工碩一 黃威凱

21 ADVANCED

22 Advanced Case1

23 Advanced Case2 資工碩一 黃威凱

24 Advanced Case3 資工碩一 黃威凱

25 Advanced Combiner Functions on Map output ◦ Example  Map input1: (1950, 0), (1950, 20), (1950, 10)  Map input2: (1950, 25), (1950, 15)  After shuffle:  Map1: (1950, [0,20,10])  Map2: (1950, [25,15])  No Use Combiner to reduce input  (1950, [0, 20, 10, 25, 15])  Use Combiner to reduce input  (1950, [20, 25]) 資工碩一 黃威凱


Download ppt "MapReduce 資工碩一 黃威凱. Outline Purpose Example Method Advanced 資工碩一 黃威凱."

Similar presentations


Ads by Google