Download presentation
Presentation is loading. Please wait.
Published byLinda Newman Modified over 9 years ago
1
MapReduce 資工碩一 黃威凱
2
Outline Purpose Example Method Advanced 資工碩一 黃威凱
3
PURPOSE
4
Purpose Data mining Data processing 資工碩一 黃威凱
5
EXAMPLE
6
Example Find the maximum temperature of year National Climatic Data Center(NCDC) ◦ The data is stored using a line-oriented ASCII format, in which each line is a record ◦ There is a directory for each year from 1901 to 2001,each containing a gzipped file for each weather station with its readings for that year 資工碩一 黃威凱
7
Example(Data format) 資工碩一 黃威凱
8
Example (Gzipped file, example for 1990) ◦ % ls raw/1990 | head ◦ 010010-99999-1990.gz ◦ 010014-99999-1990.gz ◦ 010015-99999-1990.gz ◦ 010016-99999-1990.gz ◦ 010017-99999-1990.gz ◦ 010030-99999-1990.gz ◦ 010040-99999-1990.gz ◦ 010080-99999-1990.gz ◦ 010100-99999-1990.gz ◦ 010150-99999-1990.gz 資工碩一 黃威凱
9
METHOD
10
Method Analzing the data with Unix tools Analzing the data with Hadoop 資工碩一 黃威凱
11
Method(Unix tools) 資工碩一 黃威凱
12
Method(Unix tools) Here is the beginning of a run: ◦ %./max_temperature.sh ◦ 1901 317 ◦ 1902 244 ◦ 1903 289 ◦ 1904 256 ◦ 1905 283 ◦... The complete run for the century took 42 minutes in one run single EC2 High-CPU Extra Large Instance. 資工碩一 黃威凱
13
Method(Hadoop) Use MapReduce ◦ Map Shuffle ◦ Reduce 資工碩一 黃威凱
14
Method(Hadoop) Map function ◦ Pull out the year and the air temperature ◦ Transform key-value pairs 資工碩一 黃威凱
15
Method(Hadoop) Map function ◦ The shuffle Each reduce task is fed by many map tasks. 資工碩一 黃威凱
16
Method(Hadoop) Reduce function ◦ Iterate through the list and pick up the maximum reading ◦ Input (1949, [111, 78]) (1950, [0, 22, -11]) ◦ Output: (1949, 111) (1950, 22) 資工碩一 黃威凱
17
Method(Hadoop) Data flow 資工碩一 黃威凱
18
Method(Hadoop) Java MapReduce-Mapper example 資工碩一 黃威凱
19
Method(Hadoop) Java MapReduce-Reduce example 資工碩一 黃威凱
20
Method(Hadoop) Java MapReduce-Job example 資工碩一 黃威凱
21
ADVANCED
22
Advanced Case1
23
Advanced Case2 資工碩一 黃威凱
24
Advanced Case3 資工碩一 黃威凱
25
Advanced Combiner Functions on Map output ◦ Example Map input1: (1950, 0), (1950, 20), (1950, 10) Map input2: (1950, 25), (1950, 15) After shuffle: Map1: (1950, [0,20,10]) Map2: (1950, [25,15]) No Use Combiner to reduce input (1950, [0, 20, 10, 25, 15]) Use Combiner to reduce input (1950, [20, 25]) 資工碩一 黃威凱
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.