Download presentation
Presentation is loading. Please wait.
1
Hadoop MapReduce Framework
Mr. Sriram
2
Objectives MapReduce Concepts MapReduce Job MapReduce Data Flow
Analyze different use cases where MapReduce is used Differentiate between Traditional way and MapReduce way Learn about Hadoop 2.X MapReduce architecture and components Understand execution flow of YARN MapReduce application Implement basic MapReduce concepts Run a MapReduce Program Understand Input splits concepts in MapReduce Understand MapReduce Job Submission Flow Implement Combiner and Partitioner in MapReduce
3
MapReduce Concepts Introduction to Map Reduce
Functional Programming Concepts Mapper Reducer Driver
4
Introduction to Map Reduce
Hadoop map/Reduce is a software framework for easily writing application which process vast amount of data in-parallel on large clusters of commodity hardware in a reliable, fault-tolerant manner. A Map/Reduce job usually splits the input data-set into independent chunks which are processed by map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and output of the jobs are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks. Map/Reduce framework and HDFS are running on same set of nodes and it allows the framework to effectively schedule tasks on one node where data is already present, resulting in very high aggregate bandwidth across the cluster. This framework consists of a single master Job Tracker/ Resource Manager and one slave Task Tracker / ode Manager per cluster-node.
5
Functional Programming Concepts
6
Mapper
7
Reducer
8
Driver
9
Where MapReduce is used?
10
Traditional Way
11
MapReduce Way
12
Why MapReduce?
13
Solving the problem with MapReduce
14
Hadoop 2.X MapReduce Architecture
15
Hadoop 2.X MapReduce Components
16
Anatomy of a MapReduce Program
17
MapReduce Paradigm
18
Physical Flow of MapReduce Program
19
Physical Flow of MapReduce Program
20
Life Cycle of MapReduce Job
Map function Reduce function Run this program as a MapReduce job
21
Input Splits
22
Relation between input splits and HDFS Blocks
23
MapReduce Job Submission Flow
24
Overview of MapReduce
25
Combiners
26
Combiner
27
Partitioner - Redirecting output from Mapper
28
Revisit – De Identification Architecture
29
Demo 1– Word Count Program
Demo of Word Count Data Program
30
Demo 2– Word Size Word Count Program
Demo of Word Size Word Count Data Program
31
Demo 3– Weather Data Program
Demo of Weather Data Program
32
Demo 4– Patent Data Program
Demo of Patent Data Program
33
Demo 5– Max Temp Data Program
Demo of Max Temp Data Program
34
Demo 6– Average Salary Program
Demo of Average Salary Program
35
Demo 7– DeIdentify Healthcare Program
Demo of DeIdentify Healthcare Program
36
Demo 8– Music Track Program
Demo of Music Track Program
37
Demo 9– Call Center Data Analytics Program
Demo of Callcenter Data Analytics Program
38
MapReduce Job Introduction Job Submission Job Initialization
Task Assignment Task Execution Progress and Status Updates Job Completion
39
Introduction
40
Job Submission
41
Job initialization
42
Job Assignment
43
Job Execution
44
Progress Measure
45
Progress and Status Updates
46
Progress and Status Updates..
47
Job Completion
48
MapReduce Data Flow Input Files Input Format Input Splits
Record Reader Mapper Partition and Shuffle Sort Reduce Output Format Record Writer Output Files
49
MapReduce Data Flow Diagram
50
Input Files
51
Input Format
52
Input Splits
53
Record Reader
54
Mapper
55
Partition and Shuffle
56
Sort
57
Reduce
58
Output Format
59
Record Writer
60
Thank You !!!!!!!!!!!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.