Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hadoop MapReduce Framework

Similar presentations


Presentation on theme: "Hadoop MapReduce Framework"— Presentation transcript:

1 Hadoop MapReduce Framework
Mr. Sriram

2 Objectives MapReduce Concepts MapReduce Job MapReduce Data Flow
Analyze different use cases where MapReduce is used Differentiate between Traditional way and MapReduce way Learn about Hadoop 2.X MapReduce architecture and components Understand execution flow of YARN MapReduce application Implement basic MapReduce concepts Run a MapReduce Program Understand Input splits concepts in MapReduce Understand MapReduce Job Submission Flow Implement Combiner and Partitioner in MapReduce

3 MapReduce Concepts Introduction to Map Reduce
Functional Programming Concepts Mapper Reducer Driver

4 Introduction to Map Reduce
Hadoop map/Reduce is a software framework for easily writing application which process vast amount of data in-parallel on large clusters of commodity hardware in a reliable, fault-tolerant manner. A Map/Reduce job usually splits the input data-set into independent chunks which are processed by map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and output of the jobs are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks. Map/Reduce framework and HDFS are running on same set of nodes and it allows the framework to effectively schedule tasks on one node where data is already present, resulting in very high aggregate bandwidth across the cluster. This framework consists of a single master Job Tracker/ Resource Manager and one slave Task Tracker / ode Manager per cluster-node.

5 Functional Programming Concepts

6 Mapper

7 Reducer

8 Driver

9 Where MapReduce is used?

10 Traditional Way

11 MapReduce Way

12 Why MapReduce?

13 Solving the problem with MapReduce

14 Hadoop 2.X MapReduce Architecture

15 Hadoop 2.X MapReduce Components

16 Anatomy of a MapReduce Program

17 MapReduce Paradigm

18 Physical Flow of MapReduce Program

19 Physical Flow of MapReduce Program

20 Life Cycle of MapReduce Job
Map function Reduce function Run this program as a MapReduce job

21 Input Splits

22 Relation between input splits and HDFS Blocks

23 MapReduce Job Submission Flow

24 Overview of MapReduce

25 Combiners

26 Combiner

27 Partitioner - Redirecting output from Mapper

28 Revisit – De Identification Architecture

29 Demo 1– Word Count Program
Demo of Word Count Data Program

30 Demo 2– Word Size Word Count Program
Demo of Word Size Word Count Data Program

31 Demo 3– Weather Data Program
Demo of Weather Data Program

32 Demo 4– Patent Data Program
Demo of Patent Data Program

33 Demo 5– Max Temp Data Program
Demo of Max Temp Data Program

34 Demo 6– Average Salary Program
Demo of Average Salary Program

35 Demo 7– DeIdentify Healthcare Program
Demo of DeIdentify Healthcare Program

36 Demo 8– Music Track Program
Demo of Music Track Program

37 Demo 9– Call Center Data Analytics Program
Demo of Callcenter Data Analytics Program

38 MapReduce Job Introduction Job Submission Job Initialization
Task Assignment Task Execution Progress and Status Updates Job Completion

39 Introduction

40 Job Submission

41 Job initialization

42 Job Assignment

43 Job Execution

44 Progress Measure

45 Progress and Status Updates

46 Progress and Status Updates..

47 Job Completion

48 MapReduce Data Flow Input Files Input Format Input Splits
Record Reader Mapper Partition and Shuffle Sort Reduce Output Format Record Writer Output Files

49 MapReduce Data Flow Diagram

50 Input Files

51 Input Format

52 Input Splits

53 Record Reader

54 Mapper

55 Partition and Shuffle

56 Sort

57 Reduce

58 Output Format

59 Record Writer

60 Thank You !!!!!!!!!!!


Download ppt "Hadoop MapReduce Framework"

Similar presentations


Ads by Google