Hadoop MapReduce Framework

Hadoop MapReduce Framework
Mr. Sriram

Objectives MapReduce Concepts MapReduce Job MapReduce Data Flow
Analyze different use cases where MapReduce is used Differentiate between Traditional way and MapReduce way Learn about Hadoop 2.X MapReduce architecture and components Understand execution flow of YARN MapReduce application Implement basic MapReduce concepts Run a MapReduce Program Understand Input splits concepts in MapReduce Understand MapReduce Job Submission Flow Implement Combiner and Partitioner in MapReduce

MapReduce Concepts Introduction to Map Reduce
Functional Programming Concepts Mapper Reducer Driver

Introduction to Map Reduce
Hadoop map/Reduce is a software framework for easily writing application which process vast amount of data in-parallel on large clusters of commodity hardware in a reliable, fault-tolerant manner. A Map/Reduce job usually splits the input data-set into independent chunks which are processed by map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and output of the jobs are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks. Map/Reduce framework and HDFS are running on same set of nodes and it allows the framework to effectively schedule tasks on one node where data is already present, resulting in very high aggregate bandwidth across the cluster. This framework consists of a single master Job Tracker/ Resource Manager and one slave Task Tracker / ode Manager per cluster-node.

Functional Programming Concepts

Mapper

Reducer

Driver

Where MapReduce is used?

Traditional Way

MapReduce Way

Why MapReduce?

Solving the problem with MapReduce

Hadoop 2.X MapReduce Architecture

Hadoop 2.X MapReduce Components

Anatomy of a MapReduce Program

MapReduce Paradigm

Physical Flow of MapReduce Program

Life Cycle of MapReduce Job
Map function Reduce function Run this program as a MapReduce job

Input Splits

Relation between input splits and HDFS Blocks

MapReduce Job Submission Flow

Overview of MapReduce

Combiners

Combiner

Partitioner - Redirecting output from Mapper

Revisit – De Identification Architecture

Demo 1– Word Count Program
Demo of Word Count Data Program

Demo 2– Word Size Word Count Program
Demo of Word Size Word Count Data Program

Demo 3– Weather Data Program
Demo of Weather Data Program

Demo 4– Patent Data Program
Demo of Patent Data Program

Demo 5– Max Temp Data Program
Demo of Max Temp Data Program

Demo 6– Average Salary Program
Demo of Average Salary Program

Demo 7– DeIdentify Healthcare Program
Demo of DeIdentify Healthcare Program

Demo 8– Music Track Program
Demo of Music Track Program

Demo 9– Call Center Data Analytics Program
Demo of Callcenter Data Analytics Program

MapReduce Job Introduction Job Submission Job Initialization
Task Assignment Task Execution Progress and Status Updates Job Completion

Introduction

Job Submission

Job initialization

Job Assignment

Job Execution

Progress Measure

Progress and Status Updates

Progress and Status Updates..

Job Completion

MapReduce Data Flow Input Files Input Format Input Splits
Record Reader Mapper Partition and Shuffle Sort Reduce Output Format Record Writer Output Files

MapReduce Data Flow Diagram

Input Files

Input Format

Input Splits

Record Reader

Mapper

Partition and Shuffle

Reduce

Output Format

Record Writer

Thank You !!!!!!!!!!!

Hadoop MapReduce Framework

Similar presentations

Presentation on theme: "Hadoop MapReduce Framework"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hadoop MapReduce Framework

Similar presentations

Presentation on theme: "Hadoop MapReduce Framework"— Presentation transcript:

Similar presentations

About project

Feedback