Download presentation
Presentation is loading. Please wait.
Published bySabrina Bell Modified over 8 years ago
1
MSBIC Hadoop Series Implementing MapReduce Jobs Bryan Smith email: bryan.smith@microsoft.com twitter: @smithbryanc
2
MSBIC Hadoop Series http://msbic.sqlpass.org/ Learn the basics of Hadoop through a combination of demonstration and lecture. Session participants are invited to follow along leveraging emulation environments and Azure-based clusters, the setting up of which we will address in our first session. March – Getting StartedAugust – On Vacation April – Understanding the File SystemSeptember – Hadoop & MS BI May – Implementing MapReduce Jobs October – To Be Announced June – Querying the Data with Hive November – Loading Social Media Data July – Processing the Data with PigDecember – DW Integration
3
Today’s Session Objectives: 1.Understand Basics of MapReduce 2.Implement a MapReduce Job 3.Introduce Tez
4
Sample File How Many Evens & Odds? 123456789123456789 odd even odd even odd even odd even odd Step 1 odd {1,3,5,7, 9} even{2,4,6,8} Step 2 keyvalue[ ] map( ) Step 3 odd5 even4 reduce( )
5
Sample Files Name Node Data Node XYZ Job Map Task Reduc e Task P0P0 P1P1
6
Implementing MapReduce using.NET Add the following packages: Microsoft.NET Map Reduce API for Hadoop Microsoft.NET API for Hadoop WebClient Windows Azure Storage (if running against Azure HDInsight) Add the following directives: using Microsoft.Hadoop; using Microsoft.Hadoop.MapReduce; using Microsoft.Hadoop.WebClient.WebHCatClient; If running against Azure HDInsight, change project’s Target Platform to x64
7
MapReduce Demo
8
Goodbye, MapReduce Distributable for Scale Resistant to Failure “Easy” to Program Disk Liberal/Memory Conservative Rigid Step Sequencing
9
MapReduce as a Graph Map Reduce Map Reduce Vertex Edge
10
Tez: An Alternative Model Vertex Directed Acyclic Graph (DAG) Vertex Edge
11
MapReduce vs. Tez MapReduce Focused on Disk Rigid, Linear Step Sequencing Supports Hadoop Streaming Tez Focused on Memory Flexible, Parallel Step Sequencing ???
12
Guidance on MapReduce & Tez Most of your work will be at higher levels, i.e. Pig & Hive Movement from MapReduce will benefit performance & be transparent to you Apache Tez in HDP 2.1 HDInsight lags a few months Microsoft a key contributor
13
Today’s Session Objectives: 1.Understand Basics of MapReduce 2.Implement a MapReduce Job 3.Introduce Tez
14
For Next Session Topic: Querying Data with Hive Implement a Hive table and query it using HQL Requested Action(s): Come with working HDInsight Emulator Load sample data sets into HDFS on Emulator
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.