Download presentation
Presentation is loading. Please wait.
Published byBaldwin Mosley Modified over 9 years ago
1
HDInsight on Azure and Map-Reduce Richard Conway Windows Azure MVP Elastacloud Limited
3
Introduction
5
Big Data vs Big Compute
6
Compute Bound IO Bound
7
All distributed compute works on the basis of taking a large JOB and breaking it to many smaller TASKS which are then parallelised
8
Hadoop HPC
9
Understanding Big Data
10
$100 gets you 3million times more storage in 30 years) 1980 10 MIPS/$ 2005 10M MIPS/$ >5.5 billion (70+% of global population) >2 Billion users Web traffic 2010 130 Exabyte (10 E18) 2015 1.6 ZettaByte (10 E21) >10 Billion
11
Internet of things Audio / Video Log Files Text/Image Social Sentiment Data Market Feeds eGov Feeds Weather Wikis / Blogs Click Stream Sensors / RFID / Devices Spatial & GPS Coordinates WEB 2.0 Mobile Advertisin g CollaborationeCommerce Digital Marketing Search Marketing Web Logs Recommendation s ERP / CRM Sales Pipeline Payables Payroll Inventory Contacts Deal Tracking Terabytes (10E12) Gigabytes (10E9) Exabytes (10E18) Petabytes (10E15) Velocity - Variety - variability Volume 1980 190,000$ 2010 0.07$ 1990 9,000$ 2000 15$ Storage/GB ERP / CRM WEB 2.0 Internet of things
12
Big Data, BIG OPPORTUNITY 49% CEOs and CIOs are planning big data projects Software Growth Services Growth 1. McKinsey&Company, McKinsey Global Survey Results, Minding Your Digital Business, 2012 2. IDC Market Analysis, Worldwide Big Data Technology and Services 2012–2015 Forecast, 2012
13
Invisible devices Trillions of networked nodes Low bandwidth last- mile connection Mostly addressed by local schemes Machine-centricSensing-focus Global addressingUser-centric Communication- focus Laptops / tablets / smartphones Billions of networked devices High-bandwidth access
14
Big Data Scenarios
16
Hadoop Distributed Architecture
17
Server Files Server
18
RUNTIME Code
19
TRADITIONAL RDBMSHADOOP Data Size Access Updates Structure Integrity Scaling DBA Ratio
20
Windows Azure HDInsight Service
21
Demo
22
Distributed Storage (HDFS) Query (Hive) Distributed Processing (MapReduce) HDINSIGHT / HADOOP Eco-System Legend Red = Core Hadoop Blue = Data processing Purple = Microsoft integration points and value adds Orange = Data Movement Green = Packages
23
Storing Data with HDInsight
24
Front end Stream Layer Partition Layer Name Node de Data Node Front end HDFS API DFS (1 Data Node per Worker Role) and Compute Cluster Azure Storage (ASV) … Azure Blob Storage
26
Map Reduce Examples in C#
32
public class FrenchSessionsJob : HadoopJob { public override HadoopJobConfiguration Configure(ExecutorContext context) { var config = new HadoopJobConfiguration() { InputPath = "\"/AllSessions/*.gz\"", OutputFolder = "/FrenchSessions/" }; return config; }
33
public class FrenchSessionsMapper : MapperBase { public override void Map(string inputLine, MapperContext context) { if (inputLine.Contains("Country=France") { context.IncrementCounter("FrenchSession"); context.EmitKeyValue("FR", "1"); }
34
public class SessionsReducer : ReducerCombinerBase { public override void Reduce(string key, IEnumerable values, ReducerContext context) { context.EmitKeyValue(key, values.Count()); }
35
Demo
37
https://elastastorage.blob.core.windows.net/hdinsigh t/Map-Reduce HDInsight Lab.pdf
38
Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.