HDInsight on Azure and Map-Reduce Richard Conway Windows Azure MVP Elastacloud Limited.

Slides:



Advertisements
Similar presentations
Large Scale Computing Systems
Advertisements

Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
R and HDInsight in Microsoft Azure
Senior Project Manager & Architect Love Your Data.
Setting Big Data Capabilities Free How to Make Business on Big Data? Stig Torngaard, Partner Platon.
MICROSOFT BIG DATA. WHAT IS BIG DATA? How do I optimize my fleet based on weather and traffic patterns? SOCIAL & WEB ANALYTICS LIVE DATA FEEDS ADVANCED.
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Running Hadoop-as-a-Service in the Cloud
Transform + analyze Visualize + decide Capture + manage Dat a.
ETM Hadoop. ETM IDC estimate put the size of the “digital universe” at zettabytes in forecasting a tenfold growth by 2011 to.
25 Need-to-Know Facts. Fact 1 Every 2 days we create as much information as we did from the beginning of time until 2003 [Source]Source © 2014 Bernard.
Fraud Detection in Banking using Big Data By Madhu Malapaka For ISACA, Hyderabad Chapter Date: 14 th Dec 2014 Wilshire Software.
Big Data A big step towards innovation, competition and productivity.
Business Intelligence Overview Marc Schöni Technical Solution Professional | Business Intelligence Microsoft Switzerland.
BIG DATA – WHAT’S THE BIG DEAL The call would start soon, please be on mute. Thanks for your time and patience.
Hadoop on Azure 101 What is the Big Deal? Dennis Mulder Solution Architect Microsoft Corporation.
Introduction to Big Data and Hadoop Name Title Microsoft Corporation.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Cross Platform Mobile Backend with Mobile Services James
WRF in the Cloud: An introduction to Big Compute on Windows Azure Wenming Ye Research Program Manager Microsoft Research
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
Alternatives to the shackles of limited scale in data solutions Andy Cross Windows Azure MVP Elastacloud.
Frankie Pike. 2010: 1.2 zettabytes 1.2 trillion gigabytes DVDs past the moon 2-way = 6 newspapers everyday ~58% growth per year Why care?
Big Data. What is Big Data? Big Data Analytics: 11 Case Histories and Success Stories
Software Architecture
© Hortonworks Inc Hortonworks Page 1. © Hortonworks Inc Big Data Changes the Game Megabytes Gigabytes Terabytes Petabytes Purchase detail.
Server Files Server RUNTIME Code.
© 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche.
Why I LIKE the Facebook Database… Sharon Viente May 2010.
An Introduction to HDInsight June 27 th,
Learn Big Data Application Development on Windows Azure Wen-ming Ye (叶文铭 ) Sr. Technical Evangelist Microsoft Corporation.
4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
CONNECTING PHONE APPLICATIONS TO THE CLOUD Nick Randolph (Built to Roam) SESSION CODE: COS-WPH208 (c) 2011 Microsoft. All rights reserved.
How* to Win the #BestMicrosoftHack Shahed Chowdhuri Sr. Technical WakeUpAndCode.com *Hint: Use the Cloud.
Sofia Event Center ноември 2013 г. Маги Наумова/ Боряна Петрова.
Breaking points of traditional approach What if you could handle big data?
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Unlock your Big Data with Analytics and BI on Office365 Brian Culver ● SharePoint Fest Denver ● SPT 104 ● March 1-3, 2016.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
Azure HDInsight And Excel Analyze unstructured data at scale, then visualize! George Walters Sr. Technical Solutions Professional, Data Platform Microsoft.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
This is a free Course Available on Hadoop-Skills.com.
@nmoneypenny Innovating New Products & Services with Enterprise Social Graphing: Naomi Moneypenny.
B ig D ata Analysis for Page Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.
Big Data Anton Boyko. Agenda What is Big Data? Why Big Data? How to Big Data?
Microsoft Partner since 2011
Big Data for the SQL Eye Cindy Look, it’s SQL! SELECT score, fun FROM toDo WHERE type = 'they pay me for
MSBIC Hadoop Series Implementing MapReduce Jobs Bryan Smith
Unlock your Big Data with Analytics and BI on Office365 Brian Culver ● SharePoint Fest Seattle● BI102 ● August 18-20, 2015.
818 Connecticut Ave. NW Suite 950 Washington, DC :: Phone: :: Fax: ::
#SQLSat266.
Connected Infrastructure
Fan Engagement Solution
Hadoop Aakash Kag What Why How 1.
BIG Data 25 Need-to-Know Facts.
Hadoopla: Microsoft and the Hadoop Ecosystem
Big Data Dr. Mazin Al-Hakeem (Nov 2016), “Big Data: Reality and Challenges”, LFU – Erbil.
Big Data in the Real World Tampa SQL Saturday #248 November 2013 Mark
Connected Infrastructure
Big Data Programming: an Introduction
Microsoft Connect /22/2018 9:50 PM
Big-Data Analytics with Azure HDInsight
Server & Tools Business
Presentation transcript:

HDInsight on Azure and Map-Reduce Richard Conway Windows Azure MVP Elastacloud Limited

Introduction

Big Data vs Big Compute

Compute Bound IO Bound

All distributed compute works on the basis of taking a large JOB and breaking it to many smaller TASKS which are then parallelised

Hadoop HPC

Understanding Big Data

$100 gets you 3million times more storage in 30 years) MIPS/$ M MIPS/$ >5.5 billion (70+% of global population) >2 Billion users Web traffic Exabyte (10 E18) ZettaByte (10 E21) >10 Billion

Internet of things Audio / Video Log Files Text/Image Social Sentiment Data Market Feeds eGov Feeds Weather Wikis / Blogs Click Stream Sensors / RFID / Devices Spatial & GPS Coordinates WEB 2.0 Mobile Advertisin g CollaborationeCommerce Digital Marketing Search Marketing Web Logs Recommendation s ERP / CRM Sales Pipeline Payables Payroll Inventory Contacts Deal Tracking Terabytes (10E12) Gigabytes (10E9) Exabytes (10E18) Petabytes (10E15) Velocity - Variety - variability Volume ,000$ $ ,000$ $ Storage/GB ERP / CRM WEB 2.0 Internet of things

Big Data, BIG OPPORTUNITY 49% CEOs and CIOs are planning big data projects Software Growth Services Growth 1. McKinsey&Company, McKinsey Global Survey Results, Minding Your Digital Business, IDC Market Analysis, Worldwide Big Data Technology and Services 2012–2015 Forecast, 2012

Invisible devices Trillions of networked nodes Low bandwidth last- mile connection Mostly addressed by local schemes Machine-centricSensing-focus Global addressingUser-centric Communication- focus Laptops / tablets / smartphones Billions of networked devices High-bandwidth access

Big Data Scenarios

Hadoop Distributed Architecture

Server Files Server

RUNTIME Code

TRADITIONAL RDBMSHADOOP Data Size Access Updates Structure Integrity Scaling DBA Ratio

Windows Azure HDInsight Service

Demo

Distributed Storage (HDFS) Query (Hive) Distributed Processing (MapReduce) HDINSIGHT / HADOOP Eco-System Legend Red = Core Hadoop Blue = Data processing Purple = Microsoft integration points and value adds Orange = Data Movement Green = Packages

Storing Data with HDInsight

Front end Stream Layer Partition Layer Name Node de Data Node Front end HDFS API DFS (1 Data Node per Worker Role) and Compute Cluster Azure Storage (ASV) … Azure Blob Storage

Map Reduce Examples in C#

public class FrenchSessionsJob : HadoopJob { public override HadoopJobConfiguration Configure(ExecutorContext context) { var config = new HadoopJobConfiguration() { InputPath = "\"/AllSessions/*.gz\"", OutputFolder = "/FrenchSessions/" }; return config; }

public class FrenchSessionsMapper : MapperBase { public override void Map(string inputLine, MapperContext context) { if (inputLine.Contains("Country=France") { context.IncrementCounter("FrenchSession"); context.EmitKeyValue("FR", "1"); }

public class SessionsReducer : ReducerCombinerBase { public override void Reduce(string key, IEnumerable values, ReducerContext context) { context.EmitKeyValue(key, values.Count()); }

Demo

t/Map-Reduce HDInsight Lab.pdf

Questions?