Data analytics with Hadoop In the Microsoft Azure cloud

Slides:



Advertisements
Similar presentations
HADOOP + JAVASCRIPT: WHAT WE LEARNED Asad Khan Senior Program Manager Microsoft 1.
Advertisements

Senior Project Manager & Architect Love Your Data.
Running Hadoop-as-a-Service in the Cloud
Transform + analyze Visualize + decide Capture + manage Dat a.
Hadoop Ecosystem Overview
Hadoop on Azure 101 What is the Big Deal? Dennis Mulder Solution Architect Microsoft Corporation.
.NET, Visual Studio, TFS + Git | Java, NodeJS, PHP, Python, Ruby, C++ Data SQL Databases NoSQL Tables Blob Storage HDInsight Window s Azure IaaS +
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
fs.azure.account.key.accountname enterthekeyvaluehere.
An Introduction to HDInsight June 27 th,
Breaking points of traditional approach What if you could handle big data?
AZURE DISTRIBUTED DATA Storage, HDInsight Hadoop, Azure Data Lake.
Azure HDInsight And Excel Analyze unstructured data at scale, then visualize! George Walters Sr. Technical Solutions Professional, Data Platform Microsoft.
Apache Hadoop on Windows Azure Avkash Chauhan
Microsoft Partner since 2011
Microsoft Ignite /28/2017 6:07 PM
BI 202 Data in the Cloud Creating SharePoint 2013 BI Solutions using Azure 6/20/2014 SharePoint Fest NYC.
Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Microsoft Build /9/2017 5:00 AM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Connected Infrastructure
COURSE DETAILS SPARK ONLINE TRAINING COURSE CONTENT
WPC047 Data ON THE ROAD: the Azure part
Deploying Web Application
PROTECT | OPTIMIZE | TRANSFORM
Connected Living Connected Living What to look for Architecture
Data Platform and Analytics Foundational Training
Enhancing Data and Predictive Analytics with Azure HDInsight
Partner Logo Veropath Offers a Next-Gen Expense Management SaaS Technology Solution, Built Specifically to Harness Big Data Analytics Capabilities in Azure.
CLOUDERA TRAINING For Apache HBase
Azure SDKs and Tools for You
Cloud Data platform (Cloud Application Development & Deployment)
Connected Living Connected Living What to look for Architecture
Connected Infrastructure
Data Platform and Analytics Foundational Training
9/11/2018 1:44 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Makes Navigating Plants Safer, More Efficient
Cloudy with a Chance of Data
Shubha Vijayasarathy Program Manager, Azure Event Hubs - Microsoft
9/20/ :55 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Enterprise security for big data solutions on Azure HDInsight
07 | Analyzing Big Data with Excel
Overview of Azure Data Lake Store
Running on the Powerful Microsoft Azure Platform,
Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.
Cloudy with a Chance of Data
This meme comes from South Park (S2E )
ETL: To Cloud or Not to Cloud
Massively Parallel Processing in Azure Comparing Hadoop and SQL based MPP architectures in the cloud Josh Sivey SQL Saturday #597 | Phoenix.
Server & Tools Business
Data Security for Microsoft Azure
MyAppFree, Powered by Microsoft Azure, Lets Global Users Discover and Download Tested and Handpicked Windows Apps and Games for Free MICROSOFT AZURE ISV.
Near Real Time ETLs with Azure Serverless Architecture
12/7/2018 2:05 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Modern cloud PaaS for mobile apps, web sites, API's and business logic apps
Overview of big data tools
Azure Data Lake for First Time Swimmers
THR1171 Azure Data Integration: Choosing between SSIS, Azure Data Factory, and Azure Databricks Cathrine Wilhelmsen, | cathrinew.net.
2/19/2019 9:06 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Charles Tappert Seidenberg School of CSIS, Pace University
1. Azure Data Explorer Azure Data Explorer enables rich data exploration over raw, structured, and semi-structured data delivering fast time to insight.
Big-Data Analytics with Azure HDInsight
Server & Tools Business
02 | Getting Started with HDInsight
Moving your on-prem data warehouse to cloud. What are your options?
Microsoft Virtual Academy
Cloudy with a Chance of Data
Microsoft Azure Services Platform
SQL Server on Containers
Presentation transcript:

Data analytics with Hadoop In the Microsoft Azure cloud HDInsight overview Data analytics with Hadoop In the Microsoft Azure cloud

Who am I? Larry Franks Started with computers in the early 80’s Microsoft 1994 - ? Windows/Unix/Linux/OS X C#, Java, JavaScript/Node.js, Ruby, Python, Clojure, Scala

What is HDInsight? What is Hadoop? Azure specifics A computing ecosystem for data analytics Distributed storage & computing An ever changing, multi-headed hydra of data analytics solutions Open source software Azure specifics Storage of structured and unstructured data – Azure Storage Blob (WASB) Import and export of data – Azure Data Factory Underlying OS – Linux, or Windows Azure configures everything for you "Hadoop logo" by Apache Software Foundation - https://svn.apache.org/repos/asf/hadoop/logos/out_rgb/. Licensed under Apache License 2.0 via Commons - https://commons.wikimedia.org/wiki/File:Hadoop_logo.svg#/media/File:Hadoop_logo.svg

Why would I use HDInsight/Hadoop? Usually you move to Hadoop because it became too expensive to maintain a scaled out PDW. Usually you move to HDInsight (or the cloud in general,) because maintaining hardware in your data center is expensive. The cloud allows you to create a cluster only when you need it. Since data is stored separately in Azure blobs, the data is available even when you delete the cluster. Need to process more data? Create a new cluster and point it at the data.

Scenarios Batch processing of historical data Hadoop (MapReduce) NoSQL data storage HBase Real-time event processing Storm Batch processing of historical data – MapReduce, Pig, Hive NoSQL data storage – Hbase Real-time even processing – Storm Near real-time batch processing – Spark But, for some scenarios, there’s overlap. Like Hive is sort of a NoSQL data store, and Spark has SparkSQL to query data. And both Storm and Spark have near real-time batch processing. Near real-time batch processing Spark

How do I create an HDInsight cluster? Browser Command-line SDK Templates Declarative creation of Azure resources PowerShell Bash (or other Unix shell) .NET Python Node.js Etc. Any HTML 5 browser – https://portal.azure.com PowerShell - https://azure.microsoft.com/en-us/documentation/articles/powershell-install-configure/ Azure-CLI (cross-platform command-line) - https://azure.microsoft.com/en-us/documentation/articles/xplat-cli/ Azure SDKs (NOTE: support may be limited to using ARM templates) - https://azure.microsoft.com/en-us/downloads/ Templates - https://github.com/Azure/azure-quickstart-templates Rest API

Demo: Create a cluster

How do I create solutions for HDInsight? Basically, what language, and what tools will build the bits? C#? VS. Java? Eclipse, Atom, Net beans, notepad, whatever. Maybe Maven or Gradle for project management Scala? SBT Clojure? Leiningen Etc.

Demos MapReduce demo – basically, MapReduce is a bunch of code, and it can be hard to model your business logic into MapReduce logic. Pig demo – transforming data, useful to turn unstructured data into structured data. Hive demo – performing queries over structured data.

How do I get my data in the cloud? Currently, Data Factory is the way to get data into the cloud. It can talk to a variety of data sources and store the data into a variety of cloud data stores, including Blob storage used by HDInsight. Azure Data Factory

Questions?

Credits, attributions, etc. Hadoop, Eclipse, Atom.io, Maven, Gradle, etc. logos are trademark, copyright, etc. of each respective company. Notepad and Money icons are creative commons licensed Azure symbols are from the Microsoft Azure, Cloud, and Enterprise symbol pack (http://www.microsoft.com/en-us/download/details.aspx?id=41937)