Download presentation
Presentation is loading. Please wait.
Published byAnneli Korpela Modified over 6 years ago
1
Data analytics with Hadoop In the Microsoft Azure cloud
HDInsight overview Data analytics with Hadoop In the Microsoft Azure cloud
2
Who am I? Larry Franks Started with computers in the early 80’s
Microsoft ? Windows/Unix/Linux/OS X C#, Java, JavaScript/Node.js, Ruby, Python, Clojure, Scala
3
What is HDInsight? What is Hadoop? Azure specifics
A computing ecosystem for data analytics Distributed storage & computing An ever changing, multi-headed hydra of data analytics solutions Open source software Azure specifics Storage of structured and unstructured data – Azure Storage Blob (WASB) Import and export of data – Azure Data Factory Underlying OS – Linux, or Windows Azure configures everything for you "Hadoop logo" by Apache Software Foundation - Licensed under Apache License 2.0 via Commons -
4
Why would I use HDInsight/Hadoop?
Usually you move to Hadoop because it became too expensive to maintain a scaled out PDW. Usually you move to HDInsight (or the cloud in general,) because maintaining hardware in your data center is expensive. The cloud allows you to create a cluster only when you need it. Since data is stored separately in Azure blobs, the data is available even when you delete the cluster. Need to process more data? Create a new cluster and point it at the data.
5
Scenarios Batch processing of historical data Hadoop (MapReduce)
NoSQL data storage HBase Real-time event processing Storm Batch processing of historical data – MapReduce, Pig, Hive NoSQL data storage – Hbase Real-time even processing – Storm Near real-time batch processing – Spark But, for some scenarios, there’s overlap. Like Hive is sort of a NoSQL data store, and Spark has SparkSQL to query data. And both Storm and Spark have near real-time batch processing. Near real-time batch processing Spark
6
How do I create an HDInsight cluster?
Browser Command-line SDK Templates Declarative creation of Azure resources PowerShell Bash (or other Unix shell) .NET Python Node.js Etc. Any HTML 5 browser – PowerShell - Azure-CLI (cross-platform command-line) - Azure SDKs (NOTE: support may be limited to using ARM templates) - Templates - Rest API
7
Demo: Create a cluster
8
How do I create solutions for HDInsight?
Basically, what language, and what tools will build the bits? C#? VS. Java? Eclipse, Atom, Net beans, notepad, whatever. Maybe Maven or Gradle for project management Scala? SBT Clojure? Leiningen Etc.
9
Demos MapReduce demo – basically, MapReduce is a bunch of code, and it can be hard to model your business logic into MapReduce logic. Pig demo – transforming data, useful to turn unstructured data into structured data. Hive demo – performing queries over structured data.
10
How do I get my data in the cloud?
Currently, Data Factory is the way to get data into the cloud. It can talk to a variety of data sources and store the data into a variety of cloud data stores, including Blob storage used by HDInsight. Azure Data Factory
11
Questions?
12
Credits, attributions, etc.
Hadoop, Eclipse, Atom.io, Maven, Gradle, etc. logos are trademark, copyright, etc. of each respective company. Notepad and Money icons are creative commons licensed Azure symbols are from the Microsoft Azure, Cloud, and Enterprise symbol pack (
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.