HDInsight makes Hadoop Easy

Slides:



Advertisements
Similar presentations
© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Advertisements

MIX 09 4/15/ :14 PM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Parametric Sweeps Cluster SOA MPI LINQ to HPC Excel Cluster Deployment Monitoring Diagnostics Reporting Job submission API and portal.
Interactivity Navigating a data model Working with large quantities of data Entry Editing and adding data User feedback and validation Presentation.
Microsoft Big Data Essentials Module 1 - Introduction to Big Data
Session 1.
Built by Developers for Developers…. © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
35% of Internet traffic is video today, by % Growing at ~50% CAGR TV IP Delivery ~50 million internet connected TVs sold this year 150M+ video.
Feature: Assign an Item to Multiple Sites © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
demo Cloud Storage WA Blobs Schema Management APIs & Portal Web Roles Integration Pipeline 3 rd Party Web Services 3 rd Party Store 3 rd Party.
Yousef Khalidi Distinguished Engineer Microsoft Corporation.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
North America Region Europe Region Asia Pacific Region.
Feature: Customer Combiner and Modifier © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
Pradeep Kumar C Support Escalation Engineer Windows Azure Diagnostics Logging and Monitoring in the Cloud.
SQL Server SQL Azure Visual Studio“Quadrant” SQL Server Modeling Services Entity Framework ADO.NET“M”/EDM Data Services …
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.
demo Instance AInstance B Read “7” Write “8”

customer.
demo © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
demo Demo.
Breaking points of traditional approach What if you could handle big data?
demo QueryForeign KeyInstance /sm:body()/x:Order/x:Delivery/y:TrackingId1Z
projekt202 © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks.
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.

IT Operations Management
Data Platform and Analytics Foundational Training
The story of an IoT solution
S4 Solution Specialist Sales Summit
Deployment Planning Services
What has Azure to offer to IoT Developers?
Microsoft Build /22/ :52 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Data Platform and Analytics Foundational Training
SQL Server Data Tools for Visual Studio Part I: Core SQL Server Tools
IT Operations Management
Design and Implement Cloud Data Platform Solutions
9/13/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Возможности Excel 2010, о которых следует знать
Office Power Hour New developer APIs and features for Apps for Office
Overview of Azure Data Lake Store
Microsoft Build /8/2018 5:15 AM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Microsoft Ignite NZ October 2016 SKYCITY, Auckland.
持續的產出Windows Azure 雲端服務
Server & Tools Business
Title of Presentation 12/2/2018 3:48 PM
Introduction to Building Applications with Windows Azure
Jim Nakashima Program Manager Cloud Tools
TechEd /15/2019 8:08 PM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
Pablo Castro Software Architect Microsoft Corporation
Building and running HPC apps in Windows Azure
Developing for Windows Azure
8/04/2019 9:13 PM © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Common Data Service Data Integrator
HDInsight Tools for Visual Studio
Виктор Хаджийски Катедра “Металургия на желязото и металолеене”
Developing Windows Azure Applications with Visual Studio
5/8/2019 3:20 AM bQuery-Tool 3.0 A new and elegant way to create queries and ad-hoc reports on your Baan/Infor ERP LN data. This Baan session is a query.
Шитманов Дархан Қаражанұлы Тарих пәнінің
The complete developer's guide to the SkyDrive API
Title of Presentation 5/24/2019 1:26 PM
Day 2, Session 2 Connecting System Center to the Public Cloud
Server & Tools Business
Server & Tools Business
Presentation transcript:

HDInsight makes Hadoop Easy Agenda Collect and Load Big Data HDInsight Clusters Cluster Customizations Visual Studio Tooling Tools

Collect and load big data Prerequisites Blob storage concepts Data types and sources Performance and scalability Administration Reliability Security Data processing Pre-processing data Serialization and compression Choosing tools and technologies Tools

Collect and load big data 9/18/2018 Collect and load big data Interactive PowerShell Hadoop command line AzCopy Visual Studio Cloudberry 3rd party application Azure blob HDFS HDInsight 10 01 Interactive Relational Data Streaming data Automated Server log files Relational Data Azure Data Factory Apache Sqoop SQL Server Integration Services PolyBase in APS Streaming data Apache Storm on HDInsight Azure Stream Analytics Reactive extensions (RX) Custom or 3rd party application Server log files Apache Flume SQL Server Integration Services Custom solution using the Azure SDK Automated Azure Data Factory PowerShell with task scheduler SQL Server Integration Services Custom solution using the Azure SDK © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

9/18/2018 Blob storage concepts http://<account>.blob.core.windows.net/<container>/<blobname> Store large amounts of unstructured text or binary data with the fastest read performance Account Contoso Container Images Video Blob PIC01.JPG VID1.AVI PIC02.JPG Page/blocks Block/Page Access a highly scalable, durable, and available file system Expose blobs publically over HTTP Securely lock down permissions © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Data types and sources devices and sensors geo-location data 9/18/2018 zzzz web clickstreams devices and sensors geo-location data social media server logs Azure blob storage Data types and sources Extract data in a form that can be easily consumed Stage data before submitting it to a big data cluster Submit data accurately to cluster storage, including data conversions Choose the right tool for your data If using HBase, use an appropriate technique to upload it © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Collect and load big data Prerequisites Blob storage concepts Data types and sources Performance and scalability Administration Reliability Security Data processing Pre-processing data Serialization and compression Choosing tools and technologies Tools

9/18/2018 Reliability Upload tool should handle transient connectivity and transmission failures Monitor upload to detect failures early Record each stage in a process that raised an error Scale out with multiple upload instances Validate the data before you upload © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Security Protect data at rest and in motion with secure authentication 9/18/2018 Security Protect data at rest and in motion with secure authentication Leverage local security policies and features Employ a robust auditing and monitoring process Remove non-essential sensitive data Encrypt essential sensitive data HDInsight clusters © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Collect and load big data 9/18/2018 Collect and load big data Prerequisites Blob storage concepts Data types and sources Performance and scalability Administration Reliability Security Data processing Pre-processing data Serialization and compression Choosing tools and technologies Tools © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Serialization and compression 9/18/2018 Serialization and compression Tools for Avro serialization and compression SDK is available from NuGet Tools provided by the codec supplier, eg. GZip and BZip2 for compression Use the classes in the NET Framework to perform GZip and DEFLATE compression on your source files Create a query job that is configured to write output in compressed form using one of the built-in codecs Azure SDK Azure SDK HDInsight compression libraries Format Codec Extension Splittable DEFLATE org.apache.hadoop.io.compress.DefaultCodec .deflate No GZip org.apache.hadoop.io.compress.GzipCodec .gz BZip2 org.apache.hadoop.io.compress.BZip2Codec (this codec is not enabled by default in configuration) .bz2 Yes © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Interactive data ingestion UI-based tools Cloudberry Explorer, Storage Explorer The hadoop dfs CopyFromLocal [source] [destination] command Codec supplier tools GZip and BZip2. Command line tool AzCopy to upload large files PowerShell commands Take advantage of the Azure PowerShell cmdlets Tools

Handling streaming data Apache Storm on HDInsight Open-source framework that runs on a Hadoop cluster to capture streaming data Azure Stream Analytics Stream Processing Service on Azure offering ease of use to consume and process event streams. Custom event or stream capture solution Feeds data into the cluster data store in real time or batches Microsoft StreamInsight Complex event processing (CEP) engine with a framework API for building apps that consume and process event streams Tools

Loading relational data Sqoop Extract required data from a table, view, or query in the source database and save the results as a file in your cluster storage. Interfaces that support connectivity to big data clusters Microsoft Analytics Platform System (APS) contains PolyBase, to expose a SQL-based interface for accessing data stored in Hadoop and HDInsight Tools

HDInsight Clusters Tools 9/18/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

HDInsight cluster architecture HTTP traffic ODBC/JDBC WebHCatalog Oozie Ambari Azure VNet Secure gateway AuthN HTTP Proxy Azure Storage Highly available Head nodes Worker nodes x

HDInsight service entry points 9/18/2018 HDInsight service entry points HDInsight cluster Remote desktop Oozie Command line REST SDK ODBC Hive Pig M/R Query console PowerShell Excel Visual Studio plugin © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Cluster Customizations 9/18/2018 Cluster Customizations Tools © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Cluster customization options 9/18/2018 Cluster customization options Hive/Oozie Metastore Storage accounts ScriptAction Via Azure portal HDInsight cluster provisioning states Ready for deployment Accepted Cluster storage provisioned AzureVM configuration Config values JAR file placement in cluster Via scripting / SDK Customize cluster? Running Cluster operational Configuring HDInsight No Timed Out Error Yes RDP to cluster, update config files (non-durable) Ad hoc Cluster customization (custom script running © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Visual Studio Tooling Tools 9/18/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Visual Studio tooling Ships with the Azure SDK Supported for VS 2012, 2013 and 2015 Enables Hive query authoring, submission and debugging Navigate Linked Resources Create Hive Tables Run Hive Queries View Hive Jobs Hive Script Local Validation IntelliSense Support for Hive (Preview) Table creation and schema management are also supported

Get started today! For more information visit: http://azure.microsoft.com/en-us/services/hdinsight/

© 2014 Microsoft Corporation. All rights reserved © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.