Presentation is loading. Please wait.

Presentation is loading. Please wait.

HDInsight makes Hadoop Easy

Similar presentations


Presentation on theme: "HDInsight makes Hadoop Easy"— Presentation transcript:

1 HDInsight makes Hadoop Easy
Agenda Collect and Load Big Data HDInsight Clusters Cluster Customizations Visual Studio Tooling Tools

2 Collect and load big data
Prerequisites Blob storage concepts Data types and sources Performance and scalability Administration Reliability Security Data processing Pre-processing data Serialization and compression Choosing tools and technologies Tools

3 Collect and load big data
9/18/2018 Collect and load big data Interactive PowerShell Hadoop command line AzCopy Visual Studio Cloudberry 3rd party application Azure blob HDFS HDInsight 10 01 Interactive Relational Data Streaming data Automated Server log files Relational Data Azure Data Factory Apache Sqoop SQL Server Integration Services PolyBase in APS Streaming data Apache Storm on HDInsight Azure Stream Analytics Reactive extensions (RX) Custom or 3rd party application Server log files Apache Flume SQL Server Integration Services Custom solution using the Azure SDK Automated Azure Data Factory PowerShell with task scheduler SQL Server Integration Services Custom solution using the Azure SDK © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

4 9/18/2018 Blob storage concepts Store large amounts of unstructured text or binary data with the fastest read performance Account Contoso Container Images Video Blob PIC01.JPG VID1.AVI PIC02.JPG Page/blocks Block/Page Access a highly scalable, durable, and available file system Expose blobs publically over HTTP Securely lock down permissions © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

5 Data types and sources devices and sensors geo-location data
9/18/2018 zzzz web clickstreams devices and sensors geo-location data social media server logs Azure blob storage Data types and sources Extract data in a form that can be easily consumed Stage data before submitting it to a big data cluster Submit data accurately to cluster storage, including data conversions Choose the right tool for your data If using HBase, use an appropriate technique to upload it © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

6 Collect and load big data
Prerequisites Blob storage concepts Data types and sources Performance and scalability Administration Reliability Security Data processing Pre-processing data Serialization and compression Choosing tools and technologies Tools

7 9/18/2018 Reliability Upload tool should handle transient connectivity and transmission failures Monitor upload to detect failures early Record each stage in a process that raised an error Scale out with multiple upload instances Validate the data before you upload © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

8 Security Protect data at rest and in motion with secure authentication
9/18/2018 Security Protect data at rest and in motion with secure authentication Leverage local security policies and features Employ a robust auditing and monitoring process Remove non-essential sensitive data Encrypt essential sensitive data HDInsight clusters © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

9 Collect and load big data
9/18/2018 Collect and load big data Prerequisites Blob storage concepts Data types and sources Performance and scalability Administration Reliability Security Data processing Pre-processing data Serialization and compression Choosing tools and technologies Tools © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

10 Serialization and compression
9/18/2018 Serialization and compression Tools for Avro serialization and compression SDK is available from NuGet Tools provided by the codec supplier, eg. GZip and BZip2 for compression Use the classes in the NET Framework to perform GZip and DEFLATE compression on your source files Create a query job that is configured to write output in compressed form using one of the built-in codecs Azure SDK Azure SDK HDInsight compression libraries Format Codec Extension Splittable DEFLATE org.apache.hadoop.io.compress.DefaultCodec .deflate No GZip org.apache.hadoop.io.compress.GzipCodec .gz BZip2 org.apache.hadoop.io.compress.BZip2Codec (this codec is not enabled by default in configuration) .bz2 Yes © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

11 Interactive data ingestion
UI-based tools Cloudberry Explorer, Storage Explorer The hadoop dfs CopyFromLocal [source] [destination] command Codec supplier tools GZip and BZip2. Command line tool AzCopy to upload large files PowerShell commands Take advantage of the Azure PowerShell cmdlets Tools

12 Handling streaming data
Apache Storm on HDInsight Open-source framework that runs on a Hadoop cluster to capture streaming data Azure Stream Analytics Stream Processing Service on Azure offering ease of use to consume and process event streams. Custom event or stream capture solution Feeds data into the cluster data store in real time or batches Microsoft StreamInsight Complex event processing (CEP) engine with a framework API for building apps that consume and process event streams Tools

13 Loading relational data
Sqoop Extract required data from a table, view, or query in the source database and save the results as a file in your cluster storage. Interfaces that support connectivity to big data clusters Microsoft Analytics Platform System (APS) contains PolyBase, to expose a SQL-based interface for accessing data stored in Hadoop and HDInsight Tools

14 HDInsight Clusters Tools 9/18/2018
© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

15 HDInsight cluster architecture
HTTP traffic ODBC/JDBC WebHCatalog Oozie Ambari Azure VNet Secure gateway AuthN HTTP Proxy Azure Storage Highly available Head nodes Worker nodes x

16 HDInsight service entry points
9/18/2018 HDInsight service entry points HDInsight cluster Remote desktop Oozie Command line REST SDK ODBC Hive Pig M/R Query console PowerShell Excel Visual Studio plugin © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

17 Cluster Customizations
9/18/2018 Cluster Customizations Tools © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

18 Cluster customization options
9/18/2018 Cluster customization options Hive/Oozie Metastore Storage accounts ScriptAction Via Azure portal HDInsight cluster provisioning states Ready for deployment Accepted Cluster storage provisioned AzureVM configuration Config values JAR file placement in cluster Via scripting / SDK Customize cluster? Running Cluster operational Configuring HDInsight No Timed Out Error Yes RDP to cluster, update config files (non-durable) Ad hoc Cluster customization (custom script running © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

19 Visual Studio Tooling Tools 9/18/2018
© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

20 Visual Studio tooling Ships with the Azure SDK
Supported for VS 2012, 2013 and 2015 Enables Hive query authoring, submission and debugging Navigate Linked Resources Create Hive Tables Run Hive Queries View Hive Jobs Hive Script Local Validation IntelliSense Support for Hive (Preview) Table creation and schema management are also supported

21 Get started today! For more information visit:

22 © 2014 Microsoft Corporation. All rights reserved
© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "HDInsight makes Hadoop Easy"

Similar presentations


Ads by Google