Presentation is loading. Please wait.

Presentation is loading. Please wait.

Server & Tools Business

Similar presentations


Presentation on theme: "Server & Tools Business"— Presentation transcript:

1 Server & Tools Business
6/16/2019 Microsoft Big Data Essentials Module 5 – Operationalize your Big Data Pipeline Saptak Sen, Microsoft Bill Ramos, Advaiya I’m Saptak Sen and in this session I’m going to show how you can operationalize your Big Data pipeline. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

2 Server & Tools Business
6/16/2019 Agenda Microsoft .NET SDK for Hadoop Running MapReduce Jobs on Azure HDInsight WebHDFS Client WebHCat Windows PowerShell Integration This session shows how you can use the Microsoft .NET SDK for Hadoop to run MapReduce jobs. Specifically, we’ll explore using the WebHDFS Client .NET APIs to perform basic task integration, the WebHCat APIs to schedule execution tasks, and the HDInsights cmdlets in PowerShell to manage cluster activities. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

3 Server & Tools Business
6/16/2019 Microsoft .NET SDK for Hadoop Let’s start with the Microsoft .NET SDL for Hadoop © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

4 Microsoft .NET SDK For Hadoop
.NET client libraries for Hadoop Write MapReduce in Visual Studio using C# or F# Debug against local data Job Tracker .NET Hadoop SDK Talk Track: Lets first understand about the Microsoft .NET SDK For Hadoop. The Microsoft .NET SDK for Hadoop provides .NET client libraries that make it easier to work with Hadoop (Map Reduce)from .NET. Using this SDK, developers can quickly and easily build simple .NET-based applications that run Hive queries using the Windows Azure HDInsight Service. This enables developers to use their .NET skills to perform jobs in an HDInsight cluster. Key Points: Microsoft .NET SDK for Hadoop References: DBI-B221_Bakhshi.pptx ( 5f06%2d07%2fBreakouts&FolderCTID=0x E65FFB7F3BD44D88BFF47F89775C98) Microsoft Visual Studio Slave Nodes

5 SDK components MapReduce library LINQ to Hive client library
WebClient library WebHDFS client library WebHCat client library Microsoft Visual Studio install-package Microsoft.Hadoop.MapReduce install-package Microsoft.Hadoop.Hive install-package Microsoft.Hadoop.WebClient Talk Track: Now let’s talk about the various components of the .NET SDK and take a brief look at how it works. The SDK includes the MapReduce library, which simplifies writing MapReduce jobs in .NET languages using the Hadoop streaming interface. It also includes the LINQ to Hive client library, which translates C# or F# LINQ queries into HiveQL queries and executes them on the Hadoop cluster. This library can also execute arbitrary HiveQL queries from a .NET application. Finally, the SDK includes the WebClient library, which contains client libraries for WebHDFS and WebHCat. The WebHDFS client library works with files in HDFS and Windows Azure Blog storage, while the WebHCat client library manages the scheduling and execution of jobs in an HDInsight cluster. It’s easy to run and use the .NET SDK. First, in a Visual Studio C# Console Application, install the required client libraries. Then, develop an application using the features available with the installed libraries. That’s it. Now just run the application. Key Points: MapReduce library, for writing MapReduce jobs in .NET languages LINQ to Hive client library WebHDFS client library and WebHCat client library References: C# Applications (using client libraries) Mapper Reducer Deploy and run

6 WebClient Libraries in .NET
WebHDFS client library: works with files in HDFS and Windows Azure Blob storage WebHCat client library: manages the scheduling and execution of jobs in an HDInsight cluster WebHDFS WebHCat Scalable REST API Move files in and out and delete from HDFS Perform file and directory functions HDInsight job scheduling and execution Talk Track: Now , we will see the difference in the functionality of the two libraries of WebClient i.e. WebHDFS and WebHCat WebHDFS is the web service interface for HDFS. This scalable REST API enables easy access to HDFS. You can move files in and out and delete from HDFS, taking advantage of the parallelism of the cluster. You can also perform numerous file and directory functions. In addition, the WebHCat client library manages the scheduling and execution of jobs in an HDInsight cluster. It is important to note that there is a difference in the functionality of these two libraries: WebHDFS can be used to create Hive tables, while WebHCat is usually used to run queries on those Hive tables. Key Points: Difference in the functionality of WebHDFS and WebHCat References: General: About WebHDFS: DBI-B221_Bakhshi.pptx

7 Demo 1: Creating a Hive Table Using WebHDFS Client
Server & Tools Business 6/16/2019 Demo 1: Creating a Hive Table Using WebHDFS Client Batch Layer Speed Layer Serving Layer .NET Application (WebHDFS) Windows Server HDInsight Copy data from base machine to Azure Storage Windows Azure Blob storage Talk Track: Let’s move on to the demos. In this first demo, I’ll show you how to use the Microsoft .NET SDK for Hadoop to run MapReduce jobs. Specifically, we’ll use the WebHDFS Client .NET APIs to preform basic task integration. First Click- .NET Application with WebHDFS interact with HDInsight Cluster to copy data from base machine to Azure Storage Second Click – and then loading the data into Hive Tables Key Points: .NET application (WebHDFS) to interact with HDInsight cluster References: Day 3 - Module 1 - Operationalize your Big Data Pipeline Hive table Load data .NET application (WebHDFS) to interact with HDInsight cluster © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

8 Demo 2: Performing a Remote Job with WebHCat
Server & Tools Business 6/16/2019 Demo 2: Performing a Remote Job with WebHCat Batch Layer Speed Layer Serving Layer .NET Application (WebHCat) Windows Server HDInsight Talk Track: Next we’ll explore how to use WebHCat to provide an abstract view of data on a Hadoop customer to coordinate the activities of different tools. First Click-.NET application(WebHCat) to query the Hive table using . NET code Second Click- Get the output of the query Key Points: To interact with Hive Tables using .NET application (WebHCat) References: Day 3 - Module 1 - Operationalize your Big Data Pipeline .NET application (WebHCat) to interact with Hive tables Query the Hive data using .NET code Query output Hive table © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

9 Server & Tools Business
6/16/2019 Windows PowerShell Integration Let’s now look at how PowerShell works with the SDK © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

10 Windows PowerShell Integration
Manage an HDInsight cluster using a local management console PowerShell scripts to build projects, import data into HDFS, and run samples Repeatable management through scripting Develop PowerShell scripts Run on local management console Talk Track: Lets see how you can manage an HDInsight cluster using a local management console through the use of Windows PowerShell With PowerShell scripts, you can perform tasks like building projects, importing data into HDFS, and running jobs. Key Points: Manage an HDInsight cluster using a local management console through PowerShell References: powershell.aspx Manage HDInsight cluster

11 Demo 3: Integrating PowerShell with HDInsight
Server & Tools Business 6/16/2019 Demo 3: Integrating PowerShell with HDInsight Batch Layer Speed Layer Serving Layer PowerShell Integration Windows Server HDInsight Talk Track: Finally, in this last demo, I’ll show you how to use the HDInsights cmdlets in PowerShell to manage cluster activities. On First Click- HDInsight cmdlets in PowerShell query the HDInsight Cluster Second Clock- Get the desired output Key Points: Use of PowerShell cmdlets to manage HDInsight Cluster References: Day 3 - Module 1 - Operationalize your Big Data Pipeline New-MapReduceStreamingJob -Input "/example/data/gutenberg/davinci.txt" -Output "/example/data/streamingoutput/wc.txt" -Mapper cat.exe -Reducer wc.exe -File "hdfs:///example/apps/wc.exe,hdfs:///example/apps/cat.exe" [-Define <# delimited key=value pairs>] Windows PowerShell Create a Cluster Run MapReduce Program Delete the Customer Windows Azure HDInsight View Progress View Results on Azure Storage © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

12 Server & Tools Business
6/16/2019 Learn more Microsoft .NET SDK For Hadoop Managing Your HDInsight Cluster with PowerShell That’s it for this session. To learn more about what I just showed you, check out these resource links for the Hadoop .NET SDK and Windows PowerShell. Thank you! END OF PRESENTATION © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

13 Questions?

14


Download ppt "Server & Tools Business"

Similar presentations


Ads by Google