02 | Getting Started with HDInsight Graeme Malcolm | Data Technology Specialist, Content Master Pete Harris | Learning Product Planner, Microsoft
Module Overview HDInsight Architecture Provisioning an HDInsight Cluster Cluster Remote Access Using HDFS
HDInsight Architecture HDInsight cluster One or more virtual machines Hadoop Windows Azure Storage Blob storage for HDFS Windows Azure SQL Database Metadata store for Hive and Oozie Use existing, or internal Windows Azure Blob Store container(s) SQL Database HDFS Hive/Oozie Metadata HDInsight cluster (VMs)
Demo: Provisioning HDInsight In this demonstration, you will see how to: Create an HDInsight Cluster
Cluster Remote Access Remote desktop access disabled by default Enable in Windows Azure Management Portal Specify user credentials and expiration date Use an RDP connection to the Name Node to: Access Hadoop command line and utilities Monitor Hadoop activity
Using HDFS Hosted in a blob container in Windows Azure Storage Retained even when the HDInsight cluster is deleted Paths can be WASB or HDFS wasb://data@myaccount.blob.core.windows.net/logs/file.txt /logs/file.txt HDFS shell commands ls and lsr cp, copyToLocal, and copyFromLocal mv, moveToLocal, and moveFromLocal mkdir rm and rmr cat
Demo: Remote Desktop In this demonstration, you will see how to: Configure Remote Access Browse HDFS Run a Map/Reduce Job
Module Summary Provision HDInsight clusters as needed Cluster nodes Blob storage container(s) SQL Database for Hive/Oozie metadata Enable remote desktop access only if required