MSBIC Hadoop Series Hadoop & Microsoft BI Bryan Smith
MSBIC Hadoop Series Learn the basics of Hadoop through a combination of demonstration and lecture. Session participants are invited to follow along leveraging emulation environments and Azure-based clusters, the setting up of which we will address in our first session. March – Getting StartedAugust – Processing the Data with Pig April – Understanding the File SystemSeptember – OOF May – Implementing MapReduce Jobs October – Hadoop & MS BI June – Querying the Data with HiveNovember – TBD July – On VacationDecember – TBD
Today’s Session Objectives: 1.Review interfaces available with Hadoop 2.Explore Microsoft BI tool integration with Hadoop
HDInsight HDInsight on Azure HDInsight Emulator HDInsight on PDW
Hive Editor HUE-like interface for submission of Hive queries Results & log output available for download Accessible on Azure HDInsight at
Hadoop Interfaces A WebHCat (Templeton) Ambari Oozie Hbase WebHDFS
ODBC Driver for Hive Presents Hadoop as ODBC-standard source Hortonworks ODBC Driver available herehere NOTE When setting up the driver, consider using default as the database A WebHCat (Templeton) ODBC Driver for Hive
MS BI via ODBC SQL Server Analysis Services (OLAP Mode) SQL Server Analysis Services (Tabular Mode) SQL Server Integration Services SQL Server Reporting Services SQL Server Database Engine (Linked Server) A WebHCat (Templeton) ODBC Driver for Hive
Polybase Microsoft Analytics Platform System (formerly SQL Server Parallel Data Warehouse) DB Table Externa l Table A Polybase Bridge CREATE EXTERNAL TABLE ClickStream(url varchar(50), event_date date, user_IP varchar(50)), WITH (LOCATION =‘hdfs://MyHadoop:5000/tpch1GB/employee.tbl’, FORMAT_OPTIONS (FIELD_TERMINATOR = '|')); Transparent to end-user Filtering pushed to Hadoop as map job Statistics optimize cross-platform execution
Power BI A WebHCat (Templeton) ODBC Driver for Hive Power Query PowerPivot WebHDFS In Azure HDInsight, WebHDFS is disabled so that Power Query is actually speaking to the Azure Storage Blob REST interface
A HDInsight Interfaces WebHCat (Templeton) Ambari Oozie Hbase Sqoop Hadoop Command Line Azure PowerShell Azure Cross-Platform CLI.NET SDK for Hadoop.NET SDK for Hadoop (more info)more info Cluster Management Library Job Submission Library Microsoft Avro Library Map/Reduce Client Linq to Hive Client WebHCat Client Oozie Client Ambari Monitoring Client ODBC Driver for Hive Hive Editor PowerBI Power Query PowerPivot Data Management Gateway SQL Server Analysis Services SQL Server Reporting Services SQL Server Integration Services SQL Server Database Engine Analytics Platform Server System Center Operations Manager Azure ML WebHDFS
MSBIC Hadoop Series Learn the basics of Hadoop through a combination of demonstration and lecture. Session participants are invited to follow along leveraging emulation environments and Azure-based clusters, the setting up of which we will address in our first session. March – Getting StartedAugust – Processing the Data with Pig April – Understanding the File SystemSeptember – OOF May – Implementing MapReduce Jobs October – Hadoop & MS BI June – Querying the Data with HiveNovember – TBD July – On VacationDecember – TBD