Redmond Protocols Plugfest 2016 Casey Karst PolyBase in SQL Server 2016
Big Picture Provides a scalable, T-SQL language extension for combining data from both universes
PolyBase Use Cases
PolyBase Across the Enterprise SQL Product Load DataQuery DataAge Out Data HadoopWASBHadoopWASBHadoopWASB SQL Server 2016 YYYYYY Analytic Platform System (APS)Y YYYYY Azure SQL DW NYNNY
The Hadoop Ecosystem
Initially: MapReduce for insights from HDFS-resident data Recently: SQL-like data warehouse technologies on HDFS e.g. Hive, Impala, HAWQ, Spark/Shark Hadoop Evolution
All the interest in Big Data Increased number and variety of data sources that generate large quantities of data. Realization that data is “too valuable” to delete. Dramatic decline in the cost of hardware, especially storage.
PolyBase View
Step 1: Setup a Hadoop Cluster Hortonworks or Cloudera Distributions Hadoop 2.0 or above Linux or Windows On premise or in Azure
Or Azure Storage Account Azure Storage Blob (ASB) exposes an HDFS layer PolyBase reads and writes from ASB using Hadoop APIs No compute push-down support for ASB
Step 2: Install SQL Server Select PolyBase feature Adds new PolyBase services - PolyBase Engine - PolyBase Data Movement Service (DMS) Pre-requisite: download and install JRE
1. Install multiple SQL Server instances with PolyBase. Step 3: Scale-out 14 Head Node PolyBase Engine PolyBase DMS PolyBase Engine 2. Choose one as Head Node. 3. Configure remaining as Compute Nodes a.Run sp_polybase_join_group b.Restart PolyBase DMS
After Step 3 PolyBase Scale-out Group Head node is the SQL Server instance to which queries are submitted Compute nodes are used for scale out query processing for data in HDFS or Azure
Step 4 - Choose Hadoop flavor Latest Hadoop distributions supported in SQL16 RTM Cloudera CHD 5.5 on Linux Hortonworks 2.3 on Linux & Windows Server What happens under the covers? Loading the right client jars to connect to Hadoop distribution -- different numbers map to various Hadoop flavors -- example: value 4 stands for HDP 2.0 on Windows or ASB, value 5 for HDP 2.0 on Linux, value 6 for CHD 5.1/5.5 on Linux, value 7 for HDP 2.1/2.2/2.3 on Linux/Windows or ASB 7
After Step 4
PolyBase Design
Under-the-hood
Uses Hadoop RecordReaders/RecordWriters to read/write standard HDFS file types HDFS bridge in DMS
Under-the-hood
Namenode (HDFS) Hadoop Cluster File System Data moves between clusters in parallel SQL16
Under-the-hood
Creating External Tables Once per Hadoop Cluster Once per File Format HDFS File Path
Creating External Tables (secure Hadoop) Once per Hadoop User HDFS File Path Once per File Format Once per Hadoop Cluster per user
Under-the-hood