Download presentation
Presentation is loading. Please wait.
Published byHope Lynch Modified over 8 years ago
1
Redmond Protocols Plugfest 2016 Casey Karst PolyBase in SQL Server 2016
2
Big Picture Provides a scalable, T-SQL language extension for combining data from both universes
3
PolyBase Use Cases
4
PolyBase Across the Enterprise SQL Product Load DataQuery DataAge Out Data HadoopWASBHadoopWASBHadoopWASB SQL Server 2016 YYYYYY Analytic Platform System (APS)Y YYYYY Azure SQL DW NYNNY
5
The Hadoop Ecosystem
6
Initially: MapReduce for insights from HDFS-resident data Recently: SQL-like data warehouse technologies on HDFS e.g. Hive, Impala, HAWQ, Spark/Shark Hadoop Evolution
7
All the interest in Big Data Increased number and variety of data sources that generate large quantities of data. Realization that data is “too valuable” to delete. Dramatic decline in the cost of hardware, especially storage.
8
PolyBase View
11
Step 1: Setup a Hadoop Cluster Hortonworks or Cloudera Distributions Hadoop 2.0 or above Linux or Windows On premise or in Azure
12
Or Azure Storage Account Azure Storage Blob (ASB) exposes an HDFS layer PolyBase reads and writes from ASB using Hadoop APIs No compute push-down support for ASB
13
Step 2: Install SQL Server Select PolyBase feature Adds new PolyBase services - PolyBase Engine - PolyBase Data Movement Service (DMS) Pre-requisite: download and install JRE
14
1. Install multiple SQL Server instances with PolyBase. Step 3: Scale-out 14 Head Node PolyBase Engine PolyBase DMS PolyBase Engine 2. Choose one as Head Node. 3. Configure remaining as Compute Nodes a.Run sp_polybase_join_group b.Restart PolyBase DMS
15
After Step 3 PolyBase Scale-out Group Head node is the SQL Server instance to which queries are submitted Compute nodes are used for scale out query processing for data in HDFS or Azure
16
Step 4 - Choose Hadoop flavor Latest Hadoop distributions supported in SQL16 RTM Cloudera CHD 5.5 on Linux Hortonworks 2.3 on Linux & Windows Server What happens under the covers? Loading the right client jars to connect to Hadoop distribution -- different numbers map to various Hadoop flavors -- example: value 4 stands for HDP 2.0 on Windows or ASB, value 5 for HDP 2.0 on Linux, value 6 for CHD 5.1/5.5 on Linux, value 7 for HDP 2.1/2.2/2.3 on Linux/Windows or ASB 7
17
After Step 4
18
PolyBase Design
19
Under-the-hood
20
Uses Hadoop RecordReaders/RecordWriters to read/write standard HDFS file types HDFS bridge in DMS
21
Under-the-hood
22
Namenode (HDFS) Hadoop Cluster File System Data moves between clusters in parallel SQL16
23
Under-the-hood
24
Creating External Tables Once per Hadoop Cluster Once per File Format HDFS File Path
25
Creating External Tables (secure Hadoop) Once per Hadoop User HDFS File Path Once per File Format Once per Hadoop Cluster per user
26
Under-the-hood
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.