Presentation is loading. Please wait.

Presentation is loading. Please wait.

PolyBase: T-SQL Reaching Beyond the Database

Similar presentations


Presentation on theme: "PolyBase: T-SQL Reaching Beyond the Database"— Presentation transcript:

1 PolyBase: T-SQL Reaching Beyond the Database
DA336a Casey Karst

2 Agenda Scalable data loads to Azure SQL DW
5/12/2018 2:26 PM Agenda Logical Data Warehouse Enterprise Data Warehouse Data Virtualization in SQL Server Scalable data loads to Azure SQL DW © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

3 The Basics

4 PolyBase What is PolyBase? SQL Hadoop
5/12/2018 2:26 PM What is PolyBase? SQL Hadoop PolyBase Provides a scalable, T-SQL language extension for combining data from both universes © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

5 State of PolyBase today across SQL platform
Cloudera HortonWorks HDI WASB ADLS SQL DW No Yes SQL Server 2016 APS Yes* SQL DB

6 Why

7 Trends, Problems, & Solution
5/12/2018 2:26 PM Trends, Problems, & Solution Data exploding in volume Sensors, devices, apps causing exponential data growth Data expanding in variety JSON, XML, relational, columnar files etc. Data proliferating across data stores Purpose-built data stores, acquisitions & mergers, cloud & on-premises Current state: ETL to central data warehouse Problems: Costly custom development & maintenance Hinders ad-hoc exploratory analysis Delays Time-to-Insight Proposed solution: ETL + Data Virtualization Benefits: Enables ad-hoc data exploration Enables on-demand data integration Reduces Time-to-Insight © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

8 How

9 Under-the-hood Read/Write arbitrary Hadoop File Formats
e.g. Text, RCFILE, ORC, Parquet Parallelizing Data Transfers between DW nodes and HDFS data nodes Imposing structure on semi-structured data using external table concept Exploiting compute resources of Hadoop Clusters with push-down computation

10 Single Node SQL Server instance
5/12/2018 2:26 PM Single Node SQL Server instance SQL Server PolyBase Engine SQL Server 2016 Windows Service responsible for distributed query processing. PolyBase DMS Windows Service responsible for moving data between external source & SQL Server. © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

11 HDFS bridge in DMS Hadoop Cluster
SQL Server HDFS Bridge SQL Instance Uses Hadoop RecordReaders/RecordWriters to read/write standard HDFS file types

12 Under-the-hood Read/Write arbitrary Hadoop File Formats
5/12/2018 Under-the-hood Read/Write arbitrary Hadoop File Formats e.g. Text, RCFILE, ORC, Parquet Parallelizing Data Transfers between DW nodes and HDFS data nodes Imposing structure on semi-structured data using external table concept Exploiting compute resources of Hadoop Clusters with push-down computation © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

13 PolyBase Scale out Group
5/12/2018 2:26 PM PolyBase Scale out Group Why did we build this? Moving data takes time. All queries move some data (even pushdown). What does this do? Parallel reads from external data sources for faster throughput Scaled-out local execution of queries (e.g. partial aggs, joins on external tables) SQL16 PolyBase DMS Engine Head Node Compute Nodes © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

14 Data moves between clusters in parallel
SQL16 PolyBase DMS Head Node PolyBase Engine SQL Server 2016 Namenode (HDFS) Hadoop Cluster Data Node File System

15 Under-the-hood Read/Write arbitrary Hadoop File Formats
5/12/2018 Under-the-hood Read/Write arbitrary Hadoop File Formats e.g. Text, RCFILE, ORC, Parquet Parallelizing Data Transfers between DW nodes and HDFS data nodes Imposing structure on semi-structured data using external table concept Exploiting compute resources of Hadoop Clusters with push-down computation © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

16 Create External Tables
5/12/2018 Create External Tables CREATE EXTERNAL DATA SOURCE HadoopCluster WITH (TYPE = Hadoop, LOCATION = 'hdfs:// :8020', RESOURCE_MANAGER_LOCATION = ' :8050'); CREATE EXTERNAL FILE FORMAT TextFile WITH ( FORMAT_TYPE = DELIMITEDTEXT, DATA_COMPRESSION = 'org.apache.hadoop.io.compress.GzipCodec', FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE)); CREATE EXTERNAL TABLE [dbo].[Customer] ( [SensorKey] int NOT NULL, [CustomerKey] int NOT NULL, [Speed] float NOT NULL ) WITH (LOCATION='//Sensor_Data//May2014/sensordata.tbl', DATA_SOURCE = HadoopCluster, FILE_FORMAT = TextFile Once per Hadoop Cluster Once per File Format HDFS File Path © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

17 Connecting and querying a CDH cluster
SQL VM and CDH IaaS cluster in Azure

18 Under-the-hood Read/Write arbitrary Hadoop File Formats
5/12/2018 Under-the-hood Read/Write arbitrary Hadoop File Formats e.g. Text, RCFILE, ORC, Parquet Parallelizing Data Transfers between DW nodes and HDFS data nodes Imposing structure on semi-structured data using external table concept Exploiting compute resources of Hadoop Clusters with push-down computation © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

19 Push-down: How it works
5/12/2018 2:26 PM Push-down: How it works Dynamic binding SALESDB.CUSTOMER Column filtering User Location Product Sentiment Rtwt Hour Date Sean Suz Audie Tom Sanjay Roger Steve CA WA CO IL MN TX AL xbox excel sqls wp8 ssas ssrs -1 1 5 8 16 7 11 2-8-17 SELECT User, Product, Sentiment FROM Customer WHERE Hour = Current - 1 AND Date = Today AND Sentiment > 0; Row filtering © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

20 Push-down: Push-able operations
Applies to PolyBase in SQL Server Hadoop (CDH, HDP) Azure Blob Storage COLUMN PROJECTIONS YES NO LIMIT PREDICATES AGGREGATES PARTIAL HOMOGENEOUS JOINS

21 Agenda Scalable data loads to Azure SQL DW
Logical Data Warehouse Enterprise Data Warehouse Data Virtualization in SQL Server Scalable data loads to Azure SQL DW

22 Single Gated Client Compute Node DMS Bridge Control Node Compute Node

23 Single Gated Client Parallelised
Compute Node DMS Bridge Client Control Node Compute Node DMS Bridge Client DMS Client Compute Node DMS Bridge

24 Parallel Loading with PolyBase to ASB
Compute Node DMS Bridge Azure Storage Blob (ASB) Control Node Compute Node DMS Bridge DMS Compute Node DMS Bridge

25 Parallel Loading with PolyBase to ADLS
Compute Node DMS Bridge Azure Data Lake Store (ADLS) Control Node Compute Node DMS Bridge DMS Compute Node DMS Bridge

26 ADF Performance Comparison (MB/s)

27 ADF Performance Comparison (MB/s)
5/12/2018 2:26 PM ADF Performance Comparison (MB/s) © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

28 Big Data Warehouse enriches and structures data
Microsoft Build 2016 Big Data Warehouse enriches and structures data 5/12/2018 2:26 PM Unknown Value Data High value data XML JSON TEXT Preparation Preparation Pre-process Transpose Re-format Load Transform Aggregate Consume Batch Batch Ad-hoc © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

29 Demo Build out a Star Schema Data Warehouse

30 Taxi Cab Star Schema Dim_HackneyLicense 42,958 Dim_Date 5,844
Fact_Trip 170,261,328 Dim_Weather 526,330 Dim_Geography 304,129 Dim_Medallion 13,668 Dim_Time 86,400

31 Common Loading Scenarios
5/12/2018 2:26 PM Common Loading Scenarios Load a small dimension table Load a large fact table Reload dimension table after source updates © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

32 Load of large fact table Scenario
5/12/2018 2:26 PM Load of large fact table Scenario Considerations: Resiliency Directory granularity Optimize for load speed © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

33 Demo Build out a Star Schema Data Warehouse

34 Large Fact Table file structure
ADLS Month (1-12) Day (1-31) Table Name 2013 2012 Month (1-12) Day (1-31)

35 Think about your loading as:
Extract – Cook data in storage layer Load – Optimize for speed and resiliency Transform – Use SQL DW to transform your data into production tables

36 Continue your Ignite learning path
5/12/2018 2:26 PM Continue your Ignite learning path Visit Channel 9 to access a wide range of Microsoft training and event recordings Head to the TechNet Eval Centre to download trials of the latest Microsoft products Visit Microsoft Virtual Academy for free online training visit © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

37 5/12/2018 2:26 PM Thank you Chat with me in the Speaker Lounge Find me © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "PolyBase: T-SQL Reaching Beyond the Database"

Similar presentations


Ads by Google