PolyBase: T-SQL Reaching Beyond the Database

Slides:



Advertisements
Similar presentations
MIX 09 4/15/ :14 PM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Advertisements

Introduction to Big Data and Hadoop Name Title Microsoft Corporation.
Built by Developers for Developers…. © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
SQL SERVER 2012 FOR THE NEW WORLD OF DATA Doug Leland General Manager SQL Server Marketing.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Feature: Customer Combiner and Modifier © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.
demo Instance AInstance B Read “7” Write “8”

customer.
demo © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
demo Demo.
PolyBase in SQL Server 16 David J. DeWitt Rimma V. Nehme
demo QueryForeign KeyInstance /sm:body()/x:Order/x:Delivery/y:TrackingId1Z
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks.
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.

PolyBase Query Hadoop with ease Sahaj Saini Program Manager, Microsoft.
Redmond Protocols Plugfest 2016 Casey Karst PolyBase in SQL Server 2016.
John Tran Business Program Manager, The Suddath Companies
3 Ways to Integrate Business Systems to Partners
Making of the Ignite Bot
Data Platform and Analytics Foundational Training
Data Platform Modernization
Microsoft Virtual Academy
Data Platform and Analytics Foundational Training
Microsoft /2/2018 3:42 PM BRK3129 Query Big Data using the Expanded T-SQL footprint with PolyBase in SQL Server 2016 Casey Karst Program Manager.
Microsoft Azure: The only consistent Hybrid Cloud
Why Is My SQL DW Query Slow?
Microsoft Virtual Academy
Need for Speed: Why Applications With No Database and No Services are Fast ARC334 Nick Randolph – Built to Roam.
Microsoft Build /22/ :52 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
7/22/2018 9:21 PM BRK3270 Building a Better Data Solution: Microsoft SQL Server and Azure Data Services Joey D’Antoni Principal Consultant Denny Cherry.
Excel and Power BI Better Together Democratization of data
Data Platform and Analytics Foundational Training
Mission-critical performance with Microsoft SQL Server 2016
Microsoft Dynamics NAV 2018 – what’s new
9/13/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Возможности Excel 2010, о которых следует знать
A developers guide to Azure SQL Data Warehouse
Analytics for Apps: Landing and Loading Data into SQL Data Warehouse
Overview of Azure Data Lake Store
Business Intelligence for Project Server/Online
Data Platform Modernization
11/23/2018 2:35 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
11/27/2018 Desktop Virtualization Corey Hynes Kyle Rosenthal President Technical Lead HynesITe Inc Spider Consulting @windowspcguy.
Title of Presentation 12/2/2018 3:48 PM
Microsoft Virtual Academy
Power-up NoSQL with Azure Cosmos DB
Microsoft Virtual Academy
Microsoft Virtual Academy
Microsoft Virtual Academy
What is Visual Studio Code?
Building SaaS Solutions on Windows Azure
8/04/2019 9:13 PM © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
TechEd /23/2019 9:35 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
WINDOWS AZURE A LAP AROUND PLATFORM THE Steve Marx
Developing Windows Azure Applications with Visual Studio
Windows Azure Hybrid Architectures and Patterns
Шитманов Дархан Қаражанұлы Тарих пәнінің
Title of Presentation 5/24/2019 1:26 PM
Microsoft Virtual Academy
Securing ASP.NET in an Azure Environment
Microsoft Virtual Academy
Microsoft Virtual Academy
Microsoft Virtual Academy
Microsoft Virtual Academy
Microsoft Virtual Academy
Microsoft Virtual Academy
Presentation transcript:

PolyBase: T-SQL Reaching Beyond the Database DA336a Casey Karst

Agenda Scalable data loads to Azure SQL DW 5/12/2018 2:26 PM Agenda Logical Data Warehouse Enterprise Data Warehouse Data Virtualization in SQL Server Scalable data loads to Azure SQL DW © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

The Basics

PolyBase What is PolyBase? SQL Hadoop 5/12/2018 2:26 PM What is PolyBase? SQL Hadoop PolyBase Provides a scalable, T-SQL language extension for combining data from both universes © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

State of PolyBase today across SQL platform Cloudera HortonWorks HDI WASB ADLS SQL DW No Yes SQL Server 2016 APS Yes* SQL DB

Why

Trends, Problems, & Solution 5/12/2018 2:26 PM Trends, Problems, & Solution Data exploding in volume Sensors, devices, apps causing exponential data growth Data expanding in variety JSON, XML, relational, columnar files etc. Data proliferating across data stores Purpose-built data stores, acquisitions & mergers, cloud & on-premises Current state: ETL to central data warehouse Problems: Costly custom development & maintenance Hinders ad-hoc exploratory analysis Delays Time-to-Insight Proposed solution: ETL + Data Virtualization Benefits: Enables ad-hoc data exploration Enables on-demand data integration Reduces Time-to-Insight © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

How

Under-the-hood Read/Write arbitrary Hadoop File Formats e.g. Text, RCFILE, ORC, Parquet Parallelizing Data Transfers between DW nodes and HDFS data nodes Imposing structure on semi-structured data using external table concept Exploiting compute resources of Hadoop Clusters with push-down computation

Single Node SQL Server instance 5/12/2018 2:26 PM Single Node SQL Server instance SQL Server PolyBase Engine SQL Server 2016 Windows Service responsible for distributed query processing. PolyBase DMS Windows Service responsible for moving data between external source & SQL Server. © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

HDFS bridge in DMS Hadoop Cluster SQL Server HDFS Bridge SQL Instance Uses Hadoop RecordReaders/RecordWriters to read/write standard HDFS file types

Under-the-hood Read/Write arbitrary Hadoop File Formats 5/12/2018 Under-the-hood Read/Write arbitrary Hadoop File Formats e.g. Text, RCFILE, ORC, Parquet Parallelizing Data Transfers between DW nodes and HDFS data nodes Imposing structure on semi-structured data using external table concept Exploiting compute resources of Hadoop Clusters with push-down computation © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

PolyBase Scale out Group 5/12/2018 2:26 PM PolyBase Scale out Group Why did we build this? Moving data takes time. All queries move some data (even pushdown). What does this do? Parallel reads from external data sources for faster throughput Scaled-out local execution of queries (e.g. partial aggs, joins on external tables) SQL16 PolyBase DMS Engine Head Node Compute Nodes © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Data moves between clusters in parallel SQL16 PolyBase DMS Head Node PolyBase Engine SQL Server 2016 Namenode (HDFS) Hadoop Cluster Data Node File System

Under-the-hood Read/Write arbitrary Hadoop File Formats 5/12/2018 Under-the-hood Read/Write arbitrary Hadoop File Formats e.g. Text, RCFILE, ORC, Parquet Parallelizing Data Transfers between DW nodes and HDFS data nodes Imposing structure on semi-structured data using external table concept Exploiting compute resources of Hadoop Clusters with push-down computation © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Create External Tables 5/12/2018 Create External Tables CREATE EXTERNAL DATA SOURCE HadoopCluster WITH (TYPE = Hadoop, LOCATION = 'hdfs://10.193.26.177:8020', RESOURCE_MANAGER_LOCATION = '10.193.26.178:8050'); CREATE EXTERNAL FILE FORMAT TextFile WITH ( FORMAT_TYPE = DELIMITEDTEXT, DATA_COMPRESSION = 'org.apache.hadoop.io.compress.GzipCodec', FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE)); CREATE EXTERNAL TABLE [dbo].[Customer] ( [SensorKey] int NOT NULL, [CustomerKey] int NOT NULL, [Speed] float NOT NULL ) WITH (LOCATION='//Sensor_Data//May2014/sensordata.tbl', DATA_SOURCE = HadoopCluster, FILE_FORMAT = TextFile Once per Hadoop Cluster Once per File Format HDFS File Path © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Connecting and querying a CDH cluster SQL VM and CDH IaaS cluster in Azure

Under-the-hood Read/Write arbitrary Hadoop File Formats 5/12/2018 Under-the-hood Read/Write arbitrary Hadoop File Formats e.g. Text, RCFILE, ORC, Parquet Parallelizing Data Transfers between DW nodes and HDFS data nodes Imposing structure on semi-structured data using external table concept Exploiting compute resources of Hadoop Clusters with push-down computation © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Push-down: How it works 5/12/2018 2:26 PM Push-down: How it works Dynamic binding SALESDB.CUSTOMER Column filtering User Location Product Sentiment Rtwt Hour Date Sean Suz Audie Tom Sanjay Roger Steve CA WA CO IL MN TX AL xbox excel sqls wp8 ssas ssrs -1 1 5 8 16 7 11 2-8-17 SELECT User, Product, Sentiment FROM Customer WHERE Hour = Current - 1 AND Date = Today AND Sentiment > 0; Row filtering © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Push-down: Push-able operations Applies to PolyBase in SQL Server Hadoop (CDH, HDP) Azure Blob Storage COLUMN PROJECTIONS YES NO LIMIT PREDICATES AGGREGATES PARTIAL HOMOGENEOUS JOINS

Agenda Scalable data loads to Azure SQL DW Logical Data Warehouse Enterprise Data Warehouse Data Virtualization in SQL Server Scalable data loads to Azure SQL DW

Single Gated Client Compute Node DMS Bridge Control Node Compute Node

Single Gated Client Parallelised Compute Node DMS Bridge Client Control Node Compute Node DMS Bridge Client DMS Client Compute Node DMS Bridge

Parallel Loading with PolyBase to ASB Compute Node DMS Bridge Azure Storage Blob (ASB) Control Node Compute Node DMS Bridge DMS Compute Node DMS Bridge

Parallel Loading with PolyBase to ADLS Compute Node DMS Bridge Azure Data Lake Store (ADLS) Control Node Compute Node DMS Bridge DMS Compute Node DMS Bridge

ADF Performance Comparison (MB/s)

ADF Performance Comparison (MB/s) 5/12/2018 2:26 PM ADF Performance Comparison (MB/s) © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Big Data Warehouse enriches and structures data Microsoft Build 2016 Big Data Warehouse enriches and structures data 5/12/2018 2:26 PM Unknown Value Data High value data XML JSON TEXT Preparation Preparation Pre-process Transpose Re-format Load Transform Aggregate Consume Batch Batch Ad-hoc © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Demo Build out a Star Schema Data Warehouse

Taxi Cab Star Schema Dim_HackneyLicense 42,958 Dim_Date 5,844 Fact_Trip 170,261,328 Dim_Weather 526,330 Dim_Geography 304,129 Dim_Medallion 13,668 Dim_Time 86,400

Common Loading Scenarios 5/12/2018 2:26 PM Common Loading Scenarios Load a small dimension table Load a large fact table Reload dimension table after source updates © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Load of large fact table Scenario 5/12/2018 2:26 PM Load of large fact table Scenario Considerations: Resiliency Directory granularity Optimize for load speed © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Demo Build out a Star Schema Data Warehouse

Large Fact Table file structure ADLS Month (1-12) Day (1-31) Table Name 2013 2012 Month (1-12) Day (1-31)

Think about your loading as: Extract – Cook data in storage layer Load – Optimize for speed and resiliency Transform – Use SQL DW to transform your data into production tables

Continue your Ignite learning path 5/12/2018 2:26 PM Continue your Ignite learning path Visit Channel 9 to access a wide range of Microsoft training and event recordings https://channel9.msdn.com/ Head to the TechNet Eval Centre to download trials of the latest Microsoft products http://Microsoft.com/en-us/evalcenter/ Visit Microsoft Virtual Academy for free online training visit https://www.microsoftvirtualacademy.com © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

5/12/2018 2:26 PM Thank you Chat with me in the Speaker Lounge Find me @(cakarst@microsoft.com) © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.