Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inside SQL Server Polybase

Similar presentations


Presentation on theme: "Inside SQL Server Polybase"— Presentation transcript:

1 Inside SQL Server Polybase
4/12/2019 4:42 AM Inside SQL Server Polybase Bob Ward, Principal Architect, Microsoft © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

2 Session learning objectives
4/12/2019 Session learning objectives At the end of this session, you should be better able to… Understand what is Polybase and how to use it. Understand how Polybase works to know its capabilities for data virtualization Learn how to use Polybase to build a data hub with SQL Server This slide is required. Do NOT delete. This should the first slide in your presentation after your session opening. This slide should be part of your session opening and introduce what the learner will be better able to do as a result of attending this session. You should have no more than 3 learning objectives for your session. Best practice: 1 Learning objective for 60 minute session. Aim for depth vs. breadth. Good learning objectives should be learner centric, start with a verb (e.g., articulate, demonstrate, deliver, architect, troubleshoot, design) and be S.M.A.R.T (specific, measurable, achievable, realistic, time-bound). The learning objective should define a desired learner behavior (NOT what you are going to present). The learning objectives on this slide should match those defined and published for your session on If you have questions, please contact your Track Content Lead.

3 What is SQL Server Polybase?
4/12/2019 4:42 AM What is SQL Server Polybase? “It’s all about Data Virtualization” Distributed compute engine integrated with SQL Server Query data where it lives using T-SQL Distributed, scalable query performance Manual/deploy with SQL Server Auto deploy/optimize with Big Data Clusters Analytics T-SQL Apps SQL Server PolyBase external tables Stop and do a poll of which data sources are most important to their customers ODBC NoSQL Relational databases Big data Excel Cosmos DB HDFS Intelligence over all data © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

4 The Journey to Data Virtualization in SQL Server 2019
4/12/2019 4:42 AM The Journey to Data Virtualization in SQL Server 2019 MSFT Jim Gray Labs Project to query “big data” with SQL in 2011 Polybase comes to PDW 2012 Polybase ships with SQL Server Polybase “classic” Microsoft acquires Metanautix in 2015 bringing new connectors Project Aris commences in 2017 to take Polybase to the next level SQL Server 2019 includes Polybase classic, new ODBC data sources, and Big Data Clusters (BDC) Linux support is coming David Dewitt Rimma Nehme SQL Server 2019 is in Preview Some details subject to change © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

5 Using Polybase in SQL Server: T-SQL EXTERNAL TABLE
4/12/2019 4:42 AM Using Polybase in SQL Server: T-SQL EXTERNAL TABLE Setup and configure Polybase Setup authentication Create EXTERNAL DATA SOURCE Create EXTERNAL FILE FORMAT Create EXTERNAL TABLE Create statistics on key columns Query like any other table Login and password Only for HDFS Not simple without BDC And join to any other table or external table metadata Results streamed WWI SQL HDFS INSERT only for HDFS Data lives here Cosmos DB © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

6 The SQL Server Polybase Architecture
4/12/2019 4:42 AM control and execution The SQL Server Polybase Architecture Data flow EE only All editions All editions “Head” node “Compute” node “Compute” node DW dbs SQL Engine DW dbs SQL Engine DW dbs SQL Engine tempdb Need more scale? Add compute nodes tempdb tempdb Polybase Engine Polybase Engine Polybase Engine ……... Polybase Data Movement Service Polybase Data Movement Service Polybase Data Movement Service Scan or pushdown Scale out with partitions shuffle mpdwsvc.exe Your data sources Cosmos DB HDFS © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

7 Demo Dive into Polybase
4/12/2019 4:42 AM Demo Dive into Polybase Follow the steps from © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

8 SQL Server Polybase and Hadoop
4/12/2019 4:42 AM SQL Server Polybase and Hadoop Polybase “Classic” Connect to Hadoop: Cloudera or HortonWorks on Windows and Linux Azure Blob Storage Direct Steaming or Java MapReduce SQL Server or Azure SQL Data Warehouse Polybase and HDFS in Big Data Clusters Direct Access to HDFS via SQL Server Engine Hadoop cluster pre-installed with HDFS and Spark HDFS metadata handled within the cluster © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

9 Polybase and Other Connectors
4/12/2019 4:42 AM Polybase and Other Connectors LOCATION “string” in EXTERNAL DATA SOURCE \binn\Polybase\ODBC Drivers Built-in Connectors (use ODBC) sqlserver oracle teradata mongodb SQL Server, Azure SQL Database, Azure SQL Data Warehouse No client software install required Scale out with partitions MongoDB or CosmosDB (using MongoDB API) ODBC Connector odbc You install the driver 64bit ODBC 3.0+ compliant Ex. SAP HANA (HDBCODBC Driver) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

10 Polybase as a Semantic Layer
4/12/2019 4:42 AM Polybase as a Semantic Layer Use EXTERNAL table to map common names familiar to your SQL Server Exclude fields or columns from data source that are not needed (Planned) Use SQL Server views to abstract joins and data source access Use UNION to join similar data from data sources and local SQL Server © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

11 External Tables vs Linked Servers
4/12/2019 4:42 AM External Tables vs Linked Servers External Tables Database Scoped ODBC Drivers Read-only* Scale out queries with push-down Failover with AG Basic authentication Distributed Transactions not supported Linked Servers Instance Scoped OLEDB Providers Read/write Single threaded queries with push-down Requires separate config from AG Basic and integrated authentication Distributed Transactions supported * Insert into HDFS allowed © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

12 SQL Server 2019: Data Virtualization
4/12/2019 4:42 AM SQL Server 2019: Data Virtualization Modern StockItems Legacy Suppliers Mobile App Orders WWI SQL SQL Server 2019 WideWorldimporters Accounts Receivable Customers from Acquisition Order Reviews © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

13 Demo Data Virtualization around the WideWorld

14 4/12/2019 4:42 AM Lessons Learned COLLATE required for character columns. Compat may be required ORACLE case sensitive for LOCATION = <instance>.<schema>.<table> LOCATION for SQL Server is <db>.<schema>.<table> EXTERNAL tables don’t support these types (they may be more) VARCHAR(MAX) GEOGRAPHY Computed Columns JSON MongoDB (CosmosDB) observations Be careful of types in your document LOCATION = <database>.<collection> Need to dive in EXTERNAL TABLE compatibility (Ex. Row Level Security) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

15 SQL Server 2019 Big Data Clusters and Polybase
4/12/2019 4:42 AM SQL Server 2019 Big Data Clusters and Polybase The Problem Customers want to integrate with Big Data and other data sources easily Polybase is not simple to setup, configure, maintain, and scale elastically Customers may not have a Hadoop cluster or want to build one Polybase “classic” (MapReduce) could be better The Solution Kubernetes and containers to deploy and scale elastically Everything pre-installed including HDFS cluster Build a control plane to help manage and monitor Enhance SQL Server to read from HDFS natively Provide a data mart for cached results Introduce Spark and Notebooks for Data scientists © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

16 Polybase in SQL Server 2019 Big Data Clusters
4/12/2019 4:42 AM Polybase in SQL Server 2019 Big Data Clusters mpdwsvc.exe uses SQLPAL Control Plane Controller Svc Azure FSM Engine Kibana Grafana Directly read from HDFS Persistent storage Storage pool SQL Server Spark HDFS Data Node Kubernetes pod Analytics Custom apps BI SQL Server master instance Node SQL Configuration Store (SQL Server) Elastic Search InfluxDB Cluster Polybase head node In Linux container Polybase compute nodes In Linux containers Compute pool SQL Compute Node External data sources Compute pool SQL Compute Node Data mart SQL Data Node Compute pool SQL Compute Node Storage IoT data “Built-in” Data Sources MapReduce Not used The controller svc is important in this architecture. It is used by the cluster to direct and control access to the Data Mart and Storage Pool for the built-in data sources. © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

17 Storage and Data Pools Data Sources
4/12/2019 4:42 AM Storage and Data Pools Data Sources CREATE EXTERNAL DATA SOURCE SqlStoragePool WITH (LOCATION = 'sqlhdfs://service-mssql-controller:8080’); CREATE EXTERNAL TABLE WITH ( DATA_SOURCE = SqlStoragePool, LOCATION = '/clickstream_data', FILE_FORMAT = csv_file ); CREATE EXTERNAL DATA SOURCE SqlDataPool WITH (LOCATION = 'sqldatapool://service-mssql-controller:8080/datapools/default’); CREATE EXTERNAL TABLE… WITH ( DATA_SOURCE = SqlDataPool, DISTRIBUTION = ROUND_ROBIN ); Preinstalled in model of Master Instance mssql-controller REST endpoint © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

18 Azure Data Studio and Data Virtualization
Using PROSE for intelligent import and schema detection Code generation in Notebooks Import wizards External Table detection for HDFS

19 Tech Ready 15 4/12/2019 Session takeaways Polybase = Data Virtualization = Reduced Need for ETL Polybase provides distributed read scale performance Big Data Clusters automate the deployment of Polybase Download and try it yourself Sign up for EAP for SQL Server 2019 Big Data Clusters © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

20 Questions? aka.ms/SQLBits19 4/12/2019 4:42 AM
© Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

21 Session resources SQL Server 2019 Polybase documentation
Tech Ready 15 4/12/2019 Session resources SQL Server 2019 Polybase documentation SQL Server Big Data Clusters documentation Polybase demos on GitHub Azure SQL Database Elastic Query documentation Loading data into Azure SQL Data Warehouse with Polybase © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

22 4/12/2019 4:42 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "Inside SQL Server Polybase"

Similar presentations


Ads by Google