Inside SQL Server Polybase

Slides:



Advertisements
Similar presentations
Introduction to Big Data and Hadoop Name Title Microsoft Corporation.
Advertisements

Session 1.
Built by Developers for Developers…. © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
demo Instance AInstance B Read “7” Write “8”
customer.
demo © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
demo Demo.
Breaking points of traditional approach What if you could handle big data?
demo QueryForeign KeyInstance /sm:body()/x:Order/x:Delivery/y:TrackingId1Z
IT Operations Management
Building ARM IaaS Application Environment
Data Platform and Analytics Foundational Training
Data Platform and Analytics Foundational Training
System Center Marketing
5/22/2018 1:39 AM BRK2156 Power BI Report Server: Self-service BI and enterprise reporting on-premises Christopher Finlan Senior Program Manager © Microsoft.
Creating Enterprise Grade BI Models with Azure Analysis Services
System Center Marketing
Microsoft Machine Learning & Data Science Summit
Microsoft /2/2018 3:42 PM BRK3129 Query Big Data using the Expanded T-SQL footprint with PolyBase in SQL Server 2016 Casey Karst Program Manager.
Use any Amazon S3 application with Azure Blob Storage
Melbourne Azure Meetup
Developing Hybrid Apps on Microsoft Azure Stack
AI development using Data Science Virtual Machines (DSVM) in Azure
6/19/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Business Connectivity Services in SharePoint 2010 and Office 2010
7/3/2018 9:37 AM Develop and deploy Web apps using Azure Database for MySQL and PostgreSQL Sean Li Program Manager, Azure Database for MySQL Sunil Kamath.
Build /4/2018 © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
Microsoft Build /22/ :52 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
7/22/2018 9:21 PM BRK3270 Building a Better Data Solution: Microsoft SQL Server and Azure Data Services Joey D’Antoni Principal Consultant Denny Cherry.
SQL Server on Linux on All-Flash Arrays
Microsoft Ignite /31/ :08 AM
Data Platform and Analytics Foundational Training
SQL Server for Java developers
IT Operations Management
Excel Services Deployment and Administration
Office Power Hour New developer APIs and features for Apps for Office
9/21/2018 3:41 AM BRK3180 Architect your big data solutions with SQL Data Warehouse & Azure Analysis Services Josh Caplan & Matt Usher Program Managers.
Power Apps & Flow for Microsoft Dynamics SL
Overview of Azure Data Lake Store
Azure PowerShell Aaron Roney Senior Program Manager Cormac McCarthy
11/19/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
TechEd /19/ :10 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.
Server & Tools Business
TechEd /23/ :44 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.
TechEd /24/2018 6:19 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.
SPC2012 – IT-Pro 11/30/2018 © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
TechEd /4/2018 3:19 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
Power-up NoSQL with Azure Cosmos DB
Microsoft Virtual Academy
Introduction to Building Applications with Windows Azure
TechEd /11/ :54 PM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.
TechEd /15/2019 8:08 PM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
Tech·Ed North America /22/2019 3:15 AM
Microsoft Virtual Academy
Virtual Reality with Azure and Unity
TechEd /28/2019 7:27 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
Tech Ed North America /12/2019 6:45 AM Required Slide
Windows 8 Security Internals
4/18/2019 9:46 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Виктор Хаджийски Катедра “Металургия на желязото и металолеене”
5/8/2019 3:20 AM bQuery-Tool 3.0 A new and elegant way to create queries and ad-hoc reports on your Baan/Infor ERP LN data. This Baan session is a query.
Шитманов Дархан Қаражанұлы Тарих пәнінің
The complete developer's guide to the SkyDrive API
Server & Tools Business
Modernizing on SQL Server 2019
Build /27/2019 © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION.
SQL Server 2019 Bringing Apache Spark to SQL Server
Big Data Clusters SQL Server 2019 Meets Big Data
Presentation transcript:

Inside SQL Server Polybase 4/12/2019 4:42 AM Inside SQL Server Polybase Bob Ward, Principal Architect, Microsoft https://aka.ms/bobsql https://aka.ms/bobwardms https://aka.ms/bobsqldemos © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Session learning objectives 4/12/2019 Session learning objectives At the end of this session, you should be better able to… Understand what is Polybase and how to use it. Understand how Polybase works to know its capabilities for data virtualization Learn how to use Polybase to build a data hub with SQL Server This slide is required. Do NOT delete. This should the first slide in your presentation after your session opening. This slide should be part of your session opening and introduce what the learner will be better able to do as a result of attending this session. You should have no more than 3 learning objectives for your session. Best practice: 1 Learning objective for 60 minute session. Aim for depth vs. breadth. Good learning objectives should be learner centric, start with a verb (e.g., articulate, demonstrate, deliver, architect, troubleshoot, design) and be S.M.A.R.T (specific, measurable, achievable, realistic, time-bound). The learning objective should define a desired learner behavior (NOT what you are going to present). The learning objectives on this slide should match those defined and published for your session on www.microsoftready.com. If you have questions, please contact your Track Content Lead.

What is SQL Server Polybase? 4/12/2019 4:42 AM What is SQL Server Polybase? “It’s all about Data Virtualization” Distributed compute engine integrated with SQL Server Query data where it lives using T-SQL Distributed, scalable query performance Manual/deploy with SQL Server Auto deploy/optimize with Big Data Clusters Analytics T-SQL Apps SQL Server PolyBase external tables Stop and do a poll of which data sources are most important to their customers ODBC NoSQL Relational databases Big data Excel Cosmos DB HDFS Intelligence over all data © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

The Journey to Data Virtualization in SQL Server 2019 4/12/2019 4:42 AM The Journey to Data Virtualization in SQL Server 2019 MSFT Jim Gray Labs Project to query “big data” with SQL in 2011 Polybase comes to PDW 2012 Polybase ships with SQL Server 2016 - Polybase “classic” Microsoft acquires Metanautix in 2015 bringing new connectors Project Aris commences in 2017 to take Polybase to the next level SQL Server 2019 includes Polybase classic, new ODBC data sources, and Big Data Clusters (BDC) Linux support is coming David Dewitt Rimma Nehme SQL Server 2019 is in Preview Some details subject to change © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Using Polybase in SQL Server: T-SQL EXTERNAL TABLE 4/12/2019 4:42 AM Using Polybase in SQL Server: T-SQL EXTERNAL TABLE Setup and configure Polybase Setup authentication Create EXTERNAL DATA SOURCE Create EXTERNAL FILE FORMAT Create EXTERNAL TABLE Create statistics on key columns Query like any other table Login and password Only for HDFS Not simple without BDC And join to any other table or external table metadata Results streamed WWI SQL HDFS INSERT only for HDFS Data lives here Cosmos DB © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

The SQL Server Polybase Architecture 4/12/2019 4:42 AM control and execution The SQL Server Polybase Architecture Data flow EE only All editions All editions “Head” node “Compute” node “Compute” node DW dbs SQL Engine DW dbs SQL Engine DW dbs SQL Engine tempdb Need more scale? Add compute nodes tempdb tempdb Polybase Engine Polybase Engine Polybase Engine ……... Polybase Data Movement Service Polybase Data Movement Service Polybase Data Movement Service Scan or pushdown Scale out with partitions shuffle mpdwsvc.exe Your data sources Cosmos DB HDFS © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Demo Dive into Polybase 4/12/2019 4:42 AM Demo Dive into Polybase Follow the steps from https://github.com/Microsoft/bobsql/tree/master/demos/sqlserver/polybase/fundamentals © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

SQL Server Polybase and Hadoop 4/12/2019 4:42 AM SQL Server Polybase and Hadoop Polybase “Classic” Connect to Hadoop: Cloudera or HortonWorks on Windows and Linux Azure Blob Storage Direct Steaming or Java MapReduce SQL Server or Azure SQL Data Warehouse Polybase and HDFS in Big Data Clusters Direct Access to HDFS via SQL Server Engine Hadoop cluster pre-installed with HDFS and Spark HDFS metadata handled within the cluster © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Polybase and Other Connectors 4/12/2019 4:42 AM Polybase and Other Connectors LOCATION “string” in EXTERNAL DATA SOURCE \binn\Polybase\ODBC Drivers Built-in Connectors (use ODBC) sqlserver oracle teradata mongodb SQL Server, Azure SQL Database, Azure SQL Data Warehouse No client software install required Scale out with partitions MongoDB or CosmosDB (using MongoDB API) ODBC Connector odbc You install the driver 64bit ODBC 3.0+ compliant Ex. SAP HANA (HDBCODBC Driver) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Polybase as a Semantic Layer 4/12/2019 4:42 AM Polybase as a Semantic Layer Use EXTERNAL table to map common names familiar to your SQL Server Exclude fields or columns from data source that are not needed (Planned) Use SQL Server views to abstract joins and data source access Use UNION to join similar data from data sources and local SQL Server © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

External Tables vs Linked Servers 4/12/2019 4:42 AM External Tables vs Linked Servers External Tables Database Scoped ODBC Drivers Read-only* Scale out queries with push-down Failover with AG Basic authentication Distributed Transactions not supported Linked Servers Instance Scoped OLEDB Providers Read/write Single threaded queries with push-down Requires separate config from AG Basic and integrated authentication Distributed Transactions supported * Insert into HDFS allowed © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

SQL Server 2019: Data Virtualization 4/12/2019 4:42 AM SQL Server 2019: Data Virtualization Modern StockItems Legacy Suppliers Mobile App Orders WWI SQL SQL Server 2019 WideWorldimporters Accounts Receivable Customers from Acquisition Order Reviews © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Demo Data Virtualization around the WideWorld

4/12/2019 4:42 AM Lessons Learned COLLATE required for character columns. Compat may be required ORACLE case sensitive for LOCATION = <instance>.<schema>.<table> LOCATION for SQL Server is <db>.<schema>.<table> EXTERNAL tables don’t support these types (they may be more) VARCHAR(MAX) GEOGRAPHY Computed Columns JSON MongoDB (CosmosDB) observations Be careful of types in your document LOCATION = <database>.<collection> Need to dive in EXTERNAL TABLE compatibility (Ex. Row Level Security) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

SQL Server 2019 Big Data Clusters and Polybase 4/12/2019 4:42 AM SQL Server 2019 Big Data Clusters and Polybase The Problem Customers want to integrate with Big Data and other data sources easily Polybase is not simple to setup, configure, maintain, and scale elastically Customers may not have a Hadoop cluster or want to build one Polybase “classic” (MapReduce) could be better The Solution Kubernetes and containers to deploy and scale elastically Everything pre-installed including HDFS cluster Build a control plane to help manage and monitor Enhance SQL Server to read from HDFS natively Provide a data mart for cached results Introduce Spark and Notebooks for Data scientists © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Polybase in SQL Server 2019 Big Data Clusters 4/12/2019 4:42 AM Polybase in SQL Server 2019 Big Data Clusters mpdwsvc.exe uses SQLPAL Control Plane Controller Svc Azure FSM Engine Kibana Grafana Directly read from HDFS Persistent storage … Storage pool SQL Server Spark HDFS Data Node Kubernetes pod Analytics Custom apps BI SQL Server master instance Node SQL Configuration Store (SQL Server) Elastic Search InfluxDB Cluster Polybase head node In Linux container Polybase compute nodes In Linux containers Compute pool SQL Compute Node External data sources Compute pool SQL Compute Node … Data mart SQL Data Node Compute pool SQL Compute Node Storage IoT data “Built-in” Data Sources MapReduce Not used The controller svc is important in this architecture. It is used by the cluster to direct and control access to the Data Mart and Storage Pool for the built-in data sources. © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Storage and Data Pools Data Sources 4/12/2019 4:42 AM Storage and Data Pools Data Sources CREATE EXTERNAL DATA SOURCE SqlStoragePool WITH (LOCATION = 'sqlhdfs://service-mssql-controller:8080’); CREATE EXTERNAL TABLE … WITH ( DATA_SOURCE = SqlStoragePool, LOCATION = '/clickstream_data', FILE_FORMAT = csv_file ); CREATE EXTERNAL DATA SOURCE SqlDataPool WITH (LOCATION = 'sqldatapool://service-mssql-controller:8080/datapools/default’); CREATE EXTERNAL TABLE… WITH ( DATA_SOURCE = SqlDataPool, DISTRIBUTION = ROUND_ROBIN ); Preinstalled in model of Master Instance mssql-controller REST endpoint © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Azure Data Studio and Data Virtualization Using PROSE for intelligent import and schema detection Code generation in Notebooks Import wizards External Table detection for HDFS

Tech Ready 15 4/12/2019 Session takeaways Polybase = Data Virtualization = Reduced Need for ETL Polybase provides distributed read scale performance Big Data Clusters automate the deployment of Polybase Download and try it yourself Sign up for EAP for SQL Server 2019 Big Data Clusters © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Questions? aka.ms/SQLBits19 4/12/2019 4:42 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Session resources SQL Server 2019 Polybase documentation Tech Ready 15 4/12/2019 https://aka.ms/bobsql https://aka.ms/bobwardms https://aka.ms/bobsqldemos Session resources SQL Server 2019 Polybase documentation SQL Server Big Data Clusters documentation Polybase demos on GitHub Azure SQL Database Elastic Query documentation Loading data into Azure SQL Data Warehouse with Polybase © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

4/12/2019 4:42 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.