Download presentation
Presentation is loading. Please wait.
1
Henk van der Valk Oct.15, 2016 Level: Beginner
SQL Server 2016 PolyBase Henk van der Valk Oct.15, 2016 Level: Beginner SQL PolyBase has been an high-end feature for SQL APS and now also introduced in SQL2016, SQL DB and SQLDW! It allows you to use regular T-SQL statements to ad-hoc access data stored in Hadoop and/or Azure Blob Storage from within SQL Server. This session will show you how it works & how to get started!
2
Starting SQL2016 on a server with 24 TB RAM
Microsoft Worldwide Partner Conference 2016 Starting SQL2016 on a server with 24 TB RAM 11/27/2018 8:16 AM Just 4 fun! © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
3
Thanks to our platinum sponsors :
Please add this slide add the start of your presentation after the first welcome slide PASS SQL Saturday Holland
4
Thanks to our gold and silver sponsors :
APS Onsite! Please add this slide add the start of your presentation after the first welcome slide PASS SQL Saturday Holland
5
Speaker Introduction 2002- Largest SQL DWH in the world (SQL2000)
@HenkvanderValk Speaker Introduction 10+ years active in SQLPass community! 10 years of Unisys-EMEA Performance Center 2002- Largest SQL DWH in the world (SQL2000) Project Real – (SQL 2005) ETL WR - loading 1TB within 30 mins (SQL 2008) Contributor to SQL performance whitepapers Perf Tips & tricks: Schuberg Philis- 100% uptime for mission critical apps Since april 1st, 2011 – Microsoft Data Platform ! All info represents my own personal opinion (based upon my own experience) and not that of Microsoft
6
Agenda Intro - What is PolyBase & Why? Getting started
- SQL Server product versions supported - Installation & Setup Creating External Tables, Running hybrid queries Monitoring - Tips to improve Hadoop performance Scale out Groups
7
SQL Server 2016 as fraud detection scoring engine
HTAP (Hybrid Transactional Analytical Processing) 8 socket, 192 cores 16 TB RAM
8
The Big Data lake Challenge
Different types of data Webpages, logs, and clicks Hardware and software sensors Semi-structured/unstructured data Large scale Hundreds of servers Advanced data analysis Integration between structured and unstructured data Power of both How to orchestrate?
9
PolyBase builds the Bridge
11/27/2018 8:16 AM PolyBase builds the Bridge Azure Blob Storage RDBMS Hadoop PolyBase Just-in-Time data integration Across relational and non-relational data Fast, simple data loading Best of both worlds T-SQL compatible Uses computational power at source Opportunity for new types of analysis Access any data © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
10
PolyBase View in SQL Server 2016
Execute T-SQL queries against relational data in SQL Server and ‘semi-structured’ data in HDFS and/or Azure Leverage existing T-SQL skills and BI tools to gain insights from different data stores Expand the reach of SQL Server to Hadoop(HDFS & WASB) Query Results SQL Server Hadoop Azure Blob Storage Access any data
11
Remove the complexity of big data T-SQL over Hadoop
Server & Tools Business 11/27/2018 Remove the complexity of big data T-SQL over Hadoop PolyBase NEW SQL Server Hadoop Manage structured & unstructured data Quote: ************************ ********************** ********************* *********************** $658.39 Simple T-SQL to query Hadoop data (HDFS) NEW T-SQL query JSON support NEW Name DOB State Denny Usher 11/13/58 WA Gina Burch 04/29/76 © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
12
PolyBase use cases Load data Interactively query Age-out data
11/27/2018 8:16 AM PolyBase use cases Load data Use Hadoop as an ETL tool to cleanse data before loading to data warehouse with PolyBase Interactively query Analyze relational data with semi-structured data using split-based query processing Age-out data Age-out data to HDFS and use it as ‘cold’ but queryable storage Access any data © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
13
Polybase - Turning raw data tweets into information
Query & Store Hadoop data Bi-directional seamless & fast
14
Azure Blob Storage
16
SQL Server 2016 & SQL DW Polybase!
Setup & Query BCP out vs RTC
17
Prerequisites An instance of SQL Server (64-bit) Ent.Ed. / Developer Ed.. Microsoft .NET Framework 4.5. Oracle Java SE RunTime Environment (JRE) version 7.51 or higher (64-bit). (Either JRE or Server JRE will work). Go to Java SE downloads. Note:The installer will fail if JRE is not present. Minimum memory: 4GB Minimum hard disk space: 2GB TCP/IP connectivity must be enabled.
18
Step 2: Install SQL Server
11/27/2018 8:16 AM Step 2: Install SQL Server SQL16 PolyBase DLLs Install one or more SQL Server instances with PolyBase PolyBase DLLs (Engine and DMS) are installed and registered as Windows Services Prerequisite: User must download and install JRE (Oracle) Access any data © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
19
Components introduced in SQL Server 2016
PolyBase Engine Service PolyBase Data Movement Service (with HDFS Bridge) External table constructs MR pushdown computation support Access any data
20
How to use PolyBase in SQL Server 2016
PolyBase T-SQL queries submitted here Set up a Hadoop Cluster or Azure Storage blob Install SQL Server Configure a PolyBase group - Choose Hadoop flavor - Attach Hadoop Cluster or Azure Storage Head nodes PolyBase queries can only refer to tables here and/or external tables here Compute nodes Hadoop Cluster Access any data
21
Step 1: Set up a Hadoop Cluster…
11/27/2018 8:16 AM Step 1: Set up a Hadoop Cluster… Hadoop Cluster Hortonworks or Cloudera Distributions Hadoop 2.0 or above Linux or Windows On-premises or in Azure Access any data © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
22
Step 1: …Or set up an Azure Storage blob
11/27/2018 8:16 AM Step 1: …Or set up an Azure Storage blob Azure Storage Volume Azure Storage blob (ASB) exposes an HDFS layer PolyBase reads and writes from ASB using Hadoop RecordReader/RecordWrite No compute pushdown support for ASB Access any data © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
23
Step 2: Configure a PolyBase group
11/27/2018 8:16 AM Step 2: Configure a PolyBase group SQL16 PolyBase Engine PolyBaseDMS Head node Compute nodes PolyBase scale-out group Head node is the SQL Server instance to which queries are submitted Compute nodes are used for scale-out query processing for data in HDFS or Azure © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
24
Step 3: Choose /Select Hadoop flavor
Supported Hadoop distributions Cloudera CHD 5.x on Linux Hortonworks 2.x on Linux and Windows Server What happens under the covers? Loading the right client jars to connect to Hadoop distribution Access any data
25
Step 4: Attach Hadoop Cluster or Azure Storage
SQL16 PolyBase Engine PolyBaseDMS Head node Azure Storage Volume Hadoop Cluster Access any data
26
PolyBase T-SQL queries submitted here
After Setup Head nodes Compute nodes are used for scale-out query processing on external tables in HDFS Tables on compute nodes cannot be referenced by queries submitted to head node Number of compute nodes can be dynamically adjusted by DBA Hadoop clusters can be shared between multiple SQL16 PolyBase groups PolyBase T-SQL queries submitted here PolyBase queries can only refer to tables here and/or external tables here Compute nodes - Improved PolyBase query performance with scale-out computation on external data (PolyBase scale-out groups) - Improved PolyBase query performance with faster data movement from HDFS to SQL Server and between PolyBase Engine and SQL Server Hadoop Cluster Access any data
27
Polybase configuration
--1: Create a master key on the database. -- Required to encrypt the credential secret. CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'SQLSat#551'; -- select * from sys.symmetric_keys -- Create a database scoped credential for Azure blob storage. -- IDENTITY: any string (this is not used for authentication to Azure storage). -- SECRET: your Azure storage account key. CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential WITH IDENTITY = 'wasbuser', Secret = '1abcdEFGb3Mcn0F9UdJS/10taXmr5L17xrEO17rlMRL8SNYg==';
28
Create external Data Source
--2: Create an external data source. -- LOCATION: Azure account storage account name and blob container name. -- CREDENTIAL: The database scoped credential created above. CREATE EXTERNAL DATA SOURCE AzureStorage with ( TYPE = HADOOP, LOCATION CREDENTIAL = AzureStorageCredential ); -- view list of external data sources; select * from sys.external_data_sources
29
Create External file format
--select * from sys.external_file_formats --3: Create an external file format. -- FORMAT TYPE: Type of format in Hadoop -- (DELIMITEDTEXT, RCFILE, ORC, PARQUET). -- With GZIP: CREATE EXTERNAL FILE FORMAT TextDelimited_GZIP WITH ( FORMAT_TYPE = DELIMITEDTEXT , FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE) , DATA_COMPRESSION = 'org.apache.hadoop.io.compress.GzipCodec' );
30
Create External Table --4: Create an external table.
-- The external table points to data stored in Azure storage. -- LOCATION: path to a file or directory that contains the data (relative to the blob container). -- To point to all files under the blob container, use LOCATION='/' CREATE EXTERNAL TABLE [dbo].[lineitem4] ( [ROWID1] [bigint] NULL, [L_SHIPDATE] [smalldatetime] NOT NULL, [L_ORDERKEY] [bigint] NOT NULL, [L_DISCOUNT] [smallmoney] NOT NULL, [.. [L_COMMENT] [varchar](44) NOT NULL ) WITH (LOCATION='/', DATA_SOURCE = AzureStorage, FILE_FORMAT = TextFileFormat, REJECT_TYPE = VALUE, REJECT_VALUE = 0 ));
31
Import ------------------------------------
-- IMPORT Data from WASB into NEW table: SELECT * INTO [dbo].[LINEITEM_MO_final_temp] from ( SELECT * FROM [dbo].[lineitem1] ) AS Import
32
Export data (Gzipped) -- Enable Export/ INSERT into external table
sp_configure 'allow polybase export', 1; Reconfigure CREATE EXTERNAL TABLE [dbo].[lineitem_export] ( [ROWID1] [bigint] NULL, .. [L_SHIPINSTRUCT] [varchar](25) NOT NULL, [L_COMMENT] [varchar](44) NOT NULL ) WITH (LOCATION='/gzipped', DATA_SOURCE = AzureStorage, FILE_FORMAT = TextDelimited_GZIP, REJECT_TYPE = VALUE, REJECT_VALUE = 0
33
Manage External resources
SSMS / VSTS New: - External Tables - External Resources Ext. Data Sources Ext. File formats
34
PolyBase query example #1
-- select on external table (data in HDFS) SELECT * FROM Customer WHERE c_nationkey = 3 and c_acctbal < 0; A possible execution plan: EXECUTE QUERY Select * from T where T.c_nationkey =3 and T.c_acctbal < 0 3 IMPORT FROM HDFS HDFS Customer file read into T 2 Additionally - there is… - Support for exporting data to external data source via INSERT INTO EXTERNAL TABLE SELECT FROM TABLE - Support for push-down computation to Hadoop for string operations (compare, LIKE) - Support for ALTER EXTERNAL DATA SOURCE statement CREATE temp table T Execute on compute nodes 1 Access any data
35
PolyBase query example #2
-- select and aggregate on external table (data in HDFS) SELECT AVG(c_acctbal) FROM Customer WHERE c_acctbal < 0 GROUP BY c_nationkey; What happens here? Step 1: QO compiles predicate into Java and generates a MapReduce (MR) job Step 2: Engine submits MR job to Hadoop cluster. Output left in hdfsTemp. Execution plan: hdfsTemp <US, $ > <FRA, $ > <UK, $-63.52> Run MR Job on Hadoop Apply filter and compute aggregate on Customer. 1 Access any data
36
PolyBase query example #2
-- select and aggregate on external table (data in HDFS) SELECT AVG(c_acctbal) FROM Customer WHERE c_acctbal < 0 GROUP BY c_nationkey; Execution plan: Predicate and aggregate pushed into Hadoop cluster as a MapReduce job Query optimizer makes a cost-based decision on what operators to push RETURN OPERATION Select * from T 4 IMPORT hdfsTEMP Read hdfsTemp into T 3 CREATE temp table T On DW compute nodes 2 hdfsTemp <US, $ > <FRA, $ > <UK, $-63.52> Run MR Job on Hadoop Apply filter and compute aggregate on Customer. Output left in hdfsTemp 1 Access any data
37
Server & Tools Business
11/27/2018 Summary: PolyBase Query relational and non-relational data with T-SQL Capability T-SQL for querying relational and non-relational data across SQL Server and Hadoop Benefits New business insights across your data lake Leverage existing skill sets and BI tools Faster time to insights and simplified ETL process Query relational and non-relational data, on-premises and in Azure T-SQL query When it comes to key BI investments, we are making it much easier to manage relational and non-relational data. PolyBase technology allows you to query Hadoop data and SQL Server relational data through a single T-SQL query. One of the challenges we see with Hadoop is there are not enough people knowledgeable in Hadoop and MapReduce, and this technology simplifies the skill set needed to manage Hadoop data. This can also work across your on-premises environment or SQL Server running in Azure. SQL Server Hadoop Apps Access any data © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
38
Monitoring Polybase Queries
39
Lots of new DMV’s ----------------------------------------
-- Monitoring Polybase / All DMV's : SELECT * FROM sys.external_tables SELECT * FROM sys.external_data_sources SELECT * FROM sys.external_file_formats SELECT * FROM sys.dm_exec_compute_node_errors SELECT * FROM sys.dm_exec_compute_node_status SELECT * FROM sys.dm_exec_compute_nodes SELECT * FROM sys.dm_exec_distributed_request_steps SELECT * FROM sys.dm_exec_dms_services SELECT * FROM sys.dm_exec_distributed_requests SELECT * FROM sys.dm_exec_distributed_sql_requests SELECT * FROM sys.dm_exec_dms_workers SELECT * FROM sys.dm_exec_external_operations SELECT * FROM sys.dm_exec_external_work
40
Find the longest running query
SELECT execution_id, st.text, dr.total_elapsed_time FROM sys.dm_exec_distributed_requests dr cross apply sys.dm_exec_sql_text(sql_handle) st ORDER BY total_elapsed_time DESC;
41
Find the longest running step of the distributed query plan
SELECT execution_id, step_index, operation_type, distribution_type, location_type, status, total_elapsed_time, command FROM sys.dm_exec_distributed_request_steps WHERE execution_id = 'QID1120' ORDER BY total_elapsed_time DESC;
42
Details on a Step_index
SELECT execution_id, step_index, dms_step_index, compute_node_id, type, input_name, length, total_elapsed_time, status FROM sys.dm_exec_external_work WHERE execution_id = 'QID1120' and step_index = 7 ORDER BY total_elapsed_time DESC;
43
Optimizations
44
Polybase - data compression to minimize data movement
45
Enable Pushdown configuration (Hadoop)
Improves query performance Find the file yarn-site.xml in the installation path of SQL Server. C:\Program Files\Microsoft SQL Server\MSSQL13.SQL2016RTM\MSSQL\ Binn\Polybase\Hadoop\conf \ yarn-site.xml On the Hadoop machine: in the Hadoop configuration directory. Copy the value of the configuration key yarn.application.classpath. On the SQL Server machine, in the yarn.site.xml file, find the yarn.application.classpath property. Paste the value from the Hadoop machine into the value element.
46
APS Cybercrime Filmpje & Demo!
Time to Insights APS Cybercrime Filmpje & Demo! Various sources Single query
47
Further Reading https://msdn.microsoft.com/en-us/library/mt163689.aspx
Get started with Polybase: Data compression tests:
48
Henk.vanderValk@microsoft.com www.henkvandervalk.com
Q&A
49
Please fill in the evaluation forms
Please add this slide add the end of your presentation to get feedback from the audience
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.