Download presentation
Presentation is loading. Please wait.
1
PolyBase overview Speaker Name
12/3/ :03 PM PolyBase overview Mission-critical performance with Microsoft SQL Server 2016 Speaker Name © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
2
12/3/ :03 PM Learning objectives Overview of PolyBase What is PolyBase? Why PolyBase? PolyBase prerequisites PolyBase hands-on lab PolyBase use cases © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
3
Overview of PolyBase 12/3/2017 10:03 PM
© 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
4
PolyBase unites …for a better-together world of analytics
Structured data Unstructured data Objective: This slide introduces the concept of PolyBase. PolyBase unites the two powerful worlds of data— structured data and unstructured data. PolyBase makes this data available for businesses and thus provides a better world of analytics together. Talking points: PolyBase addresses one of the main customer pain points in data warehousing: accessing distributed data sets. With increasing volumes of unstructured or semi-structured data sets, users are storing data sets in more cost- effective distributed and scalable systems, such as Apache Hadoop and cloud environments (for example, Azure storage). Data is distributed and located in heterogeneous systems, making it both difficult and time- consuming to access. New skill sets are required to perform simple operations and to combine data sets or applications to get new insights. To overcome these challenges, SQL Server 2016 introduced a feature—PolyBase. The goal of PolyBase is to make interacting with unstructured, semi-structured, and non-relational data stored in Hadoop as easy as writing Transact-SQL statements. PolyBase allows users to query non-relational data in Hadoop, blobs, files, and combine it anytime, anywhere with their existing relational data in SQL Server. Business data …for a better-together world of analytics
5
12/3/ :03 PM Interest in big data Increased number and variety of data sources that generate large quantities of data Realization that data is “too valuable” to delete Dramatic decline in cost of hardware, especially storage Objective: This slide discusses the background of why PolyBase was introduced in SQL Server 2016 to support the increasing volume and variety of relational data that land in cost-effective storage systems and high-scale data processing systems such as Hadoop. The goal with PolyBase in SQL Server 2016 is to expand the reach of SQL Server 2016 to these external data stores using known technologies and languages. Talking points: Data volume is exploding—YouTube videos, Facebook posts, credit-card transactions, store inventory, and your last grocery purchase. Trillions of pieces of information are being collected, stored, and analyzed almost daily with greater speed. This increasingly large and complex data is now challenging traditional database systems and convincing businesses to adopt big-data technologies like Hadoop. In the last few decades, computing and storage capacity have grown exponentially and driven down cost to near zero. The rise of new technologies like Hadoop is significantly changing the economics of large-scale data processing by enabling customers to analyze petabytes of data with industry-standard hardware. With the advent of big-data technologies, database administrators (DBAs) or system administrators are not required to delete data anymore and do not need to spend significant time in capacity planning. Adoption of big-data technologies like Apache Hadoop © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
6
12/3/ :03 PM What is PolyBase? Objective: Provide a brief overview of PolyBase. Talking points: PolyBase, available only within the Microsoft Analytics Platform System, is a no-compromise, all-in-one modern data warehouse. It features Microsoft SQL Server Parallel Data Warehouse alongside HDInsight, with 100 percent Hadoop distribution from Microsoft based on Hortonworks Data Platform (HDP) for Windows, for seamless operation. PolyBase can query Hadoop clusters in-place, eliminating the need to extract, transform, and load (ETL) data in the relational data warehouse. PolyBase works with HDInsight in the appliance, along with Microsoft Azure HDInsight for hybrid cloud solutions. PolyBase also supports Hadoop distributions from Hortonworks and Cloudera within your organization. © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
7
What is PolyBase? PolyBase RDBMS Hadoop
12/3/ :03 PM What is PolyBase? RDBMS Hadoop PolyBase Provides a scalable, T-SQL-compatible query-processing framework for combining data from both universes Objective: This slide defines PolyBase, and explains how it bridges the gap for the SQL Server professional. Talking points: Hadoop was originally designed with the Linux environment in mind. That means working with Hadoop data required knowing how to work with Linux and writing MapReduce jobs using the Java programming language. While SQL-like languages for Hadoop like Hive HiveQL, Pivotal HAWQ, and Teradata SQL-H were developed for this purpose, the fact still remains that you need to learn a bit of Linux to interact with Hadoop data through HiveQL. PolyBase enables a SQL Server professional to work with Hadoop data using familiar tools like SQL Server Management Studio, SQL Server Data Tools, Microsoft Office, and Power BI with the Transact-SQL language and included extensions. So what is PolyBase? PolyBase is a transparent access bridge from SQL to Hadoop. As a channel between Hadoop and relational databases, it allows users to query files (all structured, semi-structured, and unstructured) in Hadoop using the familiar language of SQL. A channel between Hadoop and relational databases that allows users to query files (all structured, semi-structured, and unstructured) in Hadoop using familiar SQL language © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
8
12/3/ :03 PM Why PolyBase? © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
9
Unique, innovative technology
12/3/ :03 PM Why PolyBase? Component of PDW region in APS Transparent Database Encryption (TDE) and Always Encrypted Highly parallelized, distributed query engine accessing heterogeneous data via SQL Objective: This slide shows the competitive advantages of PolyBase. Talking points: The main competitive advantages of PolyBase include the following: Animation <<On Click>> PolyBase, the component of the PDW region in the Analytic Platform System, is now built into SQL Server, expanding the power to extract value from unstructured and structured data using your existing T-SQL skills. PolyBase also supports Transparent Database Encryption (TDE) and Always Encrypted. PolyBase addresses one of the main customer pain points in data warehousing: accessing distributed data sets. It provides a highly parallelized, distributed query engine accessing heterogeneous data via SQL. PolyBase integrates seamlessly with all BI tools. Use it with the Microsoft BI products, such as SQL Server Reporting Services, SQL Server Analysis Services, PowerPivot, PowerQuery, and PowerView. Or use third- party BI tools, such as Tableau, Microstrategy, or Cognos. PolyBase is a unique, innovative technology that does not require additional software installed on the user’s Hadoop or Azure environment. It is fully transparent for the end user—no knowledge about Hadoop or Azure is required to use PolyBase successfully. Unique, innovative technology Seamless integration © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
10
How it works External table User perspective Systems perspective
External data source External file format Systems perspective Objective: This slide describes how PolyBase works. Talking points: Setting up and configuring PolyBase, as well as connecting to your Hadoop environment, is a multi-step process. <<Animation>> First Click From a user perspective: First, establish the SQL Server-to-Hadoop connection. Next, configure your external data source, external file formats, and finally, the external table. Working with external data is generally a three-step process: Defining the external data sources, such as those on Windows Azure Blob Storage, or in Hadoop (either in Microsoft HDInsight or external Hadoop); passing in security credentials. Defining the external file/data format (delimited text files, Hive RC, Hive ORC, or Parquet). Defining an external table. Supported data definition language (DDL) statements include CREATE/DROP, ALTER, and CREATE STATISTICS. <<Animation>> Second Click From a systems perspective: PolyBase acts as a PDW engine that bridges the traditional relational database structures held in SQL Server with the massive amounts of data held in Hadoop. It is real time in that once you set up your Hadoop data set within SQL Server, you can query it using T-SQL (even to the point where you are joining relational tables with Hadoop tables). PDW engine service Bridge Microsoft Azure
11
Server & Tools Business
12/3/2017 PolyBase benefits Query relational and non-relational data with T-SQL Capability T-SQL for querying relational and non-relational data across Microsoft SQL Server and Hadoop Benefits New business insights across your Azure Data Lake Leveraging of existing skill sets and BI tools Faster time to insights and simplified ETL process Query relational and non-relational data, on-premises and in Azure T-SQL query SQL Server Hadoop Objective: This slide provides a summary of the PolyBase capability and benefits. Talking points: When it comes to key business intelligence (BI) investments, PolyBase makes it much easier to manage relational and non-relational data, which allows you to query Hadoop data and SQL Server relational data through a single T-SQL query. Benefits: PolyBase does allow you to move data in a hybrid scenario. This ties into the concept of a data lake. A data lake can be thought of as providing full access to raw big data without moving it. This may be viewed as an alternate approach to processing big data to make its analysis easier, and then moving/synchronizing it in a data warehouse. SQL Server 2016 and PolyBase can be important components in setting up a data lake, combining it with your relational data, and doing analysis and BI on it. One of the challenges with Hadoop is that few people have Hadoop and MapReduce skill sets. This technology simplifies the skill set needed to manage Hadoop data. This can also work across your on- premises environment or SQL Server running in Azure. PolyBase also provides the ability to access and query data that is either on-premises or in the cloud, and run analytics and BI on that data. PolyBase therefore can help you build out a solution that delivers deeper insights into your data, wherever it may be located. There is no need for a separate ETL or import tool. Apps © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
12
PolyBase view in SQL Server 2016
12/3/ :03 PM PolyBase view in SQL Server 2016 Allow SQL Server 2016 customers to execute T-SQL queries against relational data in SQL Server and “semi-structured” data in HDFS and Azure Leverage existing T-SQL skills and BI tools to gain insights from different data stores Expand reach of SQL Server to Hadoop (HDFS) Query Results SQL Server Objective: This slide shows how PolyBase can be viewed in SQL Server Talking points: Customers will continue to invest in RDBMS systems. Customers would like to leverage existing T-SQL skills and BI tools to gain insights from different data stores. Expand the reach of SQL Server to Hadoop Distributed File System (HDFS). PolyBase allows users to query non-relational data in Hadoop, blobs, files, and combine it anytime, anywhere with their existing relational data in SQL Server. You can leverage existing T-SQL skills and BI tools to gain insights from different data stores. Hadoop Azure Blob Storage © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
13
Capabilities and functions
12/3/ :03 PM Capabilities and functions Users Queries Results PolyBase Hadoop Azure Blob Storage Use T-SQL statements to access and query data in Hadoop or Azure Blob Storage Use T-SQL to store data from Hadoop or Azure Blob Storage Export data to Hadoop or Azure Blob Storage Objective: Discuss high-level capabilities and functions of PolyBase. Talking points: PolyBase allows you to use T-SQL statements to access data stored in Hadoop or Azure Blob Storage and query it in an ad hoc fashion. For this purpose, it introduces three main T-SQL constructs fully integrated in SQL Server as first-class SQL objects: external file formats, external data sources, and external tables. PolyBase allows you to use T-SQL to store data originating in Hadoop or Azure Blob Storage as regular tables. Users and applications can leverage SQL Server’s mature features, such as columnstore technology, for example, for frequent BI reports. There is no need for a separate ETL or import tool. PolyBase allows you to export data to Hadoop or Azure Blob Storage. You can leverage Hadoop or Azure Blob Storage for cold but queryable data stores. Other important highlighting capabilities of PolyBase are: PolyBase integrates seamlessly with all BI tools such as SQL Server Reporting Services, SQL Server Analysis Services, PowerPivot, PowerQuery, and PowerView. Or use third-party BI tools, such as Tableau, Microstrategy, or Cognos. PolyBase does not require any additional software installed on the user’s Hadoop or Azure environment. It is fully transparent to the end user—no knowledge about Hadoop or Azure is required to use PolyBase successfully. © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
14
Agnostic architecture
12/3/ :03 PM Agnostic architecture PolyBase is vendor independent = no vendor lock-in PolyBase supports Hadoop on Linux and Windows HDInsight in APS and external Hadoop clusters PolyBase integrates with the cloud Objective: This slide shows that PolyBase is agnostic, not proprietary, in its approach and in its architecture. Taking points: PolyBase is vendor independent. It integrates with Hadoop clusters that reside outside or inside. PolyBase does not force users to decide on a single solution, but gives them the freedom to use an HDInsight region to query an external Hadoop cluster connection, or to leverage Azure services (such as HDInsight on Azure). This agnostic approach is also evident in its Hadoop distribution support that covers both Windows and Linux. PolyBase is agnostic on both types of the Hadoop cluster (Linux or Windows), whether it is a separate cluster or the Hadoop nodes are co-located with the nodes. PolyBase supports two Hadoop providers: Hortonwork Data Platform (HDP) and Cloudera Data Platform (CDH). You can run Hortonworks on either a Windows or Linux machine, and that is also part of the configuration. PolyBase builds the bridges to where the data is. Once the bridge has been defined (a simple case of a few Data Definition Language (DDL) commands), PolyBase enables users to simply write queries using T- SQL. These queries can be against data in APS, Hadoop, and/or Azure all at the same time. PolyBase can integrate with the cloud data sets, or you can store it in Azure Blob Storage or Hadoop, and query as if they were database tables. © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
15
PolyBase builds a bridge
12/3/ :03 PM PolyBase builds a bridge PolyBase = runtime integration Includes Power BI Just-in-time data integration Best of both worlds Existing analytical skills Familiar query tools Objective: This slide shows how PolyBase interacts with data. Talking points: Just-in-time data integration PolyBase enables users to simply write queries using T-SQL across relational and non-relational data. These queries can be against data in APS, Hadoop, and/or Azure all at the same time. For example, you can read data from Hadoop, transform and enrich it in APS, and persist the data back in Hadoop or Azure. This is called round-tripping the data and is just a taste of what is possible with hybrid query support. It offers fast, run-time integration across relational data stored in APS and non-relational data stored in both Hadoop and Microsoft Azure Storage Blobs. Best of both worlds PolyBase can also leverage computational resources available at the data source. In other words, it can selectively issue MapReduce jobs against a Hadoop cluster. This is called split query execution. Like a true data surgeon, PolyBase is able to dissect a query into pushable and non-pushable expressions. The pushable ones are considered for submission as MapReduce jobs, and the non-pushable parts are processed by APS. Existing analytical skills This is possible with just T-SQL, without the need to learn new skills. Literally there is nothing to really “learn” in order to be able to write PolyBase queries. If you can write T-SQL, then you can query any PolyBase-enabled data source. You can leverage your familiar SQL semantics and behavior. Familiar query tools With PolyBase, you can use SQL Server Data Tools (SSDT) to carry out your database design work for SQL Server and SQL Azure within a more user-friendly atmosphere, Visual Studio. © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
16
PolyBase prerequisites
12/3/ :03 PM PolyBase prerequisites © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
17
12/3/ :03 PM Prerequisites 64-bit SQL Server Evaluation Edition External data source, either Windows Azure Blob Storage or supported Hadoop version Microsoft .NET Framework 4.0 Oracle Java SE RunTime Environment (JRE) version 7.51 or higher (64-bit) 4 GB minimum memory 2 GB minimum hard-disk space Objective: This slide describes the prerequisites for PolyBase. Talking points: In order to run PolyBase, the following is necessary: 64-bit SQL Server Evaluation Edition. An external data source, either Windows Azure Blob Storage, or a supported Hadoop version (see Choose a Hadoop version or Azure Blob Storage using sp_configure). Microsoft .NET Framework 4.0. (go to the download center). Oracle Java SE RunTime Environment (JRE) version 7.51 or higher (64-bit). Go to downloads. (The installer will fail if JRE is not present.) Minimum memory: 4 GB. Minimum hard-disk space: 2 GB. PolyBase can be installed on only one SQL Server instance per machine. Once installed, ensure connectivity to the external data source from SQL Server. The type of connectivity strongly influences the expected query performance. For example, a 10 GB Ethernet link will result in a faster response time for PolyBase queries than a 1 GB Ethernet link. © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
18
PolyBase scale-out groups
Head node Compute node Compute node PolyBase queries SQL Server 2016 SQL Server 2016 SQL Server 2016 PolyBase Engine PolyBase DMS PolyBase DMS PolyBase DMS PolyBase scale- out group Objective: This slide describes how to create PolyBase scale-out groups. Talking points: A standalone SQL Server instance with PolyBase can become a performance bottleneck when dealing with massive data sets in Hadoop or Azure Blob Storage. The PolyBase Group feature allows you to create a cluster of SQL Server instances to process large data sets from external data sources, such as Hadoop or Azure Blob Storage, in a scale-out fashion for better query performance. You can install PolyBase with a head-only configuration, but this could create a significant bottleneck. The head node contains the SQL Server instance to which PolyBase queries are submitted. Each PolyBase group can have only one head node. A head node is a logical group of SQL Database Engine, PolyBase Engine, and PolyBase Data Movement Service on the SQL Server instance. A compute node contains the SQL Server instance that assists with scale-out query processing on external data. A compute node is a logical group of SQL Server and the PolyBase data movement service on the SQL Server instance. A PolyBase group can have multiple compute nodes. Name node (HDFS Data node Data node Data node Data node File system File system File system File system Hadoop cluster Microsoft Developer Network, PolyBase groups for scale-out computation,
19
PolyBase in CTP3 SQL Server Hadoop Name DOB State T-SQL query Jim Gray
Objective: This slide shows the new capabilities of PolyBase in CTP3. Talking points: PolyBase in CTP3 includes the following new capabilities: Improved PolyBase query performance with scale-out computation on external data (PolyBase scale-out groups). Improved PolyBase query performance with faster data movement from HDFS to SQL Server and between PolyBase Engine and SQL Server. Support for exporting data to external data source via INSERT INTO EXTERNAL TABLE SELECT FROM TABLE. Support for push-down computation to Hadoop for string operations (compare, LIKE). Support for ALTER EXTERNAL DATA SOURCE statement. Name DOB State Jim Gray 11/13/58 WA Ann Smith 04/29/76 ME
20
Hands-on lab (HOL) 12/3/2017 10:03 PM
© 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
21
12/3/ :03 PM Use cases © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
22
PolyBase use cases Load data Interactively query Age-out data
12/3/ :03 PM PolyBase use cases Load data Use Hadoop as an ETL tool to cleanse data before loading to data warehouse with PolyBase Interactively query Analyze relational data with semi-structured data using split-based query processing Age-out data Age-out data to HDFS and use it as “cold” but queryable storage Objective: This slide depicts the various use cases for PolyBase. Talking points: It’s already a painful reality that many enterprises store and maintain data in different systems that are optimized for different workloads and applications, respectively. Admins are spending much time moving, organizing, and keeping data in sync. This reality imposes another key challenge that PolyBase addresses—in addition to querying data in external data sources, a user can achieve a simpler and more performant ETL (extraction, transformation, loading). Different than existing connector technologies, such as SQOOP, a PolyBase user can use T-SQL statements to either import data from external data sources (CTAS) or export data to external data sources (CETAS). Interactively query With PolyBase, the user can import data in a very simple fashion. The PolyBase query optimizer decides which parts of the query get executed in Hadoop and which parts in PDW. This optimized querying, called split- based query processing, allows parts of the query to be executed as Hadoop MR jobs that are generated anytime, anywhere and completely transparent for the end user. The PolyBase query optimizer takes into account parameters such as the spin-up time for MR jobs and generated statistics to determine the optimal query plan. In general, if it comes to performance, the answer usually is “it depends on the actual use case/query.” With PolyBase, the user has total freedom and can leverage capabilities of PDW and/or Hadoop based on their actual needs and application requirements. © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
23
What’s the sweet spot for PolyBase?
Consumer Analyst Scientist Data volume Medium to low Reasonable High -> huge Degree of structure Very high Some Low -> none Number of users Medium Low Transformation complexity Medium to high High Analytics complexity
24
PolyBase applications and scenarios
12/3/ :03 PM PolyBase applications and scenarios Application Scenario Benefits Ad-hoc query scenario (combining relational and semi-structured data) Semi-structured data sets stored in Hadoop or Azure Blob Storage Relational data set stored in SQL Server Use of mature T-SQL to combine data sets via various JOIN expressions Improved query performance through pushdown computation for Hadoop data sources No sophisticated knowledge about Hadoop internals needed Frequent BI reports Focus on subset of data stored in Hadoop or Azure Blob Storage No frequent changes on this subset of data Series of reports needed on this subset of data Parallelized import from Hadoop or Azure Blob Storage into SQL Server Ability to create column-store table anytime via T-SQL to leverage SQL Server’s world-class column-store technology Minimize time-to-insights No maintenance of separate ETL tool and code necessary Ad-hoc query scenario (Hadoop data only) Ad-hoc query against data stored in Hadoop or Azure Blob Storage Use of mature T-SQL to run queries and combine data stored in different Hadoop systems or Azure storage containers © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
25
Microsoft Corporation
© 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, Microsoft Azure and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION Microsoft Corporation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.