Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski.

Similar presentations


Presentation on theme: "Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski."— Presentation transcript:

1 Sponsorzy strategiczni Sponsorzy srebrni

2 PolyBase – data beyond tables Hubert Kobierzewski

3 Hubert K. Kobierzewski BI Consultant in Codec-dss (aka Codec Systems) – over 8 years Specialized in: Data Warehousing, ETL processes and Business Intelligence Ex-Developer MS SQL Server certified (MCDBA, MCTS, MCITP, MCSE – BI, ex-MCT) Member of Data Platform Advisors (internal MS group) co-leader of Warsaw PLSSUG Chapter

4 What is Big Data and why is it valuable to the business? Evolution in the nature and use of data in the enterprise Data complexity: variety and velocity Petabytes Historical analysis Insight analysis Predictive analytics Predictive forecasting Value to the business

5 5 Core Services OPERATIONAL SERVICES DATA SERVICES HDFS SQOOP FLUME NFS LOAD & EXTRACT WebHDFS OOZIE AMBARI YARN MAP REDUCE HIVE & HCATALOG PIG HBASE FALCON Hadoop Cluster compute & storage.......... Hadoop clusters provide scale-out storage and distributed data processing on commodity hardware

6 HDFS  Distributed, scalable fault tolerant file system MapReduce  A framework for writing fault tolerant, scalable distributed applications Hive  A relational DBMS that stores its tables in HDFS and uses MapReduce as its target execution language Sqoop  A library and framework for moving data between HDFS and a relational DBMS HDFS MapReduce HiveSqoop

7 Move HDFS into the warehouse before analysis ETL Learn new skills T-SQL Build Integrate Manage Maintain Support Hadoop alone is not the answer to all Big Data challenges Steep learning curve, slow and inefficient Hadoop ecosystem New data sources “New” data sources New data sources

8 Background  Research done by Gray System Lab lead by Technical Fellow David DeWitt High-level goals for PolyBase  Seamless Integration with Hadoop via regular T-SQL  Enhancing the MPP query engine to process data coming from the Hadoop Distributed File System (HDFS)  Fully parallelized query processing for highly performing data import and export from HDFS  Integration with various Hadoop implementations  Hadoop on Windows Server, Hortonworks, and Cloudera

9 Prerequisites for installing PolyBase 64-bit SQL Server Evaluation edition Microsoft.NET Framework 4.0. Oracle Java SE RunTime Environment (JRE) version 7.51 or higher NOTE: Java JRE version 8 does not work. Minimum memory: 4GB Minimum hard disk space: 2GB

10 Using the installation wizard for PolyBase Run SQL Server Installation Center. (Insert SQL Server installation media and double-click Setup.exe) Click Installation, then click New Standalone SQL Server installation or add features On the feature selection page, select PolyBase Query Service for External Data. On the Server Configuration Page, configure the PolyBase Engine Service and PolyBase Data Movement Service to run under the same account.

11 -- Run sp_configure ‘hadoop connectivity’ -- and set an appropriate value sp_configure @configname = 'hadoop connectivity', @configvalue = 7; GO RECONFIGURE GO -- List the configuration settings for -- one configuration name sp_configure @configname='hadoop connectivity'; GO Option values 0: Disable Hadoop connectivity 1: Hortonworks HDP 1.3 on Windows Server Azure blob storage (WASB[S]) 2: Hortonworks HDP 1.3 on Linux 3: Cloudera CDH 4.3 on Linux 4: Hortonworks HDP 2.0 on Windows Server Azure blob storage (WASB[S]) 5: Hortonworks HDP 2.0 on Linux 6: Cloudera 5.1 on Linux 7: Hortonworks 2.1 and 2.2 on Linux Hortonworks 2.2 on Windows Server Azure blob storage (WASB[S]) Choose Hadoop data source with sp_configure

12 Start the PolyBase services After running for sp_configure, you must stop and restart the SQL Server engine service Run services.msc Find the services shown below and stop each one Restart the services

13 External tables CREATE EXTERNAL TABLE table_name ({ }[,..n ]) {WITH (DATA_SOURCE =, FILE_FORMAT =, LOCATION =‘ ’, [REJECT_VALUE = ], …} [;] Referencing external file format Referencing external data source Path of the Hadoop file/folder (Optional) Reject parameters Internal representation of data residing outside of appliance Supports wide array of data types o Excluding text, ntext and similar but including binary and varbinary SQL permissions o CREATE TABLE, and ALTER ANY SCHEMA o ALTER ANY DATA SOURCE

14 External data sources CREATE EXTERNAL DATA SOURCE datasource_name {WITH (TYPE =, LOCATION =‘ ’, [JOB_TRACKER_LOCATION = ‘ ’] } [;] Location of external data source Type of external data source Enabling or disabling of MapReduce job generation Internal representation of an external data source o Support of Hadoop as a data source and Windows Azure Blob Storage (WASB, formerly known as ASV) Enabling and disabling of split-based query processing o Generation of MapReduce jobs on-the- fly [fully transparent for end user] ALTER ANY EXTERNAL DATA SOURCE permission required

15 External file format CREATE EXTERNAL FILE FORMAT fileformat_name {WITH ( FORMAT_TYPE =, [SERDE_METHOD = ‘ ’] [DATA_COMPRESSION = ‘ ’] [FORMAT_OPTIONS ( )] } [;] (De)Serialization method [Hive RCFile] Type of external data source Compression method (Optional) Format Options [Text Files] Internal representation of an external file format o Support of delimited text files, Hive RCFiles and Hive ORC Enabling and disabling of split-based query processing o Generation of MapReduce jobs on-the- fly ALTER ANY EXTERNAL FILE FORMAT permission required

16 Format options for delimited text files :: = [,FIELD_TERMINATOR= ‘Value’], [,STRING_DELIMITER = ‘Value’], [,DATE_FORMAT = ‘Value’], [USE_TYPE_DEFAULT = ‘Value’] FIELD_TERMINATO R STRING_DELIMITE R USE_TYPE_DEFAU LT DATE_FORMAT To indicate a column delimiter To specify the delimiter for string data type fields To specify a particular date format To specify how missing entries in text files are treated

17 HDFS File / Directory //hdfs/social_media/twitter //hdfs/social_media/twitter/Daily.log Hadoop Column filtering Dynamic binding Row filtering UserLocationProductSentimentRtwtHour Date Sean Suz Audie Tom Sanjay Roger Steve CA WA CO IL MN TX AL xbox excel sqls wp8 ssas ssrs 0 1 1 1 1 1 5 0 0 8 0 0 0 2 2 2 2 1 23 1-8-14 1-7-14 PolyBase – Predicate pushdown SELECTUser,Product,Sentiment FROMTwitter_Table WHEREHour = Current- 1 ANDDate=Today ANDSentiment <= 0

18 SELECT DISTINCT C.FirstName, C.LastName, C.MaritalStatus FROM Insurance_Customer_SQL -- table in SQL Server … OPTION (FORCE EXTERNALPUSHDOWN) – push-down computation CREATE EXTERNAL DATA SOURCES ds-hdp WITH.( TYPE = Hadoop, LOCATION = “hdfs://10.193.27.52:8020”, Resources_Manager_Location = ‘10.193.27.52:8032’); Query Capabilities Push-Down Computation

19

20

21 PolyBase Demo

22 22 Use cases where PolyBase simplifies using Hadoop data Bringing islands of Hadoop data together Running queries against Hadoop data Archiving data warehouse data to Hadoop (move) Exporting relational data to Hadoop (copy) Importing Hadoop data into a data warehouse (copy)

23 HDInsight Hadoop cluster The MPP Engine’s Integration Method – without PolyBase Compute Node MPP DWH Engine Compute Node Data Node Hadoop Cluster Data Node

24 HDInsight Hadoop cluster The MPP Engine’s Integration Method – with PolyBase Data Node Hadoop Cluster Compute Node MPP DWH Engine Compute Node

25 Major Competitors Oracle since version 9i (ca. 2003) IBM PureData System Pivotal Greenplum Oracle BDA (Big Data Appliance)

26 Read and watch more… MSDN Documentation https://msdn.microsoft.com/en-ie/library/mt163689.aspx https://msdn.microsoft.com/en-ie/library/mt163689.aspx Brief introduction on Channel 9 https://channel9.msdn.com/Shows/Data-Exposed/PolyBase-in-SQL- Server-2016 https://channel9.msdn.com/Shows/Data-Exposed/PolyBase-in-SQL- Server-2016 SQL Server blog on Polybase in APS http://blogs.technet.com/b/dataplatforminsider/archive/tags/polyba se/ http://blogs.technet.com/b/dataplatforminsider/archive/tags/polyba se/

27 Questions

28 Sponsorzy strategiczni Sponsorzy srebrni


Download ppt "Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski."

Similar presentations


Ads by Google