SQL Server PolyBase and Dell EMC Isilon storage

Slides:



Advertisements
Similar presentations
Thanks to Microsoft Azure’s Scalability, BA Minds Delivers a Cost-Effective CRM Solution to Small and Medium-Sized Enterprises in Latin America MICROSOFT.
Advertisements

 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
MyCloudIT Removes the Complexity of Moving Cloud Customers’ Entire IT Infrastructures to Microsoft Azure – Including the Desktop MICROSOFT AZURE ISV: MYCLOUDIT.
1© Copyright 2013 EMC Corporation. All rights reserved. EMC and Microsoft SharePoint Server Performance Name Title Date.
Why consider the cloud? Cloud innovation presents challenges for IT.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
STEALTH Content Store for SharePoint using Caringo CAStor  Boosting your SharePoint to the MAX! "Optimizing your Business behind the scenes"
Introduction to Hadoop and HDFS
Make VMs Resilient to Failures with Availability Sets.
MidVision Enables Clients to Rent IBM WebSphere for Development, Test, and Peak Production Workloads in the Cloud on Microsoft Azure MICROSOFT AZURE ISV.
Microsoft Azure and ServiceNow: Extending IT Best Practices to the Microsoft Cloud to Give Enterprises Total Control of Their Infrastructure MICROSOFT.
PolyBase Query Hadoop with ease Sahaj Saini SQL Server, Microsoft.
© Copyright 2015 EMC Corporation. All rights reserved. EMC Isilon Scale-out NAS For Syncplicity.
 Cloud Computing technology basics Platform Evolution Advantages  Microsoft Windows Azure technology basics Windows Azure – A Lap around the platform.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
An Introduction To Big Data For The SQL Server DBA.
Apache Hadoop on Windows Azure Avkash Chauhan
PolyBase Query Hadoop with ease Sahaj Saini Program Manager, Microsoft.
Redmond Protocols Plugfest 2016 Casey Karst PolyBase in SQL Server 2016.
Microsoft Partner since 2011
DreamFactory for Microsoft Azure Is an Open Source REST API Platform That Enables Mobilization of Data in Minutes across Frameworks and Storage Methods.
BI 202 Data in the Cloud Creating SharePoint 2013 BI Solutions using Azure 6/20/2014 SharePoint Fest NYC.
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
PolyBase overview Speaker Name
Connected Infrastructure
Lenovo – DataCore Deployment Ready Offerings
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
Data Platform and Analytics Foundational Training
SAS users meeting in Halifax
Azure Infrastructure for SAP®
PolyBase: T-SQL Reaching Beyond the Database
Secrets to Fast, Easy High Availability for SQL Server in AWS
Microsoft /2/2018 3:42 PM BRK3129 Query Big Data using the Expanded T-SQL footprint with PolyBase in SQL Server 2016 Casey Karst Program Manager.
Ralleo Enterprise-Grade Solution for Managing Change and Business Transformation Provides Opportunities to Better Analyze Real-Time Data MICROSOFT AZURE.
New Heights by Guiding Them into the Cloud
Free Cloud Management Portal for Microsoft Azure Empowers Enterprise Users to Govern Their Cloud Spending and Optimize Cloud Usage and Planning MICROSOFT.
Trial.iO Makes it Easy to Provision Software Trials, Demos and Training Environments in the Azure Cloud in One Click, Without Any IT Involvement MICROSOFT.
Very Large Databases in your future
SMS+ on Microsoft Azure Provides Enhanced and Secure Text Messaging, with Audit Trail, Scalability, End-to-End Encryption, and Special Certifications MICROSOFT.
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
Connected Infrastructure
Sell Global, Feel Local by Leveraging eShopWorld
Hadoop Clusters Tess Fulkerson.
Veeam Backup Repository
Built on the Powerful Microsoft Azure Platform, Lievestro Delivers Care Information, Capacity Management Solutions to Hospitals, Medical Field MICROSOFT.
Scalable SoftNAS Cloud Protects Customers’ Mission-Critical Data in the Cloud with a Highly Available, Flexible Solution for Microsoft Azure MICROSOFT.
Massively Parallel Processing in Azure Comparing Hadoop and SQL based MPP architectures in the cloud Josh Sivey SQL Saturday #597 | Phoenix.
DeFacto Planning on the Powerful Microsoft Azure Platform Puts the Power of Intelligent and Timely Planning at Any Business Manager’s Fingertips Partner.
Accelerate Your Self-Service Data Analytics
CloneManager® Helps Users Harness the Power of Microsoft Azure to Clone and Migrate Systems into the Cloud Cost-Effectively and Securely MICROSOFT AZURE.
MyCloudIT Enables Partners to Drive Their Cloud Profitability Using CSP-Enabled Desktop Hosting Automation with Microsoft Azure and Office 365 MICROSOFT.
Storage Trends: DoITT Enterprise Storage
Datacastle RED Delivers a Proven, Enterprise-Class Endpoint Data Protection Solution that Is Scalable to Millions of Devices on the Microsoft Azure Platform.
Druva inSync: A 360° Endpoint and Cloud App Data Protection and Information Management Solution Powered by Azure for the Modern Mobile Workforce MICROSOFT.
Dell Data Protection | Rapid Recovery: Simple, Quick, Configurable, and Affordable Cloud-Based Backup, Retention, and Archiving Powered by Microsoft Azure.
Introduction to Apache
Media365 Portal by Ctrl365 is Powered by Azure and Enables Easy and Seamless Dissemination of Video for Enhanced B2C and B2B Communication MICROSOFT AZURE.
XtremeData on the Microsoft Azure Cloud Platform:
Overview of big data tools
TEMPLATE NOTES Our datasheet and mini-case study templates are formatted specifically for consistency of branding at Microsoft. Please do not alter font.
Last.Backend is a Continuous Delivery Platform for Developers and Dev Teams, Allowing Them to Manage and Deploy Applications Easier and Faster MICROSOFT.
Guarantee Hyper-V, System Center Performance and Autoscale to Microsoft Azure with Application Performance Control System from VMTurbo MICROSOFT AZURE.
Zendos Tecnologia Utilizes the Powerful, Scalable
Dell EMC SQL Server Solutions Doug Bernhardt
PerformanceBridge Application Suite and Practice 2.0 IT Specifications
Big-Data Analytics with Azure HDInsight
Moving your on-prem data warehouse to cloud. What are your options?
Oracle 1z0-928 Oracle Cloud Platform Big Data Management 2018 Associate.
Customer 360.
Microsoft Azure Services Platform
Presentation transcript:

SQL Server PolyBase and Dell EMC Isilon storage Smith, Matt F | matthew.smith@dell.com SQL Server PolyBase and Dell EMC Isilon storage

A little bit about me. I work for the Dell EMC Big Data/IoT Solution Engineering Team as a Solutions Architect (consulting & delivery) Started with SQL Server 7 back in the Dot-com days doing application development: ASP 3.0, VB6 and SQL Server. Lots of work with SQL Server over the years: Dev,DBA,BI Moved into the Big Data space a few years ago: Pivotal, Hortonworks, Cloudera.

Why I chose this topic. Personal interest in creating a performant entry-level Big Data solution for small to mid-sized Hadoop File System data sets (1-5TB). Reduce the common barriers of entry to Big Data: Training and experience, financial investment, infrastructure. Introduce Big Data into organizations through known & trusted products, leveraging existing skill sets. Realize value in helping DBA’s, developers and organizations start their big data journeys through low-risk pilot projects.

SQL Server PolyBase – a quick overview Introduced and initially a component of SQL Server 2012 Parallel Data Warehouse (PDW), now known as Analytics Platform System (APS) Added to the SQL Server Product Family in 2016 Allows you to access semi-structured data (Schema on Read) located in an HDFS compliant file-storage through SQL Server Connect to Dell EMC Isilon, Azure blob storage, Hadoop or Cloudera

What is Isilon? Scale-out, Multi-protocol Network Attached Storage Write with NFS, SMB3, FTP, HTTP and read immediately with another protocol 7000+ customers, scales from 15TB to 68PB OneFS file system – Isilon OS Each node includes processors, ram, network and disk. Compute on every node. Network: 10GbE, 40GbE. Wide selection of disk choices for nodes allow for tiering of hot data (Flash), Hybrid and Archive storage. Automate data movement across storage tiers Active Directory & Kerberos integration

Integrated Isilon and PolyBase Web Click data PolyBase Cluster NFS Head Step 2: Jobs are run node info Compute node info Decision Support Databases Compute node info SMB, NFS, HTTP, FTP, HDFS Compute node info name node OLAP name node data node Getting data into Isilon in particular is very easy, with a wide variety of protocol support. So no matter what data source you’re contemplating, you can be assured that populating your Hadoop instance is going to be as simple as it possibly can be. name node name node Step 1: Much or all of the Data lives on the Isilon/Hadoop Cluster Isilon EDW

POC Architecture – MS SQL with Isilon HDFS Isilon services as HDFS and Name-Nodes Clickstream Enables Microsoft T-SQL queries in the HADOOP environment Parallel operation on HADOOP and MS SQL in the same database DSS Very efficient methods (SMB, FTP, NFS, HDFS) for data import by DELL EMC ISILON MS SQL Integration 1. Direct queries from SQL to Isilon HDFS 2. External Pushdown Multiple Hadoop applications can even simultaneously access the same dataset (Isilon) at the same time Sensor Integrating these with SQL PolyBase is very simple. You’ve started with a Hadoop instance, and populate it with data from whatever sources you’d like, whether it’s clickstream data, other OLTP sources, sensor data and so forth. From the perspective of PolyBase, the EMC-enhanced Hadoop instance is simply another Hadoop instance. You just start using it. OLAP EDW MS SQL 2016 Enterprise Edition PolyBase ScaleOut Group

Configuring Isilon for PolyBase POC Steps Create an Access Zone Configure HDFS Service Configure Network Create pdw_user Add Active Directory NS record & test

Configuring Isilon – POC Verify Licenses Create an Access Zone

Configuring Isilon – POC Configure HDFS

Configuring Isilon – POC Configure SmartConnect

Configuring Isilon – POC Create pdw_user

Configuring Isilon – POC Add DNS entry, test.

Configuring Isilon - Resources EMC Isilon Best Practices Guide for Hadoop Data Storage - Dell EMC https://www.emc.com/collateral/.../h12877-wp-emc-isilon-hadoop-best-practices.pdf OneFS with HDFS Reference Guide - Dell EMC https://www.emc.com/collateral/TechnicalDocument/docu84284.pdf

Install & Enable PolyBase on SQL Server Install the Oracle Java SE Runtime Environment (JRE) 7.51 (x64) or 8. Do not install JRE 9!

PolyBase Data Sources Data sources include Isilon, Azure, Hortonworks and Cloudera Add Multiple Data Sources (pending support) Hortonworks HDP 1.3, 2.0, 2.1, 2.2 Cloudera CDH 4.3, 5.1

PolyBase External File Format File formats are required You create and define your own File Formats Supported File Formats include: Text (delimited) Hive ORC – Optimized Row Columnar Hive RCFile - Record Columnar (key-value)

PolyBase External Tables External Tables are schema on read You define external table columns

Table Statistics Add statistics to optimize query performance

Query Testing... Used Adam Machanic’s sp_WhoIsActive so I could get a better idea of what was going on. Observations… Lots of Temp Tables! Some data moving back and forth Things seemed to take a long time Data came back!

Insert into / Select from worked best. Insert into / Select From to Load the data you want to work with locally Create indexes (Columnstore or other) and then join to local tables for best performance Joining from local tables to external tables just didn’t work well Nightly ETL processes to move relevant data from HDFS External Tables into SQL Server makes sense Opportunities for Improvement Add additional Scale-out compute nodes (I plan to test SQL Express!)…also Containers & Powershell provisioning TempDB on flash or SSD

DMV’s for troubleshooting and analysis Many DMV’s exist. The link at the bottom of this slide includes a great process on how to use them. https://msdn.microsoft.com/library/ce9078b7-a750-4f47-b23e-90b83b783d80

Planning for Growth Data lakes tend to fill up.

HDFS: Standard Hadoop Cluster SMB, NFS, HTTP, FTP HDFS: Standard Hadoop Cluster Web Click data Name node Name node Compute Compute Node reply Node reply Node reply Node reply node reply Data Data NFS MAP Reduce MAP Reduce file copy3 file copy2 MAP Reduce MAP Reduce node info Decision Support Databases node info file copy3 file copy2 MAP Reduce MAP Reduce Landing Zone Servers HTTP MAP Reduce MAP Reduce CIFS file file file file FTP node info MAP Reduce MAP Reduce HDFS OLAP NFS MAP Reduce MAP Reduce node info file copy3 file copy2 MAP Reduce MAP Reduce MAP Reduce MAP Reduce Step 2: Data is copied into the Cluster (3 times) 3X node info file copy2 file copy3 MAP Reduce MAP Reduce Step 1: Data is copied into the Landing Zone Step 3: Hadoop Jobs are run EDW

The EMC Isilon Advantage for Analytics 1 Scale-Out Storage Platform Multiple applications & workflows 2 No Single Point of Failure Distributed NameNode 3 End-to-End Data Protection SnapshotIQ, SyncIQ, NDMP Backup 4 Industry-Leading Storage Efficiency >80% Storage Utilization 5 Independent Scalability Add compute & storage separately 6 Multi-Protocol Industry standard protocols NFS, CIFS, FTP, HTTP, HDFS HDFS EMC Isilon has recently introduced a new scale-out NAS solution for Hadoop that is designed to readily support business analytics as well other enterprise applications and workflows. (This eliminates the silo’d infrastructure approach used in many initial Hadoop deployments.) The new EMC solution also eliminates the “single-point-of-failure” issue. We do this by enabling all nodes in an EMC Isilon storage cluster to become, in effect, namenodes. This greatly improves the resiliency of your Hadoop environment. The EMC solution for hadoop also provides reliable, end-to-end data protection for Hadoop data including snapshoting for backup and recovery and data replication (with SyncIQ) for disaster recovery capabilities. Our new Hadoop solution also takes advantage of the outstanding efficiency of EMC Isilon storage systems. With our solutions, customers can achieve up to 80% or more storage utilization. EMC Hadoop solutions can also scale easily and independently. This means if you need to add more storage capacity, you don’t need to add another server (and vice versa). With EMC isilon, you also get the added benefit of linear increases in performance as the scale increases. EMC also recently announced that we are the 1st vendor to integrate the HDFS (Hadoop Distributed File System) into our storage solutions. This means that with EMC Isilon storage, you can readily use your Hadoop data with other enterprise applications and workloads while eliminating the need to manually move data around as you would with direct-attached storage.

Hadoop Architecture with Isilon R (RHIPE) Mahout Hive HBase NameNode PIG Job Tracker Task Tracker DataNode Compute Node Compute Node Compute Node name node By leveraging an EMC platform like ECS, Isilon, or even DSSD, the data node capabilities are separated from the compute nodes. This offers a number of advantages. Ethernet name node data node name node name node Compute Node Compute Node Compute Node

Can you really do big data with SQL Server 2016+ ? Yes you can, to a point. PolyBase and a Hadoop Distributed File System (HDFS) allows you to get started with Big Data Many organizations use Dell EMC Isilon storage, which supports HDFS This is a reasonable, workable solution, tested with 2.5TB of unstructured marketing analytics (clickstream) data

Your Next Step: Big data pilot project Acquire some space on your organization’s Isilon cluster. Alternatively, you could look into Azure Blob Storage and Azure Compute (SQL Server & Windows Server). Load your data sets into your newly acquired storage Install SQL Server 2016+ (Head) and one or two Compute Nodes Create your External Data Source, External File Format, External Tables and start experimenting

https://www.emc.com/products-solutions/trial-software-download/isilon.htm

Special Thanks to: Dell EMC Big Data team - @DellEMCbigdata Christian Scharrer - Dell EMC Senior Systems Engineer Rob Sonders – Dell EMC Microsoft Specialist – SQL Server: @RobertSonders Michael Wells - Dell EMC Senior Systems Engineer: @SqlTechMike Dell EMC - Denver, CO office - Isilon Engineering Team: @DellEMCStorage

Thank You Sponsors Platinum Gold Silver Bronze Swag Venue