Big Data Clusters SQL Server 2019 Meets Big Data 11/28/2019 Big Data Clusters SQL Server 2019 Meets Big Data Sorin Pește Cloud Solutions Architect, Data & AI Microsoft © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
A Modern Data Warehouse
Traditional business analytics process Start with end-user requirements to identify desired reports and analysis Define corresponding database schema and queries Identify the required data sources Create a Extract-Transform-Load (ETL) pipeline to extract required data (curation) and transform it to target schema (‘schema-on-write’) Create reports. Analyze data New requirements Create ETL pipeline Create reports Do analytics Identify data schema and queries Identify data sources ETL pipeline Dedicated ETL tools (e.g. SSIS) Defined schema Queries Results Relational LOB Applications All data not immediately required is discarded or archived
New big data thinking: All data has value All data has potential value Data hoarding No defined schema—stored in native format Schema is imposed and transformations are done at query time (schema-on-read). Apps and users interpret the data as they see fit Iterate Gather data from all sources Store indefinitely Analyze See results
Data Lake + Data Warehouse Better Together Data sources OLTP ERP CRM LOB ETL BI and analytic Dashboards Reporting Data warehouse What happened? What will happen? Descriptive Analytics Predictive Analytics LOB applications Devices Social Video Relational Why did it happen? Web Sensors Clickstream How can we make it happen? Diagnostic Analytics Prescriptive Analytics
Data Lake and Data Warehouse
Data Lake and Data Warehouse Complementary to DW Can be sourced from Data Lake Schema-on-read Schema-on-write Detailed Data Refined Data Optimized for Cost Optimized for Latency Data Discovery Data Reusability Low User Concurrency High User Concurrency Varying Query Perf Predictable Query Perf
Big Data Clusters in SQL Server 2019
Scenarios Data Virtualization Data Lake Scale-out Data Marts
Scenarios Integrated Machine Learning
Use Cases Every Industry benefits from Big Data 11/28/2019 12:05 PM Industry Sector Primary Use-Cases Retail Demand prediction In-store analytics Supply chain optimization Customer retention Cost/Revenue analytics HR analytics Inventory control Finance Cyberattack Prevention Fraud detection Customer segmentation Market analysis Risk analysis Blockchain Healthcare Fiscal control analytics Disease Prevention prediction and classification Clinical Trials optimization Patient load analysis Episode analytics Public Sector Revenue prediction Education effectiveness analysis Transportation analysis and prediction Energy demand and supply prediction and control Defense readiness predictions and threat analysis Manufacturing Predictive Maintenance (PdM) Anomaly Detection Pattern analysis Agriculture Food Safety analysis Crop forecasting Market forecasting Pipeline Optimization Use Cases Every Industry benefits from Big Data © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
SQL Server, Spark and HDFS
Spark Structured Streaming Apache Spark A unified, distributed, open source engine for large-scale data processing Spark Unifies: Batch Processing Interactive SQL Real-time processing Machine Learning Deep Learning Graph Processing Spark Core Engine Spark SQL Interactive Queries Yarn Mesos Standalone Scheduler Spark MLlib Machine Learning Spark Streaming Stream processing GraphX Graph Computation Spark MLlib Machine Learning Spark Structured Streaming Stream processing
https://bigdata.ro/2019/09/26/ml-on-spark-workshop/
HDFS A scalable, reliable and highly distributed file system to store structured and unstructured data Name Node Name Space State Block Map Data Node Replicate Reading a Block Client create addBlock Adding a Block Write Read getLocations getFileinfo
Scenarios source: dilbert.com
VMs vs Containers 11/28/2019 12:05 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Container Orchestration Containers at Scale 11/28/2019 12:05 PM Container Orchestration Containers at Scale Kubernetes Master Horizontal scaling Load balancing Self-healing Storage orchestration Service discovery Automated rollouts and rollbacks Secret and configuration management Batch execution Web Tier Data Tier Data Tier Data Tier Business Logic Business Logic Web Tier Web Tier Data Tier Data Tier © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Azure Kubernetes Service (AKS) Ship faster, operate easily, and scale confidently with managed Kubernetes on Azure Manage Kubernetes with ease Accelerate containerized development Build on an enterprise-grade, secure foundation Run anything, anywhere © Microsoft Corporation
Container Orchestration Containers at Scale 11/28/2019 12:05 PM Container Orchestration Containers at Scale Node Container(s) live in Pods Pod(s) are abstractions within Nodes Node(s) are PC’s or VM’s Cluster(s) are groups of Nodes Storage is by means of Volume(s) mounted through a Claim Kubernetes Master Node Node kubelet kube-proxy Node Pod Pod Pod Node Node Node © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
SQL Server Platform Evolution Hybrid On Premises Public/Private cloud Windows Linux Containers SQL Server SQL Server SQL Server
Deployment https://landscape.cncf.io/ 11/28/2019 12:05 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Deployment azdata bdc create --accept-eula=yes 11/28/2019 12:05 PM Deployment azdata bdc create --accept-eula=yes azdata bdc config init --source aks-dev-test --target custom azdata bdc create –c custom --accept-eula=yes © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Big Data Clusters for SQL Server 2019 – Architecture 11/28/2019 12:05 PM Big Data Clusters for SQL Server 2019 – Architecture © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
SQL Server 2019 and Big Data Control BDC OLTP, Data Virtualization, Data Mart and Big Data Control BDC SQL Data Pool Kubernetes Master SQL Server SQL Server App Pool SQL Server Master ML Server Compute Pool SQL Server Job (SSIS) Controller Storage Pool (Web Apps) SQL Server Spark (Shared Services) HDFS HDFS
Resources Official documentation – aka.ms/bdc Tech Ready 15 11/28/2019 Resources Official documentation – aka.ms/bdc In-depth training - aka.ms/sqlworkshops © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Demo SQL Server 2019 Big Data Clusters 11/28/2019 12:05 PM Demo SQL Server 2019 Big Data Clusters © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Big Data Clusters for SQL Server 2019 – Data Virtualization 11/28/2019 12:05 PM Big Data Clusters for SQL Server 2019 – Data Virtualization © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
SQL Server 2019 and Big Data Data Virtualization PolyBase Connector DMS (Executor) Scale-Out PolyBase PDW (Orchestrator) DMS (Executor – Performs Operations) PolyBase Connector NoSQL DMS (Executor) Data Source (Format) External Table PolyBase Connector RDBMS
Big Data Clusters for SQL Server 2019 – Data Mart 11/28/2019 12:05 PM Big Data Clusters for SQL Server 2019 – Data Mart © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
SQL Server 2019 and Big Data Data Mart PolyBase Connector HDFS Compute Pool Cosmos DB RDBMS SQL Server Data Pool
11/28/2019 12:05 PM Q&A © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
11/28/2019 12:05 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.