Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data Clusters SQL Server 2019 Meets Big Data

Similar presentations


Presentation on theme: "Big Data Clusters SQL Server 2019 Meets Big Data"— Presentation transcript:

1 Big Data Clusters SQL Server 2019 Meets Big Data
11/28/2019 Big Data Clusters SQL Server 2019 Meets Big Data Sorin Pește Cloud Solutions Architect, Data & AI Microsoft © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

2 A Modern Data Warehouse

3 Traditional business analytics process
Start with end-user requirements to identify desired reports and analysis Define corresponding database schema and queries Identify the required data sources Create a Extract-Transform-Load (ETL) pipeline to extract required data (curation) and transform it to target schema (‘schema-on-write’) Create reports. Analyze data New requirements Create ETL pipeline Create reports Do analytics Identify data schema and queries Identify data sources ETL pipeline Dedicated ETL tools (e.g. SSIS) Defined schema Queries Results Relational LOB Applications All data not immediately required is discarded or archived

4 New big data thinking: All data has value
All data has potential value Data hoarding No defined schema—stored in native format Schema is imposed and transformations are done at query time (schema-on-read). Apps and users interpret the data as they see fit Iterate Gather data from all sources Store indefinitely Analyze See results

5 Data Lake + Data Warehouse Better Together
Data sources OLTP ERP CRM LOB ETL BI and analytic Dashboards Reporting Data warehouse What happened? What will happen? Descriptive Analytics Predictive Analytics LOB applications Devices Social Video Relational Why did it happen? Web Sensors Clickstream How can we make it happen? Diagnostic Analytics Prescriptive Analytics

6 Data Lake and Data Warehouse

7 Data Lake and Data Warehouse
Complementary to DW Can be sourced from Data Lake Schema-on-read Schema-on-write Detailed Data Refined Data Optimized for Cost Optimized for Latency Data Discovery Data Reusability Low User Concurrency High User Concurrency Varying Query Perf Predictable Query Perf

8 Big Data Clusters in SQL Server 2019

9 Scenarios Data Virtualization Data Lake Scale-out Data Marts

10 Scenarios Integrated Machine Learning

11 Use Cases Every Industry benefits from Big Data
11/28/ :05 PM Industry Sector Primary Use-Cases Retail Demand prediction In-store analytics Supply chain optimization Customer retention Cost/Revenue analytics HR analytics Inventory control Finance Cyberattack Prevention Fraud detection Customer segmentation Market analysis Risk analysis Blockchain Healthcare Fiscal control analytics Disease Prevention prediction and classification Clinical Trials optimization Patient load analysis Episode analytics Public Sector Revenue prediction Education effectiveness analysis Transportation analysis and prediction Energy demand and supply prediction and control Defense readiness predictions and threat analysis Manufacturing Predictive Maintenance (PdM) Anomaly Detection Pattern analysis Agriculture Food Safety analysis Crop forecasting Market forecasting Pipeline Optimization Use Cases Every Industry benefits from Big Data © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

12 SQL Server, Spark and HDFS

13 Spark Structured Streaming
Apache Spark A unified, distributed, open source engine for large-scale data processing Spark Unifies: Batch Processing Interactive SQL Real-time processing Machine Learning Deep Learning Graph Processing Spark Core Engine Spark SQL Interactive Queries Yarn Mesos Standalone Scheduler Spark MLlib Machine Learning Spark Streaming Stream processing GraphX Graph Computation Spark MLlib Machine Learning Spark Structured Streaming Stream processing

14 https://bigdata.ro/2019/09/26/ml-on-spark-workshop/

15 HDFS A scalable, reliable and highly distributed file system to store structured and unstructured data Name Node Name Space State Block Map Data Node Replicate Reading a Block Client create addBlock Adding a Block Write Read getLocations getFileinfo

16 Scenarios source: dilbert.com

17 VMs vs Containers 11/28/2019 12:05 PM
© Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

18 Container Orchestration Containers at Scale
11/28/ :05 PM Container Orchestration Containers at Scale Kubernetes Master Horizontal scaling Load balancing Self-healing Storage orchestration Service discovery Automated rollouts and rollbacks Secret and configuration management Batch execution Web Tier Data Tier Data Tier Data Tier Business Logic Business Logic Web Tier Web Tier Data Tier Data Tier © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

19 Azure Kubernetes Service (AKS)
Ship faster, operate easily, and scale confidently with managed Kubernetes on Azure Manage Kubernetes with ease Accelerate containerized development Build on an enterprise-grade, secure foundation Run anything, anywhere © Microsoft Corporation

20 Container Orchestration Containers at Scale
11/28/ :05 PM Container Orchestration Containers at Scale Node Container(s) live in Pods Pod(s) are abstractions within Nodes Node(s) are PC’s or VM’s Cluster(s) are groups of Nodes Storage is by means of Volume(s) mounted through a Claim Kubernetes Master Node Node kubelet kube-proxy Node Pod Pod Pod Node Node Node © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

21 SQL Server Platform Evolution
Hybrid On Premises Public/Private cloud Windows Linux Containers SQL Server SQL Server SQL Server

22 Deployment https://landscape.cncf.io/ 11/28/2019 12:05 PM
© Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

23 Deployment azdata bdc create --accept-eula=yes
11/28/ :05 PM Deployment azdata bdc create --accept-eula=yes azdata bdc config init --source aks-dev-test --target custom azdata bdc create –c custom --accept-eula=yes © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

24 Big Data Clusters for SQL Server 2019 – Architecture
11/28/ :05 PM Big Data Clusters for SQL Server 2019 – Architecture © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

25 SQL Server 2019 and Big Data Control BDC
OLTP, Data Virtualization, Data Mart and Big Data Control BDC SQL Data Pool Kubernetes Master SQL Server SQL Server App Pool SQL Server Master ML Server Compute Pool SQL Server Job (SSIS) Controller Storage Pool (Web Apps) SQL Server Spark (Shared Services) HDFS HDFS

26 Resources Official documentation – aka.ms/bdc
Tech Ready 15 11/28/2019 Resources Official documentation – aka.ms/bdc In-depth training - aka.ms/sqlworkshops © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

27 Demo SQL Server 2019 Big Data Clusters
11/28/ :05 PM Demo SQL Server 2019 Big Data Clusters © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

28 Big Data Clusters for SQL Server 2019 – Data Virtualization
11/28/ :05 PM Big Data Clusters for SQL Server 2019 – Data Virtualization © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

29 SQL Server 2019 and Big Data Data Virtualization
PolyBase Connector DMS (Executor) Scale-Out PolyBase PDW (Orchestrator) DMS (Executor – Performs Operations) PolyBase Connector NoSQL DMS (Executor) Data Source (Format) External Table PolyBase Connector RDBMS

30 Big Data Clusters for SQL Server 2019 – Data Mart
11/28/ :05 PM Big Data Clusters for SQL Server 2019 – Data Mart © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

31 SQL Server 2019 and Big Data Data Mart
PolyBase Connector HDFS Compute Pool Cosmos DB RDBMS SQL Server Data Pool

32 11/28/ :05 PM Q&A © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

33 11/28/ :05 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "Big Data Clusters SQL Server 2019 Meets Big Data"

Similar presentations


Ads by Google