Presentation is loading. Please wait.

Presentation is loading. Please wait.

Microsoft Analytics Platform System

Similar presentations


Presentation on theme: "Microsoft Analytics Platform System"— Presentation transcript:

1 Microsoft Analytics Platform System
02 – Hardware/Software Architecture Brian Walker | Microsoft ​Architect – Data Insights COE Jesse Fountain| Microsoft ​WW TSP Lead November 23, 2018

2 Agenda SMP and MPP differences What is APS? APS hardware components
High availability PDW region overview PDW tools of the trade

3 Differences between SMP and MPP

4 SQL upgrade prospect scenario: SQL Server 2014 or APS
SQL Server customers with scale and performance needs have a choice to make Customer Application SQL 2014 Solution? New H/W & back to Step1 New Requirements: More Business, More Growth ! Step 2: Optimize Step 1: Upgrade Customer Application SQL 2008 R2 Effects All Systems. Total Upgrade. Upgrade 2 releases greater Infrastructure Re-Architecture Replace/Upgrade SAN Regression Testing Re-optimization is required for new version Re-Optimization required Unknown maintenance & cost requirements Unknown gain in performance Heavy testing and thought required Start Here Customer Application APS New Requirements? Step 1: Migrate Solution? Add Capacity Push-Button Upgrade Automatic Data Distribution Infrastructure Optimized Data Center Rack Optimized Migrate once. Quickly deliver value Pre-Optimized Infrastructure Best Practice Configuration Built-In Automatic Data Distribution & Placement Migrate once and upgrades are managed

5 Data warehousing comparison: SMP vs. MPP
Data Volume Mixed Workload 50 TB 100 TB 500 TB 10 TB 5 PB Query Concurrency Strategic, Tactical Strategic Loads Loads, SLA APS – Multi-dimensonal Scalability SMP – Tunable in one dimension at cost of other dimensions 1.000 100 10.000 The spiderweb depicts important attributes to consider when evaluating Data Warehousing options. Big Data support is newest dimension. Data Freshness Query complexity Near Real Time Data Feeds Daily Load Weekly 3-5 Way Joins Joins + OLAP operations + Aggregation + Complex “Where” constraints + Views Parallelism 5-10 Way Joins Normalized Multiple, Integrated Stars and Normalized Simple Star Multiple, Integrated Stars Batch Reporting, Repetitive Queries Ad Hoc Queries Data Analysis/Mining TB’s MB’s GB’s Query Freedom Schema Sophistication Query Data Volume

6 Management simplicity and lower operational costs
Database Administration Task APS (MPP SQL Server) SQL Server 2014 (SMP SQL Server) Logical Data Modeling High Physical Data Modeling Low Data Partitioning Definition Data Placement Definition Auto Free Space Management Data Balancing Control Data Reorganization None Moderate Index Reorganization Workspace Management Query Tuning Change Management Rearchitect Environment Never Often Database management in APS Built-in high availability and failover Linear scalability by adding nodes Minimal tuning efforts and troubleshooting Simple database and table definition Unified administration console Minimum ongoing maintenance No need to manage disk or database subsystems No detailed space management needed No memory/cache management needed No optimizer hints needed No need to manage parallelism No need to manage physical computing nodes No index reorgs needed No index rebuilds needed Use DBAs for higher value activity, not low-level system management

7 What is APS?

8 Microsoft Analytics Platform System
Low-Cost, Rapid Value Appliance: System is pre-configured at factory Industry Standard: Co-engineered with HP/Dell/Quanta Automatic Compression: 5X-15X with Columnar Technology Insight on All Data POLYBASE-SQL: easy-access to both relational and Hadoop data Near Real-Time: Access multiple data sources quickly Shared-Nothing: allow linear scalability to store all historical data High-Performance Analytics MPP: Powerful: Massively Parallel Processing (MPP) Engine Mature: Parallel Cost-Based Optimizer from SQL Server Dedicated: Direct-attached high-speed servers and storage

9 The foundation for data warehousing and advanced analytics
Combines Hardware & Software to provide a turn-key, balanced platform specific to data warehouse & analytical workloads Built for easy scale-out as Data Warehouse capacity requirements grow Deep, native integration with Hadoop High Performance & Concurrent Data Warehouse Workloads for simultaneous data Loading & Query Built-in Development Engineering Best Practices Integrated Systems Monitoring & Management

10 New features come first to APS
Updateable Column Store Agnostic Hadoop Integration via Polybase Cardinality Estimation Cost-Based Distributed SQL Query Engine Hub and Spoke Architecture Support Analytical Functions (e.g. Lag and Lead) Incremental functional releases each year

11 APS hardware components

12 Rack and network Contains Also added Rack Ethernet Switches
InfiniBand Switches Also added Power Units (PDU)

13 PDW Base Scale Unit Contains Orchestration Host Passive Host
Optional Passive Host Data Scale Unit

14 Hadoop Base Scale Unit Contains Rack & Network PDW Base Scale Unit
Orchestration Host Passive Host Data Scale Unit

15 Data Scale Unit Servers “active” in WFC Unit of growth
Used by both regions Varies in size By Vendor By Appliance Size Uses Existing Switches

16 Details HP configuration 2 – 56 compute nodes 1 – 7 racks
MGXFY13 11/23/2018 HP configuration Base Unit (6U): Redundant Infiniband Redundant Ethernet Mgmt & Control (Active) Rack Failover Node (Passive) Extension Base Unit (5U): Redundant Infiniband Redundant Ethernet Rack Failover Node (Passive) Extension Base Unit (5U): Redundant Infiniband Redundant Ethernet Rack Failover Node (Passive) Infiniband Ethernet Control Node Failover Node Infiniband Ethernet Failover Node Infiniband Ethernet Failover Node 2 – 56 compute nodes 1 – 7 racks 1, 2, or 3 TB drives 15.1 – TB raw 53 – 6342 TB User data Up to 7 spare nodes available across the entire appliance Details Reserved Customer Space (9U) ETL Servers Backup Servers Passive Unit (Additional spares) Reserved Reserved Space (9U) Reserved Customer Space (8U) ETL Servers Backup Servers Passive Unit (Additional spares) JBOD 4 Compute Node 7 Compute Node 8 Scale Unit (7U): 2 HP 1U Servers (16 Cores/Ea. Total: 32) JBOD 5U 1TB Drives User Data Capacity: 75TB JBOD 8 Compute Node 15 Compute Node 16 Scale Unit (7U): 2 HP 1U Servers (16 Cores/Ea. Total: 32) JBOD 5U 1TB Drives User Data Capacity: 75TB JBOD 12 Compute Node 23 Compute Node 24 60TB (Raw) Full Rack Scale Unit (7U): 2 HP 1U Servers (16 Cores/Ea. Total: 32) JBOD 5U 1TB Drives User Data Capacity: 75TB 120.8TB (Raw) 2 Rack 181.2TB (Raw) 3 Rack JBOD 3 Compute Node 5 Compute Node 6 Scale Unit (7U): 2 HP 1U Servers (16 Cores/Ea. Total: 32) JBOD 5U 1TB Drives User Data Capacity: 75TB JBOD 7 Compute Node 13 Compute Node 14 Scale Unit (7U): 2 HP 1U Servers (16 Cores/Ea. Total: 32) JBOD 5U 1TB Drives User Data Capacity: 75TB JBOD 11 Compute Node 21 Compute Node 22 Scale Unit (7U): 2 HP 1U Servers (16 Cores/Ea. Total: 32) JBOD 5U 1TB Drives User Data Capacity: 75TB JBOD 2 Compute Node 3 Compute Node 4 Scale Unit (7U): 2 HP 1U Servers (16 Cores/Ea. Total: 32) JBOD 5U 1TB Drives User Data Capacity: 75TB JBOD 6 Compute Node 11 Compute Node 12 Scale Unit (7U): 2 HP 1U Servers (16 Cores/Ea. Total: 32) JBOD 5U 1TB Drives User Data Capacity: 75TB JBOD 10 Compute Node 19 Compute Node 20 1/2 Rack 30TB (Raw) Scale Unit (7U): 2 HP 1U Servers (16 Cores/Ea. Total: 32) JBOD 5U 1TB Drives User Data Capacity: 75TB 90.6TB (Raw) 1 1/2 Rack JBOD 1 Compute Node 1 Compute Node 2 Base Unit (7U): 2 HP 1U Servers (16 Cores/Ea. Total: 32) JBOD 5U 1TB Drives User Data Capacity: 75TB JBOD 5 Compute Node 9 Compute Node 10 Extension Base Unit (7U): 2 HP 1U Servers (16 Cores/Ea. Total: 32) JBOD 5U 1TB Drives User Data Capacity: 75TB JBOD 9 Compute Node 17 Compute Node 18 15TB (Raw) ¼ Rack Extension Base Unit (7U): 2 HP 1U Servers (16 Cores/Ea. Total: 32) JBOD 5U 1TB Drives User Data Capacity: 75TB 1¼ Rack 75.5TB (Raw) © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

17 Dell and Quanta configuration
MGXFY13 11/23/2018 Dell and Quanta configuration Infiniband Ethernet Control Node Failover Node PDW Backplane (6U): Redundant Infiniband Redundant Ethernet Mgmt & Control (Active) Rack Failover Node (Passive) 2 – 54 compute nodes 1 – 6 racks 1, 2, or 3 TB drives 22.65 – TB raw 79 – 6116 TB User data Up to 6 spare nodes available across the entire appliance Details Reserved Reserved (6U) JBOD 5 Compute Node 8 Compute Node 9 JBOD 6 Compute Node 7 Base Unit (10U): 3 Servers in 2U enclosure (16 Cores/Ea. Total: 48) 2 JBOD 4U ea. 1TB Drives User Data Capacity: 79TB 67.9TB (Raw) Full Rack JBOD 3 Compute Node 5 Compute Node 6 JBOD 4 Compute Node 4 Base Unit (10U): 3 Servers in 2U enclosure (16 Cores/Ea. Total: 48) 2 JBOD 4U ea. 1TB Drives User Data Capacity: 79TB 45.3TB (Raw) 2/3 Rack JBOD 1 Compute Node 2 Compute Node 3 JBOD 2 Compute Node 1 Base Unit (10U): 3 Servers in 2U enclosure (16 Cores/Ea. Total: 48) 2 JBOD 4U ea. 1TB Drives User Data Capacity: 79TB 22.6TB (Raw) 1/3 Rack JBOD 2 Compute Node 2 Compute Node 3 JBOD 1 Compute Node 1 © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

18 High availability

19 Failover in Action: Control host failure
HST01 node marked as failed (AD02 Persists on HST02) Cluster fails over to HST02 HST02 already “warm” so Failover is very Fast WFOHST01 HST01 CTL01 AD01 VMM HST02 CTL01 AD02 VMM HSA01 CMP01 ISCSI01 DAS01 HSA02 CMP02 ISCSI02 HSA03 CMP03 ISCSI03 DAS02 HSA04 CMP04 ISCSI04 HSA05 CMP05 ISCSI05 DAS03 HSA06 CMP06 ISCSI06

20 Failover in Action: Compute node failure
Compute node marked as failed PDW Cluster restarts compute node on a passive server ISCSI VM does not fail over WFOHST01 HST01 CTL01 AD01 VMM HST02 CMP01 HSA01 CMP01 ISCSI01 DAS01 HSA02 CMP02 ISCSI02 HSA03 CMP03 ISCSI03 DAS02 HSA04 CMP04 ISCSI04 HSA05 CMP05 ISCSI05 DAS03 HSA06 CMP06 ISCSI06

21 APS disk layout: LUNs and filegroups/files
Each LUN is composed of 2 drives in RAID1 mirroring configuration Distributions are now split into 2 files TempDB and Log are across all 16 LUNs No fixed TempDB or log size allocation VHDXs are on JBODs to ensure high availability Disk I/O further parallelized relative to V1: bandwidth to increase by ~70% in V2 RTM Design Details Disk 1 Disk 2 Node 1: Distribution A – file 1 Node 1: Distribution A – file 2 Temp DB Log Disk 3 Disk 4 Disk 5 Disk 6 Node 1: Distribution B – file 2 Node 1: Distribution B – file 1 Disk 7 Disk 8 . . Disk 29 Disk 30 Node 1: Distribution H – file 1 Node 1: Distribution H – file 2 Disk 31 Disk 32 Disk 33 Disk 34 Node 2: Distribution A – file 1 Temp DB Log Disk 35 Disk 36 . . Disk 65 Disk 66 Fabric storage (VHDXs for node) Disk 67 Disk 68 Hot spares Disk 69 Disk 70 JBOD

22 Hadoop region HA #Scale Units Replication Factor Polybase =1 2 3 >1
Head/Controlling Nodes behave exactly the same as for PDW Data Nodes are different APS relies on Hadoop data replication for data availability Disks are not Mirrored Data Nodes do not failover Replication Factor is configurable #Scale Units Replication Factor Polybase =1 2 3 >1

23 PDW region overview

24 PDW Region Hadoop Region Appliance WFOHST01 HST01 HST02 HSA01 HSA02
CTL01 AD01 VMM ISCSI01 ISCSI02 ISCSI04 ISCSI03 ISCSI05 ISCSI06 CMP01 CMP02 CMP03 CMP04 CMP05 CMP06 DAS01 DAS02 DAS03 AD02

25 Virtual Machine Manager Fabric Active Directory
PDW region nodes WFOHST01 HST01 HST02 HSA01 HSA02 HSA03 HSA04 HSA05 HSA06 CTL01 AD01 VMM ISCSI01 ISCSI02 ISCSI04 ISCSI03 ISCSI05 ISCSI06 CMP01 CMP02 CMP03 CMP04 CMP05 CMP06 DAS01 DAS02 DAS03 AD02 PDW Nodes Control Compute (>1) Infrastructure Nodes Virtual Machine Manager Fabric Active Directory

26 Control and Compute workload nodes
Control Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node

27 Services inside the Control node
PDW Services Responsibilities PDW Engine DMS Core PDW Agent SQL Server Admin Console (IIS) Parse SQL / Syntax Check Validate & Authorize Generate D-SQL Plan Orchestrate D-SQL Plan Collate Diagnostic Info Admin Console Web App

28 Services inside the Compute node
PDW Services Responsibilities DMS Core PDW Agent SQL Server Hold User Data Process Queries Move Data Load Data

29 Virtual Machine Manager (VMM)
Deployment of Virtual Machines Configuration of Virtual Machines Hosts Windows Update Services (WSUS) Sits in the Fabric domain

30 PDW tools of the trade

31 Tools of the trade PDW Development Management SSDT (Visual Studio)
SQLCMD SSIS / SSAS / SSRS Power BI SSIS adapters dwloader.exe Management Console dwconfig.exe pav.exe PowerShell System Center

32 Tool distribution with PDW
Location Download us/download/details.aspx?id=45294 Tools Distributed SSIS Destinations Clienttools.msi dwloader Adventureworks Help File

33 Connecting from SQL 2012? Use SNAC 11 Connecting from SQL 2008 R2?
Connecting to PDW Management Console Development tools TCP Port 17001 Security Connecting from SQL 2012? Use SNAC 11 Connecting from SQL 2008 R2? Use SNAC 10

34 SQL Server Data Tools (SSDT)
Used for writing queries against PDW SSMS is not a supported tool SSDT for PDW part of standard SSDT deployment model

35 SQL Server Integration Services
Destination Adapters available for SSIS 2008R2 SSIS 2012 SSIS 2014

36 BI solution development
Power Query Power Pivot Power Map Power View SQL Server Analysis Services ROLAP Facts MOLAP Dimensions SQL Server Reporting Services

37 SQLCMD.exe -I SQLCMD support ..\Microsoft SQL Server\110\Tools\Binn\
SNAC R2 SNAC 11 for 2012+ QUOTED IDENTIFIER ON  Mandatory Must be set at SQLCMD Invocation SQLCMD.exe -I

38 Third-party tools Attunity Replicate MicroDesigner (MicroERD)
PDW Region supported target warehousing/microsoft-pdw MicroDesigner (MicroERD) Data Modelling tool

39 PDW region configuration

40

41

42 Region must be restarted once reset

43

44 Management & monitoring

45 Management console Read-Only View of Management Information
Can cancel User Sessions, Queries, Loads Can easily visualize DSQL structures Requires View Server State All data accessible via DMVs

46

47 Demo | Admin Console

48 Microsoft Analytics Platform System
11/23/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.


Download ppt "Microsoft Analytics Platform System"

Similar presentations


Ads by Google