2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN Welcome November 2012 Vorstellung Parallel Data Warehouse 1 November 2012 Meinrad Weiss
2012 © Trivadis Data Warehouse – Products Positioning Minimal HW tune- up/optimization; supports mixed workloads Balanced solution for mostly scan-centric workloads. Max HW tune-up for most DW scenarios. 4 4 Most flexible architecture for handling all DW scenarios. Scale Complexity HA by default SW-HW integration SQL Server 2008 R2 Fast Track SQL Server 2008 R2 Enterprise PDW SQL Server 2008 R2 Data Center PDW with Distributed Data Architecture November Vorstellung Parallel Data Warehouse
2012 © Trivadis Microsoft Data Warehousing Solutions Scalable and reliable platform for data warehousing on any hardware Reference Architectures offering best price performance for data warehousing Scalable and reliable platform for data warehousing on any hardware Appliance for high-end data warehousing requiring highest scalability, performance, or complexity Ideal for data marts or small to mid-sized EDWs Ideal for data marts or small to mid-sized DWs with scan- centric workloads Ideal for large data marts or mid-sized EDWs Offers flexibility in hardware and architecture Software only Reference Architectures (software and hardware) Software only DW appliance (fully integrated software and hardware) Scale-up DW Scale-out DW with MPP 10s of TB 2 – 80 TB 10s of TB 10s - 100s of TB November Vorstellung Parallel Data Warehouse
2012 © Trivadis Data Warehouse – Products Positioning 100% SQL Server 2008 R2 Compatibility Scale Complexity HA by default SW-HW integration SQL Server 2008 R2 with Fast Track Reference Architecture SQL Server 2008 R2 Enterprise PDW SQL Server 2008 R2 Data Center PDW with Distributed Data Architecture November Vorstellung Parallel Data Warehouse
2012 © Trivadis MPP vs. SMP November 2012 Vorstellung Parallel Data Warehouse 5 MPP - Massively Parallel Processing Uses many separate CPUs running in parallel to execute a single program Each CPU has its own memory and disks High-speed communications between nodes Applications must be segmented SMP MPP SMP - Symmetric Multiprocessing Multiple CPUs used to complete individual processes simultaneously All CPUs share the same memory, disks, and network controllers All SQL Server implementations up until now have been SMP
2012 © Trivadis Two hardware vendors: HP and Dell November 2012 Vorstellung Parallel Data Warehouse 6 Microsoft+Dell Parallel Data Warehouse Appliance Microsoft+HP Enterprise Data Warehouse Appliance
2012 © Trivadis SQL Control Node Management Node Landing Zone Backup Node Control RackData Rack(s) November Vorstellung Parallel Data Warehouse
2012 © Trivadis SQL Control Node Management Node Landing Zone Backup Node Control Rack SQL Client connections always go through the control node Windows Failover Cluster for Availability Contains no persistent user data Processes SQL requests Prepares execution plan Orchestrates distributed execution Local SQL Server processes final query plan and aggregates results November Vorstellung Parallel Data Warehouse
2012 © Trivadis SQL Control Node Management Node Landing Zone Backup Node Control Rack SQL Provides Support and Patching for the Appliance Holds image for re-deployment of compute node Holds Active Directory November Vorstellung Parallel Data Warehouse
2012 © Trivadis SQL Control Node Management Node Landing Zone Backup Node Control Rack SQL Provides high-capacity storage for data files from ETL processes Is available as a sandbox for other applications and scripts that run on the internal network Provides SQL Server Integration Services Source Landing Zone Files Data Loader Compute Nodes DWLoader or SQL Server Integration Services November Vorstellung Parallel Data Warehouse
2012 © Trivadis SQL Control Node Management Node Landing Zone Backup Node Control Rack SQL Provides Integrated Backup Solution Integrates with 3rd party backup products Orderable in different sizes November Vorstellung Parallel Data Warehouse
2012 © Trivadis SQL Control Node Management Node Landing Zone Backup Node Control RackData Rack(s) Data Rack Servers 5/10 active + 1 passive per Rack InfiniBand, FC and Ethernet switching Expansion Grow from 1/2–4 data racks, storage options, test/dev system Consists of COMPUTE NODES and STORAGE NODES Shared Nothing Spare Node provides failover in case of node failure November Vorstellung Parallel Data Warehouse
2012 © Trivadis Connectivity and Tools Nexus Query Chameleon DWSQL November 2012 Vorstellung Parallel Data Warehouse 13
2012 © Trivadis Creating a Database CREATE DATABASE PDW WITH (AUTOGROW = ON, REPLICATED_SIZE = 1024 GB, -- (per Node) DISTRIBUTED_SIZE = GB, -- (whole System) LOG_SIZE = 1024 GB); November Vorstellung Parallel Data Warehouse
2012 © Trivadis Distribution and Replication of Data: Replicate November 2012 Vorstellung Parallel Data Warehouse 15 Time Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Store Dim Store Dim ID Store Name Store Mgr Store Size Store Dim ID Store Name Store Mgr Store Size Product Dim Prod Dim ID Prod Category Prod Sub Cat Prod Desc Prod Dim ID Prod Category Prod Sub Cat Prod Desc Mktg Campaign Dim Mktg Campaign Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD Smaller (<5GB ) Dimension Tables are Replicated on Every Compute Node TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold SF -1 SF -2 SF -3 SF -4 Result: Fact -Dimension Joins can be performed locally
2012 © Trivadis Create Replicated Table November 2012 Vorstellung Parallel Data Warehouse 16 CREATE TABLE DimProduct( ProductId BIGINT NOT NULL, Description VARCHAR(50), CategoryId INT NOT NULL, ListPrice DECIMAL(12,2)) WITH (DISTRIBUTION = REPLICATE); CREATE TABLE DimProduct( ProductId BIGINT NOT NULL, Description VARCHAR(50), CategoryId INT NOT NULL, ListPrice DECIMAL(12,2)) WITH (DISTRIBUTION = REPLICATE); Creates tables on each of the individual compute nodes and assigns them to the REPLICATED file group. Data Compression is automatically turned on
2012 © Trivadis Distribution and Replication of Data: Distribute November 2012 Vorstellung Parallel Data Warehouse 17 SF -1 Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold Larger (> 10 GB) Fact Table is Hash Distributed Across All Compute Nodes SF -1 SF -2 SF -3 SF -4 Time Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Store Dim Store Dim ID Store Name Store Mgr Store Size Store Dim ID Store Name Store Mgr Store Size Product Dim Prod Dim ID Prod Category Prod Sub Cat Prod Desc Prod Dim ID Prod Category Prod Sub Cat Prod Desc Mktg Campaign Dim Mktg Campaign Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End
2012 © Trivadis November 2012 Vorstellung Parallel Data Warehouse Distribution on a PDW PDW Node 1 Create Table _a Create Table _b … Create Table _h 8 Tables per Node PDW Node 2 Create Table _a Create Table _b … Create Table _h PDW Node 10 Create Table _a Create Table _b … Create Table _h PDW Node … Final Result: 80 individual tables across a 10 node (1 data rack) appliance CREATE TABLE myTable (column Defs) WITH (DISTRIBUTION = HASH (id)); CREATE TABLE myTable (column Defs) WITH (DISTRIBUTION = HASH (id)); 18
2012 © Trivadis Reference Case: Today’s process flow / Building blocks DB_ GSAPOP DB_ MasterTables DB_ ReportTables FinanceCube Baseline : Once data extracted from SAP: Time taken to create end-end Reports and Cubes insights 13+ hours (In production typical 20+ hours with multiple companies) DW_Finance Transactions MasterFinance table population 6 hours 21min 6 hours 1 hour Suspicious words Reports 3hr21min
2012 © Trivadis Reference Case: Audit Process with PDW DB_ GSAPOP DB_ MasterTables DB_ ReportTables FinanceCube Once data is extracted from SAP: Creating 5 CM Reports & FSCP Finance Cube; Time taken: 30 Minutes Once data is extracted from SAP: Creating 5 CM Reports & FSCP Finance Cube; Time taken: 30 Minutes DW_Finance Transactions MasterFinance table population 8m50sec load from FlatFile 23min 10m10sec 11 min All 5 Reports within 6min (80)
2012 © Trivadis Appliance Update AU3 November 2012 Vorstellung Parallel Data Warehouse 21 Performance – up to 10x improvement Data Movement Services New cost based Query Optimizer New Data Movement Service 1/2 rack appliances from HP and Dell System Center 2012 Integration (SCOM pack) And YES … Support for Stored Procedures (subset) Collations: Full support for international data Native SQL Server drivers
2012 © Trivadis Landing Zone ETL Tools Hub and Spoke Departmental Reporting Regional Reporting High-Performance Reporting Central EDW Hub Regional Reporting with Business Decision Appliance Third-Party RDBMS Third-Party Data Integration Mobile Applications November Vorstellung Parallel Data Warehouse
2012 © Trivadis Web-BasedManagement Dashboard November 2012 Vorstellung Parallel Data Warehouse 23
2012 © Trivadis System Center (SCOM) November 2012 Vorstellung Parallel Data Warehouse 24
2012 © Trivadis SQL Server Compute Nodes System Throughput Regular SQL Server ( 1 Node) Seamless Scalability Half Rack PDW ( 5 Nodes) Full Rack PDW ( 10 Nodes) 2 Rack PDW ( 20 Nodes) 3 Rack PDW ( 30 Nodes) 4 Rack PDW ( 30 Nodes) November Vorstellung Parallel Data Warehouse
2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN Let‘s go. November 2012 Vorstellung Parallel Data Warehouse 26 Wettbewerb