SQL Server 2008 R2 Parallel Data Warehouse: Under the Hood Brian Mitchell Senior Premier Field Engineer
Tier 1 Enterprise Data Warehouse Appliance Offering –High scalability from 10s to100s of terabytes –High performance through MPP system Flexibility and Choice –Choice of deployment options through distributed architecture – Highly Scalable Most Comprehensive Solution –Complete data warehouse solution spanning desktop, enterprise data warehouse (EDW), and data marts –Deep integration with Microsoft business intelligence (BI) –Comprehensive toolset for BI, ETL, MDM, and streaming data Introducing Parallel Data Warehouse
Agenda SQL Server 2008 R2 PDW Overview Disk CPU Memory
Appliance Model Sold as a “black box” to customers End-to-end solution includes software and hardware Preconfigured from vendor Based on a balanced reference architecture Hardware specifications promote data warehousing workloads Provides enterprise-level redundancy
Appliance Hardware Schema
PDW: High Availability Failover Clustering Dual Networking –Dual Infiniband –Dual Ethernet –Dual Fiber Channel Dual Power Storage –RAID 0 –Hot Spare
PDW Benefits Appliance Model –System arrives assembled with software pre- installed Appliance optimized for DW Workloads CPU and IO bandwidth is balanced for scan-intensive queries Simple to get running and productive
PDW Advantages All loads and queries are highly parallel, automatically All DML (Inserts, Updates) are also parallel * Can increase scale and reduce execution time by adding compute racks Fewer ‘knobs’, less complexity at DBA level –Eliminates physical file layout considerations from database and table creation –Memory, parallelism, and many other SQL configuration options preset and fixed
PDW: Built on Tech You Know Windows Server 2008 (SP2) SQL Server 2008 (SP2) Failover Clustering Web Based Admin Console SQL Server 2008 R2 BI Tools connect natively –Analysis Services –Reporting Services –Integration Services –PowerPivot
Demo: PDW Built on Tech You Know
PDW: Basic Concepts
Create Database Syntax CREATE DATABASE database_name WITH ( [ AUTOGROW = ON | OFF, ] REPLICATED_SIZE = replicated_size [ GB ], DISTRIBUTED_SIZE = distributed_size [ GB ], LOG_SIZE = log_size [ GB ] ) [;] Example CREATE DATABASE BigData WITH (AUTOGROW = ON, REPLICATED_SIZE = 1024, DISTRIBUTED_SIZE = 16384, LOG_SIZE = 1024 )
Create Table Examples Replicated Table CREATE TABLE myTable ( id integer NOT NULL, lastName varchar(20), zipCode varchar(6) ); Distributed Table CREATE TABLE myTable ( id integer NOT NULL, lastName varchar(20), zipCode varchar(6) WITH ( DISTRIBUTION = HASH (id)) );
PDW: Handling Disk I/O
Replicating Tables 15 dimTime Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day dimStore Store Dim ID Store Name Store Mgr Store Size Store Dim ID Store Name Store Mgr Store Size dimProduct Prod Dim ID Prod Category Prod Sub Cat Prod Desc Prod Dim ID Prod Category Prod Sub Cat Prod Desc factSales Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold DimMktCampaign Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD Smaller Dimension Tables are Replicated on Every Compute Node TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD
Ultra Shared Nothing SQL Server PDW –Stores a portion of each table in each compute node –Stores 8 “portions” per compute node Called Distributions –Table Scan: all distributions on all nodes
Distributing Tables 17 dimTime Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day dimStore Store Dim ID Store Name Store Mgr Store Size Store Dim ID Store Name Store Mgr Store Size dimProduct Prod Dim ID Prod Category Prod Sub Cat Prod Desc Prod Dim ID Prod Category Prod Sub Cat Prod Desc factSales Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold dimMktCampaign Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD Larger Fact Table is Hash Distributed Across All Compute Nodes SF -1 SF -2 SF -3 SF -4
PDW: Database Filegroups
PDW: Database Files
PDW Compute Node SAN Architecture **N+1 cluster architecture
Handling Processing Throughput
CPU Each Compute Node is set up using Soft-Numa Each Compute Node Listens on Multiple Ports Each Port is mapped to a Soft-Numa Node
PDW: Affinity PDW Engine SELECT Name FROM tableA WHERE state = ‘TX ’ A B C D E F G H Filegroups Soft-Numa LUNs 8 Connections to SQL Server Affinitized to Cores Affinitized to Tables on FileGroups Affinitized to Disks Compute Node
PDW: Memory For Everyone
PDW Memory: Resource Governor QueryGroup_A QueryGroup_B QueryGruop_D QueryGroup_C QueryGroup_E QueryGroup_F QueryGroup_G QueryGroup_H QueryPool_A 11% QueryPool_B 11% QueryPool_D 11% QueryPool_C 11% QueryPool_E 11% QueryPool_F 11% QueryPool_G 11% QueryPool_H 11% PDW Engine SELECT Name FROM tableA WHERE state = ‘TX ’ RAM Compute Node 1
Monitoring PDW Admin Console DMV’s DBCC Commands PDW Logs DMS Logs SQL Server Logs Event Logs Cluster Logs
Monitoring PDW - Demo
Please Complete the Evaluation Form Pick up your evaluation form: In each presentation room Drop off your completed form Near the exit of each presentation room At the registration area SQL Server 2008 R2 Parallel Data Warehouse: Under the Hood 28 Presented by Dell
THANK YOU! For attending this session and PASS SQLRally Orlando, Florida Session Code | Session Title 29 Presented by Dell