Download presentation
Presentation is loading. Please wait.
Published byPamela Lee Modified over 9 years ago
1
Rushabh Mehta Managing Director (India) | Solid Quality Mentors rmehta@solidq.com
2
About me: Rushabh Mehta Professional Association for SQL Server President Solid Quality Mentors (SolidQ) Business Intelligence Mentor Managing Director, India SQL Server MVP rmehta@solidq.com◊ www.solidq.com ◊ @rushabhmehta
3
Agenda Microsoft Data Warehousing Overview SMP v/s MPP Architecture Microsoft Parallel Data Warehouse Architecture and Components
4
Microsoft Data Warehousing Offerings
5
OLAP and ETL Data Mining Managed Reporting Microsoft’s Commitment to DW and BI Pervasive Insight Data Warehousing Ad-hoc Reporting DW Scale Data Profiling Compression VS Integration KPIs Multiple sources Resource Governor Partitioning Power Pivot Load Optimize Parallel Processing Scale to 100s of TB Gartner Leaders Quadrant for Business Intelligence, since 2008 Gartner Leaders Quadrant for Data Warehouse, since 2008 Leader in “The Forrester Wave: Enterprise Data Warehousing Platforms, Q1 2009” Fastest growing of top 5 data warehouse vendors - IDC Microsoft spends as a company $9.1 billion in research annually
6
SQL Server Fast Track Data Warehouse A method for designing a cost-effective, balanced system for Data Warehouse workloads Reference hardware configurations developed in conjunction with hardware partners using this method Best practices for data layout, loading and management Solution to help customers and partners accelerate their data warehouse deployments
7
Fast Track Scope Data Path Data Warehouse Analysis Services Cubes PerformancePoint Services SAN, Storage Array Reporting Services Web Analytic Tools Integration Services ETL SharePoint Services Microsoft Office SharePoint Data Staging, Bulk Loading Subject Area Data Marts Supporting SystemsBI Data Storage SystemsPresentation Layer Systems Reference Architecture Scope (dashed) Presentation Data
8
Fast Track Value Proposition 8
9
SMP Architecture SMP = Symmetric Multiprocessing Two or more identical processors connected to single shared main memory and controlled by single OS instance Any processor can work on any task Easily move tasks between processors to balance workload efficiently All SQL Server implementations up until now have been SMP
10
MPP Architecture MPP = Massively Parallel Processing Uses many separate CPUs running in parallel to execute a single program Each CPU has its own memory Applications must be segmented, using high speed communications between nodes
11
Advantages of MPP Architecture
12
Parallel Data Warehouse Control Rack Data Rack Control Rack Data Rack/s
13
Compute Nodes Storage Nodes Spare Compute Node Dual Fiber Channel Parallel Data Warehouse Compute Node + Storage Node PDW Node PDW Node
14
Compute Nodes Each MPP node is a highly tuned SMP node with standard interfaces Dedicated hardware, database & storage Running SQL Server 2008 EESQL as primary interface Compute Node
15
Architecture: Compute Server Node Hardware Options Pre-configured For Each Sqlserver Instance On Each Compute Node. Drives Configured As RAID1 To Avoid Appliance Failover For A Single Drive Failure Dell Compute Nodes Have 2 LUN’s (2 RAID1 Pairs) HP Compute Nodes Have 3 LUN’s (3 RAID1 Pairs) tempdb Used For The Following Purposes Sort-work Area For Data Loading Into Clustered Index Tables Spill Area For Hash Joins Not Fitting Into Memory Temporary PDW Tables Enterprise Class DBMS TempDB Workspace Dual Multi-Core Processors DUAL 4Gb FC Dual InfiniBand CPU RAM
16
Data Layout Replicated: A table structure that exists as a full copy within each discrete PDW Node. Distributed: A table structure that is hashed on a single column and uniformly distributed across all nodes on the appliance. Each distribution is a separate physical table in the DBMS. Ultra shared nothing: The ability to design a schema of both distributed and replicated tables to minimize data movement between nodes Small sets of data can be more efficiently stored in full (replicated). Certain set operations are more efficient against full sets of data (i.e., single node operations).
17
Data Layout Date Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Store Dim Store Dim ID Store Name Store Mgr Store Size Store Dim ID Store Name Store Mgr Store Size Item Dim Prod Dim ID Prod Category Prod Sub Cat Prod Desc Prod Dim ID Prod Category Prod Sub Cat Prod Desc Sales Fact Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold Promo Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End DD SD ID MDMD MDMD SF 1 SF 1 DD SD ID PD SF 2 SF 2 DD SD ID PD SF 3 SF 3 DD SD ID PD SF 4 SF 4 DD SD ID PD SF 5 SF 5 DD SD ID PD SF 1 SF 1
18
Compute Nodes Storage Nodes Spare Compute Node Dual Fiber Channel Dual Infiniband Control Nodes Active / Passive Landing Zone Backup Node Management Servers Client Drivers ETL Load Interface Corporate Backup Solution Support / Patching Corporate Network Private Network Parallel Data Warehouse
19
Control Node & Client Drivers Client connections always go through the control node The Control Node contains no persistent user data PDW ‘Secret Sauce’ Processes SQL requests Prepares execution plan Orchestrates distributed execution Local SQL Server to do final query plan processing / result aggregation Client Drivers provided by DataDirect ODBC, OLE-DB, JDBC and ADO.NET client drivers Available drivers for 32 and 64 bits
20
PDW Benefits – Massive Parallel Processing Control Rack Data Rack Query 1 Query 1 is standard T-SQL submitted to SQL Server on Control Node ? ? ? ? ? ? ? ? ? ? Query is executed on all 10 Nodes Results are sent back to client
21
PDW Benefits – Massive Parallel Processing Blazing fast performance by parallelizing queries on highly optimized ultra shared nothing nodes. Control Rack Data Rack Multiple queries are simultaneously executed across all nodes. PDW supports querying while data is loading. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
22
Compute Nodes Storage Nodes Spare Compute Node Dual Fiber Channel Dual Infiniband Control Nodes Active / Passive Landing Zone Backup Node Management Servers Client Drivers ETL Load Interface Corporate Backup Solution Support / Patching Corporate Network Private Network Parallel Data Warehouse Support / Patching Management Nodes Active / Passive Cluster
23
Management Node Runs a separate domain controller (Active Directory) Used for deploying patches to all nodes in the appliance Holds images in case a node needs reimaging High Availability using Active / Passive clustering
24
Compute Nodes Storage Nodes Spare Compute Node Dual Fiber Channel Dual Infiniband Control Nodes Active / Passive Landing Zone Backup Node Management Servers Client Drivers ETL Load Interface Corporate Backup Solution Support / Patching Corporate Network Private Network Parallel Data Warehouse Landing Zone ETL Load Interface
25
Landing Zone Provides high capacity storage for data files from ETL processes Integration services available on the landing zone Connected to internal network Available as sandbox for other applications and scripts that run on internal network Source Landing Zone Files Data Loader Compute Nodes
26
Storage Nodes Spare Compute Node Dual Fiber Channel Dual Infiniband Control Nodes Active / Passive Landing Zone Management Servers Client Drivers ETL Load Interface Support / Patching Corporate Network Private Network Backup Node Corporate Backup Solution Parallel Data Warehouse Backup Node Corporate Backup Solution
27
Backup Node Coordinated backup across the nodes Database level backup Full or differential Metadata backup Can restore to a larger appliance Optional item – 1 size per config Up to 524TB of capacity Available in XS, S, M, L and XL
28
PDW Software Architecture SQL Server DW Authentication DW Configuration DW Queue DW Schema PDW Services DMS IIS Compute Nodes Compute Node Landing Zone Backup Node Management Node Built by DWPUExisting MS software3 rd Party Nexus Query Tool Nexus Query Tool JDBC OLE-DB ODBC ADO.NET JDBC OLE-DB ODBC ADO.NET SQL Server DMS User Data Admin Console DSQL Core Engine Services DMS Manager MS BI (AS, RS) MS BI (AS, RS) DMS Loader Client SQL SSIS HPC AD SQL OS Control Node 3 rd Party Tools (Client Access)
29
Conclusion MPP architecture supports massive scale through increased parallelization and shared-nothing architecture Microsoft SQL Server 2008 R2 Parallel Data Warehouse Edition brings massive scale wrapped in the simplicity of an appliance
30
References Microsoft Parallel Data Warehouse official site http://www.microsoft.com/pdw
31
Feedback / QnA Your Feedback is Important! Please take a few moments to fill out our online feedback form at: > For detailed feedback, use the form at http://www.connectwithlife.co.in/vtd/helpdesk.aspx Or email us at vtd@microsoft.com Use the Question Manager on LiveMeeting to ask your questions now!
32
Contact SolidQ www.solidq.com Email Address rmehta@solidq.com
33
© 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.