Download presentation
Presentation is loading. Please wait.
Published byAlfredo Miranda Marín Modified over 6 years ago
1
12/4/ :40 AM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
2
Matt Hollingsworth Microsoft DAT301
12/4/ :40 AM A First Look at Large-Scale Data Warehousing in Microsoft SQL Server Code Name "Madison" Matt Hollingsworth Microsoft DAT301 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
3
Agenda Concepts and Principles Madison functional overview
Early adoption
4
Symmetric Multiprocessing
SMP Single DB instance “Shared Everything” Architecture Server/CPU’s share memory disks Can lead to resource contention as you scale
5
Massively Parallel Processing
MPP Server/CPU’s have their own dedicated resources “Shared Nothing” Architecture “Secret Sauce” is parallelizing operations Lightning-fast Queries, Data Loads and Updates Linear Scalability Problem needs to be partitionable
6
SMP vs MPP SMP MPP HW advancements increasing ability to scale-up
Scaling is limited High end SMP very expensive Extremely high concurrency for some workloads Less than 1-2 TB of data SMP will almost always be better Full SQL Server functionality HA must be architected in HW advancements increasing ability to scale-up & scale-out Scaling to 1 PB+ Scale out is relatively low cost Relatively high concurrency for complex workloads > 2 TB up to 1 PB Limited SQL Server functionality HA is built in
7
Best practices focus on preserving the sequential order of data
Sequential I/O Sequential I/O Random I/O Ideal for data warehousing Scalable, predictable performance Large reads & writes Requires 1/3 or fewer drives for same performance Ideal for OLTP Not as predictable & scalable for data warehousing Small reads and writes Requires large number of drives Best practices focus on preserving the sequential order of data
8
About DATAllegro… Technology Partners Proprietary Appliance
Management and MPP Database Open Source Database and OS Industry Standard Servers Industry Standard Networking Industry Standard Storage
9
Integration Plans Microsoft BI Reference Hardware Platforms
Provide scale out through MPP on SQL Server and Windows Offer ‘Appliance like’ user experience to Data Warehouse customers Lower TCO to high end Data Warehousing Offer integrated BI platform to small and very large Enterprises Microsoft BI OPEN SOURCE DATABASE & OS Industry Standard Servers Reference Hardware Platforms Industry Standard Networking Industry Standard Storage
10
Balanced Across All Components
A Holistic Approach SQL Server 2008 Potential Performance Bottlenecks SERVER CACHE SQL Server WINDOWS CPU Cores FC Switch A B DISK LUN FC HBA A B STORAGE CONTROLLER A B CACHE CPU Feed Rate SQL Server Read Ahead Rate HBA Port Rate Switch Port Rate SP Port Rate LUN Read Rate Disk Feed Rate
11
Sequential I/O Physical table structures, file layouts and SQL Server settings to maximize sequential I/O Enough disks to feed available CPU cores Carefully designed storage infrastructure to maximize and sustain sequential I/O No bottlenecks Where possible, separate I/O paths and disks for data, TempDB and logs
12
Pre-configured, pre-tested HW reference architectures (4-32 TB)
Fast Track DW Accelerate scalable Data Warehouse deployments at lower TCO Pre-configured, pre-tested HW reference architectures (4-32 TB) SI Solution Templates Appliance-like time to value Flexibility through choice of HW platforms Low TCO through commodity hardware and value pricing Reduced risk through pre-tested and pre-tuned configurations Provides a clear upgrade path to “Madison” via Hub/Spoke
13
MPP Additional Considerations
Principles & approach of SMP carry forward Deeper level of complexity – High Availability Parallelization Inter node data movement
14
Modular building blocks
Balanced CPU and storage Both SMP and MPP are based on building blocks that scale by the CPU core Adds network, storage processing and disk bandwidth for each core Based on maximizing & sustaining true sequential I/O while minimizing disks Generally changes balance of systems so more can be spent on CPU and SW than on storage to give better overall performance for a given budget Building blocks can be adjusted for multiple MPP configurations – high performance, archive and extreme performance
15
The Future of SQL Server Data Warehousing Project "Madison"
Build on Proven Scale for SQL Server Data Warehousing Predictable Scale out through MPP Customers with over 400 TB data warehouses Accelerate plan to support largest Data Warehouses Provide Massive Scale with Low TCO Integrated with Microsoft BI
16
SQL Server MPP: 10,000-foot view
Appliance-like model Hardware and Software In unison and in balance no bottlenecks Achieve max performance per component For each HW component and each SW module: Define max performance Identify optimum workload type Adjust surrounding HW/SW to achieve optimum Packages engineering talent Lots of knowledge, many hours of tuning, trying, testing Hardware Software
17
Commodity Hardware Lower cost Frequent performance improvements
Easier upgrade and maintenance Higher customer comfort Better compatibility
18
Madison MPP Data Warehouse Architecture
Private Network Compute Nodes Industry Standard SAN Storage Distributed DB SQL SQL Corporate Network Control Node Active/Passive SQL Client Drivers SQL SQL Landing Zone Spare Node ETL Load Interface SQL Configuration & Monitoring Microsoft Cluster Server Backup Corporate Backup Solution
19
Ultra Shared Nothing An extension of traditional shared nothing design
Push shared nothing architecture into SMP node IO and CPU affinity within SMP nodes Eliminate contention per user query Use full resources for each user query Multiple physical instances of tables Distribute large tables Replicate small tables Distribute AND Replicate medium tables Re-Distribute rows “on-the-fly” when necessary
20
Control Node & Client Drivers
Client connections always go through the control node Clustered to a passive node Processes SQL requests Prepares execution plan Orchestrates distributed execution Local SQL Server to do final query plan processing / result aggregation Will use same set of drivers used by DATAllegro Provided by DataDirect ODBC, OLE-DB, JDBC and Ado.Net client drivers Wire protocol (SeQuel Link) Available drivers for 32 and 64 bits
21
Compute Nodes A SQL Server 2008 instance
DB engine nodes autonomous on local data SQL as primary interface Each MPP node is a highly tuned SMP node with standard interfaces
22
Landing Zone Provides high capacity storage for data files from ETL processes Integration services available on the landing zone Connected to internal network Available as sandbox for other applications and scripts that run on internal network. Source Landing Zone Files Data Loader Compute Nodes
23
Backup Node Builds on SQL Server native backup/ restore facility
Use VDI interface to plug into backup pipeline Database-level backup Coordinated backup across the nodes Quiesce write activity to synchronize Can only restore to another appliance with exactly the same number of distributions
24
Configuration and Monitoring
Challenge: Is it an appliance or a collection of nodes? Madison services instrumented Logs and Performance Counters Capture and forward SNMP alerts from devices within the appliance Small subset of DMVs to union underlying node DMVs Leverage HPC for monitoring
25
High Availability Multiple levels of redundancy:
Leveraging MSCS for node availability Cluster aware services: SQL Server, Madison, DMS Leveraging MSCS for SQL Services, DMS 1 spare node for every 6* compute nodes 6x1
26
Security and Encryption
Retain DA v3 design Authentication and authorization done by Madison server Users and Roles as first class principals Nested role capabilities Connection to SQL back-ends through high privilege account SQL nodes reside on private network No support for integrated auth Leverages TDE to expose DB-level encryption Supports key rotation
27
The Logical Data Model Multiple databases per appliance Tables Views
Each user database maps to one SQL Server db per node Tables Replicated, Distributed, Replicated + Distributed Leverage SQL Server compression Supports Partitioning Supports secondary indexes Views
28
SQL Server Data Types DAv3 Madison bigint P binary bit char / nchar date, time datetime (was date in DA) datetime2 datetimeoffset decimal float geometry / geography hierarchyid Int (was integer in DA) money real smalldatetime smallint smallmoney sql_variant text / ntext / image timestamp tinyint varchar / nvarchar / varbinary v*(max) uniqueidentifier xml Data Types Most scalar data types supported by SQL Server 2008 are supported by Madison Main exceptions Character and binary strings limited to 8K (i.e. no BLOB support) XML Sql-Variant System and CLR UDTs Latin1_General with binary comparison only
29
Supported SQL Syntax Aligned with ANSI SQL 92 CREATE TABLE AS SELECT
Basic INSERT, UPDATE, DELETE, SELECT CREATE TABLE AS SELECT Limited analytical function support Teradata extensions Quantile, Sample,…
30
Manageability Web-based main administrative user interface
Based on DATAllegro manageability UI Monitoring system health and activity Leveraging HPC pack 2008 Systems management Monitoring Cluster health
31
Query Tools GUI Tool: Command line tool: Nexus (CoffingDW)
Table & view object explorer Interactive query execution Command line tool: Replacement for DA-SQL Flavor of SqlCmd
32
Demo Tools Walk through
33
MS BI Integration Integration Services Reporting Services
Madison enabled as a source Data movement, lookup operations, etc. Will add a new SSIS destination Ensure integrated high performance loads Reporting Services Fully supported; including parameterized queries Will customize experience for report builder and report designer Analysis Services Will get connectivity through OLE-DB provider Will enable both MOLAP and ROLAP storage
34
Madison - Hub & Spoke Each business unit has own Data Marts
Finance Sales HR Manufacturing SQL Server AS Spoke Madison Spoke Madison HUB SQL Server DM Spoke SQL Server DM Spoke Each business unit has own Data Marts More responsive to business needs Fits budget realities Hub provides centralized data governance platform Node-to-node data movement Parallel over Infiniband or 10 Gig Networks ~500GB per min with minimal overhead
35
Benefits of Hub-And-Spoke
All systems connect via a dedicated high speed network Parallel database copy – speeds of up to 500 GB per min Simplification of data mart ETL / ELT processes with publishing model Separation of management and user workloads Integration of SMP SS08 and MPP systems Ability to independently expand any system Ability to add additional spokes without impacting other users Deployment of development and test environments that leverage parallel connectivity
36
Early Adoption MTP – Madison Technology Preview
Our flavor of CTP Assess product and field/partners readiness Provide roadmap for competitive situations Location MTC’s, Partners, other MS facilities, … Working with partners to secure hardware 2-3 week engagements TAP – Technology Adoption Program Closer to traditional TAP Assess production readiness Longer engagement Go-live requirements Customer secures hardware
37
High Level Release Definitions
Will start running MTPs in the summer V2+ Closer functional alignment with SQL Server Better integration with SQL and MS ecosystem, tools and technologies “Madison” (aka v1) Focus on time to market Compatibility with DATAllegro v3 MS BI integration H1 2010
38
Recap Data Warehousing Reference Architectures available today!
SQL Server Fast Track SQL Server “Madison” Built for advanced, large scale data warehouses Shared-nothing MPP architecture Early evaluation programs starting soon All feedback welcome: Thank you!
39
question & answer
40
Get your copy autographed by Lynn or Stephen
Monday, 3rd 17:00 to 18:00 Intersoft Book Shop
41
Required Slide 12/4/ :40 AM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.