Architecting a Large-Scale Data Warehouse with SQL Server 2005 Mark Morton Senior Technical Consultant IT Training Solutions DAT313
What We Will Cover Discuss the role of partitioning and indexing in a relational data warehouse Review enhancements in Microsoft SQL Server 2005 that facilitate partition and index strategies Loading large volumes of fact table data for optimal performance Bringing at all together…
Partitioning Role of partitioning in a relational data warehouse Partitioning refers to the process of segmenting rows of data horizontally into smaller, more manageable sets Improve scalability and availability Enhance manageability and maintenance Faster incremental load times Reduce query time
Partitioning (cont’d) Criteria for partitioning a relational data warehouse Partitioning should be considered for large fact tables that require high availability or exhibit poor query or maintenance performance Partitioning is not typically implemented or recommended for dimension tables Implementing a partition strategy is complex and should not be considered if the fact table is not sufficiently large or query performance, maintenance or availability are not an issue
Partitioning (cont’d) In SQL Server 2000 A partitioned view joins horizontally partitioned data from a set of member tables across one or more servers, making the data appear as if from one table Accomplished by creating physically separate member tables Non-overlapping CHECK constraints are required on the partition column Queries against the partition view filtering on the partition column will only include the physical tables required to resolve the query Limited to 256 member tables
Partitioning (cont’d) In SQL Server 2005 Partitions data horizontally by dividing table and index data into subsets of data which may be spread across multiple file groups A partition function is used to define the ranges and boundaries in which the partitions are segmented A partition scheme is used to map each partition segment defined by a partition function to a specific file group
Partitioning (cont’d) In SQL Server 2005 Tables, indexes or indexed views can be created directly on a partition scheme instead of a file group Queries and maintenance operations targeting a subset of data are optimized as only the partitions required to complete the operation are utilized Limited to 1000 partitions Supports local partitions only
Partitioning (cont’d) Defining your partition strategy Define Your Partition Column: Identify the single column in which data should be partitioned Define Your Partition Function: Identify the number of partitions and ranges (boundaries) in which data should be partitioned Define Your Partition Scheme: Identify and create the file groups required to store the rows of a partitioned table or index
Defining and Creating Partitions
Indexing Role of indexing in a relational data warehouse Dimension tables typically use a surrogate key created specifically for the data warehouse as the primary key Fact tables typically use the composite of all related dimension surrogate keys as the primary key A data warehouse may contain many additional non-clustered indexes in order to increase efficiency and responsiveness of ad-hoc queries Covering indexes—indexes that contain all columns referenced in the query—are often implemented to support well defined, frequently executed queries
Indexing (cont’d) Covering indexes in SQL Server 2000 Create a composite index that includes key columns for all columns referenced in the query Produces a large key size with possibly many key columns that are not used for filtering or lookups Maximum of 16 key columns (900 total bytes)
Indexing (cont’d) Covering indexes in SQL Server 2005 A non-clustered index can be extended to include non-key columns Non-key columns are stored at the leaf level similar to non-key columns of a clustered index Since all columns used in the query are located at the leaf level, only the index page is required to resolve the query Maximum of 1023 include columns (8060 total bytes)
Creating Indexes With Include Columns
Indexing (cont’d) Alternatives to covering indexes If an effective clustered index can be utilized, then non-clustered covering indexes may not be required Create single, non-clustered indexes on all referenced columns in a table and allow the query optimizer to utilize index intersection
Loading and Maintenance Managing and maintaining partitions Load data into an empty partition Remove all data from an existing partition Relocate all data in one partition from a partitioned table to another partitioned table Split one partition into two partitions Merge two partitions into one partition
Managing and Maintaining Partitions
Loading and Maintenance Optimizing bulk load performance Prefer native format over ASCII Execute multiple bulk loads concurrently Set recovery mode to bulk-logged Use the TABLOCK hint to minimize locking Load each data file in a single batch
Loading Data
Further Reading Partitioned Tables and Indexes in SQL Server 2005 (Kimberly L. Tripp)Partitioned Tables and Indexes in SQL Server 2005 (Kimberly L. Tripp) us/dnsql90/html/sql2k5partition.asphttp://msdn.microsoft.com/library/default.asp?url=/library/en- us/dnsql90/html/sql2k5partition.asphttp://msdn.microsoft.com/library/default.asp?url=/library/en- us/dnsql90/html/sql2k5partition.asphttp://msdn.microsoft.com/library/default.asp?url=/library/en- us/dnsql90/html/sql2k5partition.asp Native Partitioned Tables and Indexes (Itzik Ben-Gan)Native Partitioned Tables and Indexes (Itzik Ben-Gan) SQL SERVER 2005 Books OnlineSQL SERVER 2005 Books Online
We invite you to participate in our online evaluation on CommNet, accessible Friday only If you choose to complete the evaluation online, there is no need to complete the paper evaluation Your Feedback is Important!
© 2005 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.