Praveen Srivatsa Director| AstrhaSoft Consulting blogs.asthrasoft.com/praveens |
Table and Index Partitioning Designed for … VLDB w/very large tables (100’s GB) Large machines with 8, 16, 32, and more ‘real’ CPUs Replacing partitioned views where the partitions are in a single database Improving large data set manageability Loading Efficient dropping of whole partitions
How do you do Partitioning? Before a table is partitioned two things have to be created, a partition Function and Schema Partition Function Each row of an index/table is assigned to a partition (numbered 1, 2, 3,...) by using a “Partition Function” SQL Server 2005 supports Range partitioning only on a single column User defines the key column, the number of partitions, and the partition boundary points Partition Scheme Each partition for a partition function is mapped to a physical storage location (Filegroup) through a “Partition Scheme”
Rules For Partitions Restrictions are derived from the principles of no data movement and no scans All target indexes must exist in the source All indexes must be aligned (use the same partition function as was used on the table) Corresponding target index is in the same filegroup as the source There must be a constraint on the source to ensure all its data “fits” into the target partition
Example Code CREATE PARTITION FUNCTION testtablepart_fn (int) AS RANGE LEFT FOR VALUES (-1,10, 20, 30) CREATE PARTITION SCHEME testtablepart_ps AS PARTITION testtablepart_fn TO ([Filegroup4], [Filegroup1], [Filegroup1], [Filegroup3], [Filegroup2]) The order of filegroups being displayed is significant CREATE TABLE Employees (EmpId int, EmpName varchar(50)) ON testtablepart_ps(EmpId)
Partitioning Key restrictions Must be based on columns in the table Restrictions similar to index key limitations Only 1 column is permitted No Text/NText/Image in partition key No varchar(max) No timestamp Only “native types” – no user-defined types Column values must be deterministic Computed columns must be persisted (a new feature in 2005) Maximum 1000 partitions per table in SQL Server 2005 (practical limitation, not tested beyond 1000) All partitions of a single table or index must be in a single database Partitioned views can be used in conjunction with table partitioning to span databases and servers
Partitioning Benefits Allow easy management of very large tables and indexes (data scale-up) For example Fast Insert or Delete of large quantities of data (per-partition) Index defragmentation or rebuild on one partition using ALTER INDEX … PARTITION ( ) In some cases may experience some performance improvements for some operators, e.g. hash join would be more efficient take advantage of collocation joins of large tables
Partitioning Benefits Without interfering with access to the rest of a table: Add 100's of millions of rows to a table that already has billions of rows Delete 100's of millions of rows from that table Solution … use partitions with ranges Very fast (1 second), just metadata changes Switch data in and out Table and partition of another table Two partitions of different tables Bulk data loading (into separate table - then switch into main table) Index maintenance per partition Two key best practices to improve manageability Index Alignment (partitioned like the table) Storage Alignment (table partitions in the same filegroup\file)
Partitioning Benefits When you have to BCP 100 million rows into a table that already has 600 million row, need to reindex Table has indexes => load is slow; about 10+ times slower than loading into a heap and creating indexes afterwards Reindexing made table unavailable Using a partitioned table, can load into the table, and index it while everyone still using the rest of the partitions When done loading and indexing, just switch it in Delete is slow – deleting rows is orders of magnitude slower than truncating a multi-GB table But you don’t want to truncate the table, just delete 20 or 30% of it With partitioning, just switch out the unneeded partition Cost of running a utility usually grows linearly with the table size DBCC CHECKTABLE, use it on a partition at a time
Table Partitioning and Switching Bring a ‘partition’ of new data in or take a ‘partition’ of old data out of a partitioned table very fast A table has 12 billion rows divided into 12 months You want to archive off the 13 th month and then bring in the new month You have partitioned by month Data is ‘moved’ (not really) between tables by SWITCHing the pointers to physical data locations DATA IS NOT MOVED or even scanned, so size doesn’t matter It’s a metadata operation Takes a second
Example Switch Code ALTER TABLE table_name1 SWITCH [PARTITION ] TO table_name2 [PARTITION ] table_name1 is called “source” and table_name2 “target” of the SWITCH Target table (or partition_number2) is empty
SWITCH Partition Performance – allows building new and removing old partition fast Availability – allows adding new and removing old partition with minimal downtime Note: Schema modification lock is acquired for the duration of the ALTER TABLE … SWITCH Per-partition manageability – enables taking single partition out and run utility on it CREATE INDEX, ALTER INDEX … REBUILD And other operations that are not intended to work on individual partition numbers, DBCC CHECKTABLE Efficiently supports Sliding Window scenario
Indexes can also be Partitioned Syntax is the same as for tables “ON PartitionScheme(col)” They may be partitioned differently from the base table However, index alignment is the best practice If an index uses a similar partition function and same partitioning key as the base table, then the index is “aligned” Similar means the same number of partitions and same boundary points One-to-one correspondence between data in table and index partition All index entries in one partition map to data in a single partition of the base table
Can I Change a Partition? Yes, but do you want to? Not if there are already 100’s of millions of rows Plan partitioning very carefully and avoid a huge I/O problem later But if you have to add or remove a partition: SPLIT – adding a partition Taking 1 partition and splitting it into 2 partitions ALTER PARTITION FUNCTION pfname() SPLIT RANGE (new_boundary_value) MERGE – removing a partition Joining 2 partitions into 1 ALTER PARTITION FUNCTION pfname() MERGE RANGE (old_boundary_value) All affected tables are locked during the operation by EXCLUSIVE lock Logging is in effect similarly as in INSERT INTO … SELECT FROM No data moves in or out of the table; only rows of tables and indexes are moved from one partition to another This is the I/O killer, so plan ahead of time to avoid doing this
Partitioning Best Practices Plan Plan Plan ahead and save yourself from pain Ensure minimal data movement for SPLIT MERGE Have many partition functions so any alteration of one function will have limited impact Use storage and index alignment Empty partition should be the one that is “SPLIT” Watch how filegroups are assigned to partitions Spread the filegroups over many drives – let SQL Server handle the I/O distribution (e.g. each partition on a separate disk may not be the best choice)
Feedback / QnA Your Feedback is Important! Please take a few moments to fill out our online feedback form For detailed feedback, use the form at Or us at Use the Question Manager on LiveMeeting to ask your questions now!
© 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.