Doing fast! Optimizing Query performance with ColumnStore Indexes in SQL Server 2012 Margarita Naumova | SQL Master Academy
Agenda What is a columnstore index How it works Limits and restrictions When and how to use it
Working with SQL Server from v6.5 MCT from 1998 SQL Server Trainer and Consultant with over 50 projects in Bulgaria, Finland, Germany, UK, Greece.. 5 years Senior Consultant in Microsoft, member of Microsoft Worldwide Technical Leadership Team Microsoft Certified Master SQL Server 2008 BG SQL UG Leader MVP SQL Server Now teaching and presenting SQL Server, running SQL Master Academy Training Program About me
What is a Columnstore Index A new column store in SQL Server Database Engine Advanced Query processing techniques Accelerates specific workloads (DW)
Performance Example of CSI 1TB DW Database, 1.44 billion rows in fact table, 32 logical procs with 256GB RAM
DEMO CSI Performance Example
A columnstore Index Structure Stores data column-wise Better compression –Uses VertiPaq compression technology Less IO –Better Buffer hit rates Pure Column Store Fetches only needed columns from disk Only needed cols are fetched Segment (values for 1 col for a set of about 1mnl of rows)
Advanced Query processing techniques Processing mode –Batch mode –Row mode Best performance - when processes column data in batches using highly efficient algorithms Optimizer makes a cost based decision Segment elimination is an additional optimization Optimizer allows you to use hints
DEMO CSI Internals
DO’s for using CSI Effectively Put CSI on large tables only –Large fact tables and very large dimension table Include every column of the table in the columnstore index Use in read mostly workload, where most updates are appending new data Workflow permits using partitioning (or drop rebuild index) to handle new data Structure your queries as star joins with grouping and aggregation as much as possible or entail scanning and aggregating large amounts of data
ColumnStore Index Optimizations Rules DOP>2 and enough memory On ColumnStore indexed tables avoid using: –joins and filters on string columns –OUTER JOIN –NOT IN ( ) –UNION ALL to directly combine columnstore-indexed tables with other tables
Loading data Table with columnstore index can be read, not updated –INSERT, UPDATE, DELETE and MERGE is not allowed –Partition switching is allowed There are three possible methods for loading data –Disabling index –Partition switching –UNION ALL (tickle update)
Loading Method 1 Disable (or drop) the index –ALTER INDEX ….DISABLE Update the table Rebuild the columnstore index ALTER INDEX … REBUILD
Loading Method 2 Load new data into a staging table Build a columnstore index on the staging table –CREATE NONCLUSTERED COLUMNSTORE INDEX myindex ON Staging(……) Switch the partition into the main table –ALTER TABLE Staging SWITCH TO PARTITION 5 5/11 Staging table Load Switch
DEMO Loading and updating data in CSI
Loading Method 3 Build CSI index on primary table Create staging/delta table with no CSI Insert new data into (row-based) delta table Query both tables using UNION ALL to combine the results –Perform local-global aggregation using CTE aggregate the data from the columnstore and from the delta table in separate queries combine the results with a second level of grouping and aggregation –Do not make a view with UNION ALL Keep delta table small
Other Usage and limitations Must have a base table structure – heap or Clustered Index CSI is not appropriate for lookups, seeks and in OLTP as a general workload type Supported data types should be <8 Bytes The following data types cannot be included in a columnstore index: –binary and varbinary, ntext, text, and image –varchar(max) and nvarchar(max) –Uniqueidentifier, rowversion (and timestamp) –decimal (and numeric) with precision greater than 18 digits –datetimeoffset with scale greater than 2 –CLR types (hierarchyid and spatial types), xml
Good news Most things just work with columnstore indexes –Backup and restore –Mirroring –Log Shipping –SSMS –Administration tools –TDE
CSI Summary Benefits –Interactive experience with data Near instant response times Ad hoc queries Large data sets –Easy to set up No need for summary tables, indexed views Fewer indexes to design, create and maintain Reduces need to manually tune queries Consider limitation in V1 and overcome with –Good coding practices to achieve batch processing –Respective method of loading data
Thank you! Margarita Naumova| SQL Master Academy