The Five Ws of Columnstore Indexes Madison | APR 7 2018 The Five Ws of Columnstore Indexes
SQL Saturday Madison: Silver Sponsors
SQL Saturday Madison: Gold Sponsor
SQL Saturday Madison: Gold Sponsor
SQL Saturday Madison After Party Join us @6:30pm for some networking and fun. Appetizers provided. Madison’s 119 King Street Madison, WI 53703
Join your local WI Chapter FoxPASS - Appleton, WI MADPASS - Madison, WI Western Wisconsin PASS - Eau Claire, WI WausauPASS - Wausau, WI WI SSUG - Waukesha, WI
Save $$$ on your PASS Summit Registration PASS Summit is the largest conference for technical professionals who leverage the Microsoft Data Platform. November 6th – 9th Seattle, WA Use this code to save $150 off your registration: SSDISHN1C Use this code to get access to all 2017 Summit sessions: SQLSTRHN1C
Agenda Who? What? Why? How? Where? When? Demos
Who? Eureka Dr. Seuss Whoville Whos Deco Trim via Amazon.com
Who is this guy? John Eisbrener @johnedba john@dbatlas.com DBA: Default Blame Assignee DBA for over 10 years MSSQL, Oracle, Greenplum, Postgres Owner/Principal Consultant of a boutique consulting firm, DB Atlas http://www.dbatlas.com
Who are you? DBAs Architects Developers Analysts Management Others
What? Knowyourmeme.com
What makes Columnstore Indexes Special? What is an Index? Key differences between Rowstore and Columnstore Indexes Key differences between Clustered and Nonclustered Columnstore Indexes
What is an Index? Structure that contains data or pointers to data Designed to search for data efficiently Designed to perform as a database grows in size The type of index determines how data is stored on disk Highly customizable Columnstore Unofficial Versions SQL 2012 – Alpha SQL 2014 – Beta SQL 2016 – Version 1.0 SQL 2017 – Version 1.1
Difference between Rowstore and Columnstore Indexes Rowstore Index Columnstore Index Row-wise Format Compression is optional Returns all columns defined within the index B+ Trees Column-wise format Compression is required Returns only the columns needed Header and Data https://docs.microsoft.com/en-us/sql/relational-databases/indexes/indexes https://en.wikipedia.org/wiki/B%2B_tree
Row-wise vs Column-Wise Storage
Clustered vs Nonclustered Clustered Columnstore Index (CCI) Nonclustered Columnstore Index (NCCI) One Per Table Table is Stored in Column- wise format Significant Table Compression Cannot define a filter One Per Table Sits on top of Heap or Clustered (Rowstore) Index Copy of Data; uses more space Can define a filter
Why? Youtube.com
Why do Columnstore Indexes work so well? Importance of Compression Brief Overview of Dictionary-based algorithms Column Elimination Rowgroup Elimination
Importance of Compression Reduce Limitations imposed by Data Storage Disk Memory Throughput Proprietary Compression Algorithm Dictionary Based https://blogs.msdn.microsoft.com/sqlserverstorageengine/2007/09/30/data-compression-techniques-and-trade-offs/
Dictionary-Based Compression Lossless General Approach Build a Dictionary of Symbols (e.g. words, numbers, etc.) Assign minimal binary codes to each Symbol Smaller binary codes are assigned to more common symbols Replace raw data Symbols with Binary Codes to reduce the size of the data Works best when Symbols are homogenous https://en.wikipedia.org/wiki/Huffman_coding https://en.wikipedia.org/wiki/LZ77_and_LZ78
Dictionary-Based Compression
Column Elimination Return only those columns used within the Query Better compression ratios for data being returned because data is homogenous Column ordering in the (N)CCI Index Definition doesn’t matter, Column Elimination will happen regardless NCCI ordering is defined by the underlying Rowstore Indexes CCI Ascending/Descending order can be implied with how the data is loaded WITH (MAXDOP = 1) Partitioning can also help https://blogs.msdn.microsoft.com/sql_server_team/columnstore-index-performance-column-elimination/ https://orderbyselectnull.com/2017/07/19/cci-partitioning-part-1-rowgroup-elimination-fragmentation/
Rowgroup Elimination Also referred to as Segment Elimination If the Segment doesn’t contain values identified within the Query Predicate, the entire Rowgroup is eliminated Occurs prior to Column Elimination Not utilized for LOB-based, string-based, or binary datatypes Evaluation of the Segment Header Stores Min/Max of values within Segment https://blogs.msdn.microsoft.com/sql_server_team/columnstore-index-performance-rowgroup-elimination/ http://www.nikoport.com/2015/06/28/columnstore-indexes-part-57-segment-alignment-maintenance/
Rowgroup Elimination Example
How? How it’s Made
How do Columnstore Indexes work with changing data? Rowgroups DeltaStore Inserts Deletes and Updates Tuple Mover ColumnStore Batch Execution Mode
Rowgroups Buckets of up to 1 million rows Can be in one of 3 states Open Closed Compressed Open/Closed are stored in Row-wise format Compressed is stored in Column-wise format
DeltaStore
Tuple Mover
ColumnStore
Batch Execution Mode Introduced in SQL 2012 along with Columnstore Indexes Columnstore Index is required on the table Only usable by certain execution plan operators Aggregates/Scans/Hash Matches/Window Aggregates Passes a batch of up to 900 rows between execution plan operators Basically a turbo button for execution plans https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-query-performance http://www.sqlservercentral.com/articles/Stairway+Series/145064/
Where? Lego.com
Where can you use Columnstore Indexes? Datatype Restrictions NCCI Restrictions Optimal Workloads CCIs NCCIs
Datatype Restrictions Will not work with the following datatypes ntext, text, and image nvarchar(max), varchar(max), and varbinary(max) Does not apply to CCIs in SQL Server 2017 only rowversion (and timestamp) sql_variant CLR types (hierarchyid and spatial types) xml https://docs.microsoft.com/en-us/sql/t-sql/statements/create-columnstore-index-transact-sql#LimitRest
NCCI Restrictions Cannot have more than 1024 columns Cannot be created on a view or indexed view Cannot include a sparse column Cannot be redefined by using the ALTER INDEX statement Use CREATE INDEX WITH (DROP_EXISTING = ON) Cannot include large object (LOB) columns of type nvarchar(max), varchar(max), and varbinary(max) https://docs.microsoft.com/en-us/sql/t-sql/statements/create-columnstore-index-transact-sql#LimitRest
Optimal Workloads - CCIs Traditional DWH Fact Tables Dimension Tables with over 1 million rows Insert Mostly Workloads History Table of a Temporal Table Logging Tables Updates/Deletes < 10% of all DML Create Nonclustered (Rowstore) Indexes on CCI Improve Query Performance by avoiding Full-Table Scans Large In-Memory OLTP tables https://blogs.msdn.microsoft.com/sql_server_team/columnstore-index-which-columnstore-index-is-right-for-my-workload/ https://blogs.msdn.microsoft.com/sql_server_team/columnstore-index-why-do-i-need-to-create-clustered-columnstore-index-on-in-memory-oltp-tables-for-analytics/
Optimal Workloads - NCCIs OLTP tables with more than 1 million rows Tables that may feed a large number of analytical/aggregate queries Common tables feeding SSRS/Power BI Reports Tables that generate a high amount of Scans Very wide tables that are not easy to create Covering Indexes on Tables that could benefit from being a CCI, but cannot be offline for a long period of time https://blogs.msdn.microsoft.com/sql_server_team/columnstore-index-which-columnstore-index-is-right-for-my-workload/
Identify Candidate Tables Several Scripts have been developed by the community Niko Neugebauer (GitHub Library CISL) Sunil Agarwal (Microsoft Blog Post)
When? Apple.com
When to use various Columnstore Features? Compression Delay Filtered NCCIs Maintenance Routines With other features in SQL Server
Compression Delay Keyword Used to delay the Tuple Mover from moving a Closed Rowgroup to a Compressed Rowgroup Max value is 10080, or 7 days Helpful for frequently-updated “hot” data Closed Rowgroups can still be updated/deleted, only when a Rowgroup is compressed is the data immutable Compressing a Closed Rowgroup will require system resources, and you may want these operations to run off-hours
Filtered NCCIs Use Compression Delay isn’t long enough Query Engine will use what it can from Filtered NCCI and pull remaining data from Rowstore Index Must Redefine using CREATE NONCLUSTERED COLUMNSTORE INDEX WITH (DROP_EXISTING=ON) Requires specific SET Options https://docs.microsoft.com/en-us/sql/relational-databases/indexes/get-started-with-columnstore-for-real-time-operational-analytics https://docs.microsoft.com/en-us/sql/t-sql/statements/create-columnstore-index-transact-sql#filtered-indexes
Maintenance Routines Reorganize Physically removes rows from a rowgroup when 10% or more of the rows have been logically deleted Combines one or more compressed rowgroups to increase rows per rowgroup up to the maximum of 1,024,576 rows Manually Compresses any Closed RowGroups Compresses all Closed AND Open RowGroups when using WITH (COMPRESS_ALL_ROW_GROUPS) hint
Maintenance Routines (Continued) Rebuild Re-compresses all data into the columnstore Historically (e.g. 2014 and 2012) used to be the only way to reduce fragmentation Locks the table during the rebuild operation SQL 2017 introduces ONLINE rebuilds for NCCIs only Will be used primarily when there is a lot of fragmentation within the Compressed Rowgroups
Other features that work well with Columnstore Indexes Temporal Tables CCI on History Table Availability Groups with Read-Only Replicas Point your reports there! Partitioned Tables
Demos OhMaGif.com