Download presentation
Presentation is loading. Please wait.
1
The Five Ws of Columnstore Indexes
Chicago | MAR The Five Ws of Columnstore Indexes
2
Agenda Who? What? Why? How? Where? When? Demos
3
Who? Eureka Dr. Seuss Whoville Whos Deco Trim via Amazon.com
4
Who is this guy? John Eisbrener @johnedba john@dbatlas.com
DBA: Default Blame Assignee DBA for over 10 years MSSQL, Oracle, Greenplum, Postgres Owner/Principal Consultant of a boutique consulting firm, DB Atlas
5
Who are you? DBAs Architects Developers Analysts Management Others
6
What? Knowyourmeme.com
7
What makes Columnstore Indexes Special?
What is an Index? Key differences between Rowstore and Columnstore Indexes Key differences between Clustered and Nonclustered Columnstore Indexes
8
What is an Index? Structure that contains data or pointers to data
Designed to search for data efficiently Designed to perform as a database grows in size The type of index determines how data is stored on disk Highly customizable Columnstore Unofficial Versions SQL 2012 – Alpha SQL 2014 – Beta SQL 2016 – Version 1.0 SQL 2017 – Version 1.1
9
Difference between Rowstore and Columnstore Indexes
Rowstore Index Columnstore Index Row-wise Format Compression is optional Returns all columns defined within the index B+ Trees Column-wise format Compression is required Returns only the columns needed Header and Data
10
Row-wise vs Column-Wise Storage
11
Clustered vs Nonclustered
Clustered Columnstore Index (CCI) Nonclustered Columnstore Index (NCCI) One Per Table Table is Stored in Column- wise format Significant Table Compression Cannot define a filter One Per Table Sits on top of Heap or Clustered (Rowstore) Index Copy of Data; uses more space Can define a filter
12
Why? Youtube.com
13
Why do Columnstore Indexes work so well?
Importance of Compression Brief Overview of Dictionary-based algorithms Column Elimination Rowgroup Elimination
14
Importance of Compression
Reduce Limitations imposed by Data Storage Disk Memory Throughput Proprietary Compression Algorithm Dictionary Based
15
Dictionary-Based Compression
Lossless General Approach Build a Dictionary of Symbols (e.g. words, numbers, etc.) Assign minimal binary codes to each Symbol Smaller binary codes are assigned to more common symbols Replace raw data Symbols with Binary Codes to reduce the size of the data Works best when Symbols are homogenous
16
Dictionary-Based Compression
17
Column Elimination Return only those columns used within the Query
Better compression ratios for data being returned because data is homogenous Column ordering in the (N)CCI Index Definition doesn’t matter, Column Elimination will happen regardless NCCI ordering is defined by the underlying Rowstore Indexes CCI Ascending/Descending order can be implied with how the data is loaded WITH (MAXDOP = 1) Partitioning can also help
18
Rowgroup Elimination Also referred to as Segment Elimination
If the Segment doesn’t contain values identified within the Query Predicate, the entire Rowgroup is eliminated Occurs prior to Column Elimination Not utilized for LOB-based, string-based, or binary datatypes Evaluation of the Segment Header Stores Min/Max of values within Segment
19
Rowgroup Elimination Example
20
How? How it’s Made
21
How do Columnstore Indexes work with changing data?
Rowgroups DeltaStore Inserts Deletes and Updates Tuple Mover ColumnStore Batch Execution Mode
22
Rowgroups Buckets of up to 1 million rows Can be in one of 3 states
Open Closed Compressed Open/Closed are stored in Row-wise format Compressed is stored in Column-wise format
23
DeltaStore
24
Tuple Mover
25
ColumnStore
26
Batch Execution Mode Introduced in SQL 2012 along with Columnstore Indexes Columnstore Index is required on the table Only usable by certain execution plan operators Aggregates/Scans/Hash Matches/Window Aggregates Passes a batch of up to 900 rows between execution plan operators Basically a turbo button for execution plans
27
Where? Lego.com
28
Where can you use Columnstore Indexes?
Datatype Restrictions NCCI Restrictions Optimal Workloads CCIs NCCIs
29
Datatype Restrictions
Will not work with the following datatypes ntext, text, and image nvarchar(max), varchar(max), and varbinary(max) Does not apply to CCIs in SQL Server 2017 only rowversion (and timestamp) sql_variant CLR types (hierarchyid and spatial types) xml
30
NCCI Restrictions Cannot have more than 1024 columns
Cannot be created on a view or indexed view Cannot include a sparse column Cannot be redefined by using the ALTER INDEX statement Use CREATE INDEX WITH (DROP_EXISTING = ON) Cannot include large object (LOB) columns of type nvarchar(max), varchar(max), and varbinary(max)
31
Optimal Workloads - CCIs
Traditional DWH Fact Tables Dimension Tables with over 1 million rows Insert Mostly Workloads History Table of a Temporal Table Logging Tables Updates/Deletes < 10% of all DML Create Nonclustered (Rowstore) Indexes on CCI Improve Query Performance by avoiding Full-Table Scans Large In-Memory OLTP tables
32
Optimal Workloads - NCCIs
OLTP tables with more than 1 million rows Tables that may feed a large number of analytical/aggregate queries Common tables feeding SSRS/Power BI Reports Tables that generate a high amount of Scans Very wide tables that are not easy to create Covering Indexes on Tables that could benefit from being a CCI, but cannot be offline for a long period of time
33
Identify Candidate Tables
Several Scripts have been developed by the community Niko Neugebauer (GitHub Library CISL) Sunil Agarwal (Microsoft Blog Post)
34
When? Apple.com
35
When to use various Columnstore Features?
Compression Delay Filtered NCCIs Maintenance Routines With other features in SQL Server
36
Compression Delay Keyword
Used to delay the Tuple Mover from moving a Closed Rowgroup to a Compressed Rowgroup Max value is 10080, or 7 days Helpful for frequently-updated “hot” data Closed Rowgroups can still be updated/deleted, only when a Rowgroup is compressed is the data immutable Compressing a Closed Rowgroup will require system resources, and you may want these operations to run off-hours
37
Filtered NCCIs Use Compression Delay isn’t long enough
Query Engine will use what it can from Filtered NCCI and pull remaining data from Rowstore Index Must Redefine using CREATE NONCLUSTERED COLUMNSTORE INDEX WITH (DROP_EXISTING=ON) Requires specific SET Options
38
Maintenance Routines Reorganize
Physically removes rows from a rowgroup when 10% or more of the rows have been logically deleted Combines one or more compressed rowgroups to increase rows per rowgroup up to the maximum of 1,024,576 rows Manually Compresses any Closed RowGroups Compresses all Closed AND Open RowGroups when using WITH (COMPRESS_ALL_ROW_GROUPS) hint
39
Maintenance Routines (Continued)
Rebuild Re-compresses all data into the columnstore Historically (e.g and 2012) used to be the only way to reduce fragmentation Locks the table during the rebuild operation SQL 2017 introduces ONLINE rebuilds for NCCIs only Will be used primarily when there is a lot of fragmentation within the Compressed Rowgroups
40
Other features that work well with Columnstore Indexes
Temporal Tables CCI on History Table Availability Groups with Read-Only Replicas Point your reports there! Partitioned Tables
41
Demos OhMaGif.com
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.