The Five Ws of Columnstore Indexes

Slides:



Advertisements
Similar presentations
SQL SERVER 2012 XVELOCITY COLUMNSTORE INDEX Conor Cunningham Principal Architect SQL Server Engine.
Advertisements

Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)
Project Management Database and SQL Server Katmai New Features Qingsong Yao
SQL Server 2005 Implementation and Maintenance Chapter 12: Achieving High Availability Through Replication.
SQLintersection Putting the "Squeeze" on Large Tables Improve Performance and Save Space with Data Compression Justin Randall Tuesday,
INTRODUCING SQL SERVER 2012 COLUMNSTORE INDEXES Exploring and Managing SQL Server 2012 Database Engine Improvements.
October 15-18, 2013 Charlotte, NC Accelerating Database Performance Using Compression Joseph D’Antoni, Solutions Architect Anexinet.
Boosting DWH-Performance with SQL Server 2016 ColumnStore Index.
--A Gem of SQL Server 2012, particularly for Data Warehousing-- Present By Steven Wang.
SQLUG.be Case study: Redesign CDR archiving on SQL Server 2012 By Ludo Bernaerts April 16,2012.
October 15-18, 2013 Charlotte, NC SQL Server Index Internals Tim Chapman Premier Field Engineer.
Execution Plans Detail From Zero to Hero İsmail Adar.
Oracle Announced New In- Memory Database G1 Emre Eftelioglu, Fen Liu [09/27/13] 1 [1]
Doing fast! Optimizing Query performance with ColumnStore Indexes in SQL Server 2012 Margarita Naumova | SQL Master Academy.
Best Practices for Columnstore Indexes Warner Chaves SQL MCM / MVP SQLTurbo.com Pythian.com.
A Lap Around Columstore Martin Catherall SQL Saturday #464, Melbourne 20 th February 2016.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Memory-Optimized Tables Querying at the speed of light.
Introduction to columnstore indexes Taras Bobrovytskyi SQL wincor nixdorf.
Enable Operational Analytics (HTAP) in SQL Server 2016 and Azure SQL Database Sunil Agarwal Principal Program Manager, SQL Server Product Tiger Team
Columnstore Indexing: From SQL Server 2012 to SQL Server 2014
In-Memory Capabilities
Temporal Databases Microsoft SQL Server 2016
5/25/2018 5:29 AM BRK3081 Delivering High Performance Analytics with Columnstore Index on Traditional DW and HTAP Workloads Sunil Agarwal (Microsoft) Aaron.
Operational Analytics in SQL Server 2016 and Azure SQL Database
Temporal Databases Microsoft SQL Server 2016
UFC #1433 In-Memory tables 2014 vs 2016
Finding more space for your tight environment
Columnstore Index - is it the DW "Faster" switch you are looking for?
Introduction to SQL Server Management for the Non-DBA
Database Performance Tuning and Query Optimization
Why Should I Care About … Partitioned Views?
Administering and Deploying Power BI Solutions
Reading Execution Plans Successfully
Four Rules For Columnstore Query Performance
Database Administration for the Non-DBA
Blazing-Fast Performance:
Migrating a Disk-based Table to a Memory-optimized one in SQL Server
Marcos Freccia Stop everything! Top T-SQL tricks to a developer
PREMIER SPONSOR GOLD SPONSORS SILVER SPONSORS BRONZE SPONSORS SUPPORTERS.
ColumnStore Index Primer
Azure SQL Data Warehouse Performance Tuning
SQL 2014 In-Memory OLTP What, Why, and How
Introduction to columnstore indexes
Introduction to partitioning
TechEd /20/ :49 PM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.
20 Questions with Azure SQL Data Warehouse
11/29/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Steve Hood SimpleSQLServer.com
TechEd /2/2018 7:32 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
Azure SQL DWH: Optimization
Microsoft SQL Server 2014 for Oracle DBAs Module 7
The Five Ws of Columnstore Indexes
Realtime Analytics OLAP & OLTP in the mix
Indexing For Optimal Performance
Sunil Agarwal | Principal Program Manager
Please support our sponsors
Four Rules For Columnstore Query Performance
Clustered Columnstore Indexes (SQL Server 2014)
Chapter 11 Database Performance Tuning and Query Optimization
Diving into Query Execution Plans
Using Columnstore indexes in Azure DevOps Services. Lessons learned
Using Columnstore indexes in Azure DevOps Services. Lessons learned
SQL Server Columnar Storage
All about Indexes Gail Shaw.
XML? What’s this doing in my database? Adam Koehler
Using Columnstore indexes in Azure DevOps Services. Lessons learned.
Sunil Agarwal | Principal Program Manager
An Introduction to Partitioning
Presentation transcript:

The Five Ws of Columnstore Indexes Madison | APR 7 2018 The Five Ws of Columnstore Indexes

SQL Saturday Madison: Silver Sponsors

SQL Saturday Madison: Gold Sponsor

SQL Saturday Madison: Gold Sponsor

SQL Saturday Madison After Party Join us @6:30pm for some networking and fun. Appetizers provided. Madison’s 119 King Street Madison, WI 53703

Join your local WI Chapter FoxPASS - Appleton, WI MADPASS - Madison, WI Western Wisconsin PASS - Eau Claire, WI WausauPASS - Wausau, WI WI SSUG - Waukesha, WI

Save $$$ on your PASS Summit Registration PASS Summit is the largest conference for technical professionals who leverage the Microsoft Data Platform. November 6th – 9th Seattle, WA Use this code to save $150 off your registration: SSDISHN1C Use this code to get access to all 2017 Summit sessions: SQLSTRHN1C

Agenda Who? What? Why? How? Where? When? Demos

Who? Eureka Dr. Seuss Whoville Whos Deco Trim via Amazon.com

Who is this guy? John Eisbrener @johnedba john@dbatlas.com DBA: Default Blame Assignee DBA for over 10 years MSSQL, Oracle, Greenplum, Postgres Owner/Principal Consultant of a boutique consulting firm, DB Atlas http://www.dbatlas.com

Who are you? DBAs Architects Developers Analysts Management Others

What? Knowyourmeme.com

What makes Columnstore Indexes Special? What is an Index? Key differences between Rowstore and Columnstore Indexes Key differences between Clustered and Nonclustered Columnstore Indexes

What is an Index? Structure that contains data or pointers to data Designed to search for data efficiently Designed to perform as a database grows in size The type of index determines how data is stored on disk Highly customizable Columnstore Unofficial Versions SQL 2012 – Alpha SQL 2014 – Beta SQL 2016 – Version 1.0 SQL 2017 – Version 1.1

Difference between Rowstore and Columnstore Indexes Rowstore Index Columnstore Index Row-wise Format Compression is optional Returns all columns defined within the index B+ Trees Column-wise format Compression is required Returns only the columns needed Header and Data https://docs.microsoft.com/en-us/sql/relational-databases/indexes/indexes https://en.wikipedia.org/wiki/B%2B_tree

Row-wise vs Column-Wise Storage

Clustered vs Nonclustered Clustered Columnstore Index (CCI) Nonclustered Columnstore Index (NCCI) One Per Table Table is Stored in Column- wise format Significant Table Compression Cannot define a filter One Per Table Sits on top of Heap or Clustered (Rowstore) Index Copy of Data; uses more space Can define a filter

Why? Youtube.com

Why do Columnstore Indexes work so well? Importance of Compression Brief Overview of Dictionary-based algorithms Column Elimination Rowgroup Elimination

Importance of Compression Reduce Limitations imposed by Data Storage Disk Memory Throughput Proprietary Compression Algorithm Dictionary Based https://blogs.msdn.microsoft.com/sqlserverstorageengine/2007/09/30/data-compression-techniques-and-trade-offs/

Dictionary-Based Compression Lossless General Approach Build a Dictionary of Symbols (e.g. words, numbers, etc.) Assign minimal binary codes to each Symbol Smaller binary codes are assigned to more common symbols Replace raw data Symbols with Binary Codes to reduce the size of the data Works best when Symbols are homogenous https://en.wikipedia.org/wiki/Huffman_coding https://en.wikipedia.org/wiki/LZ77_and_LZ78

Dictionary-Based Compression

Column Elimination Return only those columns used within the Query Better compression ratios for data being returned because data is homogenous Column ordering in the (N)CCI Index Definition doesn’t matter, Column Elimination will happen regardless NCCI ordering is defined by the underlying Rowstore Indexes CCI Ascending/Descending order can be implied with how the data is loaded WITH (MAXDOP = 1) Partitioning can also help https://blogs.msdn.microsoft.com/sql_server_team/columnstore-index-performance-column-elimination/ https://orderbyselectnull.com/2017/07/19/cci-partitioning-part-1-rowgroup-elimination-fragmentation/

Rowgroup Elimination Also referred to as Segment Elimination If the Segment doesn’t contain values identified within the Query Predicate, the entire Rowgroup is eliminated Occurs prior to Column Elimination Not utilized for LOB-based, string-based, or binary datatypes Evaluation of the Segment Header Stores Min/Max of values within Segment https://blogs.msdn.microsoft.com/sql_server_team/columnstore-index-performance-rowgroup-elimination/ http://www.nikoport.com/2015/06/28/columnstore-indexes-part-57-segment-alignment-maintenance/

Rowgroup Elimination Example

How? How it’s Made

How do Columnstore Indexes work with changing data? Rowgroups DeltaStore Inserts Deletes and Updates Tuple Mover ColumnStore Batch Execution Mode

Rowgroups Buckets of up to 1 million rows Can be in one of 3 states Open Closed Compressed Open/Closed are stored in Row-wise format Compressed is stored in Column-wise format

DeltaStore

Tuple Mover

ColumnStore

Batch Execution Mode Introduced in SQL 2012 along with Columnstore Indexes Columnstore Index is required on the table Only usable by certain execution plan operators Aggregates/Scans/Hash Matches/Window Aggregates Passes a batch of up to 900 rows between execution plan operators Basically a turbo button for execution plans https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-query-performance http://www.sqlservercentral.com/articles/Stairway+Series/145064/

Where? Lego.com

Where can you use Columnstore Indexes? Datatype Restrictions NCCI Restrictions Optimal Workloads CCIs NCCIs

Datatype Restrictions Will not work with the following datatypes ntext, text, and image nvarchar(max), varchar(max), and varbinary(max) Does not apply to CCIs in SQL Server 2017 only rowversion (and timestamp) sql_variant CLR types (hierarchyid and spatial types) xml https://docs.microsoft.com/en-us/sql/t-sql/statements/create-columnstore-index-transact-sql#LimitRest

NCCI Restrictions Cannot have more than 1024 columns Cannot be created on a view or indexed view Cannot include a sparse column Cannot be redefined by using the ALTER INDEX statement Use CREATE INDEX WITH (DROP_EXISTING = ON) Cannot include large object (LOB) columns of type nvarchar(max), varchar(max), and varbinary(max) https://docs.microsoft.com/en-us/sql/t-sql/statements/create-columnstore-index-transact-sql#LimitRest

Optimal Workloads - CCIs Traditional DWH Fact Tables Dimension Tables with over 1 million rows Insert Mostly Workloads History Table of a Temporal Table Logging Tables Updates/Deletes < 10% of all DML Create Nonclustered (Rowstore) Indexes on CCI Improve Query Performance by avoiding Full-Table Scans Large In-Memory OLTP tables https://blogs.msdn.microsoft.com/sql_server_team/columnstore-index-which-columnstore-index-is-right-for-my-workload/ https://blogs.msdn.microsoft.com/sql_server_team/columnstore-index-why-do-i-need-to-create-clustered-columnstore-index-on-in-memory-oltp-tables-for-analytics/

Optimal Workloads - NCCIs OLTP tables with more than 1 million rows Tables that may feed a large number of analytical/aggregate queries Common tables feeding SSRS/Power BI Reports Tables that generate a high amount of Scans Very wide tables that are not easy to create Covering Indexes on Tables that could benefit from being a CCI, but cannot be offline for a long period of time https://blogs.msdn.microsoft.com/sql_server_team/columnstore-index-which-columnstore-index-is-right-for-my-workload/

Identify Candidate Tables Several Scripts have been developed by the community Niko Neugebauer (GitHub Library CISL) Sunil Agarwal (Microsoft Blog Post)

When? Apple.com

When to use various Columnstore Features? Compression Delay Filtered NCCIs Maintenance Routines With other features in SQL Server

Compression Delay Keyword Used to delay the Tuple Mover from moving a Closed Rowgroup to a Compressed Rowgroup Max value is 10080, or 7 days Helpful for frequently-updated “hot” data Closed Rowgroups can still be updated/deleted, only when a Rowgroup is compressed is the data immutable Compressing a Closed Rowgroup will require system resources, and you may want these operations to run off-hours

Filtered NCCIs Use Compression Delay isn’t long enough Query Engine will use what it can from Filtered NCCI and pull remaining data from Rowstore Index Must Redefine using CREATE NONCLUSTERED COLUMNSTORE INDEX WITH (DROP_EXISTING=ON) Requires specific SET Options https://docs.microsoft.com/en-us/sql/relational-databases/indexes/get-started-with-columnstore-for-real-time-operational-analytics https://docs.microsoft.com/en-us/sql/t-sql/statements/create-columnstore-index-transact-sql#filtered-indexes

Maintenance Routines Reorganize Physically removes rows from a rowgroup when 10% or more of the rows have been logically deleted Combines one or more compressed rowgroups to increase rows per rowgroup up to the maximum of 1,024,576 rows Manually Compresses any Closed RowGroups Compresses all Closed AND Open RowGroups when using WITH (COMPRESS_ALL_ROW_GROUPS) hint

Maintenance Routines (Continued) Rebuild Re-compresses all data into the columnstore Historically (e.g. 2014 and 2012) used to be the only way to reduce fragmentation Locks the table during the rebuild operation SQL 2017 introduces ONLINE rebuilds for NCCIs only Will be used primarily when there is a lot of fragmentation within the Compressed Rowgroups

Other features that work well with Columnstore Indexes Temporal Tables CCI on History Table Availability Groups with Read-Only Replicas Point your reports there! Partitioned Tables

Demos OhMaGif.com