ColumnStore Index Primer

Slides:



Advertisements
Similar presentations
Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.
Advertisements

Big Data Working with Terabytes in SQL Server Andrew Novick
Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Parallel Execution Plans Joe Chang
INTRODUCING SQL SERVER 2012 COLUMNSTORE INDEXES Exploring and Managing SQL Server 2012 Database Engine Improvements.
2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN Welcome November 2012 Columnstore Indexes.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
5 Trends in the Data Warehousing Space Source: TDWI Report – Next Generation DW.
October 15-18, 2013 Charlotte, NC Accelerating Database Performance Using Compression Joseph D’Antoni, Solutions Architect Anexinet.
Boosting DWH-Performance with SQL Server 2016 ColumnStore Index.
--A Gem of SQL Server 2012, particularly for Data Warehousing-- Present By Steven Wang.
SQLUG.be Case study: Redesign CDR archiving on SQL Server 2012 By Ludo Bernaerts April 16,2012.
Turbocharge your DW Queries with ColumnStore Indexes Susan Price Senior Program Manager DW and Big Data.
Column Oriented Database By: Deepak Sood Garima Chhikara Neha Rani Vijayita Gumber.
Doing fast! Optimizing Query performance with ColumnStore Indexes in SQL Server 2012 Margarita Naumova | SQL Master Academy.
Best Practices for Columnstore Indexes Warner Chaves SQL MCM / MVP SQLTurbo.com Pythian.com.
A Lap Around Columstore Martin Catherall SQL Saturday #464, Melbourne 20 th February 2016.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Memory-Optimized Tables Querying at the speed of light.
Clustered Columnstore index deep dive
In-Memory Capabilities
CPS216: Data-intensive Computing Systems
5/25/2018 5:29 AM BRK3081 Delivering High Performance Analytics with Columnstore Index on Traditional DW and HTAP Workloads Sunil Agarwal (Microsoft) Aaron.
Operational Analytics in SQL Server 2016 and Azure SQL Database
Lecture 16: Data Storage Wednesday, November 6, 2006.
UFC #1433 In-Memory tables 2014 vs 2016
Finding more space for your tight environment
Four Rules For Columnstore Query Performance
Evaluation of Relational Operations
A developers guide to Azure SQL Data Warehouse
The Five Ws of Columnstore Indexes
Database Administration for the Non-DBA
Blazing-Fast Performance:
Migrating a Disk-based Table to a Memory-optimized one in SQL Server
Please support our sponsors
PREMIER SPONSOR GOLD SPONSORS SILVER SPONSORS BRONZE SPONSORS SUPPORTERS.
Cardinality Estimator 2014/2016
Azure SQL Data Warehouse Performance Tuning
Introduction to columnstore indexes
Relational Operations
TechEd /20/ :49 PM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.
20 Questions with Azure SQL Data Warehouse
11/29/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
TechEd /2/2018 7:32 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
Selected Topics: External Sorting, Join Algorithms, …
Azure SQL DWH: Optimization
The Five Ws of Columnstore Indexes
Microsoft Ignite /1/ :19 PM
Realtime Analytics OLAP & OLTP in the mix
Adding Lightness Better Performance through Compression
Sunil Agarwal | Principal Program Manager
Please support our sponsors
Four Rules For Columnstore Query Performance
Clustered Columnstore Indexes (SQL Server 2014)
Implementation of Relational Operations
Lecture 13: Query Execution
Evaluation of Relational Operations: Other Techniques
From adaptive to intelligent: query processing in SQL Server 2019
Using Columnstore indexes in Azure DevOps Services. Lessons learned
Using Columnstore indexes in Azure DevOps Services. Lessons learned
SQL Server Columnar Storage
Lecture 20: Representing Data Elements
From adaptive to intelligent:
Using Columnstore indexes in Azure DevOps Services. Lessons learned.
When to use indexing pro Features
Sunil Agarwal | Principal Program Manager
An Introduction to Partitioning
Presentation transcript:

ColumnStore Index Primer Sepand Gojgini ColumnStore Index Primer

SQL Summit Annual International Conference November 6 -9 | Seattle, WA 2 Days of Pre-Cons 200+ sessions over 3 days Over 5,000 SQL Professionals Evening Networking Activities

Please Support Our Sponsors

Let’s Not Distract Others Please silent your phone We can chat later Please don’t snore

Background Product Director at Datateam 10+ years of experience designing data model and ETL engines used globally Worked on world biggest relational Data Warehouse at Amazon Professional Dodgeball player (paid in Beer and Nachos) Introduce Jason and Datateam Built Data Warehouse solution used daily by Credit Union and Banks across Canada and internationally 15+ Petabyte data warehouse with single table exceeding 700+ Terabyte Experience building DW with MPP platforms like Redshift and Microsoft PDW Make

Agenda Introduction Architecture Terminology Query Execution Walkthrough Under The Cover DEMO Summary SQL Server 2012 Limitation SQL Server 2014 Improvement SQL Server 2016 Improvement SQL Server 2016 SP1 Questions Resources

Overview

ColumnStore: What? It is data logically organized by rows and column but physically stored in columnar data format Data Is compressed, managed and stored as collection partial columns

ColumnStore: Why? Designed to optimize access to large DWs Optimized for join on integer keys, aggregations, scans, reporting Faster query response time Transparent to the application or reports Details Highly compressed – Allows more data remain in RAM Aggressive Read Ahead Processes data in units called “batches”

Non-Clustered ColumnStore Does not need to include all of the columns in a table Can be combined with other indexes on the table After being added to table it becomes Read-Only (2012/2014) Requires additional storage to store a copy of column in the index

Clustered ColumnStore Available on Enterprise edition starting with SQL Server 2014/2016 SQL Server 2016 SP1 All Editions Include ALL the column in table Is the only index on the table Replaces B+ tree for storage with Columnar technology Table remain updatable*

Architecture

ColumnStore Vs. RowStore … Heaps, B-trees store data row-wise C1 C2 C3 C4 Columnstore indexes store data column-wise Each page stores data from a single column Highly compressed About 10x better than Row Store More data fits in memory Each column accessed independently Fetch only needed columns Can dramatically decrease I/O

Terminology – Part I Column Segment Row Group Deleted Bitmap Segment C1 C2 C3 C5 C6 C4 Column Segment Segment contains values from one column for a set of rows Segments for the same set of rows comprise a row group Segments are compressed independently Segment is unit of transfer between disk and memory Each segment stored in a separate LOB Row Group Up to 1 million logically contiguous rows Collection of column segments Row Group DeltaStore

Terminology – Part II DeltaStore Deleted Bitmap Segment C1 C2 C3 C5 C6 C4 DeltaStore Introduced with Clustered ColumnStore in SQL Server 2014 Is a RowStore table that holds rows until the number of rows is large enough to move into ColumnStore Uses traditional B-Tree for storage There could be multiple DeltaStores per ColumnStore Holds Maximum of 1,048,576 or 2^20 rows before new one is created Deleted Bitmap Identifies records deleted from Row Group Row Group DeltaStore

Terminology at Glance Data is highly compressed. Dramatically reduced IO. More Data can fit into memory.

Query Execution Walkthrough OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount 20101107 106 01 1 6 30.00 103 04 2 17.00 109 20.00 03 05 3 4 20101108 02 5 25.00 102 14.00 10.00 20101109 Let’s pretend we’re looking at a few hundred million rows...

1. Horizontally Partition (Row Groups) OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount 20101107 106 01 1 6 30.00 103 04 2 17.00 109 20.00 03 05 3 4 20101108 02 5 25.00 Up to 1M rows OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount 20101108 102 02 1 14.00 106 03 2 5 25.00 109 01 10.00 20101109 04 4 20.00 103 17.00

2. Vertically Partition via Columns (Segments) OrderDateKey 20101107 20101108 ProductKey 106 103 109 StoreKey 01 04 03 05 02 RegionKey 1 2 3 Quantity 6 1 2 4 5 SalesAmount 30.00 17.00 20.00 25.00 OrderDateKey 20101108 20101109 ProductKey 102 106 109 103 StoreKey 02 03 01 04 RegionKey 1 2 Quantity 1 5 4 SalesAmount 14.00 25.00 10.00 20.00 17.00

3. Compress Each Segment* OrderDateKey 20101107 20101108 ProductKey 106 103 109 StoreKey 01 04 03 05 02 RegionKey 1 2 Quantity 6 1 2 4 5 SalesAmount 30.00 17.00 20.00 25.00 Represents Up to 1M rows OrderDateKey 20101108 20101109 ProductKey 102 106 109 103 StoreKey 02 03 01 04 RegionKey 1 2 Quantity 1 5 4 SalesAmount 14.00 25.00 10.00 20.00 17.00 Summary: Three fundamental steps occur *under the covers* during Columnstore Index creation (this is transparent to the user): Horizontal partitioning into Row Groups (1M rows each) Vertical partitioning into Segments (constituent columns) Compression (aggressive—because columnar data tends to be homogenous) Some segments will compress more than others *Encoding is explained in future slide

Fetch only needed columns SELECT ProductKey, SUM (SalesAmount) FROM SalesTable WHERE OrderDateKey < 20101108 GROUP BY ProductKey StoreKey 01 04 03 05 02 RegionKey 1 2 3 Quantity 6 1 2 4 5 OrderDateKey 20101107 20101108 ProductKey 106 103 109 SalesAmount 30.00 17.00 20.00 25.00 StoreKey 02 03 01 04 RegionKey 1 2 Quantity 1 5 4 OrderDateKey 20101108 20101109 ProductKey 102 106 109 103 SalesAmount 14.00 25.00 10.00 20.00 17.00 Depending on query, the Optimizer performs Row Group (or Partition) & Segment elimination

Under the covers

Basic Operations Inserts1: Deletes: Updates: Added to one of currently open DeltaStores Deletes: If deleted row is found in RowGroup then the Deleted Bitmap information is updated with RowId of respective row If deleted row is still in DeltaStore then it is simply removed from B-Tree Updates: They are handled as combination of delete and insert using workflow above Heaps, B-trees store data row-wise Do not over-complicate this—fundamentally it’s as easy as it looks In DW scenario Updates are once a day operation 1 Bulk Insert API skips DeltaStore

Creation of new Row Group Tuple Mover: Once a DeltaStore reaches 1,048,576 it is closed Tuple Mover runs every 5 minutes and converts any closed DeltaStore into Row Group Added to one of currently Open DeltaStores There is single Tuple Mover per instance Bulk Insert API: If there are more than 102,400 records inserted into table in single operation it is triggered Skip insertion of rows into DeltaStore and create a new Row Group for that operation Heaps, B-trees store data row-wise Do not over-complicate this—fundamentally it’s as easy as it looks

Data Encoding - Integer (Value, Start Position, Length) Run-length Encoding RLE

Data Encoding - VarChar Dictionary Encoding

Summary

SQL Server 2012 Limitation Non-clustered ColumnStore—underlying B-tree still required to support the ColumnStore Some queries, even the schema, might have to be modified to fully leverage ColumnStore Read only, not updatable Not compatible with indexed views, filtered indexes, sparse columns, computed columns Only Inner Join operation is supported Datatype support significant but not complete int, real, string, money, datetime, decimal <= 18 digits

SQL Server 2014 Improvement Clustered ColumnStore – Underlying storage is Columnar Query rewrite is no longer necessary in order to use ColumnStore Clustered ColumnStore is now updatable Non-Clustered ColumnStore is still ready only Datatype support has been expanded Everything except: blobs, CLR, NVarChar(max), VarBinary(max), XML, Spatial Improvement to Query Engine Support for all flavors of JOINs OUTER JOIN Semi-join: IN, NOT IN UNION ALL Scalar aggregates Mixed mode plans Improvements in bitmaps, spill support, … Archival Compression was introduced approximately 27% space saving using second compression at page level Columnstore can replace Row Store in

SQL Server 2016 Improvements Additional B-Tree Indexes on Clustered ColumnStore index Updatable Non-Clustered ColumnStore Support for filtered Non-Clustered ColumnStore Non-Clustered ColumnStore support for In-Memory Tables Improvement to Query Engine Expanded Support for Batch mode execution Sort, Aggregates with multiple distinct functions Window Aggregate functions Window Aggregate Analytical functions Single-Threaded queries can also run in batch mode

SQL Server 2016 SP1 Now available in all editions ColumnStore Segment Cache Standard Edition – 32 GB Web Edition – 16 GB Express Edition – 352 MB MAXDOP Standard Edition – 2 Cores All Other Editions – 1Core Non-Enterprise Limitations Aggregate Pushdown = No String Predicate Pushdown = No Local Aggregation = No Index Build/Rebuild = Limited to 1 Core SIMD Support – No

SQL Server 2017 Batch execution for single thread operation Support for non-persisted computed columns NCCI Online Rebuild (Enterprise Only) LOBs for CCI Trivial Plans (Batch execution Support) Machine Learning Services (Batch execution Support) Batch Mode Adaptive Joins Batch Mode Memory Grant Feedback 1. Inline, within the Row Group – compressing all strings with the total length below 8000 bytes. All values in this case are stored within the dictionaries. 2. Compressed Off-Row – compressing values between 8000 bytes and 16 Megabytes. The strings are individually block-compressed. Only the headers/pointers to these off-row LOBs are stored in dictionaries. 3. Pass-through – containing values above 16 MB, these are simply uncompressed LOBs with headers/pointers stored in the dictionaries.

Questions?

Resources ColumnStore Overview – MSDN Niko Neugebauer https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-overview Niko Neugebauer http://www.nikoport.com/columnstore/ SQL Server 2016 SP1 https://docs.microsoft.com/en-us/sql/sql-server/editions-and-supported-features-for-sql-server-2016 SQL Server Tiger Team https://blogs.msdn.microsoft.com/sql_server_team/tag/columnstore-index/