Inside SQL Server Columnstore Indexes

Inside SQL Server Columnstore Indexes
Inside SQL Server Columnstore Indexes Bob Ward, Microsoft Ryan Stonecipher, Microsoft

How will we spend the next 75 minutes
An Inside Talk What Why How Background and Architecture Inside building a Columnstore Index Inside accessing a Columnstore Index What happens next? Bits and Bytes We will focus on CCI today This Photo by Unknown Author is licensed under CC BY-SA

Background and Architecture

Project Apollo become Columnstore
Project Gemini in PowerPivot and SSAS (IMBI) 2008 Project Apollo 2010 NCCI SQL Server 2012 CCI SQL Server 2014 Updateable NCCI Performance enhancements SQL Server 2016 Online NCCI build SQL Server 2017 Online CCI build SQL Server 2019 NCCI = Non-clustered Columnstore Index CCI = Clustered Columnstore Index Similar but different engine than SSAS This Photo by Unknown Author is licensed under CC BY-SA

You need to know… Is this separate from SQL Server?
Built into the engine Available now in all editions Is this an “in-memory” database or index? No, but our compression helps fit more into memory Execute fastest when entire index fits into memory We built it to be pageable Is this the same as In-Memory OLTP? No, In-Memory OLTP has memory optimized tables that must fit into memory You can build a Columnstore index on top of a memory optimized table Why would I use this? Warehouse or HTAP scenarios Great for scans and large range read queries No application changes required “I wanted to show yes, a big elephant like SQL Server with millions of lines of legacy code can still dance” - Hanuma Kodavalla

It’s all about speed to access data
The Index Narrative The nature of indexes B-tree indexes optimized for seek performance but not storage space Heaps great for concurrent inserts but not for seeks or storage space Columnstore indexes optimized for scans and storage space but not great for updates What is the target for Columnstore? Optimize for space with compression. Reduce I/O and fit more into memory Build the most efficient way to maximize CPU and memory to scan and filter data So how do we do it? Combine metadata with highly compressed data in an efficient storage format. Use segments to easily find only the columns you want Use rowgroups to skip the data you don’t need “Make getting answers to your queries fast when you’re scanning massive data over and over again” – Ryan Stonecipher

The Columnstore Index Architecture
sqlservr Query Processor and Execution Apollo Engine – “Kernel” “Tuple Mover” Access Methods Batch Processing Encoding Vertipaq Compress and Merge Flush Delete Buffer Object Pool Buffer Pool System Tables Btree Index LOB Pages CACHESTORE_ COLUMNSTOREOBJECTPOOL Compressed Segments Compressed Dictionaries Delete Bitmaps Partitions Allocation units Rowgroups Segments Dictionaries Delta Store Delete Bitmap Delete Buffer Mapping Index Compressed Segments Compressed Dictionaries Change drawing to show the SQL Server engine as a box that includes all of this. Show the differences beteeen code that executes and data and disk structures Major pieces Metadata in system tables LOB Pages Btrees for other stuff Object Pool Components that put and pull data in these structures Code Questions for Ryan: Tuple Mover appears to be an on-demand task based on a timer. static CAutoRegisterTimerTask<TaskTupleMover> s_TaskTupleMover(COLUMN_STORE_TUPLE_MOVER_PERIOD_FAST, ONDEMAND_TASK_COLUMN_STORE_TUPLE_MOVER); COLUMN_STORE_TUPLE_MOVER_PERIOD_FAST looks like 5 so 5 seconds I guess this is TASK_MANAGER task so you can’t really see this in a DMV and would need to track va Xevents Is there such thing as the “Apollo Engine” that does all the compression work? This Photo by Unknown Author is licensed under CC BY-NC-ND

Inside building a Columnstore index
Let’s dive deeper

Column Store Object Pool
Building an Index LOB pages segments dictionaries Column Store Object Pool Buffer Pool serialize ckpt, flush encode, compress dm_column_store_object_pool TODO: Describe the context here? Is this building an index on an existing set of data for a cl index or heap? We should call out that we don’t sort the data like btrees We should add delta rowgroup here Questions: What is the far left hand side here? Is this data ingestion directly into column store rowgroups and segments. Does this diagram mean that when we build the index we build out the rowgroups and segments compressed in the object pool and then conver these into LOB pages to write the disk What about dictionaries? Are these also stored in LOB pages on disk? How do you find your alloc units for them? So when building the index how do we decide what to leave in the pool vs what goes to disk? Do we discard LOB pages after the build?

The Real Architecture Questions:
Are all column segments in that alloc unit? YES If so, how do we find a specific column segment for a column we need to scan? RYAN will show this in lifecycle of a query TODO: Make a comment about partitions here

Rowgroups Problem: segments can be larger than main memory
How do you manage them efficiently? Rowgroups divide a Columnstore index into horizontal slices of related columns Each row group is self-contained and can be paged independently TODO: We haven’t said what a segment is Question: If all rowgroups for a partition go into one allocation unit how do we page a rowgroup independently? TODO: We should add in rg elimination as a benefit for rgs

Column Segments Compressed storage for all data in a column.
Two representations: serialized on disk (as LOB pages), deserialized in memory (in the column store object pool). Always stored as “encoded” in memory.

Compression Why compress? Trades CPU cost for I/O.
Columns compress better than rows (~1:10 vs. ~1:3) Rows contain data from different domains (int vs. string vs. float) Rows have higher entropy because of this, and are more difficult to pack densely. SQL Server uses multiple compression schemes together for values in the same column segment. Run-length encoding (RLE) Dictionaries Various forms of value encoding for scalars Bit packing Huffman encoding (a form of prefix compression) for strings Binary stream encoding (COLUMNSTORE_ARCHIVE)

Compression: Dictionaries
Ryan to combine slides 12 and 13

Inside the Dictionaries

Frame of Reference/Delta Encoding

Compression: Run-Length Encoding
Ryan, can you talk about Vertipaq compression here and whether we use it or not and why

Looking inside a Columnstore Index
Demo Ryan 15 minutes 4 pieces to show the internals of CCI and how this fits together. Try to use WWI or WWIDW here. System tables Object Pool – DBCC CSINDEX 3.. CCI on disk – System tables + LOB pages with DBCC PAGE or DBCC SHOWTEXT Delete and Delta Stuff – System tables + DBCC PAGE

Inside accessing a Columnstore index
The power of analytics

Disable batch mode with TF 9453
Accessing CCI Batch mode select [Sale Key], [Customer Key], sum(Quantity) as quantity_sold from Fact.Sale where [Invoice Date Key] = ' ' group by [Sale Key], [Customer Key] # batches rowgroup elimination Disable batch mode with TF 9453 Question: Why do we show lob logical reads when the all the compressed segments should be in the object pool? read in segments and dictionaries really rowgroup elimination

How does this work internally
TODO: Ryan to fix and complete TODO: Put in note about broker system and rowgroup eviction

The magic of batch mode processing
Contiguous column data allows for vectorized execution that leverages advanced CPU features (SIMD) Same operation applied on a vector of values: 128-bit and 256 bit operations.

Inside the access to a columnstore index
Demo Ryan 8 minutes 1. Restart SQL Server 1 Run a scan 3. Then how the object pool populated combined with system table to see how we access it from a query. Compare this to a clustered index scan which uses metadata to find the root page walk down the tree to the left side and then runs the linked list. For CCI how do we 1) Only get the segments we want 2) Walk rowgroups 4. Do the same thing but how rowgroup elimination to show how that works 5. What if you have a bookmark lookup? How does that work 6. How do we access a delta store and object pool together?

This is all way cool but….
You don’t have enough rows to see the benefit of compression Delta Store (Rowgroup or RG) is a clustered index using row compression You may need to update or delete data Logical delete (delete bitmap) Queries must filter deletes You want to add a b-tree index on a CCI Enforce key but we can’t use Vertipaq optimization You want to add a CCI on a memory optimized table CCI must fit into memory Cannot use Vertipaq optimization Can affect compression ratios and perf

The Tuple Mover and Delta Stores
Delta RG Delta RG Delta RG OPEN CLOSED TOMBSTONE < rows = rows Tuple Mover every 5 mins. COMPRESSION_DELAY configurable at index level. Force it = ALTER INDEX Compressed RG COMPRESSED = rows

Rowgroup loading magic
Read the merge policies Rowgroup loading magic >=102400 rows < rows Compressed RG Delta RG . LOB . . btree . >= rows < rows Compressed RG Delta RG LOB btree LOB LOB Compressed RGs Compressed RGs ALTER INDEX ALTER INDEX rows rows compaction

The Delete Bitmap Deletes are logical just track don’t actually delete Persisted in a “hidden” clustered index (compressed pages) Cached in Columnstore Object Pool Flushed by Tuple Mover Cost = More rows slower queries Cleaned by reorganizing, rebuild, or table truncate TODO: What about memory pressure?

Inside the Delta Store and Delete Bitmaps
Demo Bob

Bits and Bytes Using Large Pages with –T834 Just Enterprise Edition?
What about Azure SQL Server Database? What about the competition? What about Azure SQL Server Data Warehouse? What about the future? Segments in blob store Local disk tiered storage -T834 can boost performance because the object pool will use large pages but there can be issues CCI available in all editions but limits on object pool Azure DB – Available in std and premium and MI See the list of other columnar databases systems at Ryan will describe Azure DW Ryan on the future Did you see the 1PB demo during the keynote? see the list here

Resources Get the deck at http://aka.ms/bobwardms
Get the demos at Sunil Agarwal’s blog posts at and Niko Neugebauer blog posts at Columnstore docs at VLDB tutorial at

Bonus Material This Photo by Unknown Author is licensed under CC BY

Metadata Model TABLE Has at least one syscsrowgroups PARTITION INDEX Each allocation unit has a way to seek (from the root) and to scan (from the left (‘first’) to the right (‘end’)) Has at least one syscscolsegments PARTITION Has at least one and up to three PARTITION ALLOC UNITS syscsdictionaries root root root HoBt LOB SLOB Questions: Are all column segments in that alloc unit? YES If so, how do we find a specific column segment for a column we need to scan? RYAN will show this in lifecycle of a query TODO: Make a comment about partitions here first first first

Rowgroups and Segments
Problem: segments can be larger than main memory How do you manage them efficiently? Rowgroups divide a Columnstore index into horizontal slices of related columns Each row group is self-contained and can be paged independently RG1 RG2 RG3 RG4 TODO: We haven’t said what a segment is Question: If all rowgroups for a partition go into one allocation unit how do we page a rowgroup independently? TODO: We should add in rg elimination as a benefit for rgs

Bitpack Data or VLD Store
Column Segments Header Compressed storage for all data in a column. Two representations: serialized on disk (as LOB pages), deserialized in memory (in the column store object pool). Always stored as “encoded” in memory. Index into RLE array (not persisted) Bookmark Data Array Run-length encoding RLE Data Bitpacked scalars or variable-length string array Bitpack Data or VLD Store

Column Store Object Pool
Column Store Indexes T1 T1 <rows-to-cols> Column segments are just stored as LOB values on disk. Serialize contiguous memory to individual 8KB LOB pages during index build (goes through buffer pool) During segment read, read entire LOB into memory via buffer pool, then into contiguous memory in column store object pool Segments cached in object pool, not buffer pool; buffer pool pages are Discard()-ed after read. segments C1 C2 Cn <serialize> Buffer Pool Column Store Object Pool <deserialize> <read/write lob> TODO: We should merge this with the previous slide Question: When do we uncompress column segments? When we retrieve data? <MDF/NDF>

Why so fast, Jim? Compression and in-memory caching of data reduces or eliminates the number of I/Os required to get the data. Scan and materialize only the data necessary for the query: segment elimination. Delay reconstruction of row from the column segments as long as possible: late materialization. Push-down filter and aggregate evaluation inside the storage engine and are evaluated on compressed data. Contiguous column data allows for vectorized execution that leverages advanced CPU features (SIMD) Same operation applied on a vector of values: 128-bit and 256 bit operations.

Speaker Name Title, Company Biography Point One Biography Point Two
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce pulvinar nunc est, quis congue massa lobortis eget. Biography Point Two Speaker Name Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce pulvinar nunc est, quis congue massa lobortis eget. Title, Company Biography Point Three /yourname Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce pulvinar nunc est, quis congue massa lobortis eget. @yourhandle yourname

Learn more from Speaker Name
@yourhandle

Palette PRIMARY PALETTE SECONDARY PALETTE PALETTE

Titles are set 36 Segoe UI Heading One Style Heading Two Style
Body content, 16pt Segoe UI (gray) Heading Two Style Body content, 16pt Segoe UI (gray) HEADING THREE STYLE Body content, 16pt Segoe UI (gray)

Two column layout Title Here Title Here
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam porttitor felis at justo convallis, ut pretium felis posuere. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam porttitor felis at justo convallis, ut pretium felis posuere. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae.

Two column layout with icons
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam porttitor felis at justo convallis, ut pretium felis posuere. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam porttitor felis at justo convallis, ut pretium felis posuere. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae.

Table Style HEADING Body text

Slide for Developer’s Software Code
Use this layout to show software code The font is Consolas, a monospace font The slide doesn’t use bullets but levels can be indented using the “Increase List Level” icon on the Home menu

Demo Title Demo

Video Title Title

Customer Name, Title

Inside SQL Server Columnstore Indexes

Similar presentations

Presentation on theme: "Inside SQL Server Columnstore Indexes"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Inside SQL Server Columnstore Indexes

Similar presentations

Presentation on theme: "Inside SQL Server Columnstore Indexes"— Presentation transcript:

Similar presentations

About project

Feedback