TechEd 2013 12/2/2018 7:32 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
What’s New for Columnstore Indexes and Batch Mode Processing 12/2/2018 7:32 AM DBI-B322 What’s New for Columnstore Indexes and Batch Mode Processing Igor Stanko igorstan@microsoft.com © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Agenda Trends In Data Warehousing Space How Does Columnstore Work? What’s New In Columnstore? Demo In Summary 12/2/2018
Trends in the Data Warehousing Space Understanding the Opportunity DW systems continue to grow at a fast pace, scalability is a key concern, growing a system from 10s of TBs, to 100s of TB, to PBs. Performance at scale: ability to analyze massive amounts of data while offering interactive response. Data warehousing for masses: drive down price per TB. Data Warehousing has shifted almost entirely towards the appliance model due to speed of the balanced appliance and scalability of scale out (MPP) solutions. Jim Cobelius, Forrester Research Source: TDWI Report – Next Generation DW Columnstore packaged into an appliance delivers this
Agenda Trends In Data Warehousing Space How Does Columnstore Work? What’s New In Columnstore? Demo In Summary 12/2/2018
Columnstore Refresher how is it different? Data stored as rows Data stored as columns C1 C2 C3 C5 C4 Benefits: Improved compression: Data from same domain compress better Reduced I/O: Fetch only columns needed Improved Performance: More data fits in memory …
ColumnStore Terminology Row Group Column Segment C1 C2 C3 C4 C5 C6 Column Segment contains values from one column for a set of rows Row Group Segments for the same set of rows comprise a row group Segments are compressed Each segment stored in a separate LOB Segment is unit of transfer between disk and memory
ColumnStore Index Example OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount 20101107 106 01 1 6 30.00 103 04 2 17.00 109 20.00 03 05 3 4 20101108 02 5 25.00 102 14.00 10.00 20101109
1. Horizontally Partition (create Row Groups) OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount 20101107 106 01 1 6 30.00 103 04 2 17.00 109 20.00 03 05 3 4 20101108 02 5 25.00 ~1M rows OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount 20101108 102 02 1 14.00 106 03 2 5 25.00 109 01 10.00 20101109 04 4 20.00 103 17.00
2. Vertically Partition (create Segments) OrderDateKey 20101107 20101108 ProductKey 106 103 109 StoreKey 01 04 03 05 02 RegionKey 1 2 3 Quantity 6 1 2 4 5 SalesAmount 30.00 17.00 20.00 25.00 OrderDateKey 20101108 20101109 ProductKey 102 106 109 103 StoreKey 02 03 01 04 RegionKey 1 2 Quantity 1 5 4 SalesAmount 14.00 25.00 10.00 20.00 17.00
3. Compress Each Segment Some segments will compress more than others OrderDateKey 20101107 20101108 ProductKey 106 103 109 StoreKey 01 04 03 05 02 RegionKey 1 2 3 Quantity 6 1 2 4 5 SalesAmount 30.00 17.00 20.00 25.00 OrderDateKey 20101108 20101109 RegionKey 1 2 ProductKey 102 106 109 103 Quantity 1 5 4 StoreKey 02 03 01 04 SalesAmount 14.00 25.00 10.00 20.00 17.00 Some segments will compress more than others *Encoding and reordering not shown
4. Read The Data Segment Elimination Column Elimination SELECT ProductKey, SUM (SalesAmount) FROM SalesTable WHERE OrderDateKey < 20101108 Column Elimination OrderDateKey 20101107 20101108 ProductKey 106 103 109 StoreKey 01 04 03 05 02 RegionKey 1 2 3 Quantity 6 1 2 4 5 SalesAmount 30.00 17.00 20.00 25.00 OrderDateKey 20101108 20101109 RegionKey 1 2 Segment Elimination ProductKey 102 106 109 103 Quantity 1 5 4 StoreKey 02 03 01 04 SalesAmount 14.00 25.00 10.00 20.00 17.00
Multi-Row Batch – Batch Processing Batch object Motivation: Column store significantly reduces i/o required Once i/o is reduced CPU usage becomes major bottleneck Batch processing reduces CPU usage Functionality: Instead of moving rows between iterators, move “set of rows” called batch. Usually ~900 rows at a time. Batches are organized in columnar form with extra vector indicating qualifying rows. Object is moved from iterator to iterator. Number of function calls per row processed drops few orders of magnitude. Many operations can be implemented without data copying, just slight modifications to the batch. Column vectors bitmap of qualifying rows C1 C2 C3 12/2/2018
Columnstore Benefits Improved compression: Reduced I/O: Data from same domain compress better Reduced I/O: Fetch only columns needed Improved Performance: More data fits in memory + batch processing
Agenda Trends In Data Warehousing Space How Does Columnstore Work? What’s New In Columnstore? Demo In Summary 12/2/2018
Columnstore.Next - Motivation SQL Server 2012, columnstore functionality: Non-clustered columnstore indexes. Improved compression, compared to ROW/PAGE compression. Improved query performance Gaps: No DML support, no updates (data refresh) Only secondary, non-clustered, columnstore indexes supported Poor memory management (resource governor was not honored, index build/re-build, run-time) No batch hash join spilling Limited data types support Limited batch operations supported Goals for new columnstore functionality: Competitive load performance and efficient index creation Leading compression ratios and competitive query performance Functional parity with row store 12/2/2018
Clustered Columnstore Index ** Space Used = Table space + Index space 91% savings Why is clustered index important? Saves space used Simplifies management – no secondary indexes to maintain Columnstore (and clustered columnstore index) will be PREFERRED storage engine for DW scenarios We encourage users to either move existing tables to CCI, or start using CCI for new tables Additional data types are supported (including high precision decimal, binary, varbinary, etc) 12/2/2018
Updatable Columnstore Index Table consists of column store and row store DML (update, delete, insert) operations leverage delta store INSERT Values Always lands into delta store DELETE Logical operation Data physically remove after REBUILD operation is performed. UPDATE DELETE followed by INSERT. BULK INSERT if batch < 100k, inserts go into delta store, otherwise columnstore SELECT Unifies data from Column and Row stores - internal UNION operation. “Tuple mover” converts data into columnar format once segment is full (1M of rows) REORGANIZE statement forces tuple mover to start. C1 C2 C3 C4 C5 C6 Delta (row) store C1 C2 C3 C4 C5 C6 Column Store tuple mover 12/2/2018
Memory Sensitive Columnstore Index Streaming functionality for columnstore utilities (build, rebuild, load): Columnstore segments are being built in memory. Memory consumption adjusts under memory pressure (e.g. data load, index build/rebuild). Same memory grant and reservation process is being used by different processes (build/rebuild/load). Run-time memory management: Batch mode spilling has been implemented (no need to go back to row mode execution when spilling). Available memory can affect columnstore segment quality Ideal segment size = 1M of rows. Number of segments (columns in the table) drive memory requirements. Product always attempts to create ideal segment by reserving “enough” memory. Under memory pressure, DOP is being reduced first, followed by segment size reduction. In PDW, available memory equates to resource governor settings on compute nodes. 12/2/2018
Columnstore That Improves Performance Batch hash join spilling implemented. Mixed mode (row and batch) query execution presence of row operators does not prevent operators to be executed in the batch mode Additional batch operators: joins (inner, outer) partial Aggregates w/ and w/o group by (local aggregation). Global aggregation not in batch. union all operator Notes: Distinct aggregates and UNION operators continue to be executed in row mode. No changes to PDW query processing. Q tables are still present and they are built using row store. 12/2/2018
More Performance Results 12/2/2018
Columnstore with Competitive Compression … Table compression options: DATA_COMPRESSION = { NONE | ROW | PAGE | COLUMNSTORE | COLUMNSTORE_ARCHIVE } COLUMNSTORE Compression Default compression when creating a table with Clustered Columnstore Index Typical customer workloads gets 5-7x compression ratios TPCH 3.1X TPCDS 2.8X Customer 1 3.9X Customer 2 4.3X ** compression measured against raw data file ARCHIVAL Compression Enables additional 30% compression for whole table and/or chosen partitions. Going back and forth between columnstore and columnstore_archive compressions. sys.partitions exposes compression info (3 – columnstore, 4 – columnstore_archive) 12/2/2018
12/2/2018 7:32 AM Demo © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Agenda Trends In Data Warehousing Space How Does Columnstore Work? What’s New In Columnstore? Demo Optimizing database and index design In Summary 12/2/2018
Do we need nonclustered column stores? Yes, if you need constraints or triggers on the table Creating the CCI will fail if there is a B-tree enforcing a key constriant Instead, create table with clustered index and NCCI Won’t be able to update the table No, if constraints aren’t needed Create table and add CCI No other indexes to worry about! Can insert / update / delete in the table Consistent fast query performance Recommended methods for loading into a table with NCCI Disable index, update data, rebuild -or- Use partition switching Use delta table and UNION ALL
Other Indexes And CCI Partitioning There won’t be other indexes needed with a CCI Save space and maintenance work There really isn’t much need for other indexes with NCCI, either Maybe the clustered index Partitioning Partitioning works with both CCI and NCCI Good for managing the lifecycle of data Aging off old data Especially for NCCI, where deletes aren’t possible Consider COLUMNSTO_ARCHIVAL option, if disk space is critical
Design out strings from columnstores Joining on string columns is slow Factor strings out to dimensions It’s generally good DW design practice anyway Dimension and Fact tables Date LicenseId Measure 20120301 1 100 20120302 2 200 Date LicenseNum Measure 20120301 XYZ123 100 20120302 ABC777 200 LicenseId LicenseNum 1 XYZ123 2 ABC777
Making the move to CCIs For existing tables Drop indexes & constraints Create clustered columnstore index Best done when users aren’t querying If you run a 24/7 operation, and can’t manage a window for update Create a view over the fact table, redirect to existing table Create new table as clustered columnstore index Copy all data to new table When new is table up-to-date with all recent additions… Change the view to redirect to new table
Evaluate this session Scan this QR code to evaluate this session. 12/2/2018 7:32 AM Required Slide *delete this box when your slide is finalized Your MS Tag will be inserted here during the final scrub. Evaluate this session Scan this QR code to evaluate this session. © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
12/2/2018 7:32 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Windows 2012 Storage Spaces