11/29/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Columnstore Technical Deep Dive 11/29/2018 DBI-B411 Columnstore Technical Deep Dive Sunil Agarwal Program Manager, SQL Server sunila@Microsoft.com © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Agenda Trends In Data Warehousing Space Columnstore architecture New for ColumnStore in SQL 2014 Where to learn more
Trends in the Data Warehousing Space Scale more: DW systems continue to grow at a fast pace, scalability is a key concern, growing a system from 10s of TBs, to 100s of TB, to PBs. Performance at scale: ability to analyze massive amounts of data while offering interactive query response. Data warehousing for masses: drive down price per TB. Source: TDWI Report – Next Generation DW Columnstore designed to address above needs.
In-memory Technologies Applicable to Transactional workloads: Concurrent data entry, processing and retrieval In-Memory OLTP 5-20X performance gain for OLTP integrated into SQL Server In-Memory DW 5-25X performance gain and high data compression Updatable and clustered SSD Bufferpool Extension 4-10X of RAM and up to 3X performance gain transparently for apps Applicable to Decision support workloads: Large scans and aggregates Applicable to Disk-based transactional workloads: Large working (data)set
ColumnStore - How is it different ? Data stored as rows Data stored as columns C1 C2 C3 C5 C4 … Improved compression: Data from same domain compress better Reduced I/O: Fetch only columns needed Improved Performance: More data fits in memory Optimized for CPU utilization
Columnstore Index Terminology Row Group Set of rows (typically 1 million) Column Segment Contains values from one column from the row group Segments are compressed Each segment stored separately Segment is unit of transfer between disk and memory Row Group Column Segment C1 C2 C3 C4 C5 C6
ColumnStore Index - Example OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount 20101107 106 01 1 6 30.00 103 04 2 17.00 109 20.00 03 05 3 4 20101108 02 5 25.00 102 14.00 10.00 20101109
Step-1: Horizontally Partition (create Row Groups) OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount 20101107 106 01 1 6 30.00 103 04 2 17.00 109 20.00 03 05 3 4 20101108 02 5 25.00 ~1M rows OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount 20101108 102 02 1 14.00 106 03 2 5 25.00 109 01 10.00 20101109 04 4 20.00 103 17.00
Step-2: Vertically Partition (create Segments) OrderDateKey 20101107 20101108 ProductKey 106 103 109 StoreKey 01 04 03 05 02 RegionKey 1 2 3 Quantity 6 1 2 4 5 SalesAmount 30.00 17.00 20.00 25.00 OrderDateKey 20101108 20101109 ProductKey 102 106 109 103 StoreKey 02 03 01 04 RegionKey 1 2 Quantity 1 5 4 SalesAmount 14.00 25.00 10.00 20.00 17.00
Step-3: Compress Each Segment OrderDateKey 20101107 20101108 ProductKey 106 103 109 StoreKey 01 04 03 05 02 RegionKey 1 2 3 Quantity 6 1 2 4 5 SalesAmount 30.00 17.00 20.00 25.00 OrderDateKey 20101108 20101109 ProductKey 102 106 109 103 StoreKey 02 03 01 04 RegionKey 1 2 Quantity 1 5 4 SalesAmount 14.00 25.00 10.00 20.00 17.00 11/29/2018 11 Some segments will compress more than others *Encoding and reordering not shown
Query Processing - Read The Data Needed SELECT ProductKey, SUM (SalesAmount) FROM SalesTable WHERE OrderDateKey < 20101108 Column Elimination OrderDateKey 20101107 20101108 ProductKey 106 103 109 StoreKey 01 04 03 05 02 RegionKey 1 2 3 Quantity 6 1 2 4 5 SalesAmount 30.00 17.00 20.00 25.00 OrderDateKey 20101108 20101109 RegionKey 1 2 Segment Elimination ProductKey 102 106 109 103 Quantity 1 5 4 StoreKey 02 03 01 04 SalesAmount 14.00 25.00 10.00 20.00 17.00
Multi-Row Batch – Batch Processing Batch object Motivation: Column store significantly reduces i/o required. Next bottleneck is CPU usage. Batch processing addresses CPU usage. Functionality: Batch = columnar format + filter vector. Moving “set of rows” - batch (~900 rows). Batch moved between iterators. Near-zero data copying with slight batch updates. # of function calls reduced orders of magnitude. Column vectors bitmap of qualifying rows C1 C2 C3
Agenda Trends In Data Warehousing Space How Does Columnstore Work? What’s New In Columnstore? Demo In Summary 11/29/2018
Motivation for SQL Server 2014 Improvements SQL Server 2012, columnstore functionality: Non-clustered columnstore indexes. Improved compression, compared to ROW/PAGE compression. Improved query performance Gaps: No DML support, no updates (data refresh) Only secondary, non-clustered, columnstore indexes supported Poor memory management (resource governor was not honored, index build/re-build, run-time) No batch hash join spilling Limited data types support Limited batch operations supported Goals for new columnstore functionality: Competitive load performance and efficient index creation Leading compression ratios and competitive query performance Functional parity with row store, as much as possible
Clustered Columnstore Index Why is clustered index important? Saves space Simplifies management – no secondary indexes to maintain Columnstore (and clustered columnstore index) are PREFERRED storage engine for DW scenarios We encourage users to either move existing tables to CCI, or start using CCI for new tables Additional data types are supported High precision decimal, datatypeoffset, binary, varbinary, uniqueidentifier, etc) Unsupported types: spatial, XML, max types DDL supported Evolve your schema design as needed 91% savings ** Space Used = Table space + Index space
Updatable Columnstore Index Table consists of column store and row store DML (update, delete, insert) operations leverage delta store INSERT Values Always lands into delta store DELETE Logical operation Data physically remove after REBUILD operation is performed. UPDATE DELETE followed by INSERT. BULK INSERT if batch < 100k, inserts go into delta store, otherwise columnstore SELECT Unifies data from Column and Row stores - internal UNION operation. “Tuple mover” converts data into columnar format once delta store is full (~1M of rows) REORGANIZE statement converts delta store into columnar storage. C1 C2 C3 C4 C5 C6 Delta (row) store C1 C2 C3 C4 C5 C6 Column Store tuple mover
Improved Query Performance Batch hash join spilling implemented. Mixed mode (row and batch) query execution presence of row operators does not prevent operators to be executed in the batch mode Additional batch operators: joins (inner, outer) partial/global aggregates w/ and w/o group by union all operator Note: Distinct aggregates and UNION operators continue to be executed in row mode.
Columnstore Performance Benefits Power Metric
Columnstore with Competitive Compression Table compression options: DATA_COMPRESSION = { NONE | ROW | PAGE | COLUMNSTORE | COLUMNSTORE_ARCHIVE } COLUMNSTORE Compression Default compression when creating a table with Clustered Columnstore Index Typical customer workloads gets 5-7x compression ratios TPCH 3.1X TPCDS 2.8X Customer 1 8X Customer 2 5.5X ** compression measured against raw data file ARCHIVAL Compression Enables additional 30% compression for whole table and/or chosen partitions, with CPU overhead. Going back and forth between columnstore and columnstore_archive compressions. sys.partitions exposes compression info (3 – columnstore, 4 – columnstore_archive)
Columnstore Index: TSQL Commands Index Build: Creates clustered columnstore index. CREATE CLUSTERED COLUMNSTORE INDEX … // from HEAP CREATE CLUSTERED COLUMNSTORE INDEX … WITH (DROP_EXISTING = ON) // from CI Index Rebuild: Re-creates clustered columnstore index completely. ALTER TABLE … REBUILD ALTER INDEX … REBUILD CREATE CLUSTERED COLUMNSTORE INDEX … WITH (DROP_EXISTING = ON) Reorganize: Forces delta store operation. ALTER INDEX … REORGANIZE // compresses closed row groups … REORGANIZE WITH (COMPRESS_ALL_ROW_GROUPS = ON) // compresses all row groups
Columnstore Index: DMVs sys.column_store_row_groups Visibility into all columnstore row groups (in columnar + delta store) Use this DMV to determine number of delta stores Notes: Every partition has at least one delta store Each partition can have multiple delta stores sys.column_store_segments Visibility into columnstore segments Use this DMV to determine quality of clustered columnstore index: If segments contain <1M rows, investigate why
Columnstore Index: Data Load Loading performance comparable to loading into CI (actually, load is a bit faster to CCI ) Load data directly into CCI (presort data file if possible)
11/29/2018 12:48 AM Demo 11/29/2018 25 © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Track resources Try out Power BI for Office 365! 11/29/2018 12:48 AM Track resources Download Microsoft SQL Server 2014 http://www.trySQLSever.com Try out Power BI for Office 365! http://www.powerbi.com Sign up for Microsoft HDInsight today! http://microsoft.com/bigdata © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Resources Learning TechNet msdn http://channel9.msdn.com/Events/TechEd 11/29/2018 Resources Sessions on Demand http://channel9.msdn.com/Events/TechEd Learning Microsoft Certification & Training Resources www.microsoft.com/learning TechNet Resources for IT Professionals http://microsoft.com/technet msdn Resources for Developers http://microsoft.com/msdn © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Complete an evaluation and enter to win! 11/29/2018 Complete an evaluation and enter to win! © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Evaluate this session Scan this QR code to evaluate this session. 11/29/2018 Evaluate this session Scan this QR code to evaluate this session. © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
11/29/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.