Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clustered Columnstore index deep dive

Similar presentations


Presentation on theme: "Clustered Columnstore index deep dive"— Presentation transcript:

1 Clustered Columnstore index deep dive
Microsoft Ignite 2016 12/19/2017 4:11 PM Clustered Columnstore index deep dive Sunil Agarwal (Principal Program Manager , SQL Engineering Team) © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

2 Outline Overview New Functionality in SQL 2016 Data Load Strategies
Query Performance Guidelines Index Fragmentation Summary

3 Demo – Value Proposition
Microsoft 2016 12/19/2017 4:11 PM Demo – Value Proposition © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

4 Columnstore Index - Overview
12/19/2017 Columnstore Index - Overview Data stored as columns C5 C1 C2 C3 C4 rowgroup segment Improved compression: Data from same domain compress better Reduced I/O: Fetch only columns needed Improved Performance: More data fits in memory Optimized for CPU utilization Ideal for DW Workload © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

5 World Record Breaking Performance (TPC-H)
SQL Server Leading the TPC-H benchmark

6 World Record Breaking Performance (TPC-H)
SQL Server 2016 gives 40% improved performance over SQL Server 2014

7 Outline Overview New Functionality Data Load Query Performance
Index Fragmentation Summary

8 Columnstore Index: Clustered (CCI)
BTREE (NCI) Delete bitmap SQL Server 2016 SQL Server 2014 Master copy of the data (10x compression) Additional Btree indexes for efficient equality and short-range searches and PK/FK constraints. Locking granularity at row level using NCI index path DDL – ALTER, REBUILD, REORGANIZE

9 Clustered Columnstore Index: Concurrency
Microsoft Ignite 2016 12/19/2017 4:11 PM Clustered Columnstore Index: Concurrency Operation SQL 2014 SQL 2016 Query Rowgroup granularity No support of RCSI or SI Recommendation: use Read Uncommitted Support of SI and RCSI (non blocking) Insert Lock at row level (trickle insert) Rowgroup level for set of rows No changes Delete Lock at Rowgroup level Row level lock in conjunction with NCI Update Implemented as Delete/Insert © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

10 Clustered Columnstore Index: High Availability
Operation SQL 2014 SQL 2016 Backup/Restore Just like any other index. Each SEGMENT is persisted using LOB datatype No changes AlwaysOn Failover Clustering (FCI) Fully Supported AlwaysON Availability Groups Fully supported except Readable Secondary Fully Supported with Readable Secondary Index Create/Rebuild Offline

11 Outline Overview New Functionality Data Load Query Performance
Index Fragmentation Summary

12 Data Loading into Columnstore: Bulk Import
Bulk Import (batchsize < 100k) X rowgroup lock X rowgroup lock Bulk Import (batchsize < 100k) Compressed RGs X rowgroup lock Bulk Import (batchsize >= 100k) Delta RGs Key Points Import from External Files (BCP, Bulk Insert, SSIS) Each Bulk Import thread takes X lock on the RG. If batchsize < 100k, the rows are inserted into a delta RG otherwise a new compressed RG is generated Reduced Logging (only when imported into compressed rowgroup) Don’t specify TABLOCK when importing unlike rowstore where it is required

13 Data Loading into Columnstore: Staging Table
Load data from External Files from staging table IX rowgroup lock Rows < 100k Insert into <columnstore> select * from <rowstore-staging> X rowgroup lock Rows >= 100k Compressed RGs Delta RGs Key Points: Similar to Bulk Import but with single BATCH Residual rows < 100k are inserted into delta store SQL 2016: Parallel Insert into Columnstore index

14 Data Loading – Merge Operation

15 Demo – Data Load Microsoft 2016 12/19/2017 4:11 PM
© 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

16 Outline Overview New Functionality Data Load Query Performance
Index Fragmentation Summary

17 Query Processing - Read The Data Needed
SELECT ProductKey, SUM (SalesAmount) FROM SalesTable WHERE OrderDateKey < OrderDateKey ProductKey 106 103 109 StoreKey 01 04 03 05 02 RegionKey 1 2 3 Quantity 6 1 2 4 5 SalesAmount 30.00 17.00 20.00 25.00 Column Elimination Rowgroup Elimination OrderDateKey StoreKey 02 03 01 04 ProductKey 102 106 109 103 RegionKey 1 2 Quantity 1 5 4 SalesAmount 14.00 25.00 10.00 20.00 17.00

18 Demo – Rowgroup Elimination
Microsoft 2016 12/19/2017 4:11 PM Demo – Rowgroup Elimination © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

19 First American - Business Problem
Tech Ready 15 First American - Business Problem 12/19/2017 Scope of Data:150 Million Properties in USA Querying Should support Dynamic search across 150 columns in different permutations Application requires counts to be displayed instantaneously when search criteria is added/modified No Aggregates Query can return 10+ columns with 100s of rows Interactive Map based search in combination with other data elements Daily ETL Updating > 1 million rows Not a traditional Columnstore Index Scenario Failed Solution Can’t index all 150 columns row store Limited time for data preparation (ETL). Multiple indexes added to the overhead © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

20 First American - Results with Columnstore Index
Tech Ready 15 12/19/2017 First American - Results with Columnstore Index Data Compression CPU Consumption Query Performance with 100 concurrent users With Rowstore (PAGE Compressed) With CCI Database Size (including Indexes) 560 GB 44 GB (13x) © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

21 Columnstore Index – Partitioning?
Natural Partitioning with RowGroups Example – Sales data is naturally ordered on date/time When Should I Partition? When data is not naturally ordered on commonly used column for filtering Example – Real-Estate properties with common filtering by state/county

22 Demo – Using NCI Microsoft 2016 12/19/2017 4:11 PM
© 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

23 Columnstore Index – Batch Mode Processing
12/19/2017 Columnstore Index – Batch Mode Processing Batch object Filtered Batch Process multiple rows together for efficiency Significant reduction in instructions No copying of data Buffer size 64K Rows 64 to 900 Column vectors Predicate X C1 C10 C14 Get Batch Scan New in SQL 2016 Single Thread Batch Mode Execution Multiple count DISTINCT Window Aggregates Sort © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

24 Performance: Aggregate Pushdown (SQL2016)
Select SUM(sales) from <columnstore> Performance gains using Fast Path Opportunistic evaluation of aggregation in storage layer Efficient Aggregation on compressed/encoded data in cache-friendly execution (column at a time) Leverages SIMD Datatype <= 8 bytes no strings Aggregates – MIN, MAX, SUM, COUNT, AVG Works with GroupBy Scan Aggregate SQL 2014 10 million rows(batches) 1 row Scan Aggregate SQL 2016 1 row 0 row

25 Demo – Aggregate Pushdown
Microsoft 2016 12/19/2017 4:11 PM Demo – Aggregate Pushdown © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

26 Outline Overview New Functionality Data Load Query Performance
Index Fragmentation Summary

27 Supportability: Index Maintenance
Tech Ready 15 12/19/2017 Supportability: Index Maintenance Operation SQL 2014 SQL 2016 Removing Deleted Rows Requires Index REBUILD Index REORGANIZE Remove deleted from single compressed RG Merge one or more compressed RGs with deleted rows Done ONLINE Smaller RG size resulting from Smaller BATCHSIZE Memory Pressure Index build Residual Deleted Rows Index REBUILD Ordering Rows Create clustered index Create columnstore index by dropping clustered Index No Changes © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

28 Columnstore Index Defragmentation
Two types of rowgroups Delta Rowgroup: Organized as clustered btree indexes. Can get fragmented Transient – Tuple Mover compresses them into compressed rowgroup Compressed Rowgroup: Organized as columns but read-only. Deleted rows are not physically removed leading to fragmentation Permanent Alter Index <index> on <table> REORGANIZE Runs a policy on compressed rowgroups Combine one or more rowgroups total number of rows < 1,024,576 Self merge if number of deleted rows > are deleted

29 Columnstore Index Defragmentation: Example
Current size Deleted rows Self-Merge 300k 120 k Yes Rowgroup1 Rowgroup2 Merge? 950k 920k No (don’t qualify for merge) 900k No (we can’t exceed 1 million in the merge size) 400k 500k Yes, 900k 1 million (200k deleted 500k (300k deleted) Yes, 1 million

30 Demo – Index Fragmentation
Microsoft 2016 12/19/2017 4:11 PM Demo – Index Fragmentation Sunil Agarwal © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

31 Supportability: DMVs, Perfmon, XEvents
Microsoft Ignite 2015 12/19/2017 4:11 PM Supportability: DMVs, Perfmon, XEvents PerfMon Counters (SQL2016) New DMVs sys.dm_column_store_object_pool sys.dm_db_column_store_row_group_operational_stats sys.dm_db_column_store_row_group_physical_stats sys.dm_db_index_physical_stats sys.dm_db_index_operational_stats New XEvents Under new columnstore category State transition of rowgroups Merging of compressed rowgroups © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

32 In Review: Session Objectives And Takeaways
Tech Ready 15 12/19/2017 In Review: Session Objectives And Takeaways Session Objective(s): Rationale for using columnstore index for DW Showcase new enhancements to columnstore index in SQL AzureDB and SQL 2016 Key Takeaways Significant improvements for DW query performance in SQL AzureDB and SQL 2016 Scale-out Analytics with AlwaysOn configuration Enhancements in Higher Concurrency and data load This slide is required. Do NOT delete. This should be the first slide after your Title Slide. This is an important year and we need to arm our attendees with the information they can use to Grow Share! Please ensure that your objectives are SMART (defined below) and that they will enable them to go in and win against the competition to grow share. If you have questions, please contact your Track PM for guidance. We have also posted guidance on writing good objectives, out on the Speaker Portal ( This slide should introduce the session by identifying how this information helps the attendee, partners and customers be more successful. Why is this content important? This slide should call out what’s important about the session (sort of the why should we care, why is this important and how will it help our customers/partners be successful) as well as the key takeaways/objectives associated with the session. Call out what attendees will be able to execute on using the information gained in this session. What will they be able to walk away from this session and execute on with their customers. Good Objectives should be SMART (specific, measurable, achievable, realistic, time-bound). Focus on the key takeaways and why this information is important to the attendee, our partners and our customers. Each session has objectives defined and published on please work with your Track PM to call these out here in the slide deck. If you have questions, please contact your Track PM. See slide 5 in this template for a complete list of Tracks and TPMs. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

33 Resources Sample Database and Scripts Video: Blogs
Short video on columnstore Blogs Sample Database and Scripts

34 DW Architecture - Traditional
Microsoft Ignite 2015 12/19/2017 4:11 PM DW Architecture - Traditional BI and analytics Dashboards Reporting MOLAP Data Latency Reports limited to Cubes Data Refresh Periodically Vertipaq Query Exec SSAS 2016 Multi-Dimensional (MOLAP) Tabular (VertiPaq) Data Refresh ETL SQL Server Relational DW AS – (a) data model variance from budget, forecasting using MDX and DAX (b) caching Hourly, Daily, Weekly Database © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

35 DW Architecture with SQL 2016
Microsoft Ignite 2015 12/19/2017 4:11 PM DW Architecture with SQL 2016 Key Value Proposition Interactive Analytics Querying Reduced complexity Significant Improvement in Tabular (Direct Query) BI and analytics Dashboards Reporting Multi-Dimensional Tabular Direct Query ROLAP SSAS 2016 Query exec ETL SQL Server Relational DW (columnStore) Hourly, Daily, Weekly Database © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

36 Performance: String Predicate Pushdown (SQL 2016)
Microsoft Ignite 2015 12/19/2017 4:11 PM Performance: String Predicate Pushdown (SQL 2016) SQL 2014 SQL 2016 select * from Orders where Item like ‘%tool%’ Scan Filter 50 rows 100k rows (batches) (R2) 50 rows 20 rows (R1) 30 rows Scan Filter 10 million rows (batches) 10 million rows Execution Strategy Apply Filter on Dictionary entries Find rows that refer to dictionary entries that quality (R1) Find rows not eligible for this optimization (R2) Scan returns (R1 + R2) rows Filter node applies string predicate on (R2) Row returned by Filter node = (R1 + filtered R2) R2 - Rows in delta RGs + Compressed Segments with no dictionary or too big a dictionary + Expression evaluating to NULL Execution Strategy Retrieve 10 million rows by converting dictionary encoded value to string Apply string predicate on 10 million rows © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

37 Analytics with Window Aggregates (SQL 2016)

38 12/19/2017 4:11 PM Thank you Chat with me in the Speaker Lounge Find me © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "Clustered Columnstore index deep dive"

Similar presentations


Ads by Google