Download presentation
Presentation is loading. Please wait.
Published bySuzanna Scott Modified over 8 years ago
1
Turbocharge your DW Queries with ColumnStore Indexes Susan Price Senior Program Manager DW and Big Data
2
Waiting … April 2012PNWSQL 2
3
Waiting … April 2012PNWSQL 3
4
Why use ColumnStore Indexes? Faster, interactive query response time ▫Easier data exploration ▫Better decisions Reduced physical DB design effort ▫Fewer indexes ▫Reduced need for summary aggregates and indexed views ▫May eliminate need for OLAP cubes ▫Transparent to the application Lower TCO April 2012PNWSQL 4
5
Demo April 2012PNWSQL 5
6
Agenda Columnstore indexes Batch mode processing How to use ColumnStore Indexes Best practices Troubleshooting ColumnStore Indexes Resources for more information April 2012PNWSQL 6
7
How do columnstore indexes speed up queries? 7 Columnstore indexes store data column-wise Each page stores data from a single column Highly compressed About 2x better than PAGE compression More data fits in memory Each column can be accessed independently Fetch only needed columns Can dramatically decrease IO … C1C2C3C4 Heaps, B-trees store data row-wise April 2012PNWSQL
8
Columnstore index Column Segment Segment contains values from one column for a set of rows Segments for the same set of rows comprise a row group Segments are compressed Each segment stored in a separate LOB Segment is unit of transfer between disk and memory C1 C2 C3 C5C6C4 8 April 2012PNWSQL
9
Index creation and storage 9 Base table ABCD Encode, compress Encode, compress Encode, compress Compressed column segments 1M rows/group Column store index Blobs Row group Row group Row group Segment directory New system table: sys.column_store_segments Includes segment metadata: size, min, max, … April 2012PNWSQL
10
Observed compression ratios 10 Data Set Uncompressed table size (MB) Column store index size (MB) Compression Ratio Cosmetics1,30288.514.7 SQM1,4311668.6 Xbox1,0452025.2 MSSales642,000126,0005.1 Web Analytics2,5605534.6 Telecom2,9057274.0 1.8X better compression than SQL’s page compression April 2012PNWSQL
11
Columnstore index example OrderDateKeyProductKeyStoreKeyRegionKeyQuantitySalesAmount 20101107106011630.00 20101107103042117.00 20101107109042220.00 20101107103032117.00 20101107106053420.00 20101108106021525.00 20101108102021114.00 20101108106032525.00 20101108109011110.00 20101109106042420.00 20101109106042525.00 20101109103011117.00 April 2012PNWSQL 11
12
Horizontally partition (Row Groups) OrderDateKeyProductKeyStoreKeyRegionKeyQuantitySalesAmount 20101107106011630.00 20101107103042117.00 20101107109042220.00 20101107103032117.00 20101107106053420.00 20101108106021525.00 OrderDateKeyProductKeyStoreKeyRegionKeyQuantitySalesAmount 20101108102021114.00 20101108106032525.00 20101108109011110.00 20101109106042420.00 20101109106042525.00 20101109103011117.00 April 2012PNWSQL 12
13
Vertically partition (Segments) OrderDateKey 20101107 20101108 ProductKey 106 103 109 103 106 StoreKey 01 04 03 05 02 RegionKey 1 2 2 2 3 1 Quantity 6 1 2 1 4 5 SalesAmount 30.00 17.00 20.00 17.00 20.00 25.00 OrderDateKey 20101108 20101109 ProductKey 102 106 109 106 103 StoreKey 02 03 01 04 01 RegionKey 1 2 1 2 2 1 Quantity 1 5 1 4 5 1 SalesAmount 14.00 25.00 10.00 20.00 25.00 17.00 April 2012PNWSQL 13
14
Compress each segment* OrderDateKey 20101107 20101108 ProductKey 106 103 109 103 106 StoreKey 01 04 03 05 02 RegionKey 1 2 2 2 3 1 Quantity 6 1 2 1 4 5 SalesAmount 30.00 17.00 20.00 17.00 20.00 25.00 OrderDateKey 20101108 20101109 ProductKey 102 106 109 106 103 StoreKey 02 03 01 04 01 RegionKey 1 2 1 2 2 1 Quantity 1 5 1 4 5 1 SalesAmount 14.00 25.00 10.00 20.00 25.00 17.00 April 2012PNWSQL 14
15
Fetch only needed columns StoreKey 01 04 03 05 02 StoreKey 02 03 01 04 01 RegionKey 1 2 2 2 3 1 1 2 1 2 2 1 Quantity 6 1 2 1 4 5 1 5 1 4 5 1 OrderDateKey 20101107 20101108 OrderDateKey 20101108 20101109 ProductKey 106 103 109 103 106 ProductKey 102 106 109 106 103 SalesAmount 30.00 17.00 20.00 17.00 20.00 25.00 SalesAmount 14.00 25.00 10.00 20.00 25.00 17.00 April 2012PNWSQL 15
16
Creating ColumnStore Indexes Create a columnstore index Create the table Load data into the table Create a non-clustered columnstore index on all, or some, columns CREATE NONCLUSTERED COLUMNSTORE INDEX ncci ON myTable(OrderDate, ProductID, SaleAmount) Object Explorer April 2012PNWSQL 16
17
Memory management Memory management is automatic Columnstore is persisted on disk Needed columns fetched into memory Columnstore segments flow between disk and memory SELECT C2, SUM(C4) FROM T GROUP BY C2; T.C2 T.C4 T.C2 T.C4 T.C2 T.C1 T.C3 T.C4 17 April 2012PNWSQL
18
IO and caching New (large) object cache ▫Cache for column segments and dictionaries Aggressive read ahead ▫At segment level ▫At page level within segment New memory broker ▫Brokers memory between buffer pool and object cache 18 April 2012PNWSQL
19
Data reduction Early segment elimination based on segment metadata ▫Min and max values stored in metadata for each segment Simple filters evaluated in storage engine during CS index scan ▫Conjunctions of comparisons, in-list Bitmap filters ▫Evaluated during index scan ▫Built by Hash Table Build operator 19 April 2012PNWSQL
20
Min: 20101107 103 7.00 Max: 20101108 109 30.00 -------------------------------------- 20101107 106 30.00 20101107 103 17.00 20101107 109 20.00 20101107 103 17.00 20101107 106 20.00 20101108 106 25.00 Segment elimination Min: 20101108 102 10.00 Max: 20101109 109 25.00 --------------------------------------------------------- 20101108 102 14.00 20101108 106 25.00 20101108 109 10.00 20101109 106 20.00 20101109 106 25.00 20101109 103 17.00 20 OrderDateKey 20101107 20101108 ProductKey 106 103 109 103 106 SalesAmount 30.00 17.00 20.00 17.00 20.00 25.00 OrderDateKey 20101108 20101109 ProductKey 102 106 109 106 103 SalesAmount 14.00 25.00 10.00 20.00 25.00 17.00 April 2012PNWSQL
21
Segment elimination 21 Min: 20101107 103 7.00 Max: 20101108 109 30.00 -------------------------------------- 20101107 106 30.00 20101107 103 17.00 20101107 109 20.00 20101107 103 17.00 20101107 106 20.00 20101108 106 25.00 Min: 20101108 102 10.00 Max: 20101109 109 25.00 --------------------------------------------------------- 20101108 102 14.00 20101108 106 25.00 20101108 109 10.00 20101109 106 20.00 20101109 106 25.00 20101109 103 17.00 April 2012PNWSQL
22
Segment elimination Best practice: Create CS index from a clustered index ▫Rows distributed to row groups in clustered index order ▫Does not affect ordering within segments VertiPaq orders data within row groups ▫Good segment elimination for filters on leading key column using min/max values 22 April 2012PNWSQL
23
Batch mode query execution Vector-oriented processing Compact data representation Highly efficient algorithms Better parallelism 23 Would you rather process your data like this … … or like this? April 2012PNWSQL
24
Processing data Columnstore index scan can produce batches or rows ▫Batch-enabled operators get batches ▫Non-batch operators get rows Query optimizer decides List of qualifying rows Column vectors Batch object 24 April 2012PNWSQL
25
Using ColumnStore Indexes Let the query optimizer do the work ▫Optimizer makes a cost-based decision Data access method Columnstore index | B-tree index | Heap Processing mode Batch mode | Row mode Most things “just work” ▫Backup and restore ▫Mirroring, log shipping ▫SSMS April 2012PNWSQL 25
26
Limitations on using columnstore indexes Creating columnstore index ▫Only on common business data types Maintain table: limited operations ▫Can read but not update the data ▫Can switch partitions in and out Processing queries ▫All read-only T-SQL queries run ▫Some queries are accelerated more than others Yesint, real, string, money, datetime, decimal <= 18 digits Nodecimal > 18 digits, binary, varbinary, CLR, (n)varchar(max), varbinary (max), uniqueidentifier, datetimeoffset with precision > 2 26 April 2012PNWSQL
27
Loading new data Table can be read, not updated ▫Partition switching is allowed ▫INSERT, UPDATE, DELETE, and MERGE not allowed Methods for loading data ▫Disable, update, rebuild ▫Partition switching ▫UNION ALL between large table with columnstore index and smaller updateable table April 2012PNWSQL 27
28
When to build a columnstore index Workload ▫Read mostly ▫Most updates are appends ▫Star joins ▫Queries that scan and aggregate large data volumes Workflow ▫Permits partition switching (or drop and rebuild index) ▫Typically nightly load window Table size ▫Large fact tables ▫Consider for large dimension tables ▫Very wide tables April 2012PNWSQL 28
29
When not to build a columnstore index Workload ▫Frequent loads ▫Many updates and deletes to existing data Especially if in multiple/unpredictable partitions ▫Frequent small look-up queries B-tree indexes may give better performance ▫Your workload does not benefit Workflow ▫Partition switching or rebuilding the index does not fit your workflow April 2012PNWSQL 29
30
Best practices for creating the index Use a star schema when possible ▫Build CS index on fact tables ▫Consider for large dimension tables Include all the columns in the CS index ▫Don’t use to seek into a row ▫Order of listed columns not important Convert decimal to precision <= 18 if possible Use integer types whenever possible April 2012PNWSQL 30
31
Best practices for creating the index Ensure enough memory to build the CS index Consider table partitioning to facilitate updates Consider creating the CS index from a clustered index ▫Better segment elimination when predicate on key ▫Slightly better compression (no RID) April 2012PNWSQL 31
32
Best practices for writing queries Consider modifying queries to hit the “sweet spot” ▫Star joins ▫Inner joins ▫Group By Keep statistics up to date Use MAXDOP > 1 ▫Batch mode processing only for parallel queries April 2012PNWSQL 32
33
Troubleshooting: Creating the index Are you getting out of memory errors? April 2012PNWSQL 33
34
Troubleshooting: Creating the index Are you getting out of memory errors? ▫Ensure enough memory ▫Memory requirement related to #cols, data, DOP ▫Memory available ≠ memory on the box when concurrent activity ▫By default, query is restricted to 25% even when RG not enabled ▫Check showplan XML for memory grant info ▫Rough estimate (see FAQs on technet wiki):FAQs Memory grant request in MB = [(4.2 * Num of columns in the CS index) + 68] * DOP + (Num of string cols * 34) April 2012PNWSQL 34
35
Troubleshooting: Creating the index Why is my index not building in parallel? April 2012PNWSQL 35
36
Troubleshooting: Creating the index Why is my index not building in parallel? ▫Index build is parallel only if table has > 1 M rows April 2012PNWSQL 36
37
Troubleshooting: Creating the index Why is my index not building in parallel? ▫Index build is parallel only if table has > 1 M rows How big is my columnstore index? April 2012PNWSQL 37
38
Troubleshooting: Creating the index Why is my index not building in parallel? ▫Index build is parallel only if table has > 1 M rows How big is my columnstore index? ▫For size and other info, check new catalog views Sys.column_store_segments Sys.column_store_dictionaries ▫Queries in the FAQ make it easy April 2012PNWSQL 38
39
Troubleshooting: Query performance Is the columnstore index being used? April 2012PNWSQL 39
40
Troubleshooting: Query performance Is the columnstore index being used? April 2012PNWSQL 40
41
Troubleshooting: Query performance If the columnstore index is not being used: ▫Are all needed columns present? ▫Cardinality estimate? If selective, optimizer will choose a B-tree Are other nonclustered indexes being used? ▫Too many indexes + bad statistics optimizer confusion ▫Consider using hints and/or disabling other indexes If the columnstore is being used, are there other issues? ▫Sorts, spills? ▫Table spools? ▫Is a lot of data being returned to the client? Not all bottlenecks are query processing April 2012PNWSQL 41
42
Troubleshooting: Query performance Is batch mode being used to process most of the data? April 2012PNWSQL 42
43
Troubleshooting: Query performance Is batch mode being used to process most of the data? April 2012PNWSQL 43
44
Troubleshooting: Query performance If batch mode is not being used to process most of the data ▫Is there a columnstore index being used? ▫Outer joins? ▫DOP? ▫Loop join? Check cardinality estimate ▫Operators not enabled for batch mode? Batch-enabled: Scan, filter, project Local hash partial aggregation Hash inner join, hash table build April 2012PNWSQL 44
45
Troubleshooting: Query performance Filters or joins on strings? ▫Filters on strings are not pushed into storage engine ▫Joins on integers are more efficient Filter with “OR”? ▫IN-lists but not OR filters pushed down Hash tables don’t fit into memory? ▫Usually due to small memory grant based on CE error, not physical memory limitation ▫Fall back to row mode processing ▫Slower than a row mode join April 2012PNWSQL 45
46
Real customer experiences Customer Type Industry segment/ Application MeasureWithout ColumnStore Index (sec) With ColumnStore Index Improvement ExternalOnline services Query10203 340x ExternalRetailQuery108063 17.1x ExternalHealthcareSet of 6 Queries 7389412782 5.8x InternalHR reporting Avg. response time on production system 22066 3.3x InternalFinancial reporting 3 Queries Each > 50x InternalFinancial reporting Queries taking longer than 10 min 90% reduction April 2012PNWSQL 46
47
Take-aways Columnstore indexes can enable phenomenal performance gains Batch mode processing is an essential ingredient for speedup Some adjustments to schema and loading processes may be necessary Some queries can benefit from tuning Columnstore indexes are not a magic bullet April 2012PNWSQL 47
48
Resources Columnstore FAQ: ▫http://social.technet.microsoft.com/wiki/contents /articles/sql-server-columnstore-index-faq.aspxhttp://social.technet.microsoft.com/wiki/contents /articles/sql-server-columnstore-index-faq.aspx Tuning Guide: ▫http://social.technet.microsoft.com/wiki/contents /articles/sql-server-columnstore-performance- tuning.aspxhttp://social.technet.microsoft.com/wiki/contents /articles/sql-server-columnstore-performance- tuning.aspx SIGMOD paper: ▫http://dl.acm.org/citation.cfm?doid=1989323.198 9448http://dl.acm.org/citation.cfm?doid=1989323.198 9448 April 2012PNWSQL 48
49
Thank you! Questions? April 2012PNWSQL 49
50
Data Warehouse Workload April 2012PNWSQL 50
51
Data warehouse workload Read-mostly ▫Load large amounts of data ▫Append new data incrementally ▫Rarely update existing data ▫Often retain data for given window of time (e.g. 1 yr, 3 yr, 7 yr) Sliding window data management Queries touch large amounts of data ▫Join multiple tables ▫Large “fact” tables Star schema is common Star joins are common
52
Sliding window 2010 2009 2008 2007 2011 2010 2009 2008 2010 2009 2008 2007
53
Star schema FactSales DimCustomer FactSales(CustomerKey int, ProductKey int, EmployeeKey int, StoreKey int, OrderDateKey int, SalesAmount money) DimCustomer(CustomerKey int, FirstName nvarchar(50), LastName nvarchar(50), Birthdate date, EmailAddress nvarchar(50)) DimProduct … DimDate DimEmployee DimStore
54
Star join query SELECT TOP 10 p.ModelName, p.EnglishDescription, SUM(f.SalesAmount) as SalesAmount FROM FactResellerSalesPart f, DimProduct p, DimEmployee e WHERE f.ProductKey=p.ProductKey AND e.EmployeeKey=f.EmployeeKey AND f.OrderDateKey >= 20030601 AND p.ProductLine = 'M' -- Mountain AND p.ModelName LIKE '%Frame%' AND e.SalesTerritoryKey = 1 GROUP BY p.ModelName, p.EnglishDescription ORDER BY SUM(f.SalesAmount) desc;
55
“Typical” data warehouse queries Process large amounts of data ▫Joins, aggregation, filtering Reporting queries Ad hoc queries Often slow (minutes to hours) DBAs spend considerable effort ▫Designing indexes, tuning queries ▫Building summary tables, indexed views, OLAP cubes
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.