Turbocharge your DW Queries with ColumnStore Indexes Susan Price Senior Program Manager DW and Big Data.

Slides:

Advertisements

Similar presentations

Extreme Performance with Oracle Data Warehousing

Advertisements

Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.

Parallel Query Processing in SQL Server Lubor Kollar.

SQL Server 2012 Data Warehousing Deep Dive Dejan Sarka, SolidQ

SQL SERVER 2012 XVELOCITY COLUMNSTORE INDEX Conor Cunningham Principal Architect SQL Server Engine.

DBI 312. SELECT prod.. FROM Product,,,, WHERE ….

Big Data Working with Terabytes in SQL Server Andrew Novick

Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.

Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.

Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.

Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)

Physical Database Design CIT alternate keys - named constraints - indexes.

1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.

Agenda 10 Key SQL 2012 BI Innovations BI Semantic Model Project ‘Apollo’ Vertipaq xVelocity in SQL 2012.

Making Data Warehouse Easy Conor Cunningham – Principal Architect Thomas Kejser – Principal PM.

Cloud Computing Lecture Column Store – alternative organization for big relational data.

Module 8 Improving Performance through Nonclustered Indexes.

1 Physical Data Organization and Indexing Lecture 14.

Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.

Module 7 Reading SQL Server® 2008 R2 Execution Plans.

Ashwani Roy Understanding Graphical Execution Plans Level 200.

Columnstore Indexes in SQL Server 2012 Conor Cunningham Principal Architect, Microsoft SQL Server Representing Microsoft Development.

SQL Server xVelocity memory optimized Columnstore Index Performance Tuning Rapinder Jawanda Sr. Program Manager Microsoft Corporation.

Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Parallel Execution Plans Joe Chang

1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.

Srik Raghavan Principal Lead Program Manager Kevin Cox Principal Program Manager SESSION CODE: DAT206.

Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,

Chapter 4 Logical & Physical Database Design

Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.

INTRODUCING SQL SERVER 2012 COLUMNSTORE INDEXES Exploring and Managing SQL Server 2012 Database Engine Improvements.

2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN Welcome November 2012 Columnstore Indexes.

Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.

Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.

Indexing Fundamentals Steve Hood SimpleSQLServer.com.

5 Trends in the Data Warehousing Space Source: TDWI Report – Next Generation DW.

October 15-18, 2013 Charlotte, NC Accelerating Database Performance Using Compression Joseph D’Antoni, Solutions Architect Anexinet.

--A Gem of SQL Server 2012, particularly for Data Warehousing-- Present By Steven Wang.

SQLUG.be Case study: Redesign CDR archiving on SQL Server 2012 By Ludo Bernaerts April 16,2012.

Scott Fallen Sales Engineer, SQL Sentry Blog: scottfallen.blogspot.com.

Execution Plans Detail From Zero to Hero İsmail Adar.

Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.

Doing fast! Optimizing Query performance with ColumnStore Indexes in SQL Server 2012 Margarita Naumova | SQL Master Academy.

Best Practices for Columnstore Indexes Warner Chaves SQL MCM / MVP SQLTurbo.com Pythian.com.

Columnstore Indexing: From SQL Server 2012 to SQL Server 2014

Lecture 16: Data Storage Wednesday, November 6, 2006.

Blazing-Fast Performance:

Introduction to Execution Plans

Evaluation of Relational Operations: Other Operations

ColumnStore Index Primer

Azure SQL Data Warehouse Performance Tuning

Introduction to columnstore indexes

TechEd /20/ :49 PM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.

20 Questions with Azure SQL Data Warehouse

11/29/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.

Steve Hood SimpleSQLServer.com

TechEd /2/2018 7:32 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.

Sunil Agarwal | Principal Program Manager

Introduction to Execution Plans

Clustered Columnstore Indexes (SQL Server 2014)

Implementation of Relational Operations

Evaluation of Relational Operations: Other Techniques

Diving into Query Execution Plans

Introduction to Execution Plans

Using Columnstore indexes in Azure DevOps Services. Lessons learned

Using Columnstore indexes in Azure DevOps Services. Lessons learned

Introduction to Execution Plans

Evaluation of Relational Operations: Other Techniques

SQL Server Columnar Storage

Using Columnstore indexes in Azure DevOps Services. Lessons learned.

Presentation transcript:

Turbocharge your DW Queries with ColumnStore Indexes Susan Price Senior Program Manager DW and Big Data

Waiting … April 2012PNWSQL 2

Waiting … April 2012PNWSQL 3

Why use ColumnStore Indexes? Faster, interactive query response time ▫Easier data exploration ▫Better decisions Reduced physical DB design effort ▫Fewer indexes ▫Reduced need for summary aggregates and indexed views ▫May eliminate need for OLAP cubes ▫Transparent to the application Lower TCO April 2012PNWSQL 4

Demo April 2012PNWSQL 5

Agenda Columnstore indexes Batch mode processing How to use ColumnStore Indexes Best practices Troubleshooting ColumnStore Indexes Resources for more information April 2012PNWSQL 6

How do columnstore indexes speed up queries? 7 Columnstore indexes store data column-wise Each page stores data from a single column Highly compressed About 2x better than PAGE compression More data fits in memory Each column can be accessed independently Fetch only needed columns Can dramatically decrease IO … C1C2C3C4 Heaps, B-trees store data row-wise April 2012PNWSQL

Columnstore index Column Segment Segment contains values from one column for a set of rows Segments for the same set of rows comprise a row group Segments are compressed Each segment stored in a separate LOB Segment is unit of transfer between disk and memory C1 C2 C3 C5C6C4 8 April 2012PNWSQL

Index creation and storage 9 Base table ABCD Encode, compress Encode, compress Encode, compress Compressed column segments 1M rows/group Column store index Blobs Row group Row group Row group Segment directory New system table: sys.column_store_segments Includes segment metadata: size, min, max, … April 2012PNWSQL

Observed compression ratios 10 Data Set Uncompressed table size (MB) Column store index size (MB) Compression Ratio Cosmetics1, SQM1, Xbox1, MSSales642,000126, Web Analytics2, Telecom2, X better compression than SQL’s page compression April 2012PNWSQL

Columnstore index example OrderDateKeyProductKeyStoreKeyRegionKeyQuantitySalesAmount April 2012PNWSQL 11

Horizontally partition (Row Groups) OrderDateKeyProductKeyStoreKeyRegionKeyQuantitySalesAmount OrderDateKeyProductKeyStoreKeyRegionKeyQuantitySalesAmount April 2012PNWSQL 12

Vertically partition (Segments) OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount April 2012PNWSQL 13

Compress each segment* OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount April 2012PNWSQL 14

Fetch only needed columns StoreKey StoreKey RegionKey Quantity OrderDateKey OrderDateKey ProductKey ProductKey SalesAmount SalesAmount April 2012PNWSQL 15

Creating ColumnStore Indexes Create a columnstore index Create the table Load data into the table Create a non-clustered columnstore index on all, or some, columns CREATE NONCLUSTERED COLUMNSTORE INDEX ncci ON myTable(OrderDate, ProductID, SaleAmount) Object Explorer April 2012PNWSQL 16

Memory management Memory management is automatic Columnstore is persisted on disk Needed columns fetched into memory Columnstore segments flow between disk and memory SELECT C2, SUM(C4) FROM T GROUP BY C2; T.C2 T.C4 T.C2 T.C4 T.C2 T.C1 T.C3 T.C4 17 April 2012PNWSQL

IO and caching New (large) object cache ▫Cache for column segments and dictionaries Aggressive read ahead ▫At segment level ▫At page level within segment New memory broker ▫Brokers memory between buffer pool and object cache 18 April 2012PNWSQL

Data reduction Early segment elimination based on segment metadata ▫Min and max values stored in metadata for each segment Simple filters evaluated in storage engine during CS index scan ▫Conjunctions of comparisons, in-list Bitmap filters ▫Evaluated during index scan ▫Built by Hash Table Build operator 19 April 2012PNWSQL

Min: Max: Segment elimination Min: Max: OrderDateKey ProductKey SalesAmount OrderDateKey ProductKey SalesAmount April 2012PNWSQL

Segment elimination 21 Min: Max: Min: Max: April 2012PNWSQL

Segment elimination Best practice: Create CS index from a clustered index ▫Rows distributed to row groups in clustered index order ▫Does not affect ordering within segments  VertiPaq orders data within row groups ▫Good segment elimination for filters on leading key column using min/max values 22 April 2012PNWSQL

Batch mode query execution Vector-oriented processing Compact data representation Highly efficient algorithms Better parallelism 23 Would you rather process your data like this … … or like this? April 2012PNWSQL

Processing data Columnstore index scan can produce batches or rows ▫Batch-enabled operators get batches ▫Non-batch operators get rows Query optimizer decides List of qualifying rows Column vectors Batch object 24 April 2012PNWSQL

Using ColumnStore Indexes Let the query optimizer do the work ▫Optimizer makes a cost-based decision  Data access method  Columnstore index | B-tree index | Heap  Processing mode  Batch mode | Row mode Most things “just work” ▫Backup and restore ▫Mirroring, log shipping ▫SSMS April 2012PNWSQL 25

Limitations on using columnstore indexes Creating columnstore index ▫Only on common business data types Maintain table: limited operations ▫Can read but not update the data ▫Can switch partitions in and out Processing queries ▫All read-only T-SQL queries run ▫Some queries are accelerated more than others Yesint, real, string, money, datetime, decimal <= 18 digits Nodecimal > 18 digits, binary, varbinary, CLR, (n)varchar(max), varbinary (max), uniqueidentifier, datetimeoffset with precision > 2 26 April 2012PNWSQL

Loading new data Table can be read, not updated ▫Partition switching is allowed ▫INSERT, UPDATE, DELETE, and MERGE not allowed Methods for loading data ▫Disable, update, rebuild ▫Partition switching ▫UNION ALL between large table with columnstore index and smaller updateable table April 2012PNWSQL 27

When to build a columnstore index Workload ▫Read mostly ▫Most updates are appends ▫Star joins ▫Queries that scan and aggregate large data volumes Workflow ▫Permits partition switching (or drop and rebuild index) ▫Typically nightly load window Table size ▫Large fact tables ▫Consider for large dimension tables ▫Very wide tables April 2012PNWSQL 28

When not to build a columnstore index Workload ▫Frequent loads ▫Many updates and deletes to existing data  Especially if in multiple/unpredictable partitions ▫Frequent small look-up queries  B-tree indexes may give better performance ▫Your workload does not benefit Workflow ▫Partition switching or rebuilding the index does not fit your workflow April 2012PNWSQL 29

Best practices for creating the index Use a star schema when possible ▫Build CS index on fact tables ▫Consider for large dimension tables Include all the columns in the CS index ▫Don’t use to seek into a row ▫Order of listed columns not important Convert decimal to precision <= 18 if possible Use integer types whenever possible April 2012PNWSQL 30

Best practices for creating the index Ensure enough memory to build the CS index Consider table partitioning to facilitate updates Consider creating the CS index from a clustered index ▫Better segment elimination when predicate on key ▫Slightly better compression (no RID) April 2012PNWSQL 31

Best practices for writing queries Consider modifying queries to hit the “sweet spot” ▫Star joins ▫Inner joins ▫Group By Keep statistics up to date Use MAXDOP > 1 ▫Batch mode processing only for parallel queries April 2012PNWSQL 32

Troubleshooting: Creating the index Are you getting out of memory errors? April 2012PNWSQL 33

Troubleshooting: Creating the index Are you getting out of memory errors? ▫Ensure enough memory ▫Memory requirement related to #cols, data, DOP ▫Memory available ≠ memory on the box when concurrent activity ▫By default, query is restricted to 25% even when RG not enabled ▫Check showplan XML for memory grant info ▫Rough estimate (see FAQs on technet wiki):FAQs Memory grant request in MB = [(4.2 * Num of columns in the CS index) + 68] * DOP + (Num of string cols * 34) April 2012PNWSQL 34

Troubleshooting: Creating the index Why is my index not building in parallel? April 2012PNWSQL 35

Troubleshooting: Creating the index Why is my index not building in parallel? ▫Index build is parallel only if table has > 1 M rows April 2012PNWSQL 36

Troubleshooting: Creating the index Why is my index not building in parallel? ▫Index build is parallel only if table has > 1 M rows How big is my columnstore index? April 2012PNWSQL 37

Troubleshooting: Creating the index Why is my index not building in parallel? ▫Index build is parallel only if table has > 1 M rows How big is my columnstore index? ▫For size and other info, check new catalog views  Sys.column_store_segments  Sys.column_store_dictionaries ▫Queries in the FAQ make it easy April 2012PNWSQL 38

Troubleshooting: Query performance Is the columnstore index being used? April 2012PNWSQL 39

Troubleshooting: Query performance Is the columnstore index being used? April 2012PNWSQL 40

Troubleshooting: Query performance If the columnstore index is not being used: ▫Are all needed columns present? ▫Cardinality estimate?  If selective, optimizer will choose a B-tree Are other nonclustered indexes being used? ▫Too many indexes + bad statistics  optimizer confusion ▫Consider using hints and/or disabling other indexes If the columnstore is being used, are there other issues? ▫Sorts, spills? ▫Table spools? ▫Is a lot of data being returned to the client?  Not all bottlenecks are query processing April 2012PNWSQL 41

Troubleshooting: Query performance Is batch mode being used to process most of the data? April 2012PNWSQL 42

Troubleshooting: Query performance Is batch mode being used to process most of the data? April 2012PNWSQL 43

Troubleshooting: Query performance If batch mode is not being used to process most of the data ▫Is there a columnstore index being used? ▫Outer joins? ▫DOP? ▫Loop join? Check cardinality estimate ▫Operators not enabled for batch mode?  Batch-enabled:  Scan, filter, project  Local hash partial aggregation  Hash inner join, hash table build April 2012PNWSQL 44

Troubleshooting: Query performance Filters or joins on strings? ▫Filters on strings are not pushed into storage engine ▫Joins on integers are more efficient Filter with “OR”? ▫IN-lists but not OR filters pushed down Hash tables don’t fit into memory? ▫Usually due to small memory grant based on CE error, not physical memory limitation ▫Fall back to row mode processing ▫Slower than a row mode join April 2012PNWSQL 45

Real customer experiences Customer Type Industry segment/ Application MeasureWithout ColumnStore Index (sec) With ColumnStore Index Improvement ExternalOnline services Query x ExternalRetailQuery x ExternalHealthcareSet of 6 Queries x InternalHR reporting Avg. response time on production system x InternalFinancial reporting 3 Queries Each > 50x InternalFinancial reporting Queries taking longer than 10 min 90% reduction April 2012PNWSQL 46

Take-aways Columnstore indexes can enable phenomenal performance gains Batch mode processing is an essential ingredient for speedup Some adjustments to schema and loading processes may be necessary Some queries can benefit from tuning Columnstore indexes are not a magic bullet April 2012PNWSQL 47

Resources Columnstore FAQ: ▫ /articles/sql-server-columnstore-index-faq.aspxhttp://social.technet.microsoft.com/wiki/contents /articles/sql-server-columnstore-index-faq.aspx Tuning Guide: ▫ /articles/sql-server-columnstore-performance- tuning.aspxhttp://social.technet.microsoft.com/wiki/contents /articles/sql-server-columnstore-performance- tuning.aspx SIGMOD paper: ▫ 9448http://dl.acm.org/citation.cfm?doid= April 2012PNWSQL 48

Thank you! Questions? April 2012PNWSQL 49

Data Warehouse Workload April 2012PNWSQL 50

Data warehouse workload Read-mostly ▫Load large amounts of data ▫Append new data incrementally ▫Rarely update existing data ▫Often retain data for given window of time (e.g. 1 yr, 3 yr, 7 yr)  Sliding window data management Queries touch large amounts of data ▫Join multiple tables ▫Large “fact” tables Star schema is common Star joins are common

Sliding window

Star schema FactSales DimCustomer FactSales(CustomerKey int, ProductKey int, EmployeeKey int, StoreKey int, OrderDateKey int, SalesAmount money) DimCustomer(CustomerKey int, FirstName nvarchar(50), LastName nvarchar(50), Birthdate date, Address nvarchar(50)) DimProduct … DimDate DimEmployee DimStore

Star join query SELECT TOP 10 p.ModelName, p.EnglishDescription, SUM(f.SalesAmount) as SalesAmount FROM FactResellerSalesPart f, DimProduct p, DimEmployee e WHERE f.ProductKey=p.ProductKey AND e.EmployeeKey=f.EmployeeKey AND f.OrderDateKey >= AND p.ProductLine = 'M' -- Mountain AND p.ModelName LIKE '%Frame%' AND e.SalesTerritoryKey = 1 GROUP BY p.ModelName, p.EnglishDescription ORDER BY SUM(f.SalesAmount) desc;

“Typical” data warehouse queries Process large amounts of data ▫Joins, aggregation, filtering Reporting queries Ad hoc queries Often slow (minutes to hours) DBAs spend considerable effort ▫Designing indexes, tuning queries ▫Building summary tables, indexed views, OLAP cubes