Columnstore Indexes in SQL Server 2012 Conor Cunningham Principal Architect, Microsoft SQL Server Representing Microsoft Development.

Slides:



Advertisements
Similar presentations
Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.
Advertisements

SQL SERVER 2012 XVELOCITY COLUMNSTORE INDEX Conor Cunningham Principal Architect SQL Server Engine.
Big Data Working with Terabytes in SQL Server Andrew Novick
Technical BI Project Lifecycle
1. Aim High with Oracle Real World Performance Andrew Holdsworth Director Real World Performance Group Server Technologies.
Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)
Project Management Database and SQL Server Katmai New Features Qingsong Yao
6.814/6.830 Lecture 8 Memory Management. Column Representation Reduces Scan Time Idea: Store each column in a separate file GM AAPL.
Presented by Marie-Gisele Assigue Hon Shea Thursday, March 31 st 2011.
Meanwhile RAM cost continues to drop Moore’s Law on total CPU processing power holds but in parallel processing… CPU clock rate stalled… Because.
Making Data Warehouse Easy Conor Cunningham – Principal Architect Thomas Kejser – Principal PM.
Architecting a Large-Scale Data Warehouse with SQL Server 2005 Mark Morton Senior Technical Consultant IT Training Solutions DAT313.
Query Optimizer Overview Conor Cunningham Principal Architect, SQL Server Query Processor Representing Microsoft Serbia Development Center 1.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1 Preview of Oracle Database 12 c In-Memory Option Thomas Kyte
Realizing Business Insights with PowerPivot
Update Queries Deep Dive Conor Cunningham, Principal Software Architect, SQL QP Team, Microsoft.
Overview of SQL Server Alka Arora.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
1 The Instant Data Warehouse Released 15/01/ Hello and Welcome!! Today I am very pleased to announce the release of the 'Instant Data Warehouse'.
Views Lesson 7.
Da li su kvalitetna SharePoint rješenja samo mit? Adis Jugo, PlanB. GmbH
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
INTRODUCING SQL SERVER 2012 COLUMNSTORE INDEXES Exploring and Managing SQL Server 2012 Database Engine Improvements.
2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN Welcome November 2012 Columnstore Indexes.
Session id: Darrell Hilliard Senior Delivery Manager Oracle University Oracle Corporation.
 An independent SQL Consultant  A user of SQL Server from version 2000 onwards with 12+ years experience.
Best Practices in Loading Large Datasets Asanka Padmakumara (BSc,MCTS) SQL Server Sri Lanka User Group Meeting Oct 2013.
October 15-18, 2013 Charlotte, NC Accelerating Database Performance Using Compression Joseph D’Antoni, Solutions Architect Anexinet.
--A Gem of SQL Server 2012, particularly for Data Warehousing-- Present By Steven Wang.
How to kill SQL Server Performance Håkan Winther.
SQLUG.be Case study: Redesign CDR archiving on SQL Server 2012 By Ludo Bernaerts April 16,2012.
OM. Platinum Level Sponsors Gold Level Sponsors Pre Conference Sponsor Venue Sponsor Key Note Sponsor.
Scott Fallen Sales Engineer, SQL Sentry Blog: scottfallen.blogspot.com.
Turbocharge your DW Queries with ColumnStore Indexes Susan Price Senior Program Manager DW and Big Data.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
Oracle Announced New In- Memory Database G1 Emre Eftelioglu, Fen Liu [09/27/13] 1 [1]
Doing fast! Optimizing Query performance with ColumnStore Indexes in SQL Server 2012 Margarita Naumova | SQL Master Academy.
Best Practices for Columnstore Indexes Warner Chaves SQL MCM / MVP SQLTurbo.com Pythian.com.
Carlos Bossy Quanta Intelligence SQL Server MCTS, MCITP BI CBIP, Data Mining Real-time Data Warehouse and Reporting Solutions.
Indexing strategies and good physical designs for performance tuning Kenneth Ureña /SpanishPASSVC.
Soprex framework on .NET in action
Introduction to Partitioning in SQL Server
Data Warehouse ETL By Garrett EDmondson Thanks to our Gold Sponsors:
- for the SSASMD Developer
Four Rules For Columnstore Query Performance
A developers guide to Azure SQL Data Warehouse
Database Administration for the Non-DBA
Blazing-Fast Performance:
ColumnStore Index Primer
SQL 2014 In-Memory OLTP What, Why, and How
11/29/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Azure SQL DWH: Optimization
Sunil Agarwal | Principal Program Manager
Four Rules For Columnstore Query Performance
Introduction to Execution Plans
Clustered Columnstore Indexes (SQL Server 2014)
Execution plans Eugene
Diving into Query Execution Plans
Applying Data Warehouse Techniques
Introduction to Execution Plans
From adaptive to intelligent: query processing in SQL Server 2019
Using Columnstore indexes in Azure DevOps Services. Lessons learned
Using Columnstore indexes in Azure DevOps Services. Lessons learned
Introduction to Execution Plans
Applying Data Warehouse Techniques
An Introduction to Partitioning
Presentation transcript:

Columnstore Indexes in SQL Server 2012 Conor Cunningham Principal Architect, Microsoft SQL Server Representing Microsoft Development Center Serbia

What This Talk Covers SQL Server’s upcoming “Denali” release contains a new feature for Data Warehouses to speed up Data Warehouse queries This talk provides an overview of the new surface area and some details about how it works

Who am I? I’ve worked at Microsoft for the SQL Core Engine team as an Architect for many years I work mostly on Query Processors I wrote the SQL 2008 Internals book on how the Query Optimizer works I blog at “Conor vs. SQL” I like to talk to customers about how they use the product so that I improve things in future releases

Agenda Data Warehouse Introduction New Feature and Demo How the Feature Works Restrictions in this release

Data Warehouse Introduction Data Warehouses support reporting and business intelligence operations in organizations Store facts that can be aggregated over different dimensions They often store lots and lots of facts (rows) –This leads to a design pattern called a star schema where fact tables are “over”-normalized to reduce row width –Dimension tables are frequently joined –Results are very often aggregated Example: Show me the sales totals for each department by month for the past 3 years

Data Warehouse Challenges These kinds of databases become difficult once they get big. –Query latency –ETL load times –Backup time and size –Index rebuilding –Finding time to load new data –Query plan selection issues/limitations 6

Opportunity What If… –We made DW queries 10+ times faster? Example – Business Analyst does ROLAP reports against SQL Server 2008 –Click to drill down into a report –Go get some coffee –Click again –Go get more coffee We aim to make that experience interactive –(However, coffee shop profits may plunge!)

Demo 8

How Does It Work? 9 New Index Type – ColumnStore New Query Execution Algorithms – “Batch” mode Specifically Target Star Join Queries ▫ Not all queries are faster in the initial release ▫ Customers will want to consider this in their application design Supported Pattern: SELECT SUM(…), cols FROM FactTbl JOIN DimTbl1 JOIN DimTbl2 … WHERE … GROUP BY cols

Index Storage Design Column-Orientation –Store data vertically instead of per-row –String Dictionaries for variable-length data Segment data into groups (1 million rows/group) Benefits –DW queries usually pick only a subset of columns –You can do the IO only for those columns –We can also compress that data effectively since it often has lots of duplicates –Space savings of 1.5x-2x vs. a row-based page-compressed equivalent IO Patterns for (CI Scan, Column-based scan of 3 cols, Column-based w/Compression)

Speedup from the Index If the IO required is cut in half… –We don’t get to 10 times faster (yet) –We need to improve the memory utilization and CPU utilization to get the rest of the speedup So how do we improve Query Execution 10x??? 11

What takes time in a CPU? Memory IO takes time –Cache Misses stall the CPU –L2 cache misses stall the CPU even more –So we reduced cache misses Instructions take time –Instructions also go through the caches –So we reduced instructions Disk Access takes time –So we biased the memory policies for this index to work best when in memory Over time, CPU speed has increased faster than memory speed, making all of these worse 12

Query Execution Row Mode Changes Each operator calls child for each row This works fine for smaller numbers of rows, poorly for batches In bigger queries, CPU cycles instructions in and out of the CPU (L2 cache misses) So this model suffers in DW with too many instructions, too many cache misses 13

Batch Format Column-Oriented Sized to fit within L2 cache Multiple Operators work on a batch sequentially Goal: Reduce avg. per-tuple cost –Compression –Reducing L2 data and instruction cache misses –Probabilistic data representations –Probabilistic operator execution algorithms This gets us to 10x faster (avg)

SQL 2012 Restrictions Create index: –Only on common business data types Maintain table: limited operations –Can read but not update the data –However: One can switch partitions in and out Process queries: all read-only T-SQL queries run –Some queries are accelerated more than others Yesint, real, string, money, datetime, decimal <= 18 digits Nodecimal > 18 digits, binary, BLOB, CLR, (n)varchar(max), uniqueidentifier, datetimeoffset with precision > 2 15

Using Apollo: Loading new data Table with columnstore index can be read, not updated –Partition switching is allowed –INSERT, UPDATE, DELETE, and MERGE not allowed Three possible methods for loading data –Disable, update, rebuild –Partition switching –UNION ALL between large table with columnstore and smaller updateable table 16

Query performance issues Not all operators are batch-mode enabled –Scan, Filter, Project –Local hash partial aggregation –Hash inner join, hash table build Only parallel queries can use batch mode If hash tables don’t fit into memory, fall back to row-mode processing –Memory grant request depends on cardinality est. –Falling back to row-mode is slow 17

Revisiting Our Example Scenario For SQL Server 2012, our customer will be able to: –Have specific queries go very fast (with less coffee) –DW Application developers Must design their code to load/unload data online Can use hints to control user experience for the fast and slow cases –Hint index – if it fails to get a plan, then you can present UI to the user to “maybe go get coffee” and then run in row mode This story will continue to improve as we add more capabilities to Batch processing 18

Summary New Index and Execution Algorithms for DW Significant speedup for conforming applications Opportunities for customers who can build their code to leverage the benefits 19

Thank You! Questions? 20

Microsoft and Open Source gateway for deeper exploration of open source engagements openness Port25 blogs from the platform community and the OSS Lab teams Codeplex resources for developers and consumers of open source projects Interoperability Bridges technical collaborative works bridges.com Open Up cross-Industry Interoperability and Standards activities interop/openup Shared Source portal for programmatically sharing code sharedsource OData open source starter kit for Internet publishing of Government datasets using the Open Data Open Spec protocols, file formats, standards, technical specifications openspecifications BizSpark Program for Start-Up companies from both commercial and open source backgrounds bizspark Openness and How can I receive up-to-date Openness announcements from Microsoft? In addition to the websites above, you can receive regular updates to Microsoft’s openness, interoperability and standards efforts via the following channels:

Help us choose the best Sinergija lecturer! Telekom Srbija and Microsoft will award you – at the conference end, we’ll give one HTC Mozart WP7 phone to someone from the audience – randomly. Go to log in and cast your votes. Please rate this lecture and WIN HTC MOZART! You can rate only lectures that you were present at, just once. More lectures you rate, more chances you have. Please use computers at the front of this room, or rate lecture from your phone or home computer, at Sinergija portal. This prize contest will end at Thursday, October 20 th at 9 PM. Winner will be announced at the official Sinergija web portal, is a friend of Sinergija 2011 Conference and Imagine Cup student competition in Serbia.