 An independent SQL Consultant  A user of SQL Server from version 2000 onwards with 12+ years experience.

Slides:

Advertisements

Similar presentations

1 Copyright © 2012 Oracle and/or its affiliates. All rights reserved. Convergence of HPC, Databases, and Analytics Tirthankar Lahiri Senior Director, Oracle.

Advertisements

Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.

Parallel Query Processing in SQL Server Lubor Kollar.

Arjun Suresh S7, R College of Engineering Trivandrum.

SQL SERVER 2012 XVELOCITY COLUMNSTORE INDEX Conor Cunningham Principal Architect SQL Server Engine.

Query Processing and Optimizing on SSDs Flash Group Qingling Cao

Big Data Working with Terabytes in SQL Server Andrew Novick

Instructor: Sazid Zaman Khan Lecturer, Department of Computer Science and Engineering, IIUC.

Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)

Presented by Marie-Gisele Assigue Hon Shea Thursday, March 31 st 2011.

Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.

Last Time –Main memory indexing (T trees) and a real system. –Optimize for CPU, space, and logging. But things have changed drastically! Hardware trend:

PhD/Master course, Uppsala  Understanding the interaction between your program and computer  Structuring the code  Optimizing the code  Debugging.

SQL Server Query Optimizer Cost Formulas Joe Chang

Parallel Execution Plans Joe Chang

Cloud Computing Lecture Column Store – alternative organization for big relational data.

Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.

Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.

Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Columnstore Indexes in SQL Server 2012 Conor Cunningham Principal Architect, Microsoft SQL Server Representing Microsoft Development.

Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day14:

L7: Performance Frans Kaashoek Spring 2013.

Parallel Execution Plans Joe Chang

Parallel Execution Plans Joe Chang

TPC-H Studies Joe Chang

Query Optimizer Execution Plan Cost Model Joe Chang

Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.

INTRODUCING SQL SERVER 2012 COLUMNSTORE INDEXES Exploring and Managing SQL Server 2012 Database Engine Improvements.

Sofia Event Center November 2013 Margarita Naumova SQL Master Academy.

IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.

1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.

DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.

October 15-18, 2013 Charlotte, NC Accelerating Database Performance Using Compression Joseph D’Antoni, Solutions Architect Anexinet.

--A Gem of SQL Server 2012, particularly for Data Warehousing-- Present By Steven Wang.

Deep Dive into SQL Server Batch Mode and CPU Architectures Level 400.

SQLUG.be Case study: Redesign CDR archiving on SQL Server 2012 By Ludo Bernaerts April 16,2012.

Scott Fallen Sales Engineer, SQL Sentry Blog: scottfallen.blogspot.com.

Execution Plans Detail From Zero to Hero İsmail Adar.

Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.

SQL Server Internals 101 AYMAN SENIOR MICROSOFT.

1 Components of the Virtual Memory System  Arrows indicate what happens on a lw virtual address data physical address TLB page table memory cache disk.

DBA Level 400. Stick around for RAFFLE and the AFTER EVENT!

Honest Bob’s Cube Processing Bob Duffy Database Architect.

Doing fast! Optimizing Query performance with ColumnStore Indexes in SQL Server 2012 Margarita Naumova | SQL Master Academy.

Ayman El-Ghazali Senior Microsoft.

Flash Storage 101 Revolutionizing Databases

Lecture 16: Data Storage Wednesday, November 6, 2006.

Query Tuning without Production Data

Query Tuning without Production Data

Query Tuning without Production Data

Memory Efficient Hash Joins

Software Architecture in Practice

CS : Technology Trends August 31, 2015 Ion Stoica and Ali Ghodsi (

Four Rules For Columnstore Query Performance

Blazing-Fast Performance:

ColumnStore Index Primer

TechEd /20/ :49 PM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.

11/29/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.

KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures

TechEd /2/2018 7:32 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.

Realtime Analytics OLAP & OLTP in the mix

Sunil Agarwal | Principal Program Manager

File Storage and Indexing

SQL Server Query Optimizer Cost Formulas

Four Rules For Columnstore Query Performance

Introduction to Execution Plans

Introduction to Execution Plans

Introduction to Execution Plans

Fast Accesses to Big Data in Memory and Storage Systems

SQL Server Columnar Storage

Presentation transcript:

 An independent SQL Consultant  A user of SQL Server from version 2000 onwards with 12+ years experience.

CPU Cache, Memory and IO Subsystem Latency Core L1 L3 L2 1ns10ns 100ns 100us10ms 10us

C The “Cache out” Curve Throughput Touched Data Size CPU Cache TLB NUMA Remote Storage Every time we drop out of a cache and use the next slower one down, we pay a big throughput penalty

CPCaches Service Time + Wait Time C Sequential Versus Random Page CPU Cache Throughput

 “Transistors per square inch on integrated circuits has doubled every two years since the integrated circuit was invented”  Spinning disk state of play  Interfaces have evolved  Aerial density has increased  Rotation speed has peaked at 15K RPM  Not much else...  Up until NAND flash, disk based IO sub systems have not kept pace with CPU advancements.  With next generation storage ( resistance ram etc) CPUs and storage may follow the same curve. Moores Law Vs. Advancements In Disk Technology

Row by row How do rows travel between Iterators ? Control flow Data Flow

 Query execution which leverages CPU caches.  Break through levels of compression to bridge the performance gap between IO subsystems and modern processors.  Better query execution scalability as the degree of parallelism increase.

 First introduced in SQL Server 2012, greatly enhanced in 2014  A batch is roughly 1000 rows in size and it is designed to fit into the L2/3 cache of the CPU, remember the slide on latency.  Moving batches around is very efficient*: One test showed that regular row-mode hash join consumed about 600 instructions per row while the batch-mode hash join needed about 85 instructions per row and in the best case (small, dense join domain) was a low as 16 instructions per row. * From: Enhancements To SQL Server Column Stores Microsoft Research

SELECT p.EnglishProductName,SUM([OrderQuantity]),SUM([UnitPrice]),SUM([ExtendedAmount]),SUM([UnitPriceDiscountPct]),SUM([DiscountAmount]),SUM([ProductStandardCost]),SUM([TotalProductCost]),SUM([SalesAmount]),SUM([TaxAmt]),SUM([Freight]) FROM [dbo].[FactInternetSales] f JOIN [dbo].[DimProduct] p ON f.ProductKey = p.ProductKey GOUP BY p.EnglishProductName xperf –on base –stackwalk profile xperf –d stackwalk.etl xperfview stackwalk.etl

CPU Lob cache Load segments into blob cache Break blobs into batches and pipeline them into CPU cache Conceptual View..... and whats happening in the call stack

x12 at DOP 2

FeatureSQL Server 2012 SQL Server 2014 Presence of column store indexesYes Parallel execution planYes No outer joins, NOT Ins or UNION ALLsYesNo Hash joins do not spill from memoryYesNo Scalar aggregates cannot be usedYesNo

Batch mode Hash Match Aggregate 78,400 ms* * Timings are a statistical estimate Row mode Hash Match Aggregate 445,585 ms* Vs.

Colour Red Blue Green Dictionary Lookup IDLabel 1Red 2Blue 3Green Segment Lookup IDRun Length  Compressing data going down the column is far superior to compressing data going across the row, also we only retrieve the column data that is of interest.  Run length compression is used in order to achieve this.  SQL Server 2012 introduces column store compression..., SQL Server 2014 adds more features to this.

SQL Server 2014 Column Store Storage Internals Row Groups Columns A B C Encode and Compress Segments Store Blobs Encode & Compress Delta stores < 102,400 rows

Inserts of 102,400 rows and over Inserts less than 102,400 rows and updates update = insert into delta store + insert to the deletion bit map Delta store B-tree Column store segments Tuple mover Local Dictionary Global dictionary Deletion Bitmap

SELECT [ProductKey],[OrderDateKey],[DueDateKey],[ShipDateKey],[CustomerKey],[PromotionKey],[CurrencyKey]. INTO FactInternetSalesBig FROM [dbo].[FactInternetSales] CROSS JOIN master..spt_values AS a CROSS JOIN master..spt_values AS b WHERE a.type = 'p' ANDb.type = 'p' AND a.number <= 80 AND b.number <= ,116,038 rows 57 %74 % 92 % 94 % Size (Mb)

* Posts tables from the four largest stack exchanges combined ( superuser, serverfault, maths and Ubuntu ) 59 %53 % 64 % 72 %

FeatureSQL Server 2012 SQL Server 2014 Column store indexesYes Clustered column store indexesNoYes Updateable column store indexesNoYes Column store archive compressionNoYes Columns in a column store index can be droppedNoYes Support for GUID, binary, datetimeoffset precision > 2, numeric precision > 18.NoYes Enhanced compression by storing short strings natively ( instead of 32 bit IDs )NoYes Bookmark support ( row_group_id:tuple_id)NoYes Mixed row / batch mode executionNoYes Optimized hash build and join in a single iteratorNoYes Hash memory spills cause row mode executionNoYes Iterators supportedScan, filter, project, hash (inner) join and (local) hash aggregate Yes

Disclaimer: your own mileage may vary depending on your data, hardware and queries

Hardware  2 x 2.0 Ghz 6 core Xeon CPUs  Hyper threading enabled  22 GB memory  Raid 0: 6 x 250 GB SATA III HD 10K RPM  Raid 0: 3 x 80 GB Fusion IO Software  Windows server 2012  SQL Server 2014 CTP 2  AdventureWorksDW DimProductTable  Enlarged FactInternetSales table

SELECT SUM([OrderQuantity]),SUM([UnitPrice]),SUM([ExtendedAmount]),SUM([UnitPriceDiscountPct]),SUM([DiscountAmount]),SUM([ProductStandardCost]),SUM([TotalProductCost]),SUM([SalesAmount]),SUM([TaxAmt]),SUM([Freight]) FROM [dbo].[FactInternetSalesBig] 2050Mb/s 678Mb/s 256Mb/s 85% CPU 98% CPU 98% CPU

Page compression 1,340,097 ms* All stack trace timings are a statistical estimate No compression 545,761 ms* Vs.

52 Mb/s 27 Mb/s 99% CPU 56% CPU

Clustered column store index with archive compression 61,196 ms Clustered column store index 60,651 ms Vs.

CPU CPU used for IO consumption + CPU used for decompression < total CPU capacity Compression works for you What most people tend to have

CPU CPU used for IO consumption + CPU used for decompression > total CPU capacity Compression works against you  CPU used for IO consumption + CPU used for decompression = total CPU capacity Nothing to be gained or lost from using compression

SELECT p.EnglishProductName,SUM([OrderQuantity]),SUM([UnitPrice]),SUM([ExtendedAmount]),SUM([UnitPriceDiscountPct]),SUM([DiscountAmount]),SUM([ProductStandardCost]),SUM([TotalProductCost]),SUM([SalesAmount]),SUM([TaxAmt]),SUM([Freight]) FROM [dbo].[FactInternetSalesBig] f JOIN [dbo].[DimProduct] p ON f.ProductKey = p.ProductKey GROUP BY p.EnglishProductName  We will look at the best we can do without column store indexes:  Partitioned heap fact table with page compression for spinning disk  Partitioned heap fact table without any compression our flash storage  Non partitioned column store indexes on both types of store with and without archive compression.

Join Scalability DOP / Time (ms) Time (ms) Degree of parallelism

Join Scalability DOP / Time (ms)

 A simple join between a dimension and fact table using batch mode is an order of magnitude faster than the row mode equivalent.  For flash, the cost of decompressing the column store is more than offset by:  CPU cycle savings made by moving rows around in batches.  CPU cycles savings made through the reduction of cache misses.

Hypothesis: could main memory not being able to keep up ? WaitWait_SResource_SSignal_SWaits Percentage HTBUILD SOS_SCHEDULER_YIELD QUERY_TASK_ENQUEUE_MUTEX LATCH_EX HTDELETE Total spinlock spins =

Going past one memory channel per physical core

Memory bandwidth Function of: Memory channels Number of DIMMS DIMM speed = Total CPU core consumption capacity

 Enhancements To Column Store Indexes (SQL Server 2014 ) Microsoft Research Enhancements To Column Store Indexes  SQL Server Clustered Columnstore Tuple Mover Remus Rasanu SQL Server Clustered Columnstore Tuple Mover  SQL Server Columnstore Indexes at Teched 2013 Remus Rasanu SQL Server Columnstore Indexes at Teched 2013  The Effect of CPU Caches and Memory Access Patterns Thomas Kejser The Effect of CPU Caches and Memory Access Patterns

Thomas Kejser Former SQL CAT member and CTO of Livedrive

ChrisAdkin8