Optimizing SQL Server and Databases for large Fact Tables

Slides:

Advertisements

Similar presentations

Big Data Working with Terabytes in SQL Server Andrew Novick

Advertisements

Engenio 7900 HPC Storage System. 2 LSI Confidential LSI In HPC LSI (Engenio Storage Group) has a rich, successful history of deploying storage solutions.

BARBARIN DAVID SQL Server Senior Consultant Pragmantic SA SQL Server Denali : New administration features.

SMS Gateway OZEKI NG Document version: v Adding SMS functionality to SysAid.

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1 Preview of Oracle Database 12 c In-Memory Option Thomas Kyte

Russ Houberg Senior Technical Architect, MCM KnowledgeLake, Inc.

SAP on windows server 2012 hyper-v documentation

Microsoft ® SQL Server ® 2008 and SQL Server 2008 R2 Infrastructure Planning and Design Published: February 2009 Updated: January 2012.

1 © 2006 SolidWorks Corp. Confidential. Clustering  SQL can be used in “Cluster Pack” –A pack is a group of servers that operate together and share partitioned.

Windows Azure Tour Benjamin Day Benjamin Day Consulting, Inc.

SESSION CODE: BIE07-INT Eric Kraemer Senior Program Manager Microsoft Corporation.

Designing and Deploying a Scalable EPM Solution Ken Toole Platform Test Manager MS Project Microsoft.

1 | SharePoint Saturday Calgary – 31 MAY 2014 About Me.

Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.

How to kill SQL Server Performance Håkan Winther.

Optimizing SQL Server and Databases for large Fact Tables =tg= Thomas Grohser, NTT Data SQL Server MVP SQL Server Performance Engineering SQL Saturday.

SQL Server Internals 101 AYMAN SENIOR MICROSOFT.

Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server

DESIGNING HIGH PERFORMANCE ETL FOR DATA WAREHOUSE. Best Practices and approaches. Alexei Khalyako (SQLCAT) & Marcel Franke (pmOne)

Processing Temporal Telemetry Data -aka- Storing BigData in a Small Space =tg= Thomas H. Grohser, SQL Server MVP, Senior Director - Technical Solutions.

Sql Server Architecture for World Domination Tristan Wilson.

Ayman El-Ghazali Senior Microsoft.

Indexing strategies and good physical designs for performance tuning Kenneth Ureña /SpanishPASSVC.

From Disk to Memory It’s 2016 Folks!

Size Matters Not =tg= Thomas Grohser, NTT Data SQL Server MVP

Data Platform Modernization

Blue Collar SQL Tricks - Make Standard Edition Work for you.

Temporal Databases Microsoft SQL Server 2016

A Day in the Life of a Row Eddie Wuerch, mcm

Establishing a Service Level Agreement SLA

Optimizing SQL Server and Databases for large Fact Tables

Business Critical Application Platform

Temporal Databases Microsoft SQL Server 2016

Large-scale file systems and Map-Reduce

Establishing a Service Level Agreement SLA

Establishing a Service Level Agreement SLA

Planning an Effective Upgrade from SQL Server 2008

Azure Hybrid Use Benefit Overview

Optimizing SQL Server and Databases for large Fact Tables

Windows Azure Migrating SQL Server Workloads

Planning Data Warehouse Infrastructure

# - it’s not about social media it’s about temporary tables and data

# - it’s not about social media it’s about temporary tables and data

From SLA to HA/DR solution

Introduction to Networks

Introduction to SQL Server Management for the Non-DBA

Where I am at: Swagatika Sarangi MDM Lead PASS Summit SQL Saturdays

Business Critical Application Platform

Advanced Security Protecting Data from the DBA

Upgrading to Microsoft SQL Server 2014

Oracle Storage Performance Studies

Power BI Performance …Tips and Techniques.

Data Platform Modernization

Why most candidates fail the interview in the first five minutes

Service Level Agreement

Shaving of Microseconds

Why most candidates fail the interview in the first minute

Why most candidates fail the interview in the first five minutes

From SLA to HA/DR solution

Windows Azure Hybrid Architectures and Patterns

Dell EMC SQL Server Solutions Doug Bernhardt

Blue Collar SQL Tricks - Make Standard Edition Work for you.

=tg= Thomas Grohser Column Store.

=tg= Thomas Grohser SQL Saturday Philadelphia 2019 TSQL Functions 42.

Why most Candidates fail the Interview in the first five Minutes

Why most Candidates fail the Interview in the first five Minutes

42 TSQL Functions =tg= Thomas Grohser SQL Saturday

WORKSHOP Establish a Communication and Training Plan

Hybrid Buffer Pool The Good, the Bad and the Ugly

Visual Studio and SQL Server Data Tools

Presentation transcript:

Optimizing SQL Server and Databases for large Fact Tables =tg= Thomas Grohser, NTT Data SQL Server MVP SQL Server Performance Engineering SQL Saturday #518 June 4th 2016, Portland, Maine

select * from =tg= where topic = @@Version Remark SQL 4.21 First SQL Server ever used (1994) SQL 6.0 First Log Shipping with failover SQL 6.5 First SQL Server Cluster (NT4.0 + Wolfpack) SQL 7.0 2+ billion rows / month in a single Table SQL 2000 938 days with 100% availability SQL 2000 IA64 First SQL Server on Itanium IA64 SQL 2005 IA64 First OLTP long distance database mirroring SQL 2008 IA64 First Replication into mirrored databases SQL 2008R2 IA64 SQL 2008R2 x64 First 256 CPUs & >500.000 STMT/sec First Scale out > 1.000.000 STMT/sec First time 1.2+ trillion rows in a table SQL 2012 > 220.000 Transactions per second > 1.3 Trillion Rows in a table SQL 2014 > 400.000 Transactions per second Fully automated deploy and management SQL 2016 AlwaysOn Automatic HA and DR, crossed the PB in storage SQL vNext Can’t wait to push the limits even further =tg= Thomas Grohser, NTT DATA Senior Director Technical Solutions Architecture email: Thomas.grohser@nttdata.com / tg@grohser.com Focus on SQL Server Security, Performance Engineering, Infrastructure and Architecture New Papers coming 2016 Close Relationship with SQLCAT (SQL Server Customer Advisory Team) SCAN (SQL Server Customer Advisory Network) TAP (Technology Adoption Program) Product Teams in Redmond Active PASS member and PASS Summit Speaker 22 Years with SQL Server

NTT DATA Overview Why NTT DATA for MS Services: 20,000 professionals – Optimizing balanced global delivery $1.6B – Annual revenues with history of above-market growth Long-term relationships – >1,000 clients; mid-market to large enterprise Delivery excellence – Enabled by process maturity, tools and accelerators Flexible engagement – Spans consulting, staffing, managed services, outsourcing, and cloud Industry expertise – Driving depth in select industry verticals Why NTT DATA for MS Services: NTT DATA is a Microsoft Gold Certified Partner. We cover the entire MS Stack, from applications to infrastructure to the cloud Proven track record with 500+ MS solutions delivered in the past 20 years

Agenda Defining the issue/problem Looking at the tools Using the right tools Q&A ATTENTION: Important Information may be displayed on any slide at any time! ! Without Warning !

Definition of a large fact table Moving individual target over time 2001 for me big was > 1 billion rows > 90 GB 2011 for me big was > 1.3 trillion rows > 250 TB 2016 ??? 10 PB ???

Size matters not! Having the right tools in place and knowing how to use them to handle the data is the solution.

The Problem Trying to run 30 reports on a big fact table that each need to scan the whole table… The data is ready at 5am in the morning… Reports need to be ready by 9am… The baseline Each report takes about 2 hours to finish…

Good news for people with SA Tools Hardware (Server, Storage) SQL Server (Standard, (BI), Enterprise) Clever Configuration Clever Query Scheduling Good news for people with SA

Hardware “The grade of steel”

CPU is not the limit On a modern CPU each core can process about 500 MB/s How many cores do we have in commodity server? 4-22 cores (that’s 4 more since April 2016) 1-8 sockets That’s 4 to 176 cores or ~2 – ~88 GB per second or ~7 to ~300 TB per hour CPU Capacity is a rarely a bottle neck

Understanding how SQL scans data SQL Servers reads the data page by page SQL Server may perform read-ahead Dynamically adjusts read-ahead size by table Standard Edition: Up to 128 pages Enterprise Edition: Up to 512 pages That’s up to 1 MB (Std) or 4 MB (Ent) Read ahead as much as possible… Why? Reading 4 MB takes about as long as reading 8 KB So lets help SQL doing it.

Read Ahead happens if … The next data needed is in contiguous pages on the disk. Problem with 2 or more tables that grow at the same time.

Multiple Data Files 1-3-5-7-9-… 2-4-6-8-… 1-2-4-5-7-8-… 3-6-9-… 1-3 5 8-9 … 2-4 6-7 …

Multiple File Groups FG1 FG2 1-2-3-4-5-6-7-8-9-… 1-2-3-4-5-6-7-8-9-…

SQL Server Startup Options -E can be your friend if you have large tables -E allocates 64 extents at a time That is 4 MB at a time for each table instead of 64 KB The cost of it: every table is at least 4MB (including all the ones in tempdb!

Multiple Data Files Revisited

IO and Storage Path

Read speed factor – Direct Attached 1X RAID 5 0.25-4X 1X RAID 1 1-2X 2X RAID 5 0.5-2X

Read speed factor - SAN On SAN the paths to the array are most likely the limiting factor Ensure there are enough paths to the array Try disable read cache if possible (most of the time makes it faster) 1X 1X 1X 2X

Understand the path to the drives Cache Fiber Channel Ports Controllers/Processors Switch HBA RAID Cntr. SAN DAS SSD SSD NVRAM

IO Bottle necks Rotating Disks (10-160 MB/sec) ~ 0.1 GB/s Disk Interface / SSD (3-12 Gb/sec) ~ 1 GB/s RAID Controller (1-8 GB/sec) ~ 8 GB/s Ethernet (1 or 10 Gb/sec) ~ 1 GB/s Fiber Channel (2-16 Gb/sec) ~ 2 GB/s Host bus Adapter (2-32 Gb/sec) ~ 4 GB/s PCIe Express Bus (0.25-32 GB/sec) ~ 32 GB/s System (4-16 PCIe Busses) ~ 512 GB/s

Schema and Indexes

Choose the clustered index key wisely If you have a lot of queries that range scan WHERE value BETWEEN x AND y Multiple dates in a table (e.g. Order, Ship, Delivery date, …) Which to choose? None Put index on unique ID and have helper table Date DateType MinID MaxID

Table Partitioning Great tool to make maintaining the database easier but does not give us much in performance. Could actually slow us down. Might be needed to spread data across multiple File Groups

Row and Page Compression ROW compression Almost now overhead Can save several unused bytes in each row Remember: 1 byte less on 1 billion rows is 1 GB Page Compression Some overhead Can save a lot on repeating patterns (same values within a page) New data is not compressed ! Never compress lookup data

Mary Go Round Piggy Back Scan Query 1 Query 2 Enterprise Edition Only Automatically invoked With planning much better results

Column Store Index With SQL2016 finally fully usable (updateable without workarounds, can be the clustered index) ~40% faster then before Awesome compression ratios Even better results if a lot of queries only require a few columns of the fact table

THANK YOU! and may the force be with you… Questions? thomas.grohser@nttdata.com tg@grohser.com