Optimizing SQL Server and Databases for large Fact Tables

Slides:



Advertisements
Similar presentations
Big Data Working with Terabytes in SQL Server Andrew Novick
Advertisements

Engenio 7900 HPC Storage System. 2 LSI Confidential LSI In HPC LSI (Engenio Storage Group) has a rich, successful history of deploying storage solutions.
BARBARIN DAVID SQL Server Senior Consultant Pragmantic SA SQL Server Denali : New administration features.
SMS Gateway OZEKI NG Document version: v Adding SMS functionality to SysAid.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1 Preview of Oracle Database 12 c In-Memory Option Thomas Kyte
Russ Houberg Senior Technical Architect, MCM KnowledgeLake, Inc.
SAP on windows server 2012 hyper-v documentation
Microsoft ® SQL Server ® 2008 and SQL Server 2008 R2 Infrastructure Planning and Design Published: February 2009 Updated: January 2012.
1 © 2006 SolidWorks Corp. Confidential. Clustering  SQL can be used in “Cluster Pack” –A pack is a group of servers that operate together and share partitioned.
Windows Azure Tour Benjamin Day Benjamin Day Consulting, Inc.
SESSION CODE: BIE07-INT Eric Kraemer Senior Program Manager Microsoft Corporation.
Designing and Deploying a Scalable EPM Solution Ken Toole Platform Test Manager MS Project Microsoft.
1 | SharePoint Saturday Calgary – 31 MAY 2014 About Me.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
How to kill SQL Server Performance Håkan Winther.
Optimizing SQL Server and Databases for large Fact Tables =tg= Thomas Grohser, NTT Data SQL Server MVP SQL Server Performance Engineering SQL Saturday.
SQL Server Internals 101 AYMAN SENIOR MICROSOFT.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
DESIGNING HIGH PERFORMANCE ETL FOR DATA WAREHOUSE. Best Practices and approaches. Alexei Khalyako (SQLCAT) & Marcel Franke (pmOne)
Processing Temporal Telemetry Data -aka- Storing BigData in a Small Space =tg= Thomas H. Grohser, SQL Server MVP, Senior Director - Technical Solutions.
Sql Server Architecture for World Domination Tristan Wilson.
Ayman El-Ghazali Senior Microsoft.
Indexing strategies and good physical designs for performance tuning Kenneth Ureña /SpanishPASSVC.
From Disk to Memory It’s 2016 Folks!
Size Matters Not =tg= Thomas Grohser, NTT Data SQL Server MVP
Data Platform Modernization
Blue Collar SQL Tricks - Make Standard Edition Work for you.
Temporal Databases Microsoft SQL Server 2016
A Day in the Life of a Row Eddie Wuerch, mcm
Establishing a Service Level Agreement SLA
Optimizing SQL Server and Databases for large Fact Tables
Business Critical Application Platform
Temporal Databases Microsoft SQL Server 2016
Large-scale file systems and Map-Reduce
Establishing a Service Level Agreement SLA
Establishing a Service Level Agreement SLA
Planning an Effective Upgrade from SQL Server 2008
Azure Hybrid Use Benefit Overview
Optimizing SQL Server and Databases for large Fact Tables
Windows Azure Migrating SQL Server Workloads
Planning Data Warehouse Infrastructure
# - it’s not about social media it’s about temporary tables and data
# - it’s not about social media it’s about temporary tables and data
From SLA to HA/DR solution
Introduction to Networks
Introduction to SQL Server Management for the Non-DBA
Where I am at: Swagatika Sarangi MDM Lead PASS Summit SQL Saturdays
Business Critical Application Platform
Advanced Security Protecting Data from the DBA
Upgrading to Microsoft SQL Server 2014
Oracle Storage Performance Studies
Power BI Performance …Tips and Techniques.
Data Platform Modernization
Why most candidates fail the interview in the first five minutes
Service Level Agreement
Shaving of Microseconds
Why most candidates fail the interview in the first minute
Why most candidates fail the interview in the first five minutes
From SLA to HA/DR solution
Windows Azure Hybrid Architectures and Patterns
Dell EMC SQL Server Solutions Doug Bernhardt
Blue Collar SQL Tricks - Make Standard Edition Work for you.
=tg= Thomas Grohser Column Store.
=tg= Thomas Grohser SQL Saturday Philadelphia 2019 TSQL Functions 42.
Why most Candidates fail the Interview in the first five Minutes
Why most Candidates fail the Interview in the first five Minutes
42 TSQL Functions =tg= Thomas Grohser SQL Saturday
WORKSHOP Establish a Communication and Training Plan
Hybrid Buffer Pool The Good, the Bad and the Ugly
Visual Studio and SQL Server Data Tools
Presentation transcript:

Optimizing SQL Server and Databases for large Fact Tables =tg= Thomas Grohser, NTT Data SQL Server MVP SQL Server Performance Engineering SQL Saturday #518 June 4th 2016, Portland, Maine

select * from =tg= where topic = @@Version Remark SQL 4.21 First SQL Server ever used (1994) SQL 6.0 First Log Shipping with failover SQL 6.5 First SQL Server Cluster (NT4.0 + Wolfpack) SQL 7.0 2+ billion rows / month in a single Table SQL 2000 938 days with 100% availability SQL 2000 IA64 First SQL Server on Itanium IA64 SQL 2005 IA64 First OLTP long distance database mirroring SQL 2008 IA64 First Replication into mirrored databases SQL 2008R2 IA64 SQL 2008R2 x64 First 256 CPUs & >500.000 STMT/sec First Scale out > 1.000.000 STMT/sec First time 1.2+ trillion rows in a table SQL 2012 > 220.000 Transactions per second > 1.3 Trillion Rows in a table SQL 2014 > 400.000 Transactions per second Fully automated deploy and management SQL 2016 AlwaysOn Automatic HA and DR, crossed the PB in storage SQL vNext Can’t wait to push the limits even further =tg= Thomas Grohser, NTT DATA Senior Director Technical Solutions Architecture email: Thomas.grohser@nttdata.com / tg@grohser.com Focus on SQL Server Security, Performance Engineering, Infrastructure and Architecture New Papers coming 2016 Close Relationship with SQLCAT (SQL Server Customer Advisory Team) SCAN (SQL Server Customer Advisory Network) TAP (Technology Adoption Program) Product Teams in Redmond Active PASS member and PASS Summit Speaker 22 Years with SQL Server

NTT DATA Overview Why NTT DATA for MS Services: 20,000 professionals – Optimizing balanced global delivery $1.6B – Annual revenues with history of above-market growth Long-term relationships – >1,000 clients; mid-market to large enterprise Delivery excellence – Enabled by process maturity, tools and accelerators Flexible engagement – Spans consulting, staffing, managed services, outsourcing, and cloud Industry expertise – Driving depth in select industry verticals Why NTT DATA for MS Services: NTT DATA is a Microsoft Gold Certified Partner. We cover the entire MS Stack, from applications to infrastructure to the cloud Proven track record with 500+ MS solutions delivered in the past 20 years

Agenda Defining the issue/problem Looking at the tools Using the right tools Q&A ATTENTION: Important Information may be displayed on any slide at any time! ! Without Warning !

Definition of a large fact table Moving individual target over time 2001 for me big was > 1 billion rows > 90 GB 2011 for me big was > 1.3 trillion rows > 250 TB 2016 ??? 10 PB ???

Size matters not! Having the right tools in place and knowing how to use them to handle the data is the solution.

The Problem Trying to run 30 reports on a big fact table that each need to scan the whole table… The data is ready at 5am in the morning… Reports need to be ready by 9am… The baseline Each report takes about 2 hours to finish…

Good news for people with SA Tools Hardware (Server, Storage) SQL Server (Standard, (BI), Enterprise) Clever Configuration Clever Query Scheduling Good news for people with SA

Hardware “The grade of steel”

CPU is not the limit On a modern CPU each core can process about 500 MB/s How many cores do we have in commodity server? 4-22 cores (that’s 4 more since April 2016) 1-8 sockets That’s 4 to 176 cores or ~2 – ~88 GB per second or ~7 to ~300 TB per hour CPU Capacity is a rarely a bottle neck

Understanding how SQL scans data SQL Servers reads the data page by page SQL Server may perform read-ahead Dynamically adjusts read-ahead size by table Standard Edition: Up to 128 pages Enterprise Edition: Up to 512 pages That’s up to 1 MB (Std) or 4 MB (Ent) Read ahead as much as possible… Why? Reading 4 MB takes about as long as reading 8 KB So lets help SQL doing it.

Read Ahead happens if … The next data needed is in contiguous pages on the disk. Problem with 2 or more tables that grow at the same time.

Multiple Data Files 1-3-5-7-9-… 2-4-6-8-… 1-2-4-5-7-8-… 3-6-9-… 1-3 5 8-9 … 2-4 6-7 …

Multiple File Groups FG1 FG2 1-2-3-4-5-6-7-8-9-… 1-2-3-4-5-6-7-8-9-…

SQL Server Startup Options -E can be your friend if you have large tables -E allocates 64 extents at a time That is 4 MB at a time for each table instead of 64 KB The cost of it: every table is at least 4MB (including all the ones in tempdb!

Multiple Data Files Revisited

IO and Storage Path

Read speed factor – Direct Attached 1X RAID 5 0.25-4X 1X RAID 1 1-2X 2X RAID 5 0.5-2X

Read speed factor - SAN On SAN the paths to the array are most likely the limiting factor Ensure there are enough paths to the array Try disable read cache if possible (most of the time makes it faster) 1X 1X 1X 2X

Understand the path to the drives Cache Fiber Channel Ports Controllers/Processors Switch HBA RAID Cntr. SAN DAS SSD SSD NVRAM

IO Bottle necks Rotating Disks (10-160 MB/sec) ~ 0.1 GB/s Disk Interface / SSD (3-12 Gb/sec) ~ 1 GB/s RAID Controller (1-8 GB/sec) ~ 8 GB/s Ethernet (1 or 10 Gb/sec) ~ 1 GB/s Fiber Channel (2-16 Gb/sec) ~ 2 GB/s Host bus Adapter (2-32 Gb/sec) ~ 4 GB/s PCIe Express Bus (0.25-32 GB/sec) ~ 32 GB/s System (4-16 PCIe Busses) ~ 512 GB/s

Schema and Indexes

Choose the clustered index key wisely If you have a lot of queries that range scan WHERE value BETWEEN x AND y Multiple dates in a table (e.g. Order, Ship, Delivery date, …) Which to choose? None Put index on unique ID and have helper table Date DateType MinID MaxID

Table Partitioning Great tool to make maintaining the database easier but does not give us much in performance. Could actually slow us down. Might be needed to spread data across multiple File Groups

Row and Page Compression ROW compression Almost now overhead Can save several unused bytes in each row Remember: 1 byte less on 1 billion rows is 1 GB Page Compression Some overhead Can save a lot on repeating patterns (same values within a page) New data is not compressed ! Never compress lookup data

Mary Go Round Piggy Back Scan Query 1 Query 2 Enterprise Edition Only Automatically invoked With planning much better results

Column Store Index With SQL2016 finally fully usable (updateable without workarounds, can be the clustered index) ~40% faster then before Awesome compression ratios Even better results if a lot of queries only require a few columns of the fact table

THANK YOU! and may the force be with you… Questions? thomas.grohser@nttdata.com tg@grohser.com