Big Data Working with Terabytes in SQL Server

Slides:



Advertisements
Similar presentations
Census Bureau DRIS Date: 01/16/ Index Data Modeling Data Modeling Current Datafile Current Datafile Current Dataload Current Dataload Data Overlook.
Advertisements

Chapter 1 Image Slides Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Extreme Performance with Oracle Data Warehousing
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Introduction to SQL Tuning Brown Bag Three essential concepts.
Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.
new database engine component fully integrated into SQL Server 2014 optimized for OLTP workloads accessing memory resident data achive improvements.
Data Management and Index Options for SQL Server Data Warehouses Atlanta MDF.
SQL SERVER 2012 XVELOCITY COLUMNSTORE INDEX Conor Cunningham Principal Architect SQL Server Engine.
Big Data Working with Terabytes in SQL Server Andrew Novick
Help! My table is getting too big! How to divide and conquer SQL Relay 2014.
1. Aim High with Oracle Real World Performance Andrew Holdsworth Director Real World Performance Group Server Technologies.
Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)
Database Optimization & Maintenance Tim Richard ECM Training Conference#dbwestECM Agenda SQL Configuration OnBase DB Planning Backups Integrity.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Making Data Warehouse Easy Conor Cunningham – Principal Architect Thomas Kejser – Principal PM.
Architecting a Large-Scale Data Warehouse with SQL Server 2005 Mark Morton Senior Technical Consultant IT Training Solutions DAT313.
Dual Partitioning for improved performance in VLDBs Ashwin Rao Karavadi, Rakesh Parida Microsoft IT.
Fast Track, Microsoft SQL Server 2008 Parallel Data Warehouse and Traditional Data Warehouse Design BI Best Practices and Tuning for Scaling SQL Server.
SQL Server 2008 Implementation and Maintenance Chapter 7: Performing Backups and Restores.
Exam QUESTION CertKiller.com has hired you as a database administrator for their network. Your duties include administering the SQL Server 2008.
Troubleshooting SQL Server Enterprise Geodatabase Performance Issues
Chapter 2: Designing Physical Storage MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design Study Guide (70-443)
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08.
Sofia, Bulgaria | 9-10 October SQL Server 2005 High Availability for developers Vladimir Tchalkov Crossroad Ltd. Vladimir Tchalkov Crossroad Ltd.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
1 Chapter 14 DML Tuning. 2 DML Performance Fundamentals DML Performance is affected by: – Efficiency of WHERE clause – Amount of index maintenance – Referential.
Partitioning Design For Performance and Maintainability Martin Cairns
IN-MEMORY OLTP By Manohar Punna SQL Server Geeks – Regional Mentor, Hyderabad Blogger, Speaker.
Srik Raghavan Principal Lead Program Manager Kevin Cox Principal Program Manager SESSION CODE: DAT206.
SQL Server 2005 – Table Partitioning Vinod Kumar Intel Technology India Pvt. Ltd. MVP – SQL Server
Praveen Srivatsa Director| AstrhaSoft Consulting blogs.asthrasoft.com/praveens |
Chapter 5 Index and Clustering
2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN Welcome November 2012 Columnstore Indexes.
Praveen Srivatsa Director| AstrhaSoft Consulting blogs.asthrasoft.com/praveens |
October 1-2 Ølensvåg. Hardcore SQL Session Code: SQL-401-Hardcore Speaker(s): Vidar Nordnes.
SQL Server 2005 – Table Partitioning Chad Gronbach Microsoft.
Strategies for Working with Texas-sized Databases Robert L Davis Database Engineer
Splits, Merges and Purges THE HOW TO OF TABLE PARTITIONING.
Database Administration for the Non-DBA Denny Cherry twitter.com/mrdenny.
Doing fast! Optimizing Query performance with ColumnStore Indexes in SQL Server 2012 Margarita Naumova | SQL Master Academy.
Indexing strategies and good physical designs for performance tuning Kenneth Ureña /SpanishPASSVC.
Introducing Hekaton The next step in SQL Server OLTP performance Mladen Prajdić
Use Cases for In-Memory OLTP Warner Chaves SQL MCM / MVP SQLTurbo.com Pythian.com.
Memory-Optimized Tables Querying at the speed of light.
Introduction to Partitioning in SQL Server
Practical Database Design and Tuning
Temporal Databases Microsoft SQL Server 2016
Flash Storage 101 Revolutionizing Databases
Temporal Databases Microsoft SQL Server 2016
Antonio Abalos Castillo
UFC #1433 In-Memory tables 2014 vs 2016
Very Large Databases in your future
Taking your application to memory
Introduction to SQL Server Management for the Non-DBA
The Ins and Outs of Partitioned Tables
Database Administration for the Non-DBA
Working with Very Large Tables Like a Pro in SQL Server 2014
Migrating a Disk-based Table to a Memory-optimized one in SQL Server
Real world In-Memory OLTP
SQL 2014 In-Memory OLTP What, Why, and How
20 Questions with Azure SQL Data Warehouse
Practical Database Design and Tuning
Very large Databases in your future Eric Peterson.
Table Partitioning Intro and make that a sliding window too!
Table Partitioning Intro and make that a sliding window too!
Table Partitioning Intro and make that a sliding window too!
Partition Switching Joe Tempel.
An Introduction to Partitioning
Presentation transcript:

Big Data Working with Terabytes in SQL Server Andrew Novick www.NovickSoftware.com

Agenda What’s Big? Concerns Architecture Solutions ETL/Load Performance Query Performance Backup/Restore Performance Architecture Solutions 2

Introduction Andrew Novick – Novick Software, Inc. Business Application Consulting SQL Server .Net www.NovickSoftware.com Books: Transact-SQL UDFs SQL 2000 XML Distilled

SQL Pass 2008 November 18-21 – Seattle

What’s big?

What’s Big? 100’s of gigabytes and up to 10’s of terabytes 100,000,000 rows an up to 100’s of Billions of rows

Big Scenarios Data Warehouse Very Large OLTP databases (usually with reporting functions)

Big Hardware Multi-core 8-64 RAM 16 GB to 256 GB SAN’s or direct attach RAID 64 Bit SQL Server

Concerns

What me worry?

Concerns Load Speed (ETL) Query Speed Data Management Backup / Restore DBCC CHECKDB, remove Fragmentation 12

What do we have to work with? Architecture What do we have to work with?

SQL Server Storage Architecture Table1 Table2 FileGroupA FileGroupB FileA1 FileB1 FileB2 Logical Disk System – Windows Drives Drive C: Drive D: Drive E: Physical IO - subsystem Disk

Solutions

Solution to what? Load Speed (ETL) Query Speed Data Management Backup / Restore DBCC CHECKDB, remove Fragmentation 16

Solutions Use Multiple FileGroups/Files Spread Data to maximize resource use Sliding Window if there is a time dimension Partitioned Tables and/or Views ETL – Insert into empty unindexed tables Use READ_ONLY FileGroups to minimize maintenance needs.

I/O Performance Little has changed in 50 years Watch out for bottlenecks in the I/O Path Memory reduces the need for I/O Disks can only do so many I/O operations per second The more disk heads you have the higher the I/O throughput.

At 3 PM on the 1st of the month: Where do you want your data to be? 19

Spread to as many disk resources as possible. 20

Sliding Window Always There Data Temporal Data 2008-01 Temporal Data 2008-02 Temporal Data 2008-03 Temporal Data 2008-04 Temporal Data 2008-05

Read_Only FileGroups Require only one Backup ALTER DATABASE <database> MODIFY FILEGROUP <filegroup> SET READ_ONLY Require only one Backup Don’t require page or row locks Don’t require maintenance The ALTER requires exclusive access to the database before SQL 2008

Concern - Load Performance (ETL) 4 Hour maximum window for any load Load into large indexed tables is unacceptably long. Example: 2 million row insert into 400 million row table with 10 indexes took 12 hours. 23

Concern – Query Performance Users have little patience Data warehouse Queries Frequent small to medium to support UI Less frequent large queries on fact tables may access 10’s of GB

Fact Table Queries Concentrated time period Most recent Year ago May go against full table to get year-against-year 25

Dimension Table Queries Smaller than fact table queries Sometimes involve millions of rows Frequent – support the UI

Partitioning 27

Partitioned Views Available in SQL Server Standard Created like any view Check constraints tell SQL Server which data is in which table CREATE VIEW Fact AS SELECT * FROM Fact_20080405 UNION ALL SELECT * FROM Fact_20080406 ALTER TABLE Fact_20080405 ADD CONSTRAINT CK_FACT_20080405_Date CHECK (FactDate >= ‘2008-04-05’ and FactDate < ‘2008-04-06’ 28

Partitioned View - 2 Looks to a query like any table or view Can take advantage of parallel execution. Limited to 256 tables Can cross servers (Performance Warning) SELECT FactDate, ….. FROM Fact WHERE CustID=334343 AND FactDate = ‘2008-04-05’

Partitioned View SQL Server Storage View Fact Table1 Table2 FileGroupA FileGroupB FGF1 FGF1 FGF2 FGF2 FGF3 FGF3 FGF4 FGF4 FileA1 FileB1 FileB2 F1 F1 F2 F2 F3 F3 F4 F4 Logical Disk System – Windows Drives Drive C: Drive D: Drive E: Physical IO - subsystem Disk

Partition Elimination The query compiler can eliminate partitions from consideration in the plan Partition elimination happens at query compile time. Values matching the partitioning column must be constants to allow partition elimination.

Demo 1 – Partitioned Views

Partitioned Tables SQL Server Enterprise SQL Server 2005 and Above Require a non-null partitioning column Check constraints tell SQL Server what data is in each parturition All tables are partitioned! 33

Partitioned Tables 2 Partition Function Partition Scheme Defines how to split data Partition Scheme Defines where to store each range of data CREATE Partitioned View Fact_PF(smalldatetime) RANGE RIGHT FOR VALUES (‘2001-07-01’, ‘2001-07-02’) CREATE PARTITION SCHEME Fact_PF AS PARTITION Fact_pf TO (PRIMARY, FG_20010701, FG_20010702)

Partitioned Table SQL Server Storage Table Fact Table1 Table2 Fact.$Partition=1 Fact.$Partitoin=3 Fact.$Partition=4 Fact.$Partition=2 FileGroupA FileGroupB FGF1 FGF2 FGF3 FGF4 FileA1 FileB1 FileB2 F1 F2 F3 F4 Logical Disk System – Windows Drives Drive C: Drive D: Drive E: Physical IO - subsystem Disk

Demo 2 – Partitioned Tables

Partitioning Goals Adequate Import Speed Maximize Query Performance Make use of all available resources Data Management Migrate data to cheaper resources Delete old data easily 37

Achieving Load Speed Insert into empty tables Index and add foreign keys after the insert Add the Slices to Partitioned Views Partitioned Tables 38

Achieving Query Speed Eliminate access to partitions during query compile All disk resources should be used Parallel access All available memory should be used All available CPUs should be used Parallel query 39

Solution Partition at a sufficiently high grain Spread dimension data to all useable disks Separate Data and Index FileGroups Multiple files per FileGroup Spread Fact data by partition key to all useable disks Rotate file locations to maximize dispersion 40

Concern – Data Management (Backup) Let’s say you have a 10 TB database. Now back that up.

Backup Calculation 10 TB = 10000 GB Typical Backup speed Low end 1 GB per minute High end 10 GB per minute At 10 GB/Minute Who’s got 1000 minutes?

Achieving Backup Performance Backup less! Maintain data in a READ_ONLY state Compress Backups

Partial Backup Partial Base Partial Differential Backs up read_write filegroups Partial Differential Differential backup of read_write filegroups BACKUP DATABASE <db name> READ_WRITE_FILEGROUPS ….. BACKUP DATABASE <db name> READ_WRITE_FILEGROUPS WITH DIFFERENTIAL ….

Maintenance Operations Maintain only READ_WRITE data DBCC CHECKFILEGROUP ALTER INDEX REBUILD PARTITION = REORGANIZE PARTITION = Avoid SHRINK

SQL Server 2008 – What’s New Row, page, and backup compression Filtered Indexes Optimization for star joins MERGE T-SQL DML Resource Governor Fewer operations require exclusive access to the database

New England Visual Basic Pro Focused on VB.Net development Meetings @ MS Waltham – MPR C 1st Thursday - 6:15 to 8:30 PM Sept 4 – Jim O’Neil – ASP.Net Dynamic Data Sept 25 – Chris Hammond – DotNetNuke Oct 2 – Kathleen Dollard – XML Litterals in VB 9 Nov 6 – Joe Stagner – Stupid Hacker Tricks and How 2 Defend Feb 5 ’09 – Joe Hill – Novell – Mono/VB/etc…. www.NEVB.com

Thanks for Coming Andrew Novick anovick@NovickSoftware.com www.NovickSoftware.com