Large Data Operations Joe Chang

Slides:



Advertisements
Similar presentations
Batches, Scripts, Transactions-SQL Server 7. A batch is a set of Transact-SQL statements that are interpreted together by SQL Server. They are submitted.
Advertisements

SQL Performance 2011/12 Joe Chang, SolidQ
Log Tuning. AOBD 2007/08 H. Galhardas Atomicity and Durability Every transaction either commits or aborts. It cannot change its mind Even in the face.
Slide: 1 Presentation Title Presentation Sub-Title Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL.
IELM 230: File Storage and Indexes Agenda: - Physical storage of data in Relational DB’s - Indexes and other means to speed Data access - Defining indexes.
Microsoft SQL Server Administration for SAP SQL Server Architecture.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Distributed storage for structured data
SQL Server Query Optimizer Cost Formulas Joe Chang
Parallel Execution Plans Joe Chang
SQL Server 2005 Performance Enhancements for Large Queries Joe Chang
1 Storage Refinement. Outline Disk failures To attack Intermittent failures To attack Media Decay and Write failure –Checksum To attack Disk crash –RAID.
A Guide to SQL, Eighth Edition Chapter Three Creating Tables.
Troubleshooting SQL Server Enterprise Geodatabase Performance Issues
Introduction to Databases Chapter 8: Improving Data Access.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Quantitative Performance Analysis Joe Chang
Module 3: Managing Database Files. Overview Introduction to Data Structures Creating Databases Managing Databases Placing Database Files and Logs Optimizing.
By Lecturer / Aisha Dawood 1.  You can control the number of dispatcher processes in the instance. Unlike the number of shared servers, the number of.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
TEMPDB Capacity Planning. Indexing Advantages – Increases performance – SQL server do not have to search all the rows. – Performance, Concurrency, Required.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Architecture Rajesh. Components of Database Engine.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Insert, Update & Delete Performance Joe Chang
Primary Key, Cluster Key & Identity Loop, Hash & Merge Joins Joe Chang
SQL Server Indexes Indexes. Overview Indexes are used to help speed search results in a database. A careful use of indexes can greatly improve search.
Triggers A Quick Reference and Summary BIT 275. Triggers SQL code permits you to access only one table for an INSERT, UPDATE, or DELETE statement. The.
1 Chapter 14 DML Tuning. 2 DML Performance Fundamentals DML Performance is affected by: – Efficiency of WHERE clause – Amount of index maintenance – Referential.
© 2008 Quest Software, Inc. ALL RIGHTS RESERVED. Perfmon and Profiler 101.
Parallel Execution Plans Joe Chang
Parallel Execution Plans Joe Chang
TPC-H Studies Joe Chang
Transactions and Locks A Quick Reference and Summary BIT 275.
T-SQL: Simple Changes That Go a Long Way DAVE ingeniousSQL.com linkedin.com/in/ingenioussql.
Query Optimizer Execution Plan Cost Model Joe Chang
1 Chapter 13 Parallel SQL. 2 Understanding Parallel SQL Enables a SQL statement to be: – Split into multiple threads – Each thread processed simultaneously.
Database Indexing 1 After this lecture, you should be able to:  Understand why we need database indexing.  Define indexes for your tables in MySQL. 
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Connect with life Nauzad Kapadia Quartz Systems
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Session 1 Module 1: Introduction to Data Integrity
MISSION CRITICAL COMPUTING Siebel Database Considerations.
1 Chapter 9 Tuning Table Access. 2 Overview Improve performance of access to single table Explain access methods – Full Table Scan – Index – Partition-level.
Ch 7. Working with relational data. Transactions Group of statements executed as a group. If all statements execute successfully, changes are committed.
Lock Tuning. Overview Data definition language (DDL) statements are considered harmful DDL is the language used to access and manipulate catalog or metadata.
Table Structures and Indexing. The concept of indexing If you were asked to search for the name “Adam Wilbert” in a phonebook, you would go directly to.
How to kill SQL Server Performance Håkan Winther.
Scott Fallen Sales Engineer, SQL Sentry Blog: scottfallen.blogspot.com.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
SAP Tuning 실무 SK㈜ ERP TFT.
Proactive Index Design Using QUBE Lauri Pietarinen Courtesy of Tapio Lahdenmäki November 2010 IDUG 2010.
Indexing strategies and good physical designs for performance tuning Kenneth Ureña /SpanishPASSVC.
SQL Server Performance Tuning Starter Kit Randolph West | Born SQL.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Scaling PostgreSQL with GridSQL. Who Am I? Jim Mlodgenski – Co-organizer of NYCPUG – Founder of Cirrus Technologies – Former Chief Architect of EnterpriseDB.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
SQL Server Statistics and its relationship with Query Optimizer
Understanding and Improving Server Performance
CS161 – Design and Architecture of Computer
Joe Chang yahoo . com qdpma.com
Chapter Overview Understanding the Database Architecture
Hustle and Bustle of SQL Pages
Real world In-Memory OLTP
Azure SQL DWH: Optimization
Shaving of Microseconds
Database systems Lecture 6 – Indexes
Statistics for beginners – In-Memory OLTP
SQL Server Query Optimizer Cost Formulas
Index Tuning Additional knowledge.
Presentation transcript:

Large Data Operations Joe Chang

Large Data Operations Overview Updates & Deletes Modifying large row counts can be very slow? Dropping indexes improves performance? Inserts – See SQLDev.Net Covered in various presentations by Gert Drapers

Execution Plan with Indexes 1. Insert multiple rows into table with clustered index 2. Rows are spooled 3. Nonclustered indexes are modified from the spooled data 123 Operations with indexes in place should be faster Exception - large inserts where bulk log requirements are met

Execution Plan Cost Formula Review Table Scan or Index Scan I/O: per page CPU: per row Index Seek – Plan Formula I/O Cost = per additional page(≤1GB) = per additional page(>1GB) CPU Cost = per additional row Bookmark Lookup I/O Cost = multiple of (≤1GB) = multiple of (>1GB) CPU Cost = per row Insert, Update & Delete IUD I/O Cost ~ – (>100 rows) IUD CPU Cost = per row

Plan Cost – Unit of Measure Time in seconds? CPU time? sec -> 160/sec >1350/sec (8KB) ->169/sec(64K)-> 10.8MB/sec S2K BOL: Administering SQL Server, Managing Servers, Setting Configuration Options: cost threshold for parallelism Opt Query cost refers to the estimated elapsed time, in seconds, required to execute a query on a specific hardware configuration. Too fast for 7200RPM disk random I/Os. About right for 1997 sequential disk transfer rate?

Test Table CREATE TABLE M3C_00 ( ID int NOT NULL, ID2 int NOT NULL, ID3 int NOT NULL, ID4 int NOT NULL, ID5 int NOT NULL, ID6 int NOT NULL, SeqID int NOT NULL, DistID int NOT NULL, Value char(10) NOT NULL, rDecimal decimal (9,4) NOT NULL, rMoney money NOT NULL, rDate datetime NOT NULL, sDate datetime NOT NULL ) CREATE CLUSTERED INDEX IX_M3C_00 ON M3C_00 (ID) WITH SORT_IN_TEMPDB 10M rows in table, 99 rows per page, 101,012 pages, 808MB 100K rows for each distinct value of SeqID and DistID Common SeqID values are in adjacent rows Common DistID values are in separate 8KB pages (100 rows apart)

Data Population Script int = = = = = = = BEGIN BEGIN TRANSACTION = BEGIN INSERT M3C_00 (ID,ID2,ID3,ID4,ID5,ID6,SeqID,DistID,Value,rDecimal,rMoney,rDate,sDate) CHAR(65+26*rand())+CHAR(65+26*rand())+CHAR(65+26*rand()) +CONVERT(char(6),CONVERT(int,100000*(9.0*rand()+1.0)))+CHAR(65+26*rand()), 10000*rand(), 10000*rand(), DATEADD(hour,100000*rand(),' '), ) END COMMIT TRANSACTION CHECKPOINT PRINT CONVERT(char,GETDATE(),121)+‘ row ' + Complete' END

Data Population Script Notes Double While Loop Each Insert/Update/Delete statement is an implicit transaction Gets separate transaction log entry Explicit transaction – generates a single transaction log write (max 64KB per IO) Single TRAN for entire loop requires excessively large log file Inserts are grouped into intermediate size batches

Indexes CREATE INDEX IX_M3C_01_Seq ON M3C_01 (SeqID) WITH SORT_IN_TEMPDB CHECKPOINT CREATE INDEX IX_M3C_01_Dist ON M3C_01 (DistID) WITH SORT_IN_TEMPDB CHECKPOINT UPDATE STATISTICS M3C_01 (IX_M3C_01_Seq) WITH FULLSCAN UPDATE STATISTICS M3C_01 (IX_M3C_01_Dist) WITH FULLSCAN Common SeqID values are in adjacent rows Common DistID values are in separate 8KB pages (100 rows apart)

Test Queries -- Sequential rows, table scan SELECT AVG(rMoney) FROM M3C_01 WHERE SeqID = Sequential rows, index seek and bookmark lookup SELECT AVG(rMoney) FROM M3C_01 WITH(INDEX(IX_M3C_01_Seq)) WHERE SeqID = Distributed rows, table scan SELECT AVG(rMoney) FROM M3C_01 WHERE DistID = Distributed rows, index seek and bookmark lookup SELECT AVG(rMoney) FROM M3C_01 WITH(INDEX(IX_M3C_01_Dist)) WHERE DistID = 91

Execution Plans - Select Table scan involves 101,012 pages Bookmark Lookup involves 100,000 rows 1 BL ~3.6X more expensive than 1 page in Table Scan

Table Scan Cost Detail Table Scan Formula I/O: x 101,012 = 74.8 CPU: x 10M = 11.0 I/O and CPU cost occasionally show ½ the expected value, but combined cost shows the expected value

Index and Bookmark Details Bookmark Lookup I/O: x100Kx0.998 = CPU: x100K = 0.11

Measured Query Times SELECT query 100K rows Sequential rows Distributed rows 256M Server memIndex + BLTable ScanIndex+BLTable Scan Query time (sec) Rows or Pages/sec333,333(R)9,620(P)599(R)9,620(P) Disk IO/secLow~1,200~600~1,200 Avg. Byte/ReadN/A64K8K64K 1154MB Server mem Query time Rows or Pages/sec376,00093,877268,00092,672 Test System: 2x2.4GHz Xeon, data on 2 15K disk drives

Disk Bound Select Query Cost Performance limited by disk capability Random 300/disk (small portion of 18GB drive & high queue depth) Sequential 38MB/sec (Seagate ST318451, first generation 15K drive) Disk drive random I/O ~2X gain since mid-1990’s Sequential I/O ~ 5X Cost formulas underestimate current generation disk drive sequential performance relative to random However, SQL Server cost formulas do not reflect in-memory costs

Update Operation

Update Details

Actual Cost - Update UPDATE query - 100K rows Sequential rows Distribute d rows 256M server memIndexTable Scan IndexTable Scan Query time (sec) Checkpoint time (sec) Rows /sec57,4717, , MB server mem Query time (sec) Checkpoint time (sec) Rows /sec100,00071,4294,1844,082

Update Variation Default plan is now a table scan Column value is not in the index, so a bookmark lookup is required However – data page must be loaded into buffer cache before it can be modified regardless!!

Delete Operation

Delete Details

Delete Details (2)

Delete - Actual Costs Delete query - 100K rows Sequential rows Distributed rows 256M Server memIndex Table Scan Index Table Scan Query time (sec) Checkpoint time (sec) Rows / sec7,5761, , MB Server mem Query time (sec) Checkpoint time (sec) Rows /sec12,8219,7083,0482,949

Delete–no indexes Delete query, no index 100K rowsSequential rowsDistributed rows 256M server memTable Scan Query time (sec) Checkpoint time (sec)0.14 Rows / sec8,6213, MB server mem Query time (sec) Checkpoint time (sec)0.222 Rows /sec47,6194,255

Delete with Foreign Keys

Summary When large updates and deletes are slow Examine the execute plan Look for nonclustered index seeks on modified tables with high row count Use index hint to force table scan

Additional Information SQL Server Quantitative Performance Analysis Server System Architecture Processor Performance Direct Connect Gigabit Networking Parallel Execution Plans Large Data Operations Transferring Statistics SQL Server Backup Performance with Imceda LiteSpeed