Update Queries Deep Dive Conor Cunningham, Principal Software Architect, SQL QP Team, Microsoft.

Slides:



Advertisements
Similar presentations
Understanding SQL Server Query Execution Plans
Advertisements

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
new database engine component fully integrated into SQL Server 2014 optimized for OLTP workloads accessing memory resident data achive improvements.
Query Optimizer Overview
Big Data Working with Terabytes in SQL Server Andrew Novick
Concurrency Control Part 2 R&G - Chapter 17 The sequel was far better than the original! -- Nobody.
Manipulating Data Schedule: Timing Topic 60 minutes Lecture
Database Management Systems, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.
Time in Databases CSCI 6442 With thanks to Richard Snodgrass, 1985 ACM /85/005/0236.
Module 6 Implementing Table Structures in SQL Server ®2008 R2.
A HEAP OF CLUSTERS A look into heaps vs. clustered tables Ami Levin CTO, DBSophic X.
Introduction to Structured Query Language (SQL)
Introduction to Structured Query Language (SQL)
Fundamentals, Design, and Implementation, 9/e Chapter 11 Managing Databases with SQL Server 2000.
Designing for Performance Announcement: The 3-rd class test is coming up soon. Open book. It will cover the chapter on Design Theory of Relational Databases.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Introduction to Structured Query Language (SQL)
Architecting a Large-Scale Data Warehouse with SQL Server 2005 Mark Morton Senior Technical Consultant IT Training Solutions DAT313.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Akhila Kondai October 30, 2013.
Module 9: Managing Schema Objects. Overview Naming guidelines for identifiers in schema object definitions Storage and structure of schema objects Implementing.
Introduction. 
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor Ms. Arwa.
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
The Relational Model. Review Why use a DBMS? OS provides RAM and disk.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
TEMPDB Capacity Planning. Indexing Advantages – Increases performance – SQL server do not have to search all the rows. – Performance, Concurrency, Required.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
7 1 Chapter 7 Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
IN-MEMORY OLTP By Manohar Punna SQL Server Geeks – Regional Mentor, Hyderabad Blogger, Speaker.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Advanced SQL: Triggers & Assertions
Query Optimizer Execution Plan Cost Model Joe Chang
Praveen Srivatsa Director| AstrhaSoft Consulting blogs.asthrasoft.com/praveens |
CS4432: Database Systems II Query Processing- Part 2.
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 5 Index and Clustering
Session 1 Module 1: Introduction to Data Integrity
11-1 © Prentice Hall, 2004 Chapter 11: Physical Database Design Object-Oriented Systems Analysis and Design Joey F. George, Dinesh Batra, Joseph S. Valacich,
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
SQL Triggers, Functions & Stored Procedures Programming Operations.
Scott Fallen Sales Engineer, SQL Sentry Blog: scottfallen.blogspot.com.
Execution Plans Detail From Zero to Hero İsmail Adar.
SQL Basics Review Reviewing what we’ve learned so far…….
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
Agenda for Today  DATABASE Definition What is DBMS? Types Of Database Most Popular Primary Database  SQL Definition What is SQL Server? Versions Of SQL.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Introduction to Partitioning in SQL Server
Query Optimizer Overview
Module 11: File Structure
CS522 Advanced database Systems
Antonio Abalos Castillo
Database Management System
UFC #1433 In-Memory tables 2014 vs 2016
Query Processing CSD305 Advanced Databases.
Diving into Query Execution Plans
Chapter 11 Managing Databases with SQL Server 2000
Using Columnstore indexes in Azure DevOps Services. Lessons learned
Using Columnstore indexes in Azure DevOps Services. Lessons learned
All about Indexes Gail Shaw.
Presentation transcript:

Update Queries Deep Dive Conor Cunningham, Principal Software Architect, SQL QP Team, Microsoft

Who Is Conor? I’ve been at Microsoft for 13+ years building database engines (mostly Query Processors) Spent 1-2 years outside the company ▫Startups + Consulting I like to talk to customers to help improve our future product offerings I wrote the Optimizer chapter of the SQL 2008 Internals book I blog at “Conor vs. SQL” on all things query

What You Will Learn In This Talk How to read Insert/Update/Delete plans Why the Optimizer picks various Update plan shapes How the Architecture of the System support Update Queries This is a deep-dive, white-box discussion on the SQL QP Note: There is far more on this subject than one can learn in an hour. We’ll cover a lot, but don’t expect to be an expert after the talk on everything. Note2: Most of this talk is beyond what CSS will support if you call them

Agenda Basic Updates + Architecture Overview Halloween Protection Narrow vs. Wide Plans Split, Sort, Collapse FK validation Locking Considerations Indexed views Updating views (not necessarily indexed) Table/Index Partitioning

Vocabulary Inserts, Updates, Deletes, and Merge are all related I will say “Update” but usually mean all of the various data change commands (I/U/D/M) ▫The internal operators are all the same, so I will often just call it “Update”

Basic Updates (I/U/D/M) 1.Read rows 2.Compute new values 3.Update rows Every Update plan has this basic shape Everything in Updates starts with this template We do optimize a few cases down to a single operator (“Simple” Updates)

Logical Engine Architecture Queries are executed across multiple code components ▫Query Execution (QE), Storage Engine (SE), Buffer Pool (BP), Lock Manager (LM) When queries read rows from the SE, it locks rows Pages are read into the Buffer Pool and cached The SQL QP does not _directly_ control locking ▫The plan tells the SE the locking mode ▫The SE does lock management, escalation Uniqueness enforced in SE (UNIQUE, PK constraints) when a row is written

Index Physical Details Indexes have extra columns ▫Heaps have RIDs/Bookmarks (8 bytes) ▫Non-Unique Clustered Indexes have uniqueifiers (4b) ▫Unique CIs have no extra columns Indexes link to the Heap/Cls Idx using these Update plans maintain these extra columns for you. The QP uses extra columns to do bookmark lookup/fetch This means that when you update a clustering index key, the secondary indexes need to be updated as well. It means that rebuilding heaps have to rebuild secondary indexes since RIDs are physical locators Uniquifiers are assigned on row creation and are not changed during reorg/rebuild operations.

Halloween Problem + Protection (Originally found on Halloween) UPDATE SalaryTable SET Salary = Salary * 1.1 WHERE Salary < 25000; Expectation: Each row updated once Actual (in this case): Every salary was multiplied by 1.1 until all were over Problem – while scanning and updating the rows, rows were moving ahead in the scan order and being seen again (and again) Solution – “Phase separation”. In SQL Server, this means Spooling. This is also called “Halloween Protection” We have fancy logic to determine when we need phase separation (actually when we can skip it)

HP Example On an empty table, the plan was a table scan with no HP (why is this legal?) When I added enough rows, it the plan was a Seek + Spool (why did it do this?) Bottom line – some spools are needed for correctness Bonus question – does one need this spool in snapshot isolation?

Narrow Plans, Wide Plans Per-Row (narrow) vs. Per-Index (wide) Updates that touch lots of rows tend to use wide update plans ▫Sequential IO is cheaper than Random IO ▫but there is a greater cost to batch/sort/spool Some functional logic requires wide updates ▫Indexed Views, Query Notifications

Narrow (Row at a Time) Plans Narrow plans take 1 row, update all indexes, THEN go to the next row You can see this in the SSMS Properties page ▫Look at the object list

Wide Update Example Accesses an index at a time Common Subexpression Spools let us save off the set of changes and re-read them Split, Spool, then Sort/Collapse per Index Pattern: Update Heap/Clustered Idx, then other access paths Engineering Limitation – some schema constructs ONLY work with wide update plans (Indexed Views, Query Notification) Note: I cut out ComputeScala rs Write to Spool Read Spool

Split/Sort/Collapse It is possible to get phantom(false) UNIQUE/PRIMARY key violations If the Storage Engine enforces uniqueness, the order in which we apply changes can cause the SE to error if the plan updates row 1 from v1 to v2 before updating row 2 from v2 to v3. Example: Update T set col = col+1

Split/Sort/Collapse Contintued The “Action” column controls Insert, Update, Delete Sort is on (index key, operation) Update12 Update 23 Update34 … Delete1 Insert2 Delete2 Insert3 Delete3 Insert4 Delete1 Delete2 Insert2 Delete3 Insert3 Insert4 Delete1 Update22 Update33 Insert4 Split Sort Collapse

Referential Integrity (Foreign Keys) Implemented as Semi-Join + Assert Located AFTER the Update Change We impose restrictions on the FK topology to be a “tree” Note: Updates surface the “after” image of a row Bonus: WITH OUTPUTS uses this stream as well

Locking Considerations Locks happen implicitly by the plan shape. We have heuristics on locking within a plan We can still have deadlocks, however. Solutions: ▫Read committed snapshot (optimistic concurrency) ▫Query hints (examine both shapes)

Indexed View Updates – Delta Algebra Update plans maintain IVs like secondary indexes IVs have restrictions ▫Only plan shapes with efficient maintenance This is an “Update” example with a Group By ▫SUM is commutative! (other agg fns() may not be) Key ideas: ▫Start with View Definition tree ▫Replace table in view graph with “delta” table ▫Modify other operators in view to maintain Group By (A := SUM(col1)) MyTable Group By’ Expr:=F( …) ∆MyTable Original View Delta View Maintenance Plan Update view Collapse Sort Compute Scalar (Compute New Agg ) Left Outer Join (Orig.Gbcols = Delta.Gbcols) Delta View Orig. View

Indexed View Example Every IV will have similar logic in Showplan Original View Delta View Insert Delete Update

UPDATE gets translated into an update against a base table This then becomes an almost regular update case There is logic in some views to guarantee changes fit the view restrictions (with check option) Same algebra as in indexed views

Table/Index Partitioning Plan shapes differ in 2005 vs ▫We continued to improve on the initial implementation to make it faster/use less memory Partitioning adds the notion of finding the target partition and moving rows between partitions Conceptually, the partition id is part of the key for Updates ▫Split/Sort/Collapse works to understand deletes and inserts across partition boundaries Most plans will look the same, but there are a few plan shapes that are “per-partition” plans Note: Storage Engine rowsets are not re-entrant for updates. Partitions are each separate rowsets

Partitioning Example (non-collocated) Partitioning causes different update plan behavior ▫Usually, just add the partitioning keys to the base plan shapes ▫Updates compute the partitioning function to find/update the rows Differences ▫ComputeScalar computes the target partition id ▫Split/Sort/Collapse uses ptn function (to move rows to new ptn) ▫Update operator is marked as “partitioned” (and switches partitions)

THANK YOU Thanks for attending the session Questions?