SingingSQL Presents: Fixing Broken SQL March, 2006

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Natural Data Clustering: Why Nested Loops Win So Often May, 2008 ©2008 Dan Tow, All rights reserved SingingSQL.
EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Introduction to SQL Session 2 Retrieving Data From Multiple Tables.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Getting SQL Right the First Try (Most of the Time!) May, 2008 ©2007 Dan Tow, All rights reserved SingingSQL Presents.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
Star Transformations Tony Hasler, UKOUG Birmingham 2012 Tony Hasler, Anvil Computer Services Ltd.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Oracle tuning: a tutorial Saikat Chakraborty. Introduction In this session we will try to learn how to write optimized SQL statements in Oracle 8i We.
IS 230Lecture 6Slide 1 Lecture 7 Advanced SQL Introduction to Database Systems IS 230 This is the instructor’s notes and student has to read the textbook.
Chapter 5 Index and Clustering
Session 1 Module 1: Introduction to Data Integrity
Query Processing – Implementing Set Operations and Joins Chap. 19.
Thinking in Sets and SQL Query Logical Processing.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Database Constraints ICT 011. Database Constraints Database constraints are restrictions on the contents of the database or on database operations Database.
Database Constraints Ashima Wadhwa. Database Constraints Database constraints are restrictions on the contents of the database or on database operations.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
The PostgreSQL Query Planner Robert Haas PostgreSQL East 2010.
Tuning Oracle SQL The Basics of Efficient SQL Common Sense Indexing
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Introduction to Databases (2)
Tuning Transact-SQL Queries
Cleveland SQL Saturday Catch-All or Sometimes Queries
Module 11: File Structure
Top 50 SQL Interview Questions & Answers
Physical Changes That Don’t Change the Logical Design
PL/SQL LANGUAGE MULITPLE CHOICE QUESTION SET-1
Database Normalization
Applied CyberInfrastructure Concepts Fall 2017
Displaying Data from Multiple Tables
Using the Set Operators
Displaying Data from Multiple Tables
Using Subqueries to Solve Queries
CIS 336 Competitive Success/snaptutorial.com
CIS 336 Education for Service-- snaptutorial.com.
CIS 336 Teaching Effectively-- snaptutorial.com
Writing Correlated Subqueries
Introduction to Execution Plans
Decoding the Cardinality Estimator to Speed Up Queries
From 4 Minutes to 8 Seconds in an Hour (or a bit less)
Using the Set Operators
Top Tips for Better TSQL Stored Procedures
Chapter 4 Indexes.
CH 4 Indexes.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
CH 4 Indexes.
Using Subqueries to Solve Queries
What we’ll be doing Back to Basics Read Consistency Write Consistency.
The Relational Algebra
Lecture 3 Finishing SQL
Introduction to Execution Plans
Contents Preface I Introduction Lesson Objectives I-2
Relational Database Design
Chapter 8 Database Redesign
Using Subqueries to Solve Queries
Diving into Query Execution Plans
Displaying Data from Multiple Tables
Using Subqueries to Solve Queries
Using the Set Operators
Introduction to Execution Plans
Displaying Data from Multiple Tables
SQL Performance for DBAs
Introduction to Execution Plans
CSE 326: Data Structures Lecture #14
Query Transformations
Query Tuning.
Presentation transcript:

SingingSQL Presents: Fixing Broken SQL March, 2006 ©2006 Dan Tow, All Rights Reserved dantow@singingsql.com www.singingsql.com

Overview Introduction – the Game of Fixing Broken SQL SQL That Hits Duplicate Rows SQL That Returns Unexpected Results SQL That Doesn’t Grow with the Data

Introduction – Game of Fixing Broken SQL Think of tuning as a game: based on my own experiences over more than the past decade: A SQL tuning specialist can tune much more SQL than he or she can originate – tune other people’s SQL! You’ll know far less about the application than the original SQL coder – they know the purpose best! You could just focus on making individual queries faster – just be a better optimizer than the CBO.

Introduction – Game of Fixing Broken SQL Don’t stop with just a performance-only fix! Just changing the performance of the SQL will often fail to deliver the best performance fix! – the best performance fix is often combined with a functional fix! Even without knowing the purpose of the SQL code, you can often deliver a functional fix that is independent of the performance fix!

Introduction – Game of Fixing Broken SQL If the SQL coder knows the code’s purpose best, how can you find a functional fix? As a SQL tuning specialist, you see much more SQL than an average coder – you will know SQL better than the average coder! There are recurring patterns that frequently point to functionally broken SQL, and you can recognize these patterns even without knowing the application, without knowing the detailed purpose of the SQL!

What’s wrong with this query? Problem 1 What’s wrong with this query? SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID;

Note that Order_Types is never referenced! Answer 1 Note that Order_Types is never referenced! There is no join to T!! SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID;

Problem 1.1 What happens here? SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID;

Answer 1.1 First, the database will get a Cartesian Product of all the rows in Order_Types and all the rows returned by the query you want (the query in black). Then, the database will perform a sort-unique operation to discard all the duplicates generated by that Cartesian product. SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID;

Answer 1.1(cont.) Functionally, the query result is probably fine! (When is it not?!) The physical I/O is probably fine, since any extra reads will be to the same blocks, redundantly. This will actually improve your cache hit ratio (while harming your performance!). The logical I/O may (or may not) be increased significantly, depending on the execution plan, but the CPU usage will almost certainly be increased significantly. (Looking for SQL with excessive logical I/O won’t always find these problems!) Without the DISTINCT, we lose an implicit sort. We might need an “ORDER BY” to restore that sort. SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID;

Fixed Query 1, Conservative Fix Note the added “ORDER BY” SELECT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID ORDER BY O.Order_ID, L.Line_ID;

Overview Introduction – the Game of Fixing Broken SQL SQL That Hits Duplicate Rows SQL That Returns Unexpected Results SQL That Doesn’t Grow with the Data

What’s wrong with this query? Problem 2 What’s wrong with this query? SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID;

Answer 2 Maybe, nothing is wrong with this query. Possibly, the DISTINCT is just a back-handed way to get a necessary sort. Possibly, the sort is unnecessary and is a (probably mild) waste of time. However, the DISTINCT may be a red flag pointing to a hidden many-to-many join. SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID;

Problem 2.1 SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Quantity, I.Name FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID; .1 SELECT STATEMENT ..2 SORT UNIQUE ...3 NESTED LOOPS ....4 NESTED LOOPS .....5 NESTED LOOPS ......6 TABLE ACCESS BY INDEX ROWID 4*CUSTOMERS .......7 INDEX UNIQUE SCAN CUSTOMERS_U1: customer_id ......6 TABLE ACCESS BY INDEX ROWID 1*ORDERS .......7 INDEX RANGE SCAN ORDERS_N4: customer_id .....5 TABLE ACCESS BY INDEX ROWID 2*ORDER_LINES ......6 INDEX RANGE SCAN ORDER_LINES_N1: order_id ....4 TABLE ACCESS BY INDEX ROWID 3*ITEMS .....5 INDEX RANGE SCAN ITEMS_U1: item_id,org_id

Clues 2.1 AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID; SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Quantity, I.Name FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID; .1 SELECT STATEMENT ..2 SORT UNIQUE ...3 NESTED LOOPS ....4 NESTED LOOPS .....5 NESTED LOOPS ......6 TABLE ACCESS BY INDEX ROWID 4*CUSTOMERS .......7 INDEX UNIQUE SCAN CUSTOMERS_U1: customer_id ......6 TABLE ACCESS BY INDEX ROWID 1*ORDERS .......7 INDEX RANGE SCAN ORDERS_N4: customer_id .....5 TABLE ACCESS BY INDEX ROWID 2*ORDER_LINES ......6 INDEX RANGE SCAN ORDER_LINES_N1: order_id ....4 TABLE ACCESS BY INDEX ROWID 3*ITEMS .....5 INDEX RANGE SCAN ITEMS_U1: item_id,org_id

Answer 2.1 SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Quantity, I.Name FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID; .1 SELECT STATEMENT ..2 SORT UNIQUE ...3 NESTED LOOPS ....4 NESTED LOOPS .....5 NESTED LOOPS ......6 TABLE ACCESS BY INDEX ROWID 4*CUSTOMERS .......7 INDEX UNIQUE SCAN CUSTOMERS_U1: customer_id ......6 TABLE ACCESS BY INDEX ROWID 1*ORDERS .......7 INDEX RANGE SCAN ORDERS_N4: customer_id .....5 TABLE ACCESS BY INDEX ROWID 2*ORDER_LINES ......6 INDEX RANGE SCAN ORDER_LINES_N1: order_id ....4 TABLE ACCESS BY INDEX ROWID 3*ITEMS .....5 INDEX RANGE SCAN ITEMS_U1: item_id,org_id WHERE clause is missing condition on I.Org_ID that would make the join to Items unique.

Issues with Problem 2 SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Quantity, I.Name FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID; Observations regarding the non-unique join to Items: If every Item sharing the same Item_ID also shares the same Name, then this query will return one row per Order, thanks to the DISTINCT. If different Orgs might name an Item differently, then the query is almost certainly wrong, functionally. The inner join to I.Item_ID might succeed where an inner join to I.Item_ID and to I.Org_ID might fail, causing a subtle functional difference between the two alternatives. The query implies a design question: should there be an Items table keyed on Item_ID, alone, and an Items_Orgs table keyed on Item_ID, Org_ID? What are the odds that the original developers thought all this through and got it right?!? What are the odds that the corner cases were well tested?!?

Fixed Query 2, Alternative Fixes SELECT O.Order_ID, O.Order_Date, C.Name, L.Quantity, I.Name FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND I.Org_ID = :B2 AND C.Customer_ID = O.Customer_ID ORDER BY O.Order_ID, L.Quantity, I.Name; AND I.Org_ID = L.Shipping_Org_ID

Many-to-Many Joins Many-to-many joins are almost always wrong! Many-to-many joins come from a combination of SQL errors and database-design errors. DISTINCT is a common band-aid that covers up most, but usually not all of the row errors that result from erroneous many-to-many joins! (I do not mean by this that the resulting SQL is correct!) The underlying problem calls for more than a partial, band-aid fix!

A Modest Proposal Oracle should have a parameter, settable at the session level especially for testing, that triggers an error any time the session executes SQL with a many-to-many join! More-sophisticated uniqueness constraints (for example, uniqueness of a column combination just for a particular subset of a table) would help, here.

What’s wrong with this query? Problem 3 What’s wrong with this query? SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name Customer, L.Quantity Qty, I.Name Item FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND I.Org_ID = :B2 AND C.Customer_ID = O.Customer_ID;

What if the second line had QTY=1 ?!? Problem 3 Consider a possible result: 2 lines, shipped to two addresses: SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name Customer, L.Quantity Qty, I.Name Item FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND I.Org_ID = :B2 AND C.Customer_ID = O.Customer_ID; ORDER_ID ORDER_DATE CUSTOMER QTY ITEM -------- ---------- ---------- --- ---------------- 38485 07-MAR-06 C. Millsap 1 SQL For Dummies 38485 07-MAR-06 C. Millsap 2 SQL For Dummies What if the second line had QTY=1 ?!?

Does this accurately reflect the shipped order ?!? Problem 3 Consider a possible result: 2 lines, shipped to two addresses, same Qty, Same Item: SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name Customer, L.Quantity Qty, I.Name Item FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND I.Org_ID = :B2 AND C.Customer_ID = O.Customer_ID; ORDER_ID ORDER_DATE CUSTOMER QTY ITEM -------- ---------- ---------- --- ---------------- 38485 07-MAR-06 C. Millsap 1 SQL For Dummies Does this accurately reflect the shipped order ?!? Get rid of the DISTINCT!

What’s wrong with this query? Problem 4 What’s wrong with this query? SELECT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID GROUP BY O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name;

Answer 4 This is just Problem 1 in disguise: the GROUP BY just performs a back-handed DISTINCT operation! SELECT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID GROUP BY O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name;

Problem 5 Is there a difference? SELECT … FROM … WHERE … AND :B1 = 1 UNION SELECT … FROM … WHERE … AND :B1 = 2; Versus: UNION ALL SELECT … FROM … WHERE …

Problem 5 SELECT … FROM … WHERE … AND :B1 = 1 UNION SELECT … FROM … WHERE … AND :B1 = 2; We normally think of UNION as an operation that discards duplicates between different SELECT blocks, but the above case never has such duplicates. However, UNION also discards duplicates (most likely unintentionally!) within each SELECT block! UNION acts like DISTINCT, here!

Unnecessary Sort-Unique Operations DISTINCT is more often wrong than right, frequently (imperfectly!) patching a deeper, underlying problem. DISTINCT is not the only way to get a bad sort-unique step, though! UNION can trigger an incorrect sort-unique step: remember that it does not just discard duplicates in different SELECT blocks!. GROUP BY can trigger an incorrect sort-unique step, too.

Overview Introduction – the Game of Fixing Broken SQL SQL That Hits Duplicate Rows SQL That Returns Unexpected Results SQL That Doesn’t Grow with the Data

SQL That Returns Unexpected Results Fun with Outer Joins View-Using SQL

Fun with Outer Joins Ever heard something like this? “No wonder this application has so many performance problems – just look at how many outer joins are scattered all through the SQL!”

Fun with Outer Joins Ever heard something like this? “No wonder this application has so many performance problems – just look at how many outer joins are scattered all through the SQL!” The “problem of outer joins” is largely a myth! I have never seen a query where an outer join was the true root cause of a query’s poor performance! Correct outer joins perform as well as inner joins! However, outer joins are subtle and easy to mis-code.

What’s wrong with this query? Problem 6 What’s wrong with this query? SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID(+) = L.Item_ID AND NVL(I.Org_ID,:B2) = :B2 AND C.Customer_ID = O.Customer_ID;

Problem 6 SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID(+) = L.Item_ID AND NVL(I.Org_ID,:B2) = :B2 AND C.Customer_ID = O.Customer_ID; It’s functionally fine in the case where there is a matching item having the correct Org_ID (even if there are also matching items with other Org_IDs). It’s functionally fine when there are no matching Items at all.

Problem 6 SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID(+) = L.Item_ID AND NVL(I.Org_ID,:B2) = :B2 AND C.Customer_ID = O.Customer_ID; It is almost certainly wrong in the (possibly very rare) case where there are matching Items, but they all have the wrong Org_IDs!

Problem 6 SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID(+) = L.Item_ID AND NVL(I.Org_ID,:B2) = :B2 AND C.Customer_ID = O.Customer_ID; It performs badly because it joins to more Items than necessary!

Fixed Query 6 SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID(+) = L.Item_ID AND I.Org_ID(+) = :B2 AND C.Customer_ID = O.Customer_ID; Now, the outer join is clean: order lines are never discarded, and performance is better!

What’s wrong with this query? Problem 7 What’s wrong with this query? SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item, IT.Name Type FROM Orders O, Order_Lines L, Items I, Customers C, Item_Types IT WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID(+) = L.Item_ID AND I.Org_ID(+) = :B2 AND C.Customer_ID = O.Customer_ID AND I.Item_Type_ID = IT.Item_Type_ID;

Problem 7 SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item, IT.Name Type FROM Orders O, Order_Lines L, Items I, Customers C, Item_Types IT WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID(+) = L.Item_ID AND I.Org_ID(+) = :B2 AND C.Customer_ID = O.Customer_ID AND I.Item_Type_ID = IT.Item_Type_ID; The inner join from I to IT discards the outer case of the outer join from L to I. The “(+)”s could be removed without changing the functional result, and this would be clearer, if this is the desired result. If you don’t want all inner joins, make I to IT outer.

First Fix 7 SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item, IT.Name Type FROM Orders O, Order_Lines L, Items I, Customers C, Item_Types IT WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND I.Org_ID = :B2 AND C.Customer_ID = O.Customer_ID AND I.Item_Type_ID = IT.Item_Type_ID; The result is now consistent with the original functionality, but is clearer to understand!

Alternate Fix 7 SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item, IT.Name Type FROM Orders O, Order_Lines L, Items I, Customers C, Item_Types IT WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID(+) = L.Item_ID AND I.Org_ID(+) = :B2 AND C.Customer_ID = O.Customer_ID AND I.Item_Type_ID = IT.Item_Type_ID(+); The result is now consistent with the intent implied by the original outer join from L to I!

What’s wrong with this query? Problem 8 What’s wrong with this query? SELECT C.Last_Name, C.First_Name, CE.Event_Date, Ev.Name, EM.Last_Name, Em.First_Name FROM Customers C, Customer_Events CE, Events Ev, Employees Em WHERE C.Customer_ID = :B1 AND C.Customer_ID = CE.Customer_ID AND CE.Event_ID = Ev.Event_ID AND CE.Recorded_By = Em.Employee_ID ORDER BY CE.Event_Date;

View-Using SQL SELECT C.Last_Name, C.First_Name, CE.Event_Date, Ev.Name, Em.Last_Name , Em.First_Name FROM Customers C, Customer_Events CE, Events Ev, Employees Em WHERE C.Customer_ID = :B1 AND C.Customer_ID = CE.Customer_ID AND CE.Event_ID = Ev.Event_ID AND CE.Recorded_By = Em.Employee_ID ORDER BY CE.Event_Date; The query seems fine, but you happen to notice that there is no “Employees” table mentioned in the execution plan! You look up the Employees definition:

View-Using SQL Does this view create a problem in the context of the view-using SQL in the last slide? CREATE VIEW Employees AS SELECT P.Person_ID Employee_ID, P.Last_Name, P.First_Name, D.Name Department_Name, M.Full_Name Manager_Name FROM Persons P, Persons M, Person_Roles R, Departments D WHERE P.Mgr_ID = M.Person_ID(+) AND P.Department_ID = D.Department_ID(+) AND P.Person_ID = R.Person_ID AND R.Role_Code = 'EMP' AND SYSDATE BETWEEN R.Start_Date AND NVL(R.END_DATE,SYSDATE);

First Fix, Problem 8 Don’t lose records just because they were recorded by someone who is no longer an employee! SELECT C.Last_Name, C.First_Name, CE.Event_Date, Ev.Name, EM.Last_Name, Em.First_Name FROM Customers C, Customer_Events CE, Events Ev, Employees Em WHERE C.Customer_ID = :B1 AND C.Customer_ID = CE.Customer_ID AND CE.Event_ID = Ev.Event_ID AND CE.Recorded_By = Em.Employee_ID(+) ORDER BY CE.Event_Date;

Second Fix, Problem 8 Don’t null out recorded-by names just because they were recorded by someone who is no longer an employee! This performs better! Are we done? SELECT C.Last_Name, C.First_Name, CE.Event_Date, Ev.Name, EM.Last_Name, Em.First_Name FROM Customers C, Customer_Events CE, Events Ev, Persons Em WHERE C.Customer_ID = :B1 AND C.Customer_ID = CE.Customer_ID AND CE.Event_ID = Ev.Event_ID AND CE.Recorded_By = Em.Person_ID(+) ORDER BY CE.Event_Date;

Problem 8 (Cont.) Are we done? SELECT C.Last_Name, C.First_Name, CE.Event_Date, Ev.Name, EM.Last_Name, Em.First_Name FROM Customers C, Customer_Events CE, Events Ev, Persons Em WHERE C.Customer_ID = :B1 AND C.Customer_ID = CE.Customer_ID AND CE.Event_ID = Ev.Event_ID AND CE.Recorded_By = Em.Person_ID(+) ORDER BY CE.Event_Date;

Third Fix, Problem 8 Now we are done! SELECT C.Last_Name, C.First_Name, CE.Event_Date, Ev.Name, EM.Last_Name, Em.First_Name FROM Persons C, Customer_Events CE, Events Ev, Persons Em WHERE C.Person_ID = :B1 AND C.Person_ID = CE.Customer_ID AND CE.Event_ID = Ev.Event_ID AND CE.Recorded_By = Em.Person_ID(+) ORDER BY CE.Event_Date;

Overview Introduction – the Game of Fixing Broken SQL SQL That Hits Duplicate Rows SQL That Returns Unexpected Results SQL That Doesn’t Grow with the Data

What’s wrong with this query? Problem 9 What’s wrong with this query? SELECT O.Order_ID, O.Due_Date, E.First_Notification_Date, L.Item_ID FROM Orders O, Order_Lines L, Order_Events E, Order_Line_Events LE WHERE O.Customer_ID = :B1 AND O.Order_ID = L.Order_ID AND O.Order_ID = E.Order_ID AND E.Event_Type = 'LATE_SHIPMENT' AND L.Line_ID = LE.Line_ID ORDER BY O.Due_Date;

Problem 9 SELECT O.Order_ID, O.Due_Date, E.First_Notification_Date, L.Item_ID FROM Orders O, Order_Lines L, Order_Events E, Order_Line_Events LE WHERE O.Customer_ID = :B1 AND O.Order_ID = L.Order_ID AND O.Order_ID = E.Order_ID AND E.Event_Type = 'LATE_SHIPMENT' AND L.Line_ID = LE.Line_ID ORDER BY O.Due_Date; Note Order_Events and Order_Line_Events, with line events presumably serving as details for the order events.

Problem 9 SELECT O.Order_ID, O.Due_Date, E.First_Notification_Date, L.Item_ID FROM Orders O, Order_Lines L, Order_Events E, Order_Line_Events LE WHERE O.Customer_ID = :B1 AND O.Order_ID = L.Order_ID AND O.Order_ID = E.Order_ID AND E.Event_Type = 'LATE_SHIPMENT' AND L.Line_ID = LE.Line_ID ORDER BY O.Due_Date; Order_Events and Order_Line_Events both join to the same order, in any row this query returns, but do they pertain to the same event?!? This query implies significant built-in assumptions:

Problem 9 SELECT O.Order_ID, O.Due_Date, E.First_Notification_Date, L.Item_ID FROM Orders O, Order_Lines L, Order_Events E, Order_Line_Events LE WHERE O.Customer_ID = :B1 AND O.Order_ID = L.Order_ID AND O.Order_ID = E.Order_ID AND E.Event_Type = 'LATE_SHIPMENT' AND L.Line_ID = LE.Line_ID ORDER BY O.Due_Date; Assumptions (These may be valid, today, but tomorrow?): There is at most one event of type 'LATE_SHIPMENT' per order. There are no line event details for any event types other than the 'LATE_SHIPMENT' event. There is at most one order with any given due date per customer.

Fix for Problem 9 SELECT O.Order_ID, O.Due_Date, E.First_Notification_Date, E.Order_Event_ID, L.Line_ID, L.Item_ID FROM Orders O, Order_Lines L, Order_Events E, Order_Line_Events LE WHERE O.Customer_ID = :B1 AND O.Order_ID = L.Order_ID AND O.Order_ID = E.Order_ID AND E.Event_Type = 'LATE_SHIPMENT' AND E.Order_Event_ID = LE.Order_Event_ID AND L.Line_ID = LE.Line_ID ORDER BY O.Due_Date, O.Order_ID, E.Order_Event_ID, L.Line_ID; Robust Behavior: Now guarantees one row per late-shipment event line, even under future data conditions. Sorts sensibly, due date first, orders next, order events, next, order lines last.

SQL That Doesn’t Grow with the Data Usually requires SQL changes. Often requires schema design changes (hard to make, but even harder to make later!). Seeing the assumptions requires at least a little bit of understanding of the application, but most often no more than you can guess from table and column names!

Questions?