Download presentation
Presentation is loading. Please wait.
Published byAmos Stewart Modified over 6 years ago
1
SingingSQL Presents: Fixing Broken SQL March, 2006
©2006 Dan Tow, All Rights Reserved
2
Overview Introduction – the Game of Fixing Broken SQL
SQL That Hits Duplicate Rows SQL That Returns Unexpected Results SQL That Doesn’t Grow with the Data
3
Introduction – Game of Fixing Broken SQL
Think of tuning as a game: based on my own experiences over more than the past decade: A SQL tuning specialist can tune much more SQL than he or she can originate – tune other people’s SQL! You’ll know far less about the application than the original SQL coder – they know the purpose best! You could just focus on making individual queries faster – just be a better optimizer than the CBO.
4
Introduction – Game of Fixing Broken SQL
Don’t stop with just a performance-only fix! Just changing the performance of the SQL will often fail to deliver the best performance fix! – the best performance fix is often combined with a functional fix! Even without knowing the purpose of the SQL code, you can often deliver a functional fix that is independent of the performance fix!
5
Introduction – Game of Fixing Broken SQL
If the SQL coder knows the code’s purpose best, how can you find a functional fix? As a SQL tuning specialist, you see much more SQL than an average coder – you will know SQL better than the average coder! There are recurring patterns that frequently point to functionally broken SQL, and you can recognize these patterns even without knowing the application, without knowing the detailed purpose of the SQL!
6
What’s wrong with this query?
Problem 1 What’s wrong with this query? SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID;
7
Note that Order_Types is never referenced!
Answer 1 Note that Order_Types is never referenced! There is no join to T!! SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID;
8
Problem 1.1 What happens here?
SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID;
9
Answer 1.1 First, the database will get a Cartesian Product of all the rows in Order_Types and all the rows returned by the query you want (the query in black). Then, the database will perform a sort-unique operation to discard all the duplicates generated by that Cartesian product. SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID;
10
Answer 1.1(cont.) Functionally, the query result is probably fine! (When is it not?!) The physical I/O is probably fine, since any extra reads will be to the same blocks, redundantly. This will actually improve your cache hit ratio (while harming your performance!). The logical I/O may (or may not) be increased significantly, depending on the execution plan, but the CPU usage will almost certainly be increased significantly. (Looking for SQL with excessive logical I/O won’t always find these problems!) Without the DISTINCT, we lose an implicit sort. We might need an “ORDER BY” to restore that sort. SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID;
11
Fixed Query 1, Conservative Fix
Note the added “ORDER BY” SELECT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID ORDER BY O.Order_ID, L.Line_ID;
12
Overview Introduction – the Game of Fixing Broken SQL
SQL That Hits Duplicate Rows SQL That Returns Unexpected Results SQL That Doesn’t Grow with the Data
13
What’s wrong with this query?
Problem 2 What’s wrong with this query? SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID;
14
Answer 2 Maybe, nothing is wrong with this query.
Possibly, the DISTINCT is just a back-handed way to get a necessary sort. Possibly, the sort is unnecessary and is a (probably mild) waste of time. However, the DISTINCT may be a red flag pointing to a hidden many-to-many join. SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID;
15
Problem 2.1 SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Quantity, I.Name FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID; .1 SELECT STATEMENT ..2 SORT UNIQUE ...3 NESTED LOOPS NESTED LOOPS NESTED LOOPS TABLE ACCESS BY INDEX ROWID 4*CUSTOMERS INDEX UNIQUE SCAN CUSTOMERS_U1: customer_id TABLE ACCESS BY INDEX ROWID 1*ORDERS INDEX RANGE SCAN ORDERS_N4: customer_id TABLE ACCESS BY INDEX ROWID 2*ORDER_LINES INDEX RANGE SCAN ORDER_LINES_N1: order_id TABLE ACCESS BY INDEX ROWID 3*ITEMS INDEX RANGE SCAN ITEMS_U1: item_id,org_id
16
Clues 2.1 AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID;
SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Quantity, I.Name FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID; .1 SELECT STATEMENT ..2 SORT UNIQUE ...3 NESTED LOOPS NESTED LOOPS NESTED LOOPS TABLE ACCESS BY INDEX ROWID 4*CUSTOMERS INDEX UNIQUE SCAN CUSTOMERS_U1: customer_id TABLE ACCESS BY INDEX ROWID 1*ORDERS INDEX RANGE SCAN ORDERS_N4: customer_id TABLE ACCESS BY INDEX ROWID 2*ORDER_LINES INDEX RANGE SCAN ORDER_LINES_N1: order_id TABLE ACCESS BY INDEX ROWID 3*ITEMS INDEX RANGE SCAN ITEMS_U1: item_id,org_id
17
Answer 2.1 SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Quantity, I.Name FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID; .1 SELECT STATEMENT ..2 SORT UNIQUE ...3 NESTED LOOPS NESTED LOOPS NESTED LOOPS TABLE ACCESS BY INDEX ROWID 4*CUSTOMERS INDEX UNIQUE SCAN CUSTOMERS_U1: customer_id TABLE ACCESS BY INDEX ROWID 1*ORDERS INDEX RANGE SCAN ORDERS_N4: customer_id TABLE ACCESS BY INDEX ROWID 2*ORDER_LINES INDEX RANGE SCAN ORDER_LINES_N1: order_id TABLE ACCESS BY INDEX ROWID 3*ITEMS INDEX RANGE SCAN ITEMS_U1: item_id,org_id WHERE clause is missing condition on I.Org_ID that would make the join to Items unique.
18
Issues with Problem 2 SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Quantity, I.Name FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID; Observations regarding the non-unique join to Items: If every Item sharing the same Item_ID also shares the same Name, then this query will return one row per Order, thanks to the DISTINCT. If different Orgs might name an Item differently, then the query is almost certainly wrong, functionally. The inner join to I.Item_ID might succeed where an inner join to I.Item_ID and to I.Org_ID might fail, causing a subtle functional difference between the two alternatives. The query implies a design question: should there be an Items table keyed on Item_ID, alone, and an Items_Orgs table keyed on Item_ID, Org_ID? What are the odds that the original developers thought all this through and got it right?!? What are the odds that the corner cases were well tested?!?
19
Fixed Query 2, Alternative Fixes
SELECT O.Order_ID, O.Order_Date, C.Name, L.Quantity, I.Name FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND I.Org_ID = :B2 AND C.Customer_ID = O.Customer_ID ORDER BY O.Order_ID, L.Quantity, I.Name; AND I.Org_ID = L.Shipping_Org_ID
20
Many-to-Many Joins Many-to-many joins are almost always wrong!
Many-to-many joins come from a combination of SQL errors and database-design errors. DISTINCT is a common band-aid that covers up most, but usually not all of the row errors that result from erroneous many-to-many joins! (I do not mean by this that the resulting SQL is correct!) The underlying problem calls for more than a partial, band-aid fix!
21
A Modest Proposal Oracle should have a parameter, settable at the session level especially for testing, that triggers an error any time the session executes SQL with a many-to-many join! More-sophisticated uniqueness constraints (for example, uniqueness of a column combination just for a particular subset of a table) would help, here.
22
What’s wrong with this query?
Problem 3 What’s wrong with this query? SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name Customer, L.Quantity Qty, I.Name Item FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND I.Org_ID = :B2 AND C.Customer_ID = O.Customer_ID;
23
What if the second line had QTY=1 ?!?
Problem 3 Consider a possible result: 2 lines, shipped to two addresses: SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name Customer, L.Quantity Qty, I.Name Item FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND I.Org_ID = :B2 AND C.Customer_ID = O.Customer_ID; ORDER_ID ORDER_DATE CUSTOMER QTY ITEM MAR-06 C. Millsap 1 SQL For Dummies MAR-06 C. Millsap 2 SQL For Dummies What if the second line had QTY=1 ?!?
24
Does this accurately reflect the shipped order ?!?
Problem 3 Consider a possible result: 2 lines, shipped to two addresses, same Qty, Same Item: SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name Customer, L.Quantity Qty, I.Name Item FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND I.Org_ID = :B2 AND C.Customer_ID = O.Customer_ID; ORDER_ID ORDER_DATE CUSTOMER QTY ITEM MAR-06 C. Millsap 1 SQL For Dummies Does this accurately reflect the shipped order ?!? Get rid of the DISTINCT!
25
What’s wrong with this query?
Problem 4 What’s wrong with this query? SELECT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID GROUP BY O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name;
26
Answer 4 This is just Problem 1 in disguise: the GROUP BY just performs a back-handed DISTINCT operation! SELECT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND C.Customer_ID = O.Customer_ID GROUP BY O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name;
27
Problem 5 Is there a difference? SELECT … FROM … WHERE … AND :B1 = 1
UNION SELECT … FROM … WHERE … AND :B1 = 2; Versus: UNION ALL SELECT … FROM … WHERE …
28
Problem 5 SELECT … FROM … WHERE … AND :B1 = 1
UNION SELECT … FROM … WHERE … AND :B1 = 2; We normally think of UNION as an operation that discards duplicates between different SELECT blocks, but the above case never has such duplicates. However, UNION also discards duplicates (most likely unintentionally!) within each SELECT block! UNION acts like DISTINCT, here!
29
Unnecessary Sort-Unique Operations
DISTINCT is more often wrong than right, frequently (imperfectly!) patching a deeper, underlying problem. DISTINCT is not the only way to get a bad sort-unique step, though! UNION can trigger an incorrect sort-unique step: remember that it does not just discard duplicates in different SELECT blocks!. GROUP BY can trigger an incorrect sort-unique step, too.
30
Overview Introduction – the Game of Fixing Broken SQL
SQL That Hits Duplicate Rows SQL That Returns Unexpected Results SQL That Doesn’t Grow with the Data
31
SQL That Returns Unexpected Results
Fun with Outer Joins View-Using SQL
32
Fun with Outer Joins Ever heard something like this?
“No wonder this application has so many performance problems – just look at how many outer joins are scattered all through the SQL!”
33
Fun with Outer Joins Ever heard something like this?
“No wonder this application has so many performance problems – just look at how many outer joins are scattered all through the SQL!” The “problem of outer joins” is largely a myth! I have never seen a query where an outer join was the true root cause of a query’s poor performance! Correct outer joins perform as well as inner joins! However, outer joins are subtle and easy to mis-code.
34
What’s wrong with this query?
Problem 6 What’s wrong with this query? SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID(+) = L.Item_ID AND NVL(I.Org_ID,:B2) = :B2 AND C.Customer_ID = O.Customer_ID;
35
Problem 6 SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID(+) = L.Item_ID AND NVL(I.Org_ID,:B2) = :B2 AND C.Customer_ID = O.Customer_ID; It’s functionally fine in the case where there is a matching item having the correct Org_ID (even if there are also matching items with other Org_IDs). It’s functionally fine when there are no matching Items at all.
36
Problem 6 SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID(+) = L.Item_ID AND NVL(I.Org_ID,:B2) = :B2 AND C.Customer_ID = O.Customer_ID; It is almost certainly wrong in the (possibly very rare) case where there are matching Items, but they all have the wrong Org_IDs!
37
Problem 6 SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID(+) = L.Item_ID AND NVL(I.Org_ID,:B2) = :B2 AND C.Customer_ID = O.Customer_ID; It performs badly because it joins to more Items than necessary!
38
Fixed Query 6 SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item FROM Orders O, Order_Lines L, Items I, Customers C WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID(+) = L.Item_ID AND I.Org_ID(+) = :B2 AND C.Customer_ID = O.Customer_ID; Now, the outer join is clean: order lines are never discarded, and performance is better!
39
What’s wrong with this query?
Problem 7 What’s wrong with this query? SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item, IT.Name Type FROM Orders O, Order_Lines L, Items I, Customers C, Item_Types IT WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID(+) = L.Item_ID AND I.Org_ID(+) = :B2 AND C.Customer_ID = O.Customer_ID AND I.Item_Type_ID = IT.Item_Type_ID;
40
Problem 7 SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item, IT.Name Type FROM Orders O, Order_Lines L, Items I, Customers C, Item_Types IT WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID(+) = L.Item_ID AND I.Org_ID(+) = :B2 AND C.Customer_ID = O.Customer_ID AND I.Item_Type_ID = IT.Item_Type_ID; The inner join from I to IT discards the outer case of the outer join from L to I. The “(+)”s could be removed without changing the functional result, and this would be clearer, if this is the desired result. If you don’t want all inner joins, make I to IT outer.
41
First Fix 7 SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item, IT.Name Type FROM Orders O, Order_Lines L, Items I, Customers C, Item_Types IT WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID = L.Item_ID AND I.Org_ID = :B2 AND C.Customer_ID = O.Customer_ID AND I.Item_Type_ID = IT.Item_Type_ID; The result is now consistent with the original functionality, but is clearer to understand!
42
Alternate Fix 7 SELECT O.Order_ID, O.Order_Date, C.Name Customer, L.Type, L.Cost, L.Quantity Qty, I.Name Item, IT.Name Type FROM Orders O, Order_Lines L, Items I, Customers C, Item_Types IT WHERE O.Customer_ID = :B1 AND L.Order_ID = O.Order_ID AND I.Item_ID(+) = L.Item_ID AND I.Org_ID(+) = :B2 AND C.Customer_ID = O.Customer_ID AND I.Item_Type_ID = IT.Item_Type_ID(+); The result is now consistent with the intent implied by the original outer join from L to I!
43
What’s wrong with this query?
Problem 8 What’s wrong with this query? SELECT C.Last_Name, C.First_Name, CE.Event_Date, Ev.Name, EM.Last_Name, Em.First_Name FROM Customers C, Customer_Events CE, Events Ev, Employees Em WHERE C.Customer_ID = :B1 AND C.Customer_ID = CE.Customer_ID AND CE.Event_ID = Ev.Event_ID AND CE.Recorded_By = Em.Employee_ID ORDER BY CE.Event_Date;
44
View-Using SQL SELECT C.Last_Name, C.First_Name, CE.Event_Date, Ev.Name, Em.Last_Name , Em.First_Name FROM Customers C, Customer_Events CE, Events Ev, Employees Em WHERE C.Customer_ID = :B1 AND C.Customer_ID = CE.Customer_ID AND CE.Event_ID = Ev.Event_ID AND CE.Recorded_By = Em.Employee_ID ORDER BY CE.Event_Date; The query seems fine, but you happen to notice that there is no “Employees” table mentioned in the execution plan! You look up the Employees definition:
45
View-Using SQL Does this view create a problem in the context of the view-using SQL in the last slide? CREATE VIEW Employees AS SELECT P.Person_ID Employee_ID, P.Last_Name, P.First_Name, D.Name Department_Name, M.Full_Name Manager_Name FROM Persons P, Persons M, Person_Roles R, Departments D WHERE P.Mgr_ID = M.Person_ID(+) AND P.Department_ID = D.Department_ID(+) AND P.Person_ID = R.Person_ID AND R.Role_Code = 'EMP' AND SYSDATE BETWEEN R.Start_Date AND NVL(R.END_DATE,SYSDATE);
46
First Fix, Problem 8 Don’t lose records just because they were recorded by someone who is no longer an employee! SELECT C.Last_Name, C.First_Name, CE.Event_Date, Ev.Name, EM.Last_Name, Em.First_Name FROM Customers C, Customer_Events CE, Events Ev, Employees Em WHERE C.Customer_ID = :B1 AND C.Customer_ID = CE.Customer_ID AND CE.Event_ID = Ev.Event_ID AND CE.Recorded_By = Em.Employee_ID(+) ORDER BY CE.Event_Date;
47
Second Fix, Problem 8 Don’t null out recorded-by names just because they were recorded by someone who is no longer an employee! This performs better! Are we done? SELECT C.Last_Name, C.First_Name, CE.Event_Date, Ev.Name, EM.Last_Name, Em.First_Name FROM Customers C, Customer_Events CE, Events Ev, Persons Em WHERE C.Customer_ID = :B1 AND C.Customer_ID = CE.Customer_ID AND CE.Event_ID = Ev.Event_ID AND CE.Recorded_By = Em.Person_ID(+) ORDER BY CE.Event_Date;
48
Problem 8 (Cont.) Are we done?
SELECT C.Last_Name, C.First_Name, CE.Event_Date, Ev.Name, EM.Last_Name, Em.First_Name FROM Customers C, Customer_Events CE, Events Ev, Persons Em WHERE C.Customer_ID = :B1 AND C.Customer_ID = CE.Customer_ID AND CE.Event_ID = Ev.Event_ID AND CE.Recorded_By = Em.Person_ID(+) ORDER BY CE.Event_Date;
49
Third Fix, Problem 8 Now we are done!
SELECT C.Last_Name, C.First_Name, CE.Event_Date, Ev.Name, EM.Last_Name, Em.First_Name FROM Persons C, Customer_Events CE, Events Ev, Persons Em WHERE C.Person_ID = :B1 AND C.Person_ID = CE.Customer_ID AND CE.Event_ID = Ev.Event_ID AND CE.Recorded_By = Em.Person_ID(+) ORDER BY CE.Event_Date;
50
Overview Introduction – the Game of Fixing Broken SQL
SQL That Hits Duplicate Rows SQL That Returns Unexpected Results SQL That Doesn’t Grow with the Data
51
What’s wrong with this query?
Problem 9 What’s wrong with this query? SELECT O.Order_ID, O.Due_Date, E.First_Notification_Date, L.Item_ID FROM Orders O, Order_Lines L, Order_Events E, Order_Line_Events LE WHERE O.Customer_ID = :B1 AND O.Order_ID = L.Order_ID AND O.Order_ID = E.Order_ID AND E.Event_Type = 'LATE_SHIPMENT' AND L.Line_ID = LE.Line_ID ORDER BY O.Due_Date;
52
Problem 9 SELECT O.Order_ID, O.Due_Date, E.First_Notification_Date, L.Item_ID FROM Orders O, Order_Lines L, Order_Events E, Order_Line_Events LE WHERE O.Customer_ID = :B1 AND O.Order_ID = L.Order_ID AND O.Order_ID = E.Order_ID AND E.Event_Type = 'LATE_SHIPMENT' AND L.Line_ID = LE.Line_ID ORDER BY O.Due_Date; Note Order_Events and Order_Line_Events, with line events presumably serving as details for the order events.
53
Problem 9 SELECT O.Order_ID, O.Due_Date, E.First_Notification_Date, L.Item_ID FROM Orders O, Order_Lines L, Order_Events E, Order_Line_Events LE WHERE O.Customer_ID = :B1 AND O.Order_ID = L.Order_ID AND O.Order_ID = E.Order_ID AND E.Event_Type = 'LATE_SHIPMENT' AND L.Line_ID = LE.Line_ID ORDER BY O.Due_Date; Order_Events and Order_Line_Events both join to the same order, in any row this query returns, but do they pertain to the same event?!? This query implies significant built-in assumptions:
54
Problem 9 SELECT O.Order_ID, O.Due_Date, E.First_Notification_Date, L.Item_ID FROM Orders O, Order_Lines L, Order_Events E, Order_Line_Events LE WHERE O.Customer_ID = :B1 AND O.Order_ID = L.Order_ID AND O.Order_ID = E.Order_ID AND E.Event_Type = 'LATE_SHIPMENT' AND L.Line_ID = LE.Line_ID ORDER BY O.Due_Date; Assumptions (These may be valid, today, but tomorrow?): There is at most one event of type 'LATE_SHIPMENT' per order. There are no line event details for any event types other than the 'LATE_SHIPMENT' event. There is at most one order with any given due date per customer.
55
Fix for Problem 9 SELECT O.Order_ID, O.Due_Date, E.First_Notification_Date, E.Order_Event_ID, L.Line_ID, L.Item_ID FROM Orders O, Order_Lines L, Order_Events E, Order_Line_Events LE WHERE O.Customer_ID = :B1 AND O.Order_ID = L.Order_ID AND O.Order_ID = E.Order_ID AND E.Event_Type = 'LATE_SHIPMENT' AND E.Order_Event_ID = LE.Order_Event_ID AND L.Line_ID = LE.Line_ID ORDER BY O.Due_Date, O.Order_ID, E.Order_Event_ID, L.Line_ID; Robust Behavior: Now guarantees one row per late-shipment event line, even under future data conditions. Sorts sensibly, due date first, orders next, order events, next, order lines last.
56
SQL That Doesn’t Grow with the Data
Usually requires SQL changes. Often requires schema design changes (hard to make, but even harder to make later!). Seeing the assumptions requires at least a little bit of understanding of the application, but most often no more than you can guess from table and column names!
57
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.