Download presentation
Presentation is loading. Please wait.
Published byArabella Lorin Ramsey Modified over 8 years ago
1
Installment 17 Toil and Trouble 資管所 研一 690530012 涂延坤
2
Outline C. J. Date’s position : Duplicate rows should never have been permitted -- David Beech’s argument Given that they are permitted, they ought to be avoided in practice -- Expression Transformation Conclusion Technical Correspondence
3
David Beech’s argument Part 1 Date’s opinion 1 Why duplicates are good? Why duplicates are bad? Positional addressing Part 2 Date’s opinion 2 Bags are defined in terms of sets
4
David Beech’s argument Part 1 Why duplicates are good? Duplicates occur naturally in practice Given that the above is true, it’s a burden to invent some artificial identifier in order to distinguish between them
5
David Beech’s argument Part 1 Why duplicates are bad? Individual objects must be identifiable No duplicates in a collection of objects(a mathematical set) obviously have identity -- self-identifying ex:{3, 6, 8, 11} is a set. {3, 6, 6, 8, 8, 8, 11} is not a set, but a multiset or bag. What is the identification mechanism in a collection that permits duplicates?
6
David Beech’s argument Part 1 Positional addressing An artificial value An ordering for the collection of objects ex: {3, 6, 6, 8, 8, 8, 11} the two “6”s occupy the second and third positions with respect to that ordering Not work in relational model, because of needing additional operators ex: insert this new row here A “natural” identifier -- a column value Still work in relational model
7
David Beech’s argument Part 2 Bags are defined in terms of sets Bag theory -- assuming that there is a way to count duplicates Each bag element has a hidden identifying tag that distinguishes it The bag is a set of tag/element pairs
8
Outline C. J. Date’s position : Duplicate rows should never have been permitted -- David Beech’s argument Given that they are permitted, they ought to be avoided in practice -- Expression Transformation Conclusion Technical Correspondence
9
Expression Transformation In a relational model is valid In the presence of duplicates is not necessarily valid Ex : List part numbers for parts that either are screws or are supplied by supplier S1, or both PSP P#PNAME P1Screw P1Screw P1Screw P2Screw S#P# S1P1 S1P1 S1P2
10
1. SELECT P# FROM P WHERE PNAME = ‘Screw’ OR P# IN ( SELECT P# FROM SP WHERE S# = ‘S1’); 2. SELECT P# FROM SP WHERE S# = ‘S1’ OR P# IN ( SELECT P# FROM P WHERE PNAME = ‘Screw’); PSP P#PNAME P1Screw P1Screw P1Screw P2Screw S#P# S1P1 S1P1 S1P2 PNAME = ‘Screw’S# = ‘S1’Result P# P1 P2 P# P1 P2 P# P1 P2 P# P1 P2 P# P1 P2 P# P1 P2
11
PSP 3. SELECT P. P# FROM P, SP WHERE ( S# = ‘S1’ AND P. P# = SP. P# ) OR PNAME = ‘Screw’; S#SP.P# S1P1 S1P1 S1P1 S1P1 S1P1 S1P1 S1P1 S1P1 S1P2 S1P2 S1P2 S1P2 P#PNAME P1Screw P1Screw P1Screw P2Screw S#P# S1P1 S1P1 S1P2 P AND SP’s Cartesian Product 4. SELECT SP. P# FROM P, SP WHERE ( S# = ‘S1’ AND P. P# = SP. P#) OR PNAME = ‘Screw’; P# P1 P2 P# P1 P2 3 4 Result P.P#PNAME P1Screw P1Screw P1Screw P2Screw P1Screw P1Screw P1Screw P2Screw P1Screw P1Screw P1Screw P2Screw
12
PSP P#PNAME P1Screw P1Screw P1Screw P2Screw S#P# S1P1 S1P1 S1P2 5. SELECT P# FROM P WHERE PNAME = ‘Screw’ UNION ALL SELECT P# FROM SP WHERE S# = ‘S1’; PNAME = ‘Screw’S# = ‘S1’ P# P1 P2 P# P1 P2 Result P# P1 P2
13
PSP P#PNAME P1Screw P1Screw P1Screw P2Screw S#P# S1P1 S1P1 S1P2 6. SELECT DISTINCT P# FROM P WHERE PNAME = ‘Screw’ UNION ALL SELECT P# FROM SP WHERE S# = ‘S1’; PNAME = ‘Screw’S# = ‘S1’ P# P1 P2 P# P1 P2 Result P# P1 P2 7. SELECT P# FROM P WHERE PNAME = ‘Screw’ UNION ALL SELECT DISTINCT P# FROM SP WHERE S# = ‘S1’; P# P1 P2 P# P1 P2 P# P1 P2
14
PSP P#PNAME P1Screw P1Screw P1Screw P2Screw S#P# S1P1 S1P1 S1P2 8. SELECT DISTINCT P# FROM P WHERE PNAME = ‘Screw’ OR P# IN ( SELECT P# FROM SP WHERE S# = ‘S1’); PNAME = ‘Screw’S# = ‘S1’Result P# P1 P2 P# P1 P2 9. SELECT DISTINCT P# FROM SP WHERE S# = ‘S1’ OR P# IN ( SELECT P# FROM P WHERE PNAME = ‘Screw’); P# P1 P2 P# P1 P2 P# P1 P2 P# P1 P2
15
PSP P#PNAME P1Screw P1Screw P1Screw P2Screw S#P# S1P1 S1P1 S1P2 10. SELECT P# FROM P GROUP BY P#, PNAME HAVING PNAME = ‘Screw’ OR P# IN ( SELECT P# FROM SP WHERE S# = ‘S1’ ); S# = ‘S1’Result P# P1 P2 P# P1 P2 P.P#PNAME P1Screw P2Screw GROUP BY P#, PNAME
16
PSP P#PNAME P1Screw P1Screw P1Screw P2Screw S#P# S1P1 S1P1 S1P2 11. SELECT P. P# FROM P, SP GROUP BY P. P#, PNAME, S#, SP. P# HAVING ( S# = ‘S1’ AND P. P# = SP. P# ) OR PNAME = ‘Screw’; P# P1 P2 GROUP BY P. P#, PNAME, S#, SP. P# Result S#SP.P# S1P1 S1P1 S1P1 S1P1 S1P1 S1P1 S1P1 S1P1 S1P2 S1P2 S1P2 S1P2 P AND SP’s Cartesian Product P.P#PNAME P1Screw P1Screw P1Screw P1Screw P1Screw P1Screw P2Screw P2Screw P1Screw P1Screw P1Screw P2Screw P.P#PNAME P1Screw P2Screw P1Screw P2Screw S#SP.P# S1P1 S1P1 S1P2 S1P2
17
PSP P#PNAME P1Screw P1Screw P1Screw P2Screw S#P# S1P1 S1P1 S1P2 12. SELECT P# FROM P WHERE PNAME = ‘Screw’ UNION SELECT P# FROM SP WHERE S# = ’S1’; PNAME = ‘Screw’S# = ‘S1’ P# P1 P2 P# P1 P2 Result P# P1 P2
18
The twelve different formulations produce nine different results Duplicate rows act as a significant optimization inhibitor, because: The optimizer code is expensive and unreliable System performance is awful The user is involved in figuring out the best way to state a given query Expression Transformation
19
The puzzle corner problem : to try out the twelve formulations and any other you can think of, on your own DBMS SELECT DISTINCT P. P# FROM P WHERE PNAME = ‘Screw’ OR EXISTS ( SELECT * FROM SP WHERE P. P# = SP. P# AND S# = ‘S1’ );
20
Expression Transformation To ensure that query results contain no duplicates -- by specifying DISTINCT Note : SQL systems are unable to optimize properly over duplicate elimination, because of lack of knowledge of key inheritance (see installment 9)
21
Outline C. J. Date’s position : Duplicate rows should never have been permitted -- David Beech’s argument Given that they are permitted, they ought to be avoided in practice -- Expression Transformation Conclusion Technical Correspondence
22
Conclusion Duplicate row support should be dropped Codd’s corrective steps for duplicate rows Install a “two-position switch” in the DBMS so that the DBA can specify whether duplicates are to be eliminated (a)automatically; (b) only on user request Support for (b) will be phased out in about two years Drop the support for duplicate rows, and improve the optimizer
23
Outline C. J. Date’s position : Duplicate rows should never have been permitted -- David Beech’s argument Given that they are permitted, they ought to be avoided in practice -- Expression Transformation Conclusion Technical Correspondence
24
Argument 1 : Not to require to assume that there is a way to count duplicates to construct a bags theory Date’s response : Bag theory Must be more complex than set theory Includes set theory as proper subset Is reducible to set theory
25
Technical Correspondence Argument 2 : the example of rats and ratlets Date’s response : When there’s a new litter, we make the obvious entry in LITTERS When an individual ratlet becomes “interesting”, we make the obvious entry in RATLETS LITTERS ( LITTER_ID, #_OF_RATLETS ) PRIMARY KEY ( LITTER_ID ) RATLETS ( RATLET_ID, LITTER_ID) PRIMARY KEY ( RATLET_ID ) FOREIGN KEY ( LITTER_ID ) REFERENCES LITTERS
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.