Installment 17 Toil and Trouble 資管所研一 690530012 涂延坤.

Installment 17 Toil and Trouble 資管所研一 690530012 涂延坤

Outline C. J. Date’s position ： Duplicate rows should never have been permitted －－ David Beech’s argument Given that they are permitted, they ought to be avoided in practice －－ Expression Transformation Conclusion Technical Correspondence

David Beech’s argument Part 1 Date’s opinion 1 Why duplicates are good? Why duplicates are bad? Positional addressing Part 2 Date’s opinion 2 Bags are defined in terms of sets

David Beech’s argument Part 1 Why duplicates are good? Duplicates occur naturally in practice Given that the above is true, it’s a burden to invent some artificial identifier in order to distinguish between them

David Beech’s argument Part 1 Why duplicates are bad? Individual objects must be identifiable No duplicates in a collection of objects(a mathematical set) obviously have identity －－ self-identifying ex:{3, 6, 8, 11} is a set. {3, 6, 6, 8, 8, 8, 11} is not a set, but a multiset or bag. What is the identification mechanism in a collection that permits duplicates?

David Beech’s argument Part 1 Positional addressing An artificial value An ordering for the collection of objects ex: {3, 6, 6, 8, 8, 8, 11} the two “6”s occupy the second and third positions with respect to that ordering Not work in relational model, because of needing additional operators ex: insert this new row here A “natural” identifier －－ a column value Still work in relational model

David Beech’s argument Part 2 Bags are defined in terms of sets Bag theory －－ assuming that there is a way to count duplicates Each bag element has a hidden identifying tag that distinguishes it The bag is a set of tag/element pairs

Expression Transformation In a relational model is valid In the presence of duplicates is not necessarily valid Ex ： List part numbers for parts that either are screws or are supplied by supplier S1, or both PSP P#PNAME P1Screw P1Screw P1Screw P2Screw S#P# S1P1 S1P1 S1P2

1. SELECT P# FROM P WHERE PNAME = ‘Screw’ OR P# IN ( SELECT P# FROM SP WHERE S# = ‘S1’); 2. SELECT P# FROM SP WHERE S# = ‘S1’ OR P# IN ( SELECT P# FROM P WHERE PNAME = ‘Screw’); PSP P#PNAME P1Screw P1Screw P1Screw P2Screw S#P# S1P1 S1P1 S1P2 PNAME = ‘Screw’S# = ‘S1’Result P# P1 P2 P# P1 P2 P# P1 P2 P# P1 P2 P# P1 P2 P# P1 P2

PSP 3. SELECT P. P# FROM P, SP WHERE ( S# = ‘S1’ AND P. P# = SP. P# ) OR PNAME = ‘Screw’; S#SP.P# S1P1 S1P1 S1P1 S1P1 S1P1 S1P1 S1P1 S1P1 S1P2 S1P2 S1P2 S1P2 P#PNAME P1Screw P1Screw P1Screw P2Screw S#P# S1P1 S1P1 S1P2 P AND SP’s Cartesian Product 4. SELECT SP. P# FROM P, SP WHERE ( S# = ‘S1’ AND P. P# = SP. P#) OR PNAME = ‘Screw’; P# P1 P2 P# P1 P2 3 4 Result P.P#PNAME P1Screw P1Screw P1Screw P2Screw P1Screw P1Screw P1Screw P2Screw P1Screw P1Screw P1Screw P2Screw

PSP P#PNAME P1Screw P1Screw P1Screw P2Screw S#P# S1P1 S1P1 S1P2 5. SELECT P# FROM P WHERE PNAME = ‘Screw’ UNION ALL SELECT P# FROM SP WHERE S# = ‘S1’; PNAME = ‘Screw’S# = ‘S1’ P# P1 P2 P# P1 P2 Result P# P1 P2

PSP P#PNAME P1Screw P1Screw P1Screw P2Screw S#P# S1P1 S1P1 S1P2 6. SELECT DISTINCT P# FROM P WHERE PNAME = ‘Screw’ UNION ALL SELECT P# FROM SP WHERE S# = ‘S1’; PNAME = ‘Screw’S# = ‘S1’ P# P1 P2 P# P1 P2 Result P# P1 P2 7. SELECT P# FROM P WHERE PNAME = ‘Screw’ UNION ALL SELECT DISTINCT P# FROM SP WHERE S# = ‘S1’; P# P1 P2 P# P1 P2 P# P1 P2

PSP P#PNAME P1Screw P1Screw P1Screw P2Screw S#P# S1P1 S1P1 S1P2 8. SELECT DISTINCT P# FROM P WHERE PNAME = ‘Screw’ OR P# IN ( SELECT P# FROM SP WHERE S# = ‘S1’); PNAME = ‘Screw’S# = ‘S1’Result P# P1 P2 P# P1 P2 9. SELECT DISTINCT P# FROM SP WHERE S# = ‘S1’ OR P# IN ( SELECT P# FROM P WHERE PNAME = ‘Screw’); P# P1 P2 P# P1 P2 P# P1 P2 P# P1 P2

PSP P#PNAME P1Screw P1Screw P1Screw P2Screw S#P# S1P1 S1P1 S1P2 10. SELECT P# FROM P GROUP BY P#, PNAME HAVING PNAME = ‘Screw’ OR P# IN ( SELECT P# FROM SP WHERE S# = ‘S1’ ); S# = ‘S1’Result P# P1 P2 P# P1 P2 P.P#PNAME P1Screw P2Screw GROUP BY P#, PNAME

PSP P#PNAME P1Screw P1Screw P1Screw P2Screw S#P# S1P1 S1P1 S1P2 11. SELECT P. P# FROM P, SP GROUP BY P. P#, PNAME, S#, SP. P# HAVING ( S# = ‘S1’ AND P. P# = SP. P# ) OR PNAME = ‘Screw’; P# P1 P2 GROUP BY P. P#, PNAME, S#, SP. P# Result S#SP.P# S1P1 S1P1 S1P1 S1P1 S1P1 S1P1 S1P1 S1P1 S1P2 S1P2 S1P2 S1P2 P AND SP’s Cartesian Product P.P#PNAME P1Screw P1Screw P1Screw P1Screw P1Screw P1Screw P2Screw P2Screw P1Screw P1Screw P1Screw P2Screw P.P#PNAME P1Screw P2Screw P1Screw P2Screw S#SP.P# S1P1 S1P1 S1P2 S1P2

PSP P#PNAME P1Screw P1Screw P1Screw P2Screw S#P# S1P1 S1P1 S1P2 12. SELECT P# FROM P WHERE PNAME = ‘Screw’ UNION SELECT P# FROM SP WHERE S# = ’S1’; PNAME = ‘Screw’S# = ‘S1’ P# P1 P2 P# P1 P2 Result P# P1 P2

The twelve different formulations produce nine different results Duplicate rows act as a significant optimization inhibitor, because: The optimizer code is expensive and unreliable System performance is awful The user is involved in figuring out the best way to state a given query Expression Transformation

The puzzle corner problem ： to try out the twelve formulations and any other you can think of, on your own DBMS SELECT DISTINCT P. P# FROM P WHERE PNAME = ‘Screw’ OR EXISTS ( SELECT * FROM SP WHERE P. P# = SP. P# AND S# = ‘S1’ );

Expression Transformation To ensure that query results contain no duplicates －－ by specifying DISTINCT Note ： SQL systems are unable to optimize properly over duplicate elimination, because of lack of knowledge of key inheritance (see installment 9)

Conclusion Duplicate row support should be dropped Codd’s corrective steps for duplicate rows Install a “two-position switch” in the DBMS so that the DBA can specify whether duplicates are to be eliminated (a)automatically; (b) only on user request Support for (b) will be phased out in about two years Drop the support for duplicate rows, and improve the optimizer

Argument 1 ： Not to require to assume that there is a way to count duplicates to construct a bags theory Date’s response ： Bag theory Must be more complex than set theory Includes set theory as proper subset Is reducible to set theory

Technical Correspondence Argument 2 ： the example of rats and ratlets Date’s response ： When there’s a new litter, we make the obvious entry in LITTERS When an individual ratlet becomes “interesting”, we make the obvious entry in RATLETS LITTERS ( LITTER_ID, #_OF_RATLETS ) PRIMARY KEY ( LITTER_ID ) RATLETS ( RATLET_ID, LITTER_ID) PRIMARY KEY ( RATLET_ID ) FOREIGN KEY ( LITTER_ID ) REFERENCES LITTERS

Installment 17 Toil and Trouble 資管所研一 690530012 涂延坤.

Similar presentations

Presentation on theme: "Installment 17 Toil and Trouble 資管所研一 690530012 涂延坤."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Installment 17 Toil and Trouble 資管所 研一 690530012 涂延坤.

Similar presentations

Presentation on theme: "Installment 17 Toil and Trouble 資管所 研一 690530012 涂延坤."— Presentation transcript:

Similar presentations

About project

Feedback

Installment 17 Toil and Trouble 資管所研一 690530012 涂延坤.

Presentation on theme: "Installment 17 Toil and Trouble 資管所研一 690530012 涂延坤."— Presentation transcript: