Relational Database Design Theory

Relational Database Design Theory
Database Design Concepts Update, deletion and insertion anomalies Functional dependency 5/4/2019 CS319 Theory of Databases

Database Design Concepts 0
The attributes in a database are determined by the set of observables that determine the state of the enterprise These attributes can in general be clustered into subsets according to the contexts for observation in which they are encountered Database design is concerned with how attributes can be organised into plausible clusters of attributes as relations within a relational scheme – more formally ... 5/4/2019 CS319 Theory of Databases

A relational database scheme can be regarded as defined by an abstract relation R(A1, A2, ..., An) where A1, A2, ..., An are all the attributes of interest. This is simplest possible relation scheme for the database. In general, when setting up the database, decompose the relation R to form a new relational scheme. Issues what is the nature of this decomposition? does it matter what is associated in tables? if so, what principles apply to the design of tables? 5/4/2019 CS319 Theory of Databases

Answer the association of attributes should be guided by data dependency failure to take account of dependency creates anomalies on modifying the database anomalies can be largely eliminated through an appropriate choice of tables and normalisation cf. an object-oriented approach, where attributes are grouped according to the objects that contain them 5/4/2019 CS319 Theory of Databases

The relational database design problem Any relational database could – in principle – be organised as a single relation R(A1, A2, ..., An) where A1, A2, ..., An are all the attributes of interest. Why not do it this way? convenience: easy to handle small relations ... but also need to consider semantics of attributes In practice, typically choose to decompose R into a set of smaller relations R1, R2, ..., Rn, where the set of attributes in R = the set of attributes in R1, R2, ..., Rn. 5/4/2019 CS319 Theory of Databases

Illustrative example of design E.g. could take HVFC as comprising a single relation HVFC (NAME, ADDRESS, BALANCE, ORDER_NO, ITEM, QUANTITY, SNAME, SADDRESS, PRICE) defined by the natural join of the relations MEMBERS(NAME, ADDRESS, BALANCE) ORDERS(ORDER_NO, NAME, ITEM, QUANTITY) SUPPLIERS(SNAME, SADDRESS, ITEM, PRICE) No information loss involved in this: can recover all the original relations by projection ... 5/4/2019 CS319 Theory of Databases

Data dependencies in the HVFC In HVFC DB, can infer some attributes from others: e.g. know SNAME Þ know SADDRESS know SNAME and ITEM Þ know PRICE NB knowledge of such dependencies is derived from real-world meaning of the attributes This informs good decomposition of a relational scheme 5/4/2019 CS319 Theory of Databases

E.g. if we have a single relation for HVFC, then every tuple with SNAME "S" has SADDRESS "A" Þ very high level of redundancy in the table Actually have this redundancy in the relation: SUPPLIERS(SNAME, SADDRESS, ITEM, PRICE) SMITH LONDON NUT 20p SMITH LONDON BOLT 25p JONES HULL PIN 15p ... this redundancy has consequences for update 5/4/2019 CS319 Theory of Databases

Anomalies and the SUPPLIER relation Form of the SUPPLIER relation leads to anomalies on update, deletion and insertion: update anomalies: when supplier address is updated, must be updated in all tuples, else two addresses recorded for same supplier insertion anomalies: can't record info on supplier unless he supplies part, whence .... deletion anomalies: delete all items supplied by one supplier and you lose supplier's address. 5/4/2019 CS319 Theory of Databases

A better design of the HVFC relation scheme will resolve these problems. Thus decomposition SA (SNAME, SADDRESS) SIP (SNAME, ITEM, PRICE) resolves the anomalies. Relational database design theory guides the choice of decomposition for a given relation scheme. Use the data dependencies for this: if no data dependencies, then no decomposition 5/4/2019 CS319 Theory of Databases

Functional Dependency 1
Formalising Data Dependency Let R º R(A1, A2, ..., An) be a relation scheme, and let X and Y be subsets of { A1, A2, ..., An }. Think of R as an intensional relation with many possible extensions. Definition of functional dependency "X functionally determines Y" if, in every extension r of R, two tuples in r which agree on the attribute set X also agree on Y. 5/4/2019 CS319 Theory of Databases

Vocabulary and notation for functional dependency "X functionally determines Y" if, in every extension r of R, two tuples in r which agree on the attribute set X also agree on Y … Also say "Y functionally depends on X” and denote dependency by X ® Y [to be read as "X determines Y"] 5/4/2019 CS319 Theory of Databases

Notes on functional dependency Functional dependency is a semantic notion defined in terms of the intensional rather than extensional data-base. Can only infer (possibly) that certain functional dependencies do not hold by inspecting the extensional part of a db. The set of functional dependencies can be viewed as a set of integrity constraints on the relation scheme, and where possible it should be preserved under decomposition. 5/4/2019 CS319 Theory of Databases

An illustrative example Abbreviate names of attributes in SUPPLIERS(SNAME, SADDRESS, ITEM, PRICE) to S, A, I, P respectively. Use notation S ® A, SI ® P to represent functional dependencies (abbrev. FDs). The set of ALL functional dependencies is usually large: SI ® AP, SP ® SAP, S ® SA, SI ® SAIP are all examples of valid functional dependencies 5/4/2019 CS319 Theory of Databases

If X is a set (modulo discounting the empty set): a functional dependency on X  a pair of subsets of X  each functional dependency on X  (X)  (X) If X is the basis of a relational scheme, then the set of all functional dependencies on X is a [special kind of] subset of (X)  (X). Set X has N attributes  22N instances of FDs on set X  around “2 to the 2 to the 2N” patterns of FD on X … even though this set of patterns is constrained 5/4/2019 CS319 Theory of Databases

Suppose that R(X) is a relational scheme based on X Number of FDs in a R(X) can be of size 2N, where |X|=N  is in general very inefficient, if not infeasible, to list all the FDs in a relational scheme on X Number of patterns of FDs on a set X grows in a doubly exponential manner with the size of X  relational tables are semantically inconceivably rich 5/4/2019 CS319 Theory of Databases

Two agendas for the study of functional dependency: attempting to automate database design by finding computationally feasible ways to manipulate FDs investigating the principles that bind the particular pattern of FDs in a relational scheme to the real-world observables that it is intended to model Relational theory addresses both of these agendas … 5/4/2019 CS319 Theory of Databases

Number of FDs in a R(X) can be of size 2N, where |X|=N Number of patterns of FDs on a set X grows in a doubly exponential manner with the size of X … Problematic issues: representing set of ALL FDs in a relational scheme X processing the set of ALL functional dependencies determining how patterns of FDs are constrained Solution find axioms to characterise functional dependency represent the set of all FDs as the set of FDs that can be inferred from a basic set of FDs 5/4/2019 CS319 Theory of Databases

Inference of Functional Dependencies Suppose R is a relation scheme, and F is a set of functional dependencies for R. If X, Y are subsets of attributes of R and all relations r which satisfy the FDs in F also satisfy X ® Y, then say F logically implies X ® Y, written F |= X ® Y e.g. if F = { A ® B, B ® C } then F |= A ® C if F = { S ® A, SI ® P } then F |= S I ® AP, F |= SP ® SAP etc. 5/4/2019 CS319 Theory of Databases

Inference of Functional Dependencies (cont.) Definition if F is a set of FDs, then F+  { X ® Y ½ F |= X ® Y } Informally, F+ is set of ALL functional dependencies that can be inferred from the functional dependencies in F. ... usually F+ is much too large even to enumerate! 5/4/2019 CS319 Theory of Databases

More about the set F+ ... usually F+ is much too large even to enumerate! Example with 3 attributes, 2 FDs and 35 dependencies: If R = ABC and F = { A ® B, B ® C } then F+ is A ® S for all [non-empty] subsets S of ABC 7 FDs B ® BC, B ® B, B ® C 3 FDs C ® C FD AB ® S for all [n-e] subsets S of ABC 7 FDs AC ® S for all [n-e] subsets S of ABC 7 FDs BC ® BC, BC ® B, BC ® C 3 FDs ABC ® S for all [n-e] subsets S of ABC 7 FDs 5/4/2019 CS319 Theory of Databases

Can interpret keys in terms of functional dependency ... Definition of a superkey: If R(A1, A2, ..., An) is a relation scheme, and F is the set of FDs in R then a subset X of { A1, A2, ..., An } is a superkey for R if F |= X ® Ai for 1 ≤ i ≤ n Definition of a key: X is a key if no proper subset Y of X is a superkey 5/4/2019 CS319 Theory of Databases

Armstrong’s axioms for reasoning about FDs 1
Armstrong [1974] proved following theorem: Theorem: F+ is the set of dependencies which can be deduced from F by applying three inference rules: (i) reflexivity if Y  X  U then X ® Y (ii) augmentation if X ® Y then XZ ® YZ (" Z  U) (iii) transitivity if X ® Y and Y ® Z then X ® Z (i) (ii) and (iii) are Armstrong's axioms. The theorem asserts that Armstrong’s axioms are a sound and complete set of inference rules. 5/4/2019 CS319 Theory of Databases

First prove the easy part of Armstrong’s Theorem … Axioms are sound … if Y  X  U then X ® Y if X ® Y, then XZ ® YZ (" Z  U) if X ® Y and Y ® Z then X ® Z (i) if r is a relation with two tuples s and t that agree on X, and Y  X, then s and t agree on Y (ii) if r is a relation with two tuples s and t that agree on X and Z and X ® Y, then s and t must agree on Y and Z (iii) if r is a relation with two tuples s and t that agree on X and both X ® Y and Y ® Z then s and t must agree on Y, and hence they must agree on Z 5/4/2019 CS319 Theory of Databases

Completeness of Armstrong’s axioms: if Y  X  U then X ® Y if X ® Y, then XZ ® YZ (" Z  U) if X ® Y and Y ® Z then X ® Z First define X+, the closure of X  U with respect to F: X+ is the set of attributes A such that X ® A can be deduced from Armstrong’s axioms. Note that we can deduce that X ® Y for some set Y by consulting Armstrong’s axioms if and only if Y  X+: if Y contains A then X ® Y and Y ® A, so X ® A if X ® A and X ® B, then X=XX ® XA = AX ® AB 5/4/2019 CS319 Theory of Databases

Completeness of Armstrong’s axioms (continued): if Y  X  U then X ® Y if X ® Y, then XZ ® YZ (" Z  U) if X ® Y and Y ® Z then X ® Z To establish completeness, it is sufficient to show: if X ® Y cannot be deduced from Armstrong’s axioms (the syntactic characterisation of ®), then there is a relational extension for R in which all the dependencies in F are true, but X ® Y does not hold (the semantic characterisation of ®) 5/4/2019 CS319 Theory of Databases

Completeness of Armstrong’s axioms (continued): Suppose can’t deduce X ® Y from Armstrong’s axioms Consider the relational extension R0 for R with 2 tuples: attributes of X+ other attributes 1 1 … … 1 1 1 … … 0 (assuming boolean attributes, or more generally that the two tuples agree on X+ but disagree elsewhere) Need to establish that R0 is a valid extension of R: i.e. that the dependencies in F are respected in R0. 5/4/2019 CS319 Theory of Databases

Completeness of Armstrong’s axioms (continued): Consider the relational extension R0 for R with 2 tuples: attributes of X+ other attributes 1 1 … … 1 1 1 … … 0 Check that all the dependencies in F are true: Suppose that V ® W is a dependency in F If V is not a subset of X+, the dependency holds in R0 If V is a subset of X+, then both X ® V, and X ® W can be deduced by Armstrong’s axioms. This means that W is a subset of X+, and thus V ® W holds in R0 5/4/2019 CS319 Theory of Databases

Completeness of Armstrong’s axioms (continued): Consider the relational extension R0 for R with 2 tuples: attributes of X+ other attributes 1 1 … … 1 1 1 … … 0 Confirm that X ® Y does not hold in R0: Recall that we can deduce that X ® Y for some set Y by consulting Armstrong’s axioms if and only if Y  X+. By assumption, we can’t deduce that X ® Y holds in R0. Hence Y contains an attribute not in the subset X+, confirming that X ® Y does not hold in R0. 5/4/2019 CS319 Theory of Databases

Computational aspects of dependencies 1
Computing the closure of an attribute set Given relation scheme R with set of attributes U and a set of dependencies F involving attributes in U The closure of the set of attributes X from U is X+  { A ½ F |= X ® A } Note that X+ is a (small) set of attributes, unlike F+, which is a (large) set of functional dependencies. … can't compute F+ efficiently, but can compute X+. 5/4/2019 CS319 Theory of Databases

Computing the closure of an attribute set (cont.) Can't compute F+ efficiently, but can compute X+: Construct sequence of attribute sets: X(0) = X X(i+1) = X(i) È { A ½ V ® W Î F, V  X(i), A Î W } Since number of attributes in U is finite, get X(j+1) = X(j), and have X+ = X(j) at this point. Complexity of this construction is O(|F|.|R|) – need to check whether X  X(i) in an efficient way, but otherwise amount of computation determined by number of dependencies processed at each iteration and number of iterations ( £ number of attributes in U) 5/4/2019 CS319 Theory of Databases

Illustrative example Take F to be the set of dependencies AB ® C ACD ® B CG ® BD C ® A D ® EG CE ® AG BC ® D BE ® C Let X = BD. Then can compute X+ thus: X(0) = BD, X(1) = BDEG=BEDG, X(2) = BECDG=BCEDG, X(3) = BCEAGDG=ABCDEG = X(4). Computation of this type is used for key checking etc 5/4/2019 CS319 Theory of Databases

Representing the set of dependencies Obviously better to work with F rather than F+. Issue How small and simple can we choose G for G+ = F+? ... if G+ = F+, describe this as G generates F+ ... if G+ Ê F+, say that G covers F motivates the definition of a minimal cover ... 5/4/2019 CS319 Theory of Databases

Representing the set of dependencies (cont.) Definition: G is a minimal cover for F if a) G+ =F+ b) every RHS in G is a single attribute c) every LHS minimal subject to determining RHS i.e. for no X ® Y Î G is there a proper subset Z of X such that G \ { X ® Y } È { Z ® Y } generates F+ d) no proper subset of G also generates F+. Minimal cover isn't necessarily unique. 5/4/2019 CS319 Theory of Databases

Computing a minimal cover for F Have a reasonably efficient algorithm for computing a minimal cover. ‘Reasonably’ means efficient relative to the size of the given set F that generates F+. Can't expect efficiency irrespective of size of |F|. Minimal cover isn't necessarily unique … the algorithm to compute a minimal cover is non-deterministic (that is to say, it involves arbitrary choices) 5/4/2019 CS319 Theory of Databases

Computing a minimal cover for F (cont.) Input to the algorithm is a set F that generates F+. Transform F in 3 stages 1. replace X ® Y by the { X ® A such that A Î Y } 2. check that no X ® A is redundant 3. eliminate any redundant attribute of X in X ® A Need to carry out each step efficiently ... 5/4/2019 CS319 Theory of Databases

Computing a minimal cover for F (cont.) 1. replace X ® Y by the { X ® A such that A Î Y } 2. check that no X ® A is redundant 3. eliminate any redundant attribute of X in X ® A In Stage 1, have already transformed F so that all dependencies are of the form X ® A, where |A|=1. To check for redundancy, take each dependency X ® A of F in turn, and compute the closure X+ of X relative to the set of dependencies F \ {X ® A}. If A Î X+, then discard X ® A from F, and continue the stage 2 redundancy checking with the new F. 5/4/2019 CS319 Theory of Databases

Computing a minimal cover for F (cont.) 1. replace X ® Y by the { X ® A such that A Î Y } 2. check that no X ® A is redundant 3. eliminate any redundant attribute of X in X ® A At Stage 3, can assume that all dependencies in F are of the form X ® A, where |A|=1, and no X ® A dependency is redundant. For each dependency X ® A in turn, and for each attribute B in X, compute the closure (X \ {B})+ relative to the set of dependencies F. If AÎ(X \ {B})+, then replace X ® A by X \ {B} ® A in F, and continue the Stage 3 redundancy checking with the new F. 5/4/2019 CS319 Theory of Databases

Computing a minimal cover for F (cont.) Note: eliminating redundancy at Stage 3 relies on the fact that if X \ {B} ® A, then X ® A, but in general the converse implication is false. From this fact it follows that if X \ {B} ® A can be inferred from the dependencies in F then X ® A can be replaced by X \ {B} ® A Algorithm is tolerably efficient (ex.), since none of this checking involves explicit computation with F+. 5/4/2019 CS319 Theory of Databases

Illustrative example Find a minimal cover for F, where F is: AB ® C ACD ® B CG ® BD C ® A D ® EG CE ® AG BC ® D BE ® C Stage 1 transforms F to form: AB ® C ACD ® B CG ® B CG ® D C ® A D ® E D ® G CE ® A CE ® G BC ® D BE ® C 5/4/2019 CS319 Theory of Databases

Illustrative example of computing a minimal cover ... AB ® C ACD ® B CG ® B CG ® D C ® A D ® E D ® G CE ® A CE ® G BC ® D BE ® C Stage 2 finds that CE ® A is redundant, since C ® A. After CE ® A eliminated, there is still redundancy CG ® B can be derived from the 3 dependencies: CG ® D, C ® A and ACD ® B There is no further redundancy Thus CE ® A and CG ® B are eliminated at Stage 2. 5/4/2019 CS319 Theory of Databases

Illustrative example of computing a minimal cover ... AB ® C ACD ® B CG ® D ... after Stage 2 C ® A D ® E D ® G CE ® G BC ® D BE ® C Stage 3 replaces ACD ® B by CD ® B, since C ® A and as a result (CD)+ relative to F contains B AB ® C CD ® B CG ® D C ® A D ® E D ® G minimal cover To get another minimal cover (ex.): at Stage 2 eliminate ACD ® B, CG ® D, CE ® A 5/4/2019 CS319 Theory of Databases

Relational Database Design Theory

Similar presentations

Presentation on theme: "Relational Database Design Theory"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Relational Database Design Theory

Similar presentations

Presentation on theme: "Relational Database Design Theory"— Presentation transcript:

Similar presentations

About project

Feedback