Some slides are from Dr. Sara Cohen Design Theory Some slides are from Dr. Sara Cohen
Overview Starting Point: Set of functional dependencies that describe real-world constraints Goal: Create tables that do not contain redundancies, so that there is less wasted space there is less of a chance to introduce errors in the database
Design Theory Armstrong's axioms defined, so that we can derive functional dependencies Need to identify a key: find a single key find all keys Both algorithms use as a subroutine an algorithm that computes the closure. In class a polynomial algorithm was given. A linear algorithm will be shown.
Compute Closure in Linear Time
Closure of a Set of Attributes Let U be a set of attributes and F be a set of functional dependencies on U. Suppose that X U is a set of attributes. Definition: X+ = { A | F X A} We would like to compute X+ |=
Algorithm From Class Compute Closure(X, F) C := X While there is a V W in F such that (V C)and (W C) do C := C W 3.Return C Complexity O(|U||F|)
Example R=ABCDE F={ABC, CEB, DA, BCE} {A}+ = {A,B}+ = {B,D}+=
A More Efficient Algorithm We start by creating a table, with a row for each FD and a column for each attribute. The table will have 2 additional columns called size and tail. In the row for a dependency X Y, there will be the value true in each column corresponding to an attribute in X. The size column will contain the size of the set X. The tail column will contain Y.
Example Table A B C D E Size Tail A → C B → D AD → E F = {A → C, B → D, AD → E} A B C D E Size Tail A → C 1 B → D AD → E 2
Compute Closure(X, F, T) /* T is the table, n is the number of FDs in F */ C := X Q := X While Q is not empty A := Q.dequeue() for i=1..n if T[i, A]=true then T[i,size] := T[i, size] –1 if T[i,size]=0, then Q := Q (T[i,tail]\C) C := C T[i,tail]
Computing AB+ A B C D E Size Tail A → C B → D AD → E 1 2 Start: X+ = {A,B}, Q = {A, B} A B C D E Size Tail A → C 1 B → D AD → E 2
Computing AB+ A B C D E Size Tail A → C B → D AD → E 1 Iteration of A: X+ = {A,B,C}, Q = {B,C} A B C D E Size Tail A → C B → D 1 AD → E
Computing AB+ A B C D E Size Tail A → C B → D AD → E 1 Iteration of B: X+ = {A,B,C,D}, Q = {C,D} A B C D E Size Tail A → C B → D AD → E 1
Computing AB+ A B C D E Size Tail A → C B → D AD → E 1 Iteration of C: X+ = {A,B,C,D}, Q = {D} A B C D E Size Tail A → C B → D AD → E 1
Computing AB+ A B C D E Size Tail A → C B → D AD → E Iteration of D: X+ = {A,B,C,D,E}, Q = {E} A B C D E Size Tail A → C B → D AD → E
Computing AB+ A B C D E Size Tail A → C B → D AD → E Iteration of E: X+ = {A,B,C,D,E}, Q = {} A B C D E Size Tail A → C B → D AD → E
Complexity? A B C D E Size Tail A → C B → D AD → E 1 2 To get an efficient algorithm, we assume that there are pointers from each “true” box in the table to the next “true” box in the same column. A B C D E Size Tail A → C 1 B → D AD → E 2
Complexity Complexity:O(|F|) Dequeue each attribute of X+ (attributes appearing at right in FDs of F) The number of changes of size in the table is the number of attributes appearing at left in FDs of F
Decomposition Characteristics
Characteristics of a Decomposition Two important characteristics of a decomposition: lossless join: necessary, otherwise original relation cannot be recreated, even if tables are not modified dependency preserving: allows us to check that inserts/updates are correct without joining the sub-relations
Lossless Join T C S Smith DB Cohen Jones OS Levy C S DB Cohen OS Levy
Checking Check for a lossless join using the algorithm from class (with the a-s and b-s) Check for dependency preserving using an algorithm shown today
Dependency Preservation R=ABC Decomposition {AB, AC} Dependencies {AB, BC}. Is it lossless? Does this decomposition preserve BC?
Dependency Preservation (cont’d) B A 100 10 1 2 300 20 3 B A 10 1 2 30 3 4 C A 100 1 2 300 3 400 4
Definitions We define S (F) to be the set of dependencies XY in F+ such that X and Y are in S. We say that a decomposition R1...Rn of R is dependency preserving if for all instances r of R that satisfy the FDs of R: (R1 (F) U ... U Rn (F))+ = F+ Note that one inclusion clearly holds always. This definition implies an exponential algorithm to check if a decomposition is dependency preserving We give a polynomial algorithm
Algorithm Let R be a relation, decomposed into R1, R2,…,Rn Let F be a set of functional dependencies To check whether R1,…,Rn preserves all the functional dependencies in F, run the algorithm on the next slide for each X -> Y in F If the answer is “Yes” for all FDs, then the decomposition preserves F If the answer is “No” for at least one FD, then the decomposition does not preserve F
Testing Dependency Preservation To check if the decomposition preserves XY: Z:=X while changes to Z occur do for i=1 to n do Z:= Z ((Z Ri)+ Ri) if YZ return “yes” else return “no”
Example (1) R=ABCD F = {A -> B, B -> C, C -> D, D -> A} R1=AB, R2=BC, R3=CD Is this decomposition dependency preserving?
Example (2) R = ABCDE F = {A -> ABCDE, BC -> A, DE -> C} Suppose we decompose R into ABDE and DEC. Is the decomposition dependency preserving?
Normal Forms
Non-Redundant Cover Algorithm for decomposition to 3NF that has a lossless join and is dependency preserving uses a non-redundant cover
Finding a Non-Redundant Cover 3 Steps: Define G as the result of putting F in standard form by decomposing each FD so that it has a single attribute on the right side For each XA in G and for each B in X, check whether G X-BA. If so, remove B For each XA in G, check whether G-{XA} XA. If so, remove XA |= |=
Normal Forms The basic idea: if a relation is in one of these forms, then it avoids certain problems (e.g., redundancy) Normal Forms: BCNF: Every dependency X->A in F+ must be (1) trivial or (2) X is a super-key 3NF: Every dependency X->A in F+ must be (1) trivial, (2) X is a super-key or (3) A is an attribute of a key
Example Reminder F+ = {X -> X+ | exist Y->Z in F st Y in X and Z not in X} Suppose that R = ABC. For each of the following values of F, decide whether R is in BCNF/3NF: F = {} F = {A -> B} F = {A -> B, A -> C} F = {A -> B, B -> C} F = {A -> B, BC -> A}
Decomposition into 3NF Given a relation R with functional dependencies F Step 1: Find a non-redundant cover G of F Step 2: For each FD XA in G, create a schema XA Step 3: If no schema created so far contains a key, add a key as a schema Step 4: Remove schemas that are contained in other schemas The result is a decomposition into 3NF that is dependency preserving and has a lossless join
Example Find a decomposition into 3NF for the relation R = ABCDEFGH, with the functional dependencies F = {AB, ABCDE, EFGH, ACDFEG}
Example Non-redundant cover G = {AB, ACDE, EFG, EFH} Key ACDF Schema: AB, ACDE, EFG, EFH, ACDF
Decomposition into BCNF There always exists a decomposition into BCNF that has a lossless join There does not always exist a decomposition into BCNF that is dependency preserving Example: Consider the relation SBD (sailor, boat, date) with the FDs {SBD} and {DB} There is a polynomial algorithm for finding such a decomposition
Algorithm from Class Suppose R is not in BCNF Suppose that XA violates the BCNF condition for R Decompose R into R-A, XA Continue recursively with R-A and XA Note: We must find violations to the BCNF condition in FR-A and FXA
Polynomial Algorithm for Decomposition into BCNF
Lemmas Lemma 1: Every 2-attributes scheme is in BCNF Lemma 2: If a schema R is not in BCNF, then we can find attributes A and B in R, such that (R – AB) A. It may or may not be the case that (R – AB) B as well.
Algorithm Check whether the schema R is into BCNF If not, decompose R R – A XA such that X -> A there is no YX such that YA
Algorithm Z:=R; // at all times, Z is the one scheme of the decomposition that may not be in BCNF repeat decompose Z into Z–A and XA, where XA is in BCNF and XA; // use the decomposition procedure add XA to the decomposition; Z:=Z–A; Until Z cannot be decomposed //by Lemma 2 add Z to the decomposition
Decomposition Procedure if Z contains no A and B such that A is in (Z-AB)+ // all closures are taken with respect to F then return that Z is in BCNF and cannot be decomposed else begin find one such A and B; Y:=Z–B; while Y contains A and B such that A is in (Y-AB)+ do Y:=Y–B; return the decomposition Z–A and Y; // Y is in the form XA, XA end
Example Schema R=CTHRSG C = course T = teacher H = hour R = room S = student G = grade
FDs C T Each course has one teacher HR C Only one course can meet in a room at one time HT R A teacher can be in only one room at one time CS G Each student has one grade in each course HS R A student can be in only one room at one time
Running the Algorithm (1) Z = CTHRSG Check A=C, B=T: C in (HRSG)+ Y = CHRSG A=R, B=C, Y = HRSG A=R, B=G, Y = HRS Add HRS to the decomposition and Z=CTHSG
Running the Algorithm (2) Z = CTHSG Check A=T, B=H Y = CTSG A=T, B=S, Y = CTG A=T, B=G, Y = CT Add CT to the decomposition and Z=CHSG
Running the Algorithm (3) Z = CHSG Check A=G, B=H Y = CSG Add CSG to the decomposition and Z=CHS CHS is into BCNF
Decomposition into BCNF HRS, CT, CSG, CHS Is it lossless join? Is it dependency preserving?