Download presentation
Presentation is loading. Please wait.
1
Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University
2
Functional Dependency Knowing the value of an item (or items) means you know the values of other items in the row e.g., if we know the person’s name, then we know the address In our example, if we know the Parcel-ID, we know the Alderman, Township name, and other Township attributes: Parcel-ID - > Alderman Parcel-ID - > Thall_add Parcel-ID - > Tship-ID Parcel-ID - > Tship_name
3
Moving from First Normal Form (1NF to Second Normal Form (2NF), we need to: Identify functional dependencies Place in separate tables, one key per table
4
Normal Forms Summary No repeat columns (create new records such that there are multiple records per entry) Split the tables, so that all non-key attributes depend on a primary key. Split tables further, if there are transitive functional dependencies. This results in tables with a single, primary key per table.
5
if any two rows never agree on value, then is trivially preserved. e.g course_ID course_name is not trivially preserved e.g. student_ID, course_ID course_name is trivially preserved
6
Normal Forms Are Good Because: It reduces total data storage Changing values in the database is easier It “insulates” information – it is easier to retain important data Many operations are easier to code
7
The table instance satisfies the following student_name student_name (a trivial dependency) student_name, course_name student_name (also trivial) there are many trivial dependencies – R.H.S. subset of L.H.S. student_ID, course_ID (student_ID, student_name, course_ID, course_Name ) - student_ID, course_ID is a key
8
is a superkey for R iff R. where R is taken as the schema for relation R. is a candidate key for R iff R, and for no that is a proper subset of , R. (student_ID, course_ID) is a candidate key (student_ID, course_ID, course_name) is not a candidate key
9
F – a set of functional dependencies f – an individual functional dependency f is implied by F if whenever all functional dependencies in F are true, then f is true. For example, consider Workers(id, name, office, did, since) {id did, did office } implies id office Reasoning about FDs
10
Closure of a set of FDs The set of all FDs implied by a given set F of FDs is called the closure of F, denoted as F +. Armstrong’s Axioms, can be applied repeatedly to infer all FDs implied by a set of FDs. Suppose X,Y, and Z are sets of attributes over a relation. (notation: XZ is X U Z) Armstrong’s Axioms Reflexivity: if Y X, then X Y Augmentation: if X Y, then XZ YZ Transitivity: if X Y and Y Z, then X Z
11
reflexivity : student_ID, student_name student_ID student_ID, student_name student_name (trivial dependencies) augmentation : student_ID student_name implies student_ID, course_name student_name, course_name transitivity : course_ID course_name and course_name department_name Implies course_ID department_name
12
Armstrong’s Axioms is sound and complete. –Sound: they generate only FDs in F +. –Complete: repeated application of these rules will generate all FDs in F +. The proof of soundness is straight forward, but completeness is harder to prove.
13
Proof of Armstrong’s Axioms (soundness) Notation: We use t[X] for X [ t ] for any tuple t. (note that we used t.X before) Reflexivity: If Y X, then X Y Assume t 1, t 2 such that t 1 [X] = t 2 [X] then t 1 [ Y ] = t 2 [ Y ] since Y X Hence X Y
14
Augmentation: if X Y, then XZ YZ Assume t 1, t 2 such that t 1 [ XZ ] = t 2 [ XZ] t 1 [Z]= t 2 [Z], since Z XZ ------ (1) t 1 [X]= t 2 [X], since X XZ t 1 [Y] = t 2 [Y], definition of X Y ------ (2) t 1 [YZ] = t 2 [ YZ ] from (1) and (2) Hence, XZ YZ
15
Transitivity: If X Y and Y Z, then X Z. Assume t 1, t 2 such that t 1 [X] = t 2 [X] Then t 1 [Y] = t 2 [Y], definition of X Y Hence, t 1 [Z] = t 2 [Z], definition of Y Z Therefore, X Z
16
Additional rules Sometimes, it is convenient to use some additional rules while reasoning about F +. These additional rules are not essential in the sense that their soundness can be proved using Armstrong’s Axioms. Union: if X Y and X Z, then X YZ. Decomposition: if X YZ, then X Y and X Z.
17
To show the correctness of the union rule: X Y and X Z, then X YZ ( union ) Proof: X Y … (1) ( given ) X Z … (2) ( given ) XX XY … (3) ( augmentation on (1) ) X XY … (4) ( simplify (3) ) XY ZY … (5) ( augmentation on (2) ) X ZY … (6) ( transitivity on (4) and (5) )
18
To show the correctness of the decomposition rule: if X YZ, then X Y and X Z (decomposition) Proof: X YZ … (1) ( given ) YZ Y … (2) ( reflexivity ) X Y … (3) ( transitivity on (1), (2) ) YZ Z … (4) ( reflexivity ) X Z … (5) ( transitivity on (1), (4) )
19
R= ( A, B, C ) F = {A B, B C } F + = {A A, B B, C C, AB AB, BC BC, AC AC, ABC ABC, AB A, AB B, BC B, BC C, AC A, AC C, ABC AB, ABC BC, ABC AC, ABC A, ABC B, ABC C, A B, … (1) ( given ) B C, … (2) ( given ) A C, … (3) ( transitivity on (1) and (2) ) AC BC, … (4) ( augmentation on (1) ) AC B,… (5) ( decomposition on (4) ) A AB,… (6) ( augmentation on (1) ) AB AC, AB C, B BC, A AC, AB BC, AB ABC, AC ABC, A BC, A ABC } Using reflexivity, we can generate all trivial dependencies Note that A, B, C, are attributes We refer to the set {A,B} simply as AB
20
Attribute Closure Computing the closure of a set of FDs can be expensive In many cases, we just want to check if a given FD X Y is in F +. X - a set of attributes F - a set of functional dependencies X + - closure of X under F set of attributes functionally determined by X under F.
21
Example: F= { A B, B C } A + = ABC ….. A X where X ABC B + = BC C + = C AB + = ABC
22
Algorithm to compute closure of attributes X + under F closure := X ; Repeat for each U V in F do begin if U closure then closure := closure V ; end Until (there is no change in closure)
23
R= ( A, B, C, G, H, I ) F= {A B, A C, CG H, CG I, B H } To compute AG + closure = AG closure = ABG ( A B ) closure = ABCG ( A C ) closure = ABCGH ( CG H ) closure = ABCGHI ( CG I ) Is AG a candidate key? AG R A + R ? G + R ?
24
How to Compute Meaning - Armstrong’s inference rules Rules of the computation: –reflexivity: if Y X, then X Y –Augmentation: if X Y, then WX WY –Transitivity: if X Y and Y Z, then X Z Derived rules: –Union: if X Y and X Z, the X YZ –Decomposition: if X YZ, then X Y and X Z –Pseudotransitivity: if X Y and WY Z, then XW Z Armstrong’s Axioms: –sound –complete
25
Example (continued) Answers: Inferred FD Which Rule did we apply ? 4. name, category name Trivial rule 5. name, category color Transitivity on 4, 1 6. name, category category Trivial rule 7. name, category color, category Split/combine on 5, 6 8. name, category price Transitivity on 3, 7 1. name color 2. category department 3. color, category price 1. name color 2. category department 3. color, category price
26
Another Example Enrollment(student, major, course, room, time) student major major, course room course time What else can we infer ?
27
Another Rule If then Augmentation follows from trivial rules and transitivity How ? A 1, A 2, …, A n B A 1, A 2, …, A n, C 1, C 2, …, C p B Augmentation
28
Problem: infer ALL FDs Given a set of FDs, infer all possible FDs How to proceed ? Try all possible FDs, apply all 3 rules –E.g. R(A, B, C, D): how many FDs are possible ? Drop trivial FDs, drop augmented FDs –Still way too many Better: use the Closure Algorithm (next)
29
Lossless Decomposition A decomposition is lossless if we can recover: R(A, B, C) Decompose R1(A, B) R2(A, C) Recover R ’ (A, B, C) Thus,R ’ = R
30
Closure of a set of Attributes Given a set of attributes A 1, …, A n The closure, {A 1, …, A n } +, is the set of attributes B s.t. A 1, …, A n B Given a set of attributes A 1, …, A n The closure, {A 1, …, A n } +, is the set of attributes B s.t. A 1, …, A n B name color category department color, category price name color category department color, category price Example: Closures: name + = {name, color} {name, category} + = {name, category, color, department, price} color + = {color}
31
Closure Algorithm Start with X={A1, …, An}. Repeat until X doesn’t change do: if B 1, …, B n C is a FD and B 1, …, B n are all in X then add C to X. Start with X={A1, …, An}. Repeat until X doesn’t change do: if B 1, …, B n C is a FD and B 1, …, B n are all in X then add C to X. {name, category} + = {name, category, color, department, price} name color category department color, category price name color category department color, category price Example:
32
Example Compute {A,B} + X = {A, B, } Compute {A, F} + X = {A, F, } R(A,B,C,D,E,F) A, B C A, D E B D A, F B A, B C A, D E B D A, F B
33
Using Closure to Infer ALL FDs A, B C A, D B B D A, B C A, D B B D Example: Step 1: Compute X +, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ = ABCD, AC+ = AC, AD+ = ABCD ABC+ = ABD+ = ACD + = ABCD (no need to compute– why ?) BCD + = BCD, ABCD+ = ABCD A+ = A, B+ = BD, C+ = C, D+ = D AB+ = ABCD, AC+ = AC, AD+ = ABCD ABC+ = ABD+ = ACD + = ABCD (no need to compute– why ?) BCD + = BCD, ABCD+ = ABCD Step 2: Enumerate all FD’s X Y, s.t. Y X + and X Y = : AB CD, AD BC, ABC D, ABD C, ACD B
34
Relation Decomposition Break the relation into two: NameSSNCity Fred123-45-6789Seattle Joe987-65-4321Westfield SSNPhoneNumber 123-45-6789206-555-1234 123-45-6789206-555-6543 987-65-4321908-555-2121 Anomalies have gone: No more repeated data Easy to move Fred to “Bellevue” (how ?) Easy to delete all Joe’s phone number (how ?) NameSSNPhoneNumberCity Fred123-45-6789206-555-1234Seattle Fred123-45-6789206-555-6543Seattle Joe987-65-4321908-555-2121Westfield
35
Relational Schema Design Person buys Product name pricenamessn Conceptual Model: Relational Model: plus FD’s Normalization: Eliminates anomalies Decompositions should always be lossless Lossless decomposition ensure that the information in the original relation can be accurately reconstructed based on the information represented in the decomposed relations.
36
Decompositions in General R 1 = projection of R on A 1,..., A n, B 1,..., B m R 2 = projection of R on A 1,..., A n, C 1,..., C p R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p )
37
Decomposition Sometimes it is correct: NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera NamePrice Gizmo19.99 OneClick24.99 Gizmo19.99 NameCategory GizmoGadget OneClickCamera GizmoCamera Lossless decomposition
38
Incorrect Decomposition Sometimes it is not: NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera NameCategory GizmoGadget OneClickCamera GizmoCamera PriceCategory 19.99Gadget 24.99Camera 19.99Camera What’s incorrect ?? Lossy decomposition Given instances of the decomposed relations, we may not be able to reconstruct the corresponding instance of the original relation – information loss
39
Decompositions in General R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) If A 1,..., A n B 1,..., B m Then the decomposition is lossless R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) Example: name price, hence the first decomposition is lossless Note: don’t need necessarily A 1,..., A n C 1,..., C p
40
Normal Forms First Normal Form = all attributes are atomic Second Normal Form (2NF) = old and obsolete Third Normal Form (3NF) = this lecture Boyce Codd Normal Form (BCNF) = this lecture Others...
41
Boyce-Codd Normal Form A simple condition for removing anomalies from relations: In English (though a bit vague): Whenever a set of attributes of R is determining another attribute, it should determine all the attributes of R. A relation R is in BCNF if: If A 1,..., A n B is a non-trivial dependency in R, then {A 1,..., A n } is a key for R A relation R is in BCNF if: If A 1,..., A n B is a non-trivial dependency in R, then {A 1,..., A n } is a key for R
42
BCNF Decomposition Algorithm A’s Others B’s R1R1 Is there a 2-attribute relation that is not in BCNF ? Repeat choose A 1, …, A m B 1, …, B n that violates the BNCF condition split R into R 1 (A 1, …, A m, B 1, …, B n ) and R 2 (A 1, …, A m, [others]) continue with both R 1 and R 2 Until no more violations R2R2
43
Example What are the dependencies? SSN Name, City What are the keys? {SSN, PhoneNumber} Is it in BCNF? NameSSNPhoneNumberCity Fred123-45-6789206-555-1234Seattle Fred123-45-6789206-555-6543Seattle Joe987-65-4321908-555-2121Westfield Joe987-65-4321908-555-1234Westfield
44
Decompose it into BCNF NameSSNCity Fred123-45-6789Seattle Joe987-65-4321Westfield SSNPhoneNumber 123-45-6789206-555-1234 123-45-6789206-555-6543 987-65-4321908-555-2121 987-65-4321908-555-1234 SSN Name, City Let’s check anomalies: Redundancy ? Update ? Delete ?
45
Summary of BCNF Decomposition Find a dependency that violates the BCNF condition: A’s Others B’s R1R2 Heuristics: choose B, B, … B “as large as possible” 12m Decompose: 2-attribute relations are BCNF Continue until there are no BCNF violations left. A 1, A 2, …, A n B 1, B 2, …, B m
46
Example Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN name, age age hairColor Decompose in BCNF (in class): Step 1: find all keys (How ? Compute S +, for various sets S) Step 2: now decompose
47
Other Example R(A,B,C,D) A B, B C Key: AD Violations of BCNF: A B, A C, A BC Pick A BC: split into R1(A,BC) R2(A,D) What happens if we pick A B first ?
48
Lossless Decompositions A decomposition is lossless if we can recover: R(A,B,C) R1(A,B) R2(A,C) R’(A,B,C) should be the same as R(A,B,C) R’ is in general larger than R. Must ensure R’ = R Decompose Recover
49
Lossless Decompositions Given R(A,B,C) s.t. A B, the decomposition into R1(A,B), R2(A,C) is lossless
50
3NF: A Problem with BCNF Unit Company Product Unit Company Unit Product FD’s: Unit Company; Company, Product Unit So, there is a BCNF violation, and we decompose. Unit Company No FDs Notice: we loose the FD: Company, Product Unit
51
So What’s the Problem? Unit Company Product Unit CompanyUnit Product Galaga99 UW Galaga99 databases Bingo UW Bingo databases No problem so far. All local FD’s are satisfied. Let’s put all the data back into a single table again (anomalies?): Galaga99 UW databases Bingo UW databases Violates the dependency: company, product -> unit!
52
Solution: 3rd Normal Form (3NF) A simple condition for removing anomalies from relations: A relation R is in 3rd normal form if : Whenever there is a nontrivial dependency A 1, A 2,..., A n B for R, then {A 1, A 2,..., A n } is a key for R, or B is part of a key. A relation R is in 3rd normal form if : Whenever there is a nontrivial dependency A 1, A 2,..., A n B for R, then {A 1, A 2,..., A n } is a key for R, or B is part of a key. Tradeoff: BCNF = no anomalies, but may lose some FDs 3NF = keeps all FDs, but may have some anomalies
53
How to Compute Meaning -the meaning of a set of FDs, F+ umbrella: a collapsible shade consisting of fabric stretched over hinged ribs radiating from a central pole Given the ribs of an umbrella, the FDs, what does the whole umbrella, F +, look like? Determine each set of attributes, X, that appears on a left-hand side of a FD. Determine the set, X +, the closure of X under F.
54
How to Compute Meaning when do sets of FDs mean the same? F covers E if every FD in E is also in F + F and E are equivalent if F covers E and E covers F. We can determine whether F covers E by calculating X + with respect to F for each FD, X Y in E, and then checking whether this X + includes the attributes in Y +. If this is the case for every FD in E, then F covers E. F E F+F+
55
How to Compute Meaning - minimal cover of a set of FDs Is there a minimal set of ribs that will hold the umbrella open? F is minimal if: every dependency in F has a single attribute as right-hand side we can’t replace any dependency X A in F with a dependency Y A where Y X and still have a set of dependencies equivalent with F we can’t remove any dependency from F and still have a set of dependencies equivalent with F
56
How to guarantee lossless joins Decompose relation, R, with functional dependencies, F, into relations, R 1 and R 2, with attributes, A 1 and A 2, and associated functional dependencies, F 1 and F 2. The decomposition is lossless iff: A 1 A 2 A 1 \A 2 is in F +, or A 1 A 2 A 2 \A 1 is in F + R 1 R 2 =R
57
Example R1 (A1, A2, A3, A5) R2 (A1, A3, A4) R3 (A4, A5) FD1: A1 A3 A5 FD2: A5 A1 A4 FD3: A3 A4 A2
58
Example (con ’ t) A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) b(1,4) a(5) R2 a(1) b(2,2) a(3) a(4) b(2,5) R3 b(3,1) b(3,2) b(3,3) a(4) a(5)
59
Example (con ’ t) By FD1: A1 A3 A5 A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) b(1,4) a(5) R2 a(1) b(2,2) a(3) a(4) b(2,5) R3 b(3,1) b(3,2) b(3,3) a(4) a(5)
60
Example (con ’ t) By FD1: A1 A3 A5 we have a new result table A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) b(1,4) a(5) R2 a(1) b(2,2) a(3) a(4) a(5) R3 b(3,1) b(3,2) b(3,3) a(4) a(5)
61
Example (con ’ t) By FD2: A5 A1 A4 A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) b(1,4) a(5) R2 a(1) b(2,2) a(3) a(4) a(5) R3 b(3,1) b(3,2) b(3,3) a(4) a(5)
62
Example (con ’ t) FD2: A5 A1 A4 we have a new result table A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) a(4) a(5) R2 a(1) b(2,2) a(3) a(4) a(5) R3 a(1) b(3,2) b(3,3) a(4) a(5)
63
How to guarantee preservation of FDs Decompose relation, R, with functional dependencies, F, into relations, R 1,..., R k, with associated functional dependencies, F 1,..., F k. The decomposition is dependency preserving iff: F + =(F 1 ... F k ) +
64
3NF that is not BCNF A B C Candidate keys:{A,B} and {A,C} Determinants:{A,B} and {C} A decomposition: Lossless, but not dependency preserving! A B C R C B R1R1 A C R2R2
65
Major Results in Normalization Theory Theorem: There is an algorithm for testing a decomposition for lossless join wrt. a set of FDs Theorem: There is an algorithm for testing a decomposition for dependency preservation Theorem: There is an algorithm for lossless join decomposition into BCNF Theorem: There is an algorithm for dependency preserving decomposition into 3NF
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.