Presentation is loading. Please wait.

Presentation is loading. Please wait.

Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee.

Similar presentations


Presentation on theme: "Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee."— Presentation transcript:

1 Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

2 Armstrong’s Axioms For computing the set of FDs that follow a given FD, the following rules called Armstrong’s axioms are useful: 1.Reflexivity: If B  A, then A  B 2.Augmentation: If A  B, then A  C  B  C Note also that if A  B, then A  C  B for any set of attributes C. 3.Transitivity: If A  B and B  C then A  C

3 Example of 3NF but not BCNF  R(A B C D)  1 2 1 2  1 3 2 1  2 1 1 1  1 3 2 2  2 1 3 2  Is this table BCNF?

4  FD graph of R is  A B C D  B  A  AC  B  *BD  A, C  *CD  A, B  Prime Attribute = key attributes are {B,C,D}  B  A and B is not a key, So R is not 3NF  R is not BCNF

5 Projecting FDs Given a relation R (A,B,C,D) and F(R) = {A  B, B  C, C  D}. Suppose S is projected from R as S(A,C,D). What is F(S). To compute F(S), start by computing the closures of all attributes in S. In R, A + = {A  B, A  C, A  D} In S, A + = {A  C, A  D} C + = {C  D} and D + = {D} Since A + contains all attributes of S, it is not required to compute (AC) +, (AD) + or (ACD) +.

6 Inference Rules for FD’s Is equivalent to Splitting rule and Combining rule A1...AmB1...Bm A 1, A 2, …, A n  B 1, B 2, …, B m A 1, A 2, …, A n  B 1 A 1, A 2, …, A n  B 2..... A 1, A 2, …, A n  B m A 1, A 2, …, A n  B 1 A 1, A 2, …, A n  B 2..... A 1, A 2, …, A n  B m

7 Inference Rules for FD’s (continued) Trivial Rule Why ? A1A1 … AmAm where i = 1, 2,..., n A 1, A 2, …, A n  A i

8 Inference Rules for FD’s (continued) Transitive Closure Rule If and then Why ? A 1, A 2, …, A n  B 1, B 2, …, B m B 1, B 2, …, B m  C 1, C 2, …, C p A 1, A 2, …, A n  C 1, C 2, …, C p

9 A1A1 … AmAm B1B1 … BmBm C1C1...CpCp

10 Example (continued) Start from the following FDs: Infer the following FDs: 1. name  color 2. category  department 3. color, category  price 1. name  color 2. category  department 3. color, category  price Inferred FD Which Rule did we apply ? 4. name, category  name 5. name, category  color 6. name, category  category 7. name, category  color, category 8. name, category  price

11 Another Rule If then Augmentation follows from trivial rules and transitivity How ? A 1, A 2, …, A n  B A 1, A 2, …, A n, C 1, C 2, …, C p  B Augmentation

12 Problem: infer ALL FDs Given a set of FDs, infer all possible FDs How to proceed ?  Try all possible FDs, apply all 3 rules E.g. R(A, B, C, D): how many FDs are possible ?  Drop trivial FDs, drop augmented FDs Still way too many  Better: use the Closure Algorithm (next)

13 Closure of a set of Attributes Given a set of attributes A 1, …, A n The closure, {A 1, …, A n } +, is the set of attributes B s.t. A 1, …, A n  B Given a set of attributes A 1, …, A n The closure, {A 1, …, A n } +, is the set of attributes B s.t. A 1, …, A n  B name  color category  department color, category  price name  color category  department color, category  price Example: Closures: name + = {name, color} {name, category} + = {name, category, color, department, price} color + = {color}

14 Closure Algorithm Start with X={A1, …, An}. Repeat until X doesn’t change do: if B 1, …, B n  C is a FD and B 1, …, B n are all in X then add C to X. Start with X={A1, …, An}. Repeat until X doesn’t change do: if B 1, …, B n  C is a FD and B 1, …, B n are all in X then add C to X. {name, category} + = {name, category, color, department, price} name  color category  department color, category  price name  color category  department color, category  price Example:

15 Example Compute {A,B} + X = {A, B, } Compute {A, F} + X = {A, F, } R(A,B,C,D,E,F) A, B  C A, D  E B  D A, F  B A, B  C A, D  E B  D A, F  B

16 Using Closure to Infer ALL FDs A, B  C A, D  B B  D A, B  C A, D  B B  D Example: Step 1: Compute X +, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ = ABCD, AC+ = AC, AD+ = ABCD ABC+ = ABD+ = ACD + = ABCD (no need to compute– why ?) BCD + = BCD, ABCD+ = ABCD A+ = A, B+ = BD, C+ = C, D+ = D AB+ = ABCD, AC+ = AC, AD+ = ABCD ABC+ = ABD+ = ACD + = ABCD (no need to compute– why ?) BCD + = BCD, ABCD+ = ABCD Step 2: Enumerate all FD’s X  Y, s.t. Y  X + and X  Y =  : AB  CD, AD  BC, ABC  D, ABD  C, ACD  B

17 Problem: Finding FDs  Approach 1: During Database Design Designer derives them from real-world knowledge of users Problem: knowledge might not be available  Approach 2: From a Database Instance Analyze given database instance and find all FD ’ s satisfied by that instance Useful if designers don ’ t get enough information from users Problem: FDs might be artifical for the given instance

18 Find All FDs StudentDeptCourseRoom AliceCSEC++020 BobCSEC++020 AliceEEHW040 CarolCSEDB045 DanCSEJava050 ElsaCSEDB045 FrankEECircuits020 Do all FDs make sense in practice ?

19 Answer Course  Dept, Room Dept, Room  Course Student, Dept  Course, Room Student, Course  Dept, Room Student, Room  Dept, Course Course  Dept, Room Dept, Room  Course Student, Dept  Course, Room Student, Course  Dept, Room Student, Room  Dept, Course Do all FDs make sense in practice ?

20 Keys  A key is a set of attributes A 1,..., A n s.t. for any other attribute B, we have A 1,..., A n  B  A minimal key is a set of attributes which is a key and for which no subset is a key  Note: book calls them superkey and key

21 Computing Keys  Compute X + for all sets X  If X + = all attributes, then X is a key  List only the minimal keys Note: there can be many minimal keys !  Example: R(A,B,C), AB  C, BC  A Minimal keys: AB and BC

22 Examples of Keys  Product(name, price, category, color) name, category  price category  color Keys are: {name, category} and all supersets  Enrollment(student, address, course, room, time) student  address room, time  course student, course  room, time Keys are:

23 Relational Schema Design (or Logical Schema Design) Main idea:  Start with some relational schema  Find out its FD ’ s  Use them to design a better relational schema

24 Data Anomalies When a database is poorly designed we get anomalies: Redundancy: data is repeated Update anomalies: need to change in several places Delete anomalies: may lose data when we don ’ t want

25 Relational Schema Design Anomalies: Redundancy = repeat data Update anomalies = Fred moves to “Bellevue” Deletion anomalies = Joe deletes his phone number: what is his city ? Example: Persons with several phones SSN  Name, City NameSSNPhoneNumbe r City Fred123-45-6789206-555- 1234 Seattle Fred123-45-6789206-555- 6543 Seattle Joe987-65-4321908-555- 2121 Westfield but not SSN  PhoneNumber

26 Relation Decomposition Break the relation into two: NameSSNCity Fred123-45-6789Seattle Joe987-65-4321Westfield SSNPhoneNumbe r 123-45-6789206-555- 1234 123-45-6789206-555- 6543 987-65-4321908-555- 2121 Anomalies have gone: No more repeated data Easy to move Fred to “Bellevue” (how ?) Easy to delete all Joe’s phone number (how ?) NameSSNPhoneNumberCity Fred123-45-6789206-555-1234Seattle Fred123-45-6789206-555-6543Seattle Joe987-65-4321908-555-2121Westfield

27 Relational Schema Design Person buys Product name pricenamessn Conceptual Model: Relational Model: plus FD’s Normalization: Eliminates anomalies

28 Decompositions in General R 1 = projection of R on A 1,..., A n, B 1,..., B m R 2 = projection of R on A 1,..., A n, C 1,..., C p R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p )

29 Decomposition  Sometimes it is correct: NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera NamePrice Gizmo19.99 OneClick24.99 Gizmo19.99 NameCategory GizmoGadget OneClickCamera GizmoCamera Lossless decomposition

30 Incorrect Decomposition  Sometimes it is not: NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera NameCategory GizmoGadget OneClickCamera GizmoCamera PriceCategory 19.99Gadget 24.99Camera 19.99Camera What’s incorrect ?? Lossy decomposition

31 Decompositions in General R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) If A 1,..., A n  B 1,..., B m Then the decomposition is lossless R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) Example: name  price, hence the first decomposition is lossless Note: don’t need necessarily A 1,..., A n  C 1,..., C p

32 Normal Forms First Normal Form = all attributes are atomic Second Normal Form (2NF) = old and obsolete Third Normal Form (3NF) = this lecture Boyce Codd Normal Form (BCNF) = this lecture Others...

33

34 R (J, K, L) F = (JK  L, L  K) Two candidate keys: JK and JL R is in 3NF  JK  LJK is a superkey  L  KK is prime BCNF decomposition yields:  R1 (L,K), R2 (L,J)  testing for JK  L requires a join There is some redundancy in R

35 Boyce-Codd Normal Form A simple condition for removing anomalies from relations: In English (though a bit vague): Whenever a set of attributes of R is determining another attribute, it should determine all the attributes of R. A relation R is in BCNF if: If A 1,..., A n  B is a non-trivial dependency in R, then {A 1,..., A n } is a key for R A relation R is in BCNF if: If A 1,..., A n  B is a non-trivial dependency in R, then {A 1,..., A n } is a key for R

36

37

38 BCNF Decomposition Algorithm A’s Others B’s R1R1 Is there a 2-attribute relation that is not in BCNF ? Repeat choose A 1, …, A m  B 1, …, B n that violates the BNCF condition split R into R 1 (A 1, …, A m, B 1, …, B n ) and R 2 (A 1, …, A m, [others]) continue with both R 1 and R 2 Until no more violations R2R2

39 Example What are the dependencies? SSN  Name, City What are the keys? {SSN, PhoneNumber} Is it in BCNF? NameSSNPhoneNumbe r City Fred123-45-6789206-555- 1234 Seattle Fred123-45-6789206-555- 6543 Seattle Joe987-65-4321908-555- 2121 Westfield Joe987-65-4321908-555- 1234 Westfield

40 Decompose it into BCNF NameSSNCity Fred123-45-6789Seattle Joe987-65-4321Westfield SSNPhoneNumbe r 123-45-6789206-555- 1234 123-45-6789206-555- 6543 987-65-4321908-555- 2121 987-65-4321908-555- 1234 SSN  Name, City Let’s check anomalies: Redundancy ? Update ? Delete ?

41 Summary of BCNF Decomposition Find a dependency that violates the BCNF condition: A’s Others B’s R1R2 Heuristics: choose B, B, … B “as large as possible” 12m Decompose: 2-attribute relations are BCNF Continue until there are no BCNF violations left. A 1, A 2, …, A n  B 1, B 2, …, B m

42 Example Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN  name, age age  hairColor Decompose in BCNF (in class): Step 1: find all keys (How ? Compute S +, for various sets S) Step 2: now decompose

43 Other Example  R(A,B,C,D) A  B, B  C  Key: AD  Violations of BCNF: A  B, A  C, A  BC  Pick A  BC: split into R1(A,BC) R2(A,D)  What happens if we pick A  B first ?

44 Lossless Decompositions A decomposition is lossless if we can recover: R(A,B,C) R1(A,B) R2(A,C) R ’ (A,B,C) should be the same as R(A,B,C) R’ is in general larger than R. Must ensure R’ = R Decompose Recover

45 Lossless Decompositions  Given R(A,B,C) s.t. A  B, the decomposition into R1(A,B), R2(A,C) is lossless

46 3NF: A Problem with BCNF Unit Company Product Unit Company Unit Product FD’s: Unit  Company; Company, Product  Unit So, there is a BCNF violation, and we decompose. Unit  Company No FDs Notice: we loose the FD: Company, Product  Unit

47 So What’s the Problem? Unit Company Product Unit CompanyUnit Product Galaga99 UW Galaga99 databases Bingo UW Bingo databases No problem so far. All local FD’s are satisfied. Let’s put all the data back into a single table again (anomalies?): Galaga99 UW databases Bingo UW databases Violates the dependency: company, product -> unit!

48 Solution: 3rd Normal Form (3NF) A simple condition for removing anomalies from relations: A relation R is in 3rd normal form if : Whenever there is a nontrivial dependency A 1, A 2,..., A n  B for R, then {A 1, A 2,..., A n } is a key for R, or B is part of a key. A relation R is in 3rd normal form if : Whenever there is a nontrivial dependency A 1, A 2,..., A n  B for R, then {A 1, A 2,..., A n } is a key for R, or B is part of a key. Tradeoff: BCNF = no anomalies, but may lose some FDs 3NF = keeps all FDs, but may have some anomalies

49 Purpose of Normalization  To reduce the chances for anomalies to occur in a database.  normalization prevents the possible corruption of databases stemming from what are called “ insertion anomalies," "deletion anomalies," and "update anomalies."

50 Insertion Anomaly  A failure to place a new database entry into all the places in the database where that new entry needs to be stored.  In a properly normalized database, a new entry needs to be inserted into only one place in the database

51 Deletion Anomaly  A failure to remove an existing database entry when it is time to remove that entry.  In a properly normalized database, an old, to-be-gotten- rid-of entry needs to be deleted from only one place in the database

52 Update anomaly  An update of a database involves modifications that may be additions, deletions, or both. Thus "update anomalies" can be either of the kinds of anomalies discussed above.

53

54

55

56

57

58

59

60

61

62

63

64


Download ppt "Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee."

Similar presentations


Ads by Google