Presentation is loading. Please wait.

Presentation is loading. Please wait.

Schema Normalization, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 11, 2005 Some slide content.

Similar presentations


Presentation on theme: "Schema Normalization, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 11, 2005 Some slide content."— Presentation transcript:

1 Schema Normalization, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 11, 2005 Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

2 2 Announcements  Decide on 3-person project groups by 1 week from Thursday (10/20)  Homework 2 answers posted on Web  Homework 3 due Thursday  No class next Tuesday (Fall Break)  Midterm: Thursday 10/20

3 3 Not All Designs are Equally Good Why is this a poor schema design? And why is this one better? Stuff(sid, name, serno, subj, cid, exp-grade) Student(sid, name) Course(serno, cid) Subject(cid, subj) Takes(sid, serno, exp-grade)

4 4 Functional Dependencies Describe “Key-Like” Relationships A key is a set of attributes where: If keys match, then the tuples match A functional dependency (FD) is a generalization: If an attribute set determines another, written X ! Y then if two tuples agree on attribute set X, they must agree on X: sid ! name What other FDs are there in this data?  FDs are independent of our schema design choice

5 5 Formal Definition of FD’s Def. Given a relation schema R and subsets X, Y of R: An instance r of R satisfies FD X  Y if, for any two tuples t1, t2 2 r, t1[X ] = t2[X] implies t1[Y] = t2[Y]  For an FD to hold for schema R, it must hold for every possible instance of r (Can a DBMS verify this? Can we determine this by looking at an instance?)

6 6 General Thoughts on Good Schemas We want all attributes in every tuple to be determined by the tuple’s key attributes, i.e. part of a superkey (for key X  Y, a superkey is a “non-minimal” X) What does this say about redundancy? But:  What about tuples that don’t have keys (other than the entire value)?  What about the fact that every attribute determines itself?

7 7 Armstrong’s Axioms: Inferring FDs Some FDs exist due to others; can compute using Armstrong’s axioms:  Reflexivity: If Y  X then X  Y (trivial dependencies) name, sid  name  Augmentation: If X  Y then XW  YW serno  subj so serno, exp-grade  subj, exp-grade  Transitivity: If X  Y and Y  Z then X  Z serno  cid and cid  subj so serno  subj

8 8 Armstrong’s Axioms Lead to…  Union: If X  Y and X  Z then X  YZ  Pseudotransitivity: If X  Y and WY  Z then XW  Z  Decomposition: If X  Y and Z  Y then X  Z Let’s prove a few of these from Armstrong’s Axioms

9 9 Closure of a Set of FD’s Defn. Let F be a set of FD’s. Its closure, F +, is the set of all FD’s: {X  Y | X  Y is derivable from F by Armstrong’s Axioms} Which of the following are in the closure of our Student-Course FD’s? name  name cid  subj serno  subj cid, sid  subj cid  sid

10 10 Attribute Closures: Is Something Dependent on X? Defn. The closure of an attribute set X, X +, is: X + =  {Y | X  Y  F + }  This answers the question “is Y determined (transitively) by X?”; compute X + by:  Does sid, serno  subj, exp-grade ? closure := X; repeat until no change { if there is an FD U  V in F such that U is in closure then add V to closure}

11 11 Equivalence of FD sets Defn. Two sets of FD’s, F and G, are equivalent if their closures are equivalent, F + = G + e.g., these two sets are equivalent: { XY  Z, X  Y } and { X  Z, X  Y }  F + contains a huge number of FD’s (exponential in the size of the schema)  Would like to have smallest “representative” FD set

12 12 Minimal Cover Defn. A FD set F is minimal if: 1. Every FD in F is of the form X  A, where A is a single attribute 2. For no X  A in F is: F – {X  A } equivalent to F 3. For no X  A in F and Z  X is: F – {X  A }  {Z  A } equivalent to F Defn. F is a minimum cover for G if F is minimal and is equivalent to G. e.g., {X  Z, X  Y} is a minimal cover for {XY  Z, X  Z, X  Y} in a sense, each FD is “essential” to the cover we express each FD in simplest form

13 13 More on Closures If F is a set of FD’s and X  Y  F + then for some attribute A  Y, X  A  F + Proof by counterexample. Assume otherwise and let Y = {A 1,..., A n } Since we assume X  A 1,..., X  A n are in F + then X  A 1... A n is in F + by union rule, hence, X  Y is in F + which is a contradiction

14 14 Why Armstrong’s Axioms? Why are Armstrong’s axioms (or an equivalent rule set) appropriate for FD’s? They are:  Consistent: any relation satisfying FD’s in F will satisfy those in F +  Complete: if an FD X  Y cannot be derived by Armstrong’s axioms from F, then there exists some relational instance satisfying F but not X  Y  In other words, Armstrong’s axioms derive all the FD’s that should hold  What is the goal of using these axioms?

15 15 Decomposition Consider our original “bad” attribute set We could decompose it into: But this decomposition loses information about the relationship between students and courses. Why? Stuff(sid, name, serno, subj, cid, exp-grade) Student(sid, name) Course(serno, cid) Subject(cid, subj)

16 16 Lossless Join Decomposition R 1, … R k is a lossless join decomposition of R w.r.t. an FD set F if for every instance r of R that satisfies F,  R 1 (r) ⋈... ⋈  R k (r) = r Consider: What if we decompose on (sid, name) and (serno, subj, cid, exp-grade)? sidnamesernosubjcidexp-grade 1Sam570103AI570B 23Nitin550103DB550A

17 17 Testing for Lossless Join R 1, R 2 is a lossless join decomposition of R with respect to F iff at least one of the following dependencies is in F+ (R 1  R 2 )  R 1 – R 2 (R 1  R 2 )  R 2 – R 1 So for the FD set: sid  name serno  cid, exp-grade cid  subj Is (sid, name) and (serno, subj, cid, exp-grade) a lossless decomposition?

18 18 Dependency Preservation Ensures we can check whether a FD X  Y is violated during DB updates, without using a join:  F Z, the projection of FD set F onto attribute set Z, is: {X  Y | X  Y  F +, X  Y  Z} i.e., it is those FDs only applicable to Z’s attributes  A decomposition R 1, …, R k is dependency preserving if F + = (F R 1 ...  F R k ) + (note we need an extra closure!) We don’t lose the ability to test the “cover” of our FDs in a single table, just because we decompose

19 19 Example 1 For Schema R(sid, name, serno, cid, subj, exp-grade) and FD set: sid  nameserno  cid cid  subjsid, serno  exp-grade Is R 1 (sid, name) and R 2 (serno, subj, cid, exp-grade):  A lossless decomposition?  Is it dependency-preserving? How about R1(sid, name) and R2(sid, serno, subj, cid, exp-grade)?

20 20 Example 2 Given schema R(name, street, city, st, zip, item, price), FD setname  street, citystreet, city  st street, city  zipname, item  price and decomposition R 1 (name, street, city, st, zip) and R 2 (name, item, price)  Is it lossless?  Is it dependency preserving? What if we replaced the first FD with name, street  city?

21 21 A More Disturbing Example… Given schema R(sid, fid, subj) and FD set: fid  subjsid, subj  fid Consider the decomposition R 1 (sid, fid) and R 2 (fid, subj)  Is it lossless?  Is it dependency preserving?  If it isn’t, can you think of a decomposition that is? Can you do this non-redundantly?

22 22 Redundancy vs. FDs Ideally, we want a design s.t. for each nontrivial dependency X  Y, X is a superkey for some relation schema in R  We just saw that this isn’t always possible in a non- redundant way… Thus we have two kinds of normal forms, Boyce-Codd and Third Normal Form

23 23 Two Important Normal Forms Boyce-Codd Normal Form (BCNF). For every relation scheme R and for every X  A that holds over R, either A  X (it is trivial),or or X is a superkey for R Third Normal Form (3NF). For every relation scheme R and for every X  A that holds over R, either A  X (it is trivial), or X is a superkey for R, or A is a member of some key for R

24 24 Normal Forms Compared BCNF is preferable, but sometimes in conflict with the goal of dependency preservation  It’s strictly stronger than 3NF Let’s see algorithms to obtain:  A BCNF lossless join decomposition (nondeterministic)  A 3NF lossless join, dependency preserving decomposition

25 25 BCNF Decomposition Algorithm (from Korth et al.; our book gives a recursive version) result := {R} compute F+ while there is a relation schema R i in result that isn’t in BCNF { let A  B be a nontrivial FD on R i s.t. A  R i is not in F+ and A and B are disjoint result:= (result – R i )  {(R i - B), (A,B)} }

26 26 3NF Decomposition Algorithm Let F be a minimal cover i:=0 for each FD A  B in F { if none of the schemas R j, 1  j  i, contains AB { increment i R i := (A, B) } if no schema R j, 1  j  i contains a candidate key for R { increment i R i := any candidate key for R } return (R 1, …, R i ) Build dep.- preserving decomp. Ensure lossless decomp.

27 27 Summary of Normalization  We can always decompose into 3NF and get:  Lossless join  Dependency preservation  But with BCNF we are only guaranteed lossless joins  BCNF is stronger than 3NF: every BCNF schema is also in 3NF  The BCNF algorithm is nondeterministic, so there is not a unique decomposition for a given schema R

28 28 XML: A Semi-Structured Data Model

29 29 Why XML? XML is the confluence of several factors:  The Web needed a more declarative format for data  Documents needed a mechanism for extended tags  Database people needed a more flexible interchange format  “Lingua franca” of data  It’s parsable even if we don’t know what it means! Original expectation:  The whole web would go to XML instead of HTML Today’s reality:  Not so… But XML is used all over “under the covers”

30 30 Why DB People Like XML Can get data from all sorts of sources  Allows us to touch data we don’t own!  This was actually a huge change in the DB community Interesting relationships with DB techniques  Useful to do relational-style operations  Leverages ideas from object-oriented, semistructured data Blends schema and data into one format  Unlike relational model, where we need schema first  … But too little schema can be a drawback, too!

31 31 XML Anatomy Kurt P. Brown PRPL: A Database Workload Specification Language 1992 Univ. of Wisconsin-Madison Paul R. McJones The 1995 SQL Reunion Digital System Research Center Report SRC1997-018 1997 db/labs/dec/SRC1997-018.html http://www.mcjones.org/System_R/SQL_Reunion_95/ Processing Instr. Element Attribute Close-tag Open-tag

32 32 Well-Formed XML A legal XML document – fully parsable by an XML parser  All open-tags have matching close-tags (unlike so many HTML documents!), or a special: shortcut for empty tags (equivalent to  Attributes (which are unordered, in contrast to elements) only appear once in an element  There’s a single root element  XML is case-sensitive


Download ppt "Schema Normalization, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 11, 2005 Some slide content."

Similar presentations


Ads by Google