CMSC424: Database Design Instructor: Amol Deshpande
Today… Relational Database Design
Where did we come up with the schema that we used ? E.g. why not store the actor names with movies ? Topics: Formal definition of what it means to be a “good” schema. How to achieve it.
Movies Database Schema Movie(title, year, length, inColor, studioName, producerC#) StarsIn(movieTitle, movieYear, starName) MovieStar(name, address, gender, birthdate) MovieExec(name, address, cert#, netWorth) Studio(name, address, presC#) Movie(title, year, length, inColor, studioName, producerC#, starName) MovieStar(name, address, gender, birthdate) MovieExec(name, address, cert#, netWorth) Studio(name, address, presC#) Changed to:
TitleYearLengthinColorStudioNameprodC#StarName Star wars YesFox128Hamill Star wars YesFox128Fisher Star wars YesFox128H. Ford King Kong YesUniversal150Watts King Kong noRKO20Fay Issues: 1.Redundancy higher storage, inconsistencies (“anomalies”) 2.Need nulls Unable to represent some information without using nulls How to store movies w/o actors (pre-productions etc) ? Movie(title, year, length, inColor, studioName, producerC#, starName)
TitleYearLengthinColorStudioNameprodC#StarNames Star wars YesFox128{Hamill, Fisher, H. ford} King Kong YesUniversal150Watts King Kong noRKO20Fay Issues: 3. Avoid sets - Hard to represent - Hard to query Movie(title, year, length, inColor, studioName, producerC#, starNames)
NameAddress FoxAddress1 Studio2Address1 UniversialAddress2 This process is also called “decomposition” Issues: 4. Requires more joins (w/o any obvious benefits) 5. Hard to check for some dependencies What if the “address” is actually the presC#’s address ? No easy way to ensure that constraint (w/o a join). Split Studio(name, address, presC#) into: Studio1 (name, presC#) Studio2(name, address)??? NamepresC# Fox101 Studio2101 Universial102 Smaller schemas always good ????
movieTitlestarName Star WarsHamill King KongWatts King KongFaye Issues: 6. “joining” them back results in more tuples than what we started with (King Kong, 1933, Watts) & (King Kong, 2005, Faye) This is a “lossy” decomposition We lost some constraints/information The previous example was a “lossless” decomposition. Decompose StarsIn(movieTitle, movieYear, starName) into: StarsIn1(movieTitle, movieYear) StarsIn2(movieTitle, starName) ??? movieTitlemovieYear Star wars1977 King Kong1933 King Kong2005 Smaller schemas always good ????
Desiredata No sets Correct and faithful to the original design Avoid lossy decompositions As little redundancy as possible To avoid potential anomalies No “inability to represent information” Nulls shouldn’t be required to store information Dependency preservation Should be possible to check for constraints
Approach We will encode and list all our knowledge about the schema somehow Functional dependencies (FDs) SSN name (SSN “implies” length) If two tuples have the same “SSN”, they must have the same “name” movietitle length --- Not true. But, (movietitle, movieYear) length --- True. We will define a set of rules that the schema must follow to be considered good “Normal forms”: 1NF, 2NF, 3NF, BCNF, 4NF, … Rules specify constraints on the schemas and FDs
Functional Dependencies Let R be a relation schema R and R The functional dependency holds on R iff for any legal relations r(R), whenever two tuples t 1 and t 2 of r have same values for , they have same values for . t 1 [ ] = t 2 [ ] t 1 [ ] = t 2 [ ] On this instance, A B does NOT hold, but B A does hold A B
Functional Dependencies Difference between holding on an instance and holding on all legal relation Title Year holds on this instance Is this a true functional dependency ? No. Two movies in different years can have the same name. Can’t draw conclusions based on a single instance TitleYearLengthinColorStudioNameprodC#StarName Star wars YesFox128Hamill Star wars YesFox128Fisher Star wars YesFox128H. Ford King Kong noRKO20Fay
Functional Dependencies Functional dependencies and keys: A key constraint is a specific form of a FD. E.g. if A is a superkey for R, then: A R Similarly for candidate keys and primary keys.
Definitions A relation instance r satisfies a set of functional dependencies, F, if the FDs hold on the relation F holds on a relation schema R if not legal (allowable) relation instance of R violates it A functional dependency, A B, is trivial if: B is a subset of A e.g. Movieyear, length length
Definitions 4. Given a set of functional dependencies, F, its closure, F +, is all FDs that are implied by FDs in F. e.g. If A B, and B C, then clearly A C We will see a formal method for inferring this later
BCNF Given a relation schema R, and a set of functional dependencies F, if every FD, A B, is either: Trivial A is a superkey of R Then, R is in BCNF (Boyce-Codd Normal Form) Why is BCNF good ?
BCNF What if the schema is not in BCNF ? Decompose (split) the schema into two pieces. Careful: you want the decomposition to be lossless Rule: A decomposition of R into (R1, R2) is lossless, iff: R1 ∩ R2 R1 or R1 ∩ R2 R2