Presentation is loading. Please wait.

Presentation is loading. Please wait.

Schema Refinement and Normal Forms

Similar presentations


Presentation on theme: "Schema Refinement and Normal Forms"— Presentation transcript:

1 Schema Refinement and Normal Forms
ENGI3675 Database Systems

2 Overview Cow textbook chapter 19
ER models give a good overview of the DB structure and design Then translated into Conceptual Schema and implemented with SQL commands However, they do not fully account for Integrity Constraints (IC) Taking them into account will allow us to refine the conceptual schema, and create better DB Does so by eliminating redundancy in DB

3 The Evils of Redundancy
Redundancy: storing the same data at several placed and/or under several forms It’s at the root of several problems associated with relational schemas Redundant storage, which at the very least is a waste of disk space and I/O operations Update anomalies: unless all copies of the data are updated at once, there will be discrepancies Insertion anomalies: inserting some information may require inserting other, unrelated information Deletion anomalies: deleting information may cause the loss of other, unrelated information 14

4 The Evils of Redundancy
Salary function of rating: Storage: We store three instances of 8-100,000 relationship, and two of 5-70,000 Update: If we increase salary without changing rating, the DB becomes inconsistent Insert: we cannot add a rated employee without knowing the salary for that rating Delete: if we delete all employees rated 8, we lose the salary for that rating sin name rating salary Bob 8 100,000 Dave Lisa 5 70,000 Jimmy

5 The Evils of Redundancy
Problem comes from using a schema that forces an association between attributes Integrity constraints, in particular functional dependencies, can be used to identify schemas with such problems Solution is decomposition, replacing a large redundant schema by a set of smaller, non-redundant schemas sin name rating Bob 8 Dave Lisa 5 Jimmy rating salary 5 70,000 8 100,000 14

6 Functional Dependencies
A functional dependency (FD) is a kind of integrity constraint defined as: Given a table T with (sets of) attributes X and Y There is an FD X → Y if, for all pairs of tuples (t1, t2) in T, if t1.X = t2.X then t1.Y = t2.Y For all pairs of tuples, if the X values agree, then the Y values must also agree Must hold for all possible legal instances of the relation schema Must be identified based on semantics of the real-world relation being modelled (represent real-world relationships) An FD implies redundancy If we know the value of X, we automatically know the value of Y Storing both X and Y is redundant 15

7 Functional Dependencies
rating → salary salary → rating rating → salary No FD name rating salary Bob 8 100,000 Dave Lisa 5 70,000 Jimmy name rating salary Bob 8 100,000 Dave Lisa 5 70,000 6 Jimmy name rating salary Bob 8 100,000 Dave 90,000 5 70,000 Jimmy {dept, rating} → salary rating → {salary, benefits} name dept rating salary Bob HR 8 90,000 Dave IT 100,000 Lisa 5 60,000 70,000 Jimmy name rating salary benefits Bob 8 100,000 full Dave Lisa 5 70,000 dental Jimmy 15

8 Functional Dependencies
A key constraint is a special case of FD Set of attributes X, any and all sets of attributes Y; X → Y holds: X is a key V  X, V → Y holds: X is a superkey Set of attributes that includes a key But an FD is not necessarily a key sin name rating salary Bob 8 100,000 Dave Lisa 5 70,000 Jimmy sin → salary (key constraint) {sin, name} → salary (superkey constraint) rating → salary (FD but not key constraint) 15

9 Functional Dependencies
Given some FDs, we can infer additional FDs X → Y, Y → Z, therefore X → Z A rating value determines salary, and salary determines benefits, therefore rating determines benefits X → Y, therefore XZ → YZ X  Y, therefore X → Y X → YZ, therefore X → Y and X → Z, and vice-versa An FD f is implied by a set of FDs F if f holds on every and all instances where F holds The set of all FDs implied by F is the closure set of F, noted F+

10 Closure Set Example Example: Closure set
Table is {rating R, salary S, benefits B} R → S, and S → B Closure set Trivial FDs (from definition of table) R → R, S → S, B → B, RS → RS, RS → R, RS → S, RB → RB, RB → R, RB → B, SB→ SB, SB → S, SB → B, RSB → RSB, RSB → R, RSB → S, RSB → B, RSB → RS, RSB → RB, RSB → SB Transitive FDs (from R → S and S → B) R → B Reflective FDs (from R → S, S → B, and R → B) RB → SB, SR → BR, RS → BS

11 Decomposition We can detect redundancy by detecting FDs
We can eliminate redundancy by decomposition, replacing a large redundant schema by a set of smaller, non-redundant schemas A decomposition of a relation schema R consists of replacing R by two or more relations such that: Each new relation schema contains a subset of the attributes of R Every attribute of R appears as an attribute of at least one of the new relations No attributes not found in R appear in the new relations But decomposition can cause problems May not be possible to recover the original tuples May not be possible to recover the original dependencies Some queries may become more expensive

12 Lossless-Join Decompositions
R is decomposed into R1 and R2 If we can join R1 and R2 to recover R exactly, then it is a lossless-join decomposition No tuples in R disappear, no new tuples appear from join 29

13 Lossless-Join Decompositions
sin name rating salary Bob 8 100,000 Dave Lisa 5 70,000 Jimmy sin name rating Bob 8 Dave Lisa 5 Jimmy name salary Bob 100,000 Dave Lisa 70,000 Jimmy sin name rating salary Bob 8 100,000 Dave Lisa 5 70,000 Jimmy Join on name = name

14 Dependency-Preserving Decompositions
R is decomposed into R1 and R2 If we can discover and enforce all the FDs of R using R1 and R2, then it is a dependency- preserving decomposition We can discover and enforce all the FDs on R using R1 and R2 We can enforce all FDs during insert/update using one of the new relations

15 Dependency-Preserving Decompositions
dept rating salary HR 8 100,000 PR 7 70,000 IT 5 50,000 dept rating HR 8 PR 7 IT 5 rating salary 8 100,000 7 70,000 5 50,000 FR1: dept → rating rating → dept FR2: rating → salary salary → rating dept → rating rating → salary salary → dept F: F+: (FR1  FR2) +: dept → rating rating → salary salary → dept dept → salary rating → dept salary → rating dept → rating rating → salary salary → dept dept → salary rating → dept salary → rating

16 Dependency-Preserving Decompositions
sin name rating salary Bob 8 100,000 Dave Lisa 5 70,000 Jimmy sin name rating Bob 8 Dave Lisa 5 Jimmy name salary Bob 100,000 Dave Lisa 70,000 Jimmy rating → salary No Dependencies No Dependencies Not dependency preserving No way to link rating and salary

17 Efficient Decompositions
Decomposition breaks a relation into several relations Queries might now be selecting attributes from multiple relations Requires join Required projection in original relation Increases computational cost of queries Need to study workload of DB Do not split attributes often queried together Even if it adds redundancy

18 Efficient Decompositions
sin name rating Bob 8 Dave Lisa 5 Jimmy sin name rating salary Bob 8 100,000 Dave Lisa 5 70,000 Jimmy rating salary 5 70,000 8 100,000 Lossless-join, dependency-preserving decomposition that eliminates redundancy If workload includes a lot of {name, salary} queries Decomposition makes system less efficient

19 Properties of Decomposition
A decomposition can be lossless-join without being dependency-preserving, and vice-versa rating → salary rating → salary name rating salary Bob 8 100,000 Dave Lisa 5 70,000 Jimmy name dept Bob HR Dave IT Lisa Jimmy name dept rating salary Bob HR 8 100,000 Dave IT Lisa 5 70,000 Jimmy Dependency-preserving, but not lossless join Creates tuples (Bob, IT, 8, ) and (Bob, HR, 5, 70000)

20 Properties of Decomposition
A decomposition can be lossless-join without being dependency-preserving, and vice-versa {rating, dept} → salary {city, interests} → dept {city, interests} → dept rating dept salary city interests 8 HR 80,000 TO Admin 5 50,000 IT 90,000 TB Tech dept city interests HR TO Admin IT TB Tech rating salary city 5 50,000 TO 8 80,000 90,000 TB Lossless-join but not dependency preserving Lost {rating, dept} → salary Nothing stopping us from inserting (8, 90000, TO)

21 Normal Forms We want to decompose relations into smaller non-redundant ones But we saw that decompositions can lead to problems How to avoid them? Decompose towards normal forms If a relation is in a given normal form, it is known that certain kinds of problems are avoided/minimized This can be used to guide decomposition, and help decide whether decomposition is needed This is called normalization 22

22 Normal Forms Many possible normal forms
First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form Fourth Normal Form Fifth Normal Form Each is more restrictive on what FDs are allowed Eliminates more redundancy Creates simples DB schema May not always be possible 22

23 First Normal Form Domain of values of attributes contains only atomic values Lists and sets of values not allowed Each attribute can only have a single value This has been the case in our course so far 22

24 First Normal Form Employee table violates 1NF
Dept allows list of values for employees working in multiple departments sin name dept Bob HR Dave IT Lisa Admin Jimmy 22

25 First Normal Form Employee table is 1NF Loses primary key
Employee table is 1NF with primary key sin WorksIn table is 1NF with primary key (sin, dept) Join needed to recover tuples sin name dept Bob HR Dave IT Lisa Admin Jimmy sin name Bob Dave Lisa Jimmy sin dept HR IT Admin 22

26 Second Normal Form No partial dependencies Table is 1NF
Attributes that are dependent on part of a key X → Z where pair (X, Y) is a key name dept city Bob HR TO Lisa Admin Dave IT TB Jimmy Table is 1NF Key is pair (name, dept) Partial dependency: dept → city Violates 2NF Notice update anomaly danger: updating Bob’s location to TB would cause queries “where is HR dept” to give conflicting results!

27 Second Normal Form Solve by decomposing to isolate partial dependency
X → Z becomes its own table with X as key Z removed from first table, X remains name dept Bob HR Lisa Admin Dave IT Jimmy dept city HR TO Admin IT TB Notice update anomaly is impossible: we can only update dept cities, not employees

28 Third Normal Form (3NF) A relation schema R is in 3NF if, for all FDs X → A A  X (it’s a trivial FD) X is a superkey A is part of a key of R Recall: a key is a minimal set of attributes; A is not part of a superkey Finding all keys of R is a problem: need to study relation to discover all candidate keys

29 Boyce-Codd Normal Form (BCNF)
A relation schema R is in BCNF if, for all FDs X → A A  X (it’s a trivial FD) X is a superkey More restrictive than 3NF No redundancy in R can be predicted using FDs alone For two tuples with the same X, either A is part of X and therefore already known Or X is a key, therefore there cannot be two tuples with the same X

30 3NF and BCNF Two keys: (city, salary) → city (sin, city) → salary
sin (primary key) (name, dept) (city, salary) → city X → A where A  X Trivial FD (sin, city) → salary X → A where X is a superkey Key (sin) functionally defines everything city → dept X → A where A is part of a key OK for 3NF but not BCNF sin name dept city salary Bob HR TO 50,000 Dave 60,000 Lisa NY IT TB Jimmy 70,000

31 Normalization into BCNF
Given a relation schema R with FD set F, which is not BCNF BCNF violated by FD X → A Therefore, A  X and X not a superkey Decompose into R1 = R – A, & R2 = XA R1 does not have A: no violating FD R2 has X as key: FD not violating BCNF and dependency- preserving If either R1 or R2 is not BCNF, repeat dept → city dept → city ID dept city 1 HR TO 2 IT TB 3 ID dept 1 HR 2 IT 3 dept city HR TO IT TB

32 Normalization into BCNF
May not be dependency-preserving {city, interests} → dept dept city interests HR TO Admin IT TB Tech {rating, dept} → salary {city, interests} → dept rating dept salary city interests 8 HR 80,000 TO Admin 5 50,000 IT 90,000 TB Tech No Dependencies rating salary city interests 8 80,000 TO Admin 5 50,000 90,000 TB Tech Problem won’t occur if we start with {rating, dept} → salary instead The order in which we pick the dependencies and decompositions matters!

33 Normalization into BCNF
May not be possible {rating, dept} → city city → dept Option 1: {rating, dept} → city R1: {rating, dept} , R2: {rating, dept, city} R2 = R rating dept city 5 HR TO IT TB 8 NY LA Option 2: city → dept R1: {rating, city} , R2: {dept, city} Lost a dependency

34 Normalization into BCNF
Requires backtracking tree-searching algorithm Root: R with list of all BCNF-violating dependencies Pick a BCNF-violating dependency and decompose tables If decomposition not dependency-preserving or leading to BCNF tables, backtrack and try different dependency End when All tables are BCNF (success) Tree has been completely searched (impossible)

35 Normalization into 3NF Notice: BCNF algo is the same as 2NF
Works for 3NF as well With the same flaw: decomposition does not guaranteed dependency preservation Simply flag 2NF and 3NF decompositions, return if search fails to find BCNF

36 Practice Exercise: 19.7

37 Practice Exercise: 19.7

38 Practice Exercise: 19.7

39 Other Dependencies FDs are the most common type of dependency
Easiest and most useful to guide schema refinement Others include Multivalued Dependencies Join Dependencies

40 Multivalued Dependencies
This schema is BCNF There is redundancy Each physics textbook is stored twice Each physics prof is stored twice Each math prof is stored thrice Problem is that prof & textbook are independent And that’s not an FD Multivalued dependency Course → → Prof Since prof and book are independent, they should not be in a single ternary relationship Course Prof Textbook Physics Green Mechanics Optics Brown Math Vectors Geometry

41 Multivalued Dependencies
“if there is a tuple t1 showing a course X being taught by prof Y, and a tuple t2 showing a course X using textbook Z, then there is a tuple t3 showing course X being taught by prof Y using textbook Z” X → → Y Each value in X is associated to a set of values in Y, independently of other attributes The MVD X → → Y over relation R holds if, for all pairs of tuples (t1, t2) and given Z = R – XY: if t1.X = t2.X, then there exits a t3 such as t1.XY = t3.XY and t2.Z = t3.Z Course Prof Textbook t1 Physics Green Mechanics t3 Optics Brown t2 Math Vectors Geometry

42 Fourth Normal Form More strict version of BCNF
Recall, in BCNF, for all FDs X → A in relation R A  X (it’s a trivial FD) X is a superkey We add: for all MVDs X → → Y Y  X or XY = R (it’s a trivial MVD)

43 Fourth Normal Form Not in 4NF
Course → → Prof non-trivial and Course not a key Decomposing in CourseProf and CourseTextbook Both are 4NF Course Prof Textbook Physics Green Mechanics Optics Brown Math Vectors Geometry Course Prof Physics Green Brown Math Course Textbook Physics Mechanics Optics Math Vectors Geometry

44 Join Dependencies If a decomposition of R into R1, …, Rn is a lossless-join, then join dependency holds Generalization of MVDs X → → Y in R can be expressed as Example: CP & CT are a lossless-join decomposition of CPT C → → P Course Textbook Physics Mechanics Optics Math Vectors Geometry Course Prof Physics Green Brown Math

45 Fifth Normal Form More strict version of 4NF Add that for all JD
Ri = R for some i The entire relation is one of the decompositions It’s a trivial JD JD is obtained from a decomposition using FDs where the left side is a key of R Join is done on key attribute

46 Schema Refinement Important step in a DB design is building an ER model Complex & subjective process: different ER models are possible Constraints and dependencies are not clearly expressed in ER models Normalisation helps improve the ER model Higher NF usually leads to better DB designs

47 Constraints on Entity Set
Employees ssn name salary lot rating Two FDs {ssn} → {ssn, name, lot, rating, salary} {rating} → {salary} But the second one doesn’t appear in the ER model!

48 Constraints on Relation Set
sid did Departments Suppliers Contract cid Parts quantity pid Contract C says that department D buys a quantity Q of part P from supplier S Company policy: a department can only get one part from any given supplier Dependency DS → P Split in DSP & CQSD Not an intuitive relation in the ER model

49 Schema Refinement Each employee is assigned to a parking lot
dname budget did since name Works_In Departments Employees ssn Each employee is assigned to a parking lot Suppose parking lots are assigned by departments did → lot Redundancy problems! Redundant storage Update anomalies Insertion anomalies Deletion anomalies

50 Schema Refinement Solution: decompose {did, lot} into separate relation Notice: this relation has the same key as Departments We can further refine by putting lot as attribute of Departments

51 Schema Refinement Solves redundancy issues
lot dname budget did since name Works_In Departments Employees ssn Solves redundancy issues No redundant storage, lot stored once per did Can update a did’s lot by updating a single value Can insert new employees without associating them to a lot A department with no employees still has its associated lot

52 Summary Redundancy is evil
Redundant storage, update anomalies, insertion anomalies, deletion anomalies We can decompose a large redundant schema into a set of smaller, non-redundant schemas Functional dependencies can guide this decomposition The closure set F+ of a set of FDs F A good decomposition has some properties Lossless-join Dependency-preserving Does not make system workload inefficient 9

53 Summary We can reduce problems and improve ER design using normal forms First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF)

54 Summary We then learned how to apply these notions to refine ER diagrams and relation schemas lot dname budget did since name Works_In Departments Employees ssn

55 Exercises 19.2 19.3 19.5 19.6 19.7 19.8 19.10 19.13a & 4


Download ppt "Schema Refinement and Normal Forms"

Similar presentations


Ads by Google