Download presentation
Presentation is loading. Please wait.
1
Database Systems Relational Database Design Functional Dependencies, Normalisation Theory
Gergely Lukács Pázmány Péter Catholic University Faculty of Information Technology Budapest, Hungary
2
Overview Redundancy Goals of normalisation Keys
Insert, Update, Delete anomalies Goals of normalisation Keys Super key, candidate key, primary key Functional Denepdencies 1NF, 2NF, 3NF, BCNF Multivalued dependencies, 4NF
3
Example: Repetition of information
Data for building and budget are repeated for each dept_name Problems: Redundancy: Complicates updating, introducing possibility of inconsistency Wastes space Null values Cannot store information about a department if no instructor exists Can use null values, but they are difficult to handle.
4
Update, Insertion, Deletion anomalies
(Uncontrolled redundancy) leads to: Update anomaly: If one instance is updated, then all other occurances have to be updated to the same new value (otherwise: inconsistency) Insertion anomaly: it is not possible to store information unless other information is stored also Deletion anomaly: it is not possible to delete some pieces of information without losing some other pieces of information
5
Features of a good design
Free the database of modification anomalies Make the database schema more informative to users Minimize redesign when extending the database schema Avoid bias towards any particular pattern of querying
6
The role of normalisation
Theoretical approach: starting with one relation, containing all attributes (universal relation) Some points in the design Getting information on the miniworld E/R model Relational model Normalisation Redesign, reverse-engineering
8
Decomposition A decomposition of R=(A1, A2, ...An) is a set of relations R1,…Rk R1(A11, A12, ...A1i), R2(A21, A22, ...A2j),.... Rk (Ak1, Ak2, ...Akn) such that the following 2 properties hold: 1. U Anm = { A1, A2, ...An } 2. an instance of Rk is rk= (Ak1, Ak2, ...Akn)(r) where r is an instance of R
9
Decomposition All attributes of an original schema (R) must appear in the decomposition (R1, R2): R = R1 ∪ R2
10
Example: Lossy Decomposition
11
Definition of lossless decomposition
Theorem If relations R1,…,Rk form a decomposition of R, then r r1 r2.... rk ( = natural join) Definition: If relations R1,…Rk form a decomposition of R, then it is said to be a lossless decomposition, if r = r1 r2.... rk.
12
Functional Dependencies
Functional dependency is a constraint: the value for a certain set of attributes determines uniquely the value for another set of attributes. generalization of the notion of a key.
13
Functional Dependencies
Let R be a relation schema R and R The functional dependency holds on R if and only if for any legal relations r(R), whenever any two tuples t1 and t2 of r agree on the attributes , they also agree on the attributes . That is, t1[] = t2 [] t1[ ] = t2 [ ] Example (with one attribute): Consider r(A,B) with the following instance of r. On this instance, A B does NOT hold, but B A does hold. Related to the semantics of the relationships, not to particular data in the tables! A dependency X⟶A is full if the dependency fails for every proper subset X' of X; the dependency is partial if not, ie if there is a proper subset X' of X such that X'⟶A. 4 7
14
Armstrong's axioms Additional rules derived from axioms
F1: reflexivity if Y X then X ® Y F2: augmentation if X ® Y then XZ ® YZ F3: transitivity if X ® Y and Y ® Z then X ® Z A B C Additional rules derived from axioms Union if A B and A C, then A BC Decomposition if A BC, then A B and A C
15
Super/Candidate/Primary Keys
K is a superkey for relation schema R if and only if K → R K is a candidate key for R if and only if K → R, and for no α ⊂ K, α → R Primary key: one selected candidate key Prime attribute: an attribute that belongs to some candidate key
16
First normal form First normal form (1NF) is now considered part of the formal definition of the relational model A relational schema R is in first normal form if the domains of all attributes of R are atomic (indivisible) and that the value of any attribute in a tuple must be a single value from the domain Example of non-atomic domains: Composite attributes Non-atomic values complicate storage and encourage redundant (repeated) storage of data Example: Set of accounts stored with each customer, and set of owners stored with each account NOTE: Objectrelational databases (used e.g., for geographic or xml databases) have moved away from this restriction
17
Second normal form (2NF)
No non-prime attribute in the table is functionally dependent on a proper subset of any candidate key If K represents the set of attributes making up a candidate key every nonprime attribute A (that is an attribute not a member of any key) is functionally dependent on K (i.e. K⟶A), but that this fails for any proper subset of K (no proper subset of K functionally determines A).
19
Third normal form (3NF) 2NF and there is no dependency X⟶A
for nonprime attribute A and for an attribute set X that does not contain a key (i.e. X is not a superkey). In other words, if X⟶A holds for some nonprime A, then X must be a superkey. For comparison, 2NF says that if X⟶A for nonprime A, then X cannot be a proper subset of any key, but X can still overlap with a key or be disjoint from a key.
22
Boyce-Codd Normal Form (BCNF)
BCNF requires that whenever there is a nontrivial functional dependency X⟶A, then X is a superkey. It differs from 3NF in that 3NF requires either that X be a superkey or that A be prime (a member of some key). BCNF bans all nontrivial nonsuperkey dependencies X⟶A; 3NF makes an exception if A is prime.
24
“I swear to construct my tables so that all nonkey columns are dependent on the key, the whole key and nothing but the key, so help me Codd.”
25
(BCNF and Dependency Preservation)
Constraints, including functional dependencies, are costly to check in practice unless they pertain to only one relation If it is sufficient to test only those dependencies on each individual relation of a decomposition in order to ensure that all functional dependencies hold, then that decomposition is dependency preserving. It is not always possible to achieve both BCNF and dependency preservation
26
(BCNF cont.) All databases enforce primary-key constraints. One could use a CHECK statement to enforce the lost FD2 statement, but this is often a lost cause. CHECK (not exists (select ay.county, ax.lot_num, ax.property_ID, ax2.property_ID from LOTS1AX ax, LOTS1AX ax2, LOTS1AY ay where ax.area = ay.area and ax2.area = ay.area // join condition and ax.lot_num = ax2.lot_num and ax.property_ID <> ax2.property_ID)) We might be better off ignoring FD5 here, and just allowing for the possibility that area does not determine county, or determines it only "by accident". Generally, it is good practice to normalize to 3NF, but it is sometimes not possible to achieve BCNF
27
Multivalued Dependencies
There are database schemas in BCNF that do not seem to be sufficiently normalized Consider a database classes(course, teacher, book) such that (c,t,b) classes means that t is qualified to teach c, and b is a required textbook for c The database is supposed to list for each course the set of teachers any one of which can be the course’s instructor, and the set of books, all of which are required for the course (no matter who teaches it).
28
Multivalued Dependencies (Cont.)
There are no non-trivial functional dependencies and therefore the relation is in BCNF Insertion anomalies – i.e., if Sara is a new teacher that can teach database, two tuples need to be inserted (database, Sara, DB Concepts) (database, Sara, Ullman) course teacher book database operating systems Avi Hank Sudarshan Jim DB Concepts Ullman OS Concepts Shaw classes
29
Multivalued Dependencies (Cont.)
Therefore, it is better to decompose classes into: course teacher database operating systems Avi Hank Sudarshan Jim teaches course book database operating systems DB Concepts Ullman OS Concepts Shaw text
30
Multivalued Dependencies (MVDs)
Multivalued dependency: constraint between two sets of attributes in a relation in fact: 3 sets of attributes Multivalued dependency requires that certain tuples be present in a relation Let R be a relation schema and let R and R. The multivalued dependency holds on R if in any legal relation r(R), for all pairs for tuples t1 and t2 in r such that t1[] = t2 [], there exist tuples t3 and t4 in r such that: t1[] = t2 [] = t3 [] = t4 [] t3[] = t1 [] t3[R – ] = t2[R – ] t4 [] = t2[] t4[R – ] = t1[R – ]
31
MVD (Cont.) Tabular representation of
32
Example (Cont.) In our example: course teacher course book
The above formal definition is supposed to formalize the notion that given a particular value of Y (course) it has associated with it a set of values of Z (teacher) and a set of values of W (book), and these two sets are in some sense independent of each other. Note: If Y Z then Y Z
33
4NF 4NF: avoiding multivalued dependencies
34
Youtube (good, somewhat too detailed)
Functional Dependencies Part 1 Part 2 BCNF Part 1 Part 2 Multivalued Dependencies Par t 1 Part 2
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.