Chapter 3: Design theory for relational Databases

Chapter 3: Design theory for relational Databases
Whittney Schwarz | Trevor Russ

INTRODUCTION There are many ways to go about designing a relational database schema for an application. Whatever approach is chosen, it is common for an initial relational schema to have room for improvement, especially by eliminating redundancy. Often, the problems with a schema involve trying to combine too much into one relation. “Dependencies” is a well developed theory for relational databases. It is the basis for what makes a good relational database schema, and what we can do about a schema if it has flaws.

Overview 3.1: Functional Dependencies
generalization of the idea of a key for a relation 3.2: Rules about Functional Dependencies 3.3: Design of Relational Database Schemas 3.4: Decomposition: The Good, Bad, and Ugly 3.5: Third Normal Form

3.1: Functional dependencies
Definition: A functional dependency (FD) is a statement that two tuples of a relation that agree on some particular set of attributes must also agree on some other particular set of attributes A functional dependency on a relation R is a statement of the form “If two tuples of R agree on all of the attributes A1,A2, …, An, then they must agree on all of another list of attributes B1,B2, …, Bm and say that “A1,A2, …, An functionally determine B1,B2, …, Bm”. Formally: A1,A2, …, An  B1,B2, …, Bm

Functional Dependencies are used to determine keys.
Relation: Cars Make Model Color Year Kia Honda Toyota Optima Civic Camry Forte Black Blue Red 2008 2012 2015 Reminder: a key is a constraint on a relation that specifies uniqueness Functional Dependencies are used to determine keys. Functional Dependency: Model  Make NOT Functionally Dependent: Make  Model Make  Color Make  Year

superkeys A set of attributes that contains a key is called a superkey, short for “superset of key”. Every key is a superkey, but some superkeys are not (minimal) keys. The difference between a key and a superkey is that a superkey contains the key but can also have additional attributes that would also still result in a unique tuple; it’s just not the minimal attributes that will also get you the same tuple.

{artist, age, albumName, songName, year}
Example We have the following key: {artist, albumName, year} A superkey could also include: {artist, age, albumName, songName, year} The key is included in the superkey, but has extra attributes added that will still result in the same unique tuple.

3.2: Rules About functional dependencies
In this section we are introduced to several useful rules about Functional Dependencies. In general, these rules let us replace one set of FD’s by an equivalent set, or to add to a set of FD’s that follow from the original set.

Splitting/combining rule
Consider the following Functional Dependency: artist albumName year  songName age This is equivalent to: artist albumName year  songName artist albumName year  age This rule ONLY applies to the right side. You can’t split the left side because these are the attributes specifically chosen to yield the unique tuples.

Trivial functional dependencies
A constraint is trivial if it holds for every instance of the relation, regardless of other constraints A1, A2, … , AnB1,B2, … , Bm such that {B1,B2, … , Bm} ⊆ { A1, A2, … , An } Simply stated: A trivial Functional Dependency has a right side that is a subset of the left side. Examples: make model  make make  make Both are trivial FD’s

Trivial dependency rule
When some—but not all—of the attributes on the right side of a Functional Dependency are also on the left, it can be simplified by removing from the right side of an FD those attributes that appear on the left. Attribute1 Attribute2  Attribute1 Attribute2 Attribute3 Attribute1 Attribute2  Attribute3

Closure of attributes Denoted: {R}+
Starting with the given set of attributes, we repeatedly expand the set by adding the right sides of FD’s as soon as we have included their left sides. Eventually, the set cannot be expanded any further and the resulting set is the closure. Denoted: {R}+

Transitive rule If AB and BC, then AC social  lastName
Example: social  lastName lastName  firstName The transitive rule allows us to combine the two Functional Dependencies to get a new FD: social  firstName

Closing Sets of Functional Dependencies:
If given a set of FD’s S, then any set of FD’s equivalent to S is said to be a basis for S. Easier to work with singleton right sides; so, if need be, we can apply the splitting rule to make the right sides singletons. A minimal basis for a relation is a basis that satisfies three conditions: All the FD’s in B have singleton right sides If any FD is removed from B, the result is no longer a basis If for any FD in B we remove one or more attributes from the left side of F, the result is no longer a basis.

Armstrong’s axioms A set of rules from which it is possible to derive any FD that follows from a given set Reflexivity: If {B1,B2, … , Bm} ⊆ { A1, A2, … , An }, then A1, A2, … , AnB1,B2, … , Bm (trivial FDs) Augmentation: If A1, A2, … , AnB1,B2, … , Bm , then A1, A2, … , An,C1,C2,…,CkB1,B2, … , Bm, C1,C2,…,Ck for any set of attributes C1,C2,…,Ck. Since some C’s may also be A’s or B’s, we should eliminate from the left side duplicate attributes and do the same for the right side. Transitivity: If A1, A2, … , AnB1,B2, … , Bm and B1,B2,…,Bm  C1,C2,…,Ck then A1, A2, … , An  C1,C2,…,Ck

Database Design* Careless implementation of a relational database can carry serve downsides. Like most things in Computer Science we want a plan of action to avoid these design issues. In order to understand our plan of action we must first understand the problems.

Problems caused by poorly designed databases are known as anomalies.
Three major types Redundancy anomalies – Occur when we have unnecessary repetitions of data. Update anomalies – Occur when information is changed in a single tuple but is not properly updated elsewhere. Deletion anomalies – If sets of values become empty we may lose other information as a side effect.

How do we solve anomalies?
Decomposition – The process of breaking relations down into a collection of smaller relations. These smaller relations when combined must equal the original relation. (A = B u C)

*Note the lack of previous anomalies

What is the goal of decomposition?
To replace a poorly designed relation with several well designed relations. There is a simple condition to ensure this called Boyce-Codd Normal Form or BCNF. BCNF only occurs when the left side of every known FD is a superkey.

Does this table follow BCNF?
To solve this we must first look at its Functional Dependencies: title and year -> length, genre, studioName However this (title, year) key pairing is NOT a superkey, as it does not help us determine the starName.

Repeatedly choosing proper decompositions creates subsets with the following attributes:
1) These subsets become compliant with BCNF. 2) The data from the original undivided relation is still properly presented. In general, we must continue applying the decomposition rule until every subset is properly following BCNF (every FD of a table produces a superkey).

Recovering information - Lossless Join
Lossless join – When joining sub-relations (or tuples) recreates the original relation (or tuple) they were decomposed from. This must be done with a natural join (⟗). Reason: Any single step of a recursive decomposition will always be equal to the join of its projections. This means that if we preform projections on our sub relations, we can use their tuples to find a tuple of our original.

Given R {A,B,C} and it’s FD is B->C.
R may be decomposed into R1{A,B} and R2 {B,C} Then assume tuple “t” = (a,b,c) where a,b,c are its components. Projecting t onto R1 yields (a,b) Projecting t onto R2 yields (b,c) R1 ⟗ R2 = t. This means regardless of what tuple t we start with we can always join its projections to get that original tuple back.

Tuple Facts Every tuple produced by a natural join is guaranteed to be a tuple of R. The natural join is both associative and communitive, meaning your join does not need a fixed order. The natural join will ONLY equal R if and only if every tuple in the join is also in R. There is no other algorithm that can reconstruct a relation like this. The natural join must be used.

Chase test – This is an organized way to see if a tuple “t” in a group of sub relations can be proved. To do this test we draw a picture of everything we know called a tableau. We then use a given set of FDs to prove that tuple “t” really does exist in base relation R.

Question: Why must the answer our chase test will produce have to be a lossless join? This is because the chase itself is a proof that one of the projected tuples from R must in fact be the tuple “t” we would get from the join. The entire point of the chase to a check to ensure that a joinable tuple does exist.

Given R(A,B,C,D) and its sub relations: S1 = {A,B}, S2 = {B,C}, and S3 = {C,D}.
Given B -> AD. Step 1, simply draw the tableau: Step 2, find unnumbered values that touch And follow our B -> AD rule.

Step 3, rewrite the tableau:
Because we were given no C -> … rules we are done. No lossless join exists, why? Because joining all of our projections gives the following:

What we want decomposition to do
Elimination of anomalies. Recovery of information. Preservation of dependencies – The ability to recreate the original relation from its’ sub relations projected FD’s, while also satisfying the conditions of the original FD’s. It is impossible to get all 3 of these at once. BCNF can only do the first two. Third Normal Form can only do the last two.

Third Normal Form (3NF) A table is in 3NF if:
The dependency X-> A exists and A is a subset of X ( or X -> A is a nontrivial FD). X (the left side of our FD) is a superkey OR The right side of our FD consists of prime attributes only. Prime attributes – refers to an attribute that is a member of some key.

The Synthesis Algorithm for 3NF
The goal of this algorithm is: The relations of a decomposition are all in 3NF. The decomposition still has a lossless join. The decomposition preserves dependency. Steps of this algorithm: Find a minimal basis in your set of functional dependencies that still hold for R. For each FD X -> A use XA as a schema of one of the relations in the decomposition. If no relation schema from step 2 is a superkey for R, add another relation whose schema is a key for R

How does this algorithm work?
- It works by showing three things: that the lossless join and dependency preservation both hold, and all relations are in 3NF. For the lossless join – We can use the same chase test from before. For the dependency preservation – Each FD of the minimal basis has all of its attributes in some relation of the decomposition. For 3NF – We have to add a relation whose schema is a key, because all attributes of this relation are prime. Thus no 3NF violation could occur.

Example Vendor ID Name Acc_No Bank_Code_No Vendor Bank Bank ID
FDs: ID -> Name, Acc_No, & Bank_Code_No Bank_Code_No -> Bank Vendor ID Name Acc_No Bank_Code_No Bank Bank_Code_No

Sources DATABASE SYSTEMS The Complete Book Second Edition.

Chapter 3: Design theory for relational Databases

Similar presentations

Presentation on theme: "Chapter 3: Design theory for relational Databases"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 3: Design theory for relational Databases

Similar presentations

Presentation on theme: "Chapter 3: Design theory for relational Databases"— Presentation transcript:

Similar presentations

About project

Feedback