Chapter 3: Design theory for relational Databases

Slides:



Advertisements
Similar presentations
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Advertisements

Spring 2011 Instructor: Hassan Khosravi
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Chapter 14 Advanced Normalization Transparencies © Pearson Education Limited 1995, 2005.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
1 CS122A: Introduction to Data Management Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li.
Lecture 11: Functional Dependencies
Normalization Database Management Systems, 3rd ed., Ramakrishnan and Gehrke, Chapter 19.
Advanced Normalization
Design Theory for Relational Databases
CS422 Principles of Database Systems Normalization
Design Theory for RDB Normal Forms.
Schedule Today: Next After that Normal Forms. Section 3.6.
Schema Refinement and Normal Forms
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
Module 5: Overview of Database Design -- Normalization
Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF)
CS422 Principles of Database Systems Normalization
CPSC-310 Database Systems
Relational Database Design by Dr. S. Sridhar, Ph. D
Schedule Today: Jan. 23 (wed) Week of Jan 28
Relational Database Design
CS 480: Database Systems Lecture 22 March 6, 2013.
Chapter 8: Relational Database Design
3.1 Functional Dependencies
Handout 4 Functional Dependencies
Advanced Normalization
Schema Refinement and Normalization
Lecture 6: Design Theory
Module 5: Overview of Normalization
Design Theory for Relational Databases
Schema Refinement What and why
Normalization Murali Mani.
Functional Dependencies and Normalization
Functional Dependencies and Relational Schema Design
Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design October 12 & 15, 2007.
Normalization Part II cs3431.
Lecture 8: Database Design
Functional Dependencies
Lecture 07: E/R Diagrams and Functional Dependencies
Normalization cs3431.
CS 405G: Introduction to Database Systems
Instructor: Mohamed Eltabakh
Functional Dependencies
Relational Database Design
Multivalued Dependencies
Anomalies Boyce-Codd Normal Form 3rd Normal Form
Lecture 6: Functional Dependencies
Chapter 3: Multivalued Dependencies
Chapter 7a: Overview of Database Design -- Normalization
Functional Dependencies and Normalization
Lecture 09: Functional Dependencies
CS4222 Principles of Database System
Design Theory for Relational Databases
Presentation transcript:

Chapter 3: Design theory for relational Databases Whittney Schwarz | Trevor Russ

INTRODUCTION There are many ways to go about designing a relational database schema for an application. Whatever approach is chosen, it is common for an initial relational schema to have room for improvement, especially by eliminating redundancy. Often, the problems with a schema involve trying to combine too much into one relation. “Dependencies” is a well developed theory for relational databases. It is the basis for what makes a good relational database schema, and what we can do about a schema if it has flaws.

Overview 3.1: Functional Dependencies generalization of the idea of a key for a relation 3.2: Rules about Functional Dependencies 3.3: Design of Relational Database Schemas 3.4: Decomposition: The Good, Bad, and Ugly 3.5: Third Normal Form

3.1: Functional dependencies Definition: A functional dependency (FD) is a statement that two tuples of a relation that agree on some particular set of attributes must also agree on some other particular set of attributes A functional dependency on a relation R is a statement of the form “If two tuples of R agree on all of the attributes A1,A2, …, An, then they must agree on all of another list of attributes B1,B2, …, Bm and say that “A1,A2, …, An functionally determine B1,B2, …, Bm”. Formally: A1,A2, …, An  B1,B2, …, Bm

Functional Dependencies are used to determine keys. Relation: Cars Make Model Color Year Kia Honda Toyota Optima Civic Camry Forte Black Blue Red 2008 2012 2015 Reminder: a key is a constraint on a relation that specifies uniqueness Functional Dependencies are used to determine keys. Functional Dependency: Model  Make NOT Functionally Dependent: Make  Model Make  Color Make  Year

superkeys A set of attributes that contains a key is called a superkey, short for “superset of key”. Every key is a superkey, but some superkeys are not (minimal) keys. The difference between a key and a superkey is that a superkey contains the key but can also have additional attributes that would also still result in a unique tuple; it’s just not the minimal attributes that will also get you the same tuple.

{artist, age, albumName, songName, year} Example We have the following key: {artist, albumName, year} A superkey could also include: {artist, age, albumName, songName, year} The key is included in the superkey, but has extra attributes added that will still result in the same unique tuple.

3.2: Rules About functional dependencies In this section we are introduced to several useful rules about Functional Dependencies. In general, these rules let us replace one set of FD’s by an equivalent set, or to add to a set of FD’s that follow from the original set.

Splitting/combining rule Consider the following Functional Dependency: artist albumName year  songName age This is equivalent to: artist albumName year  songName artist albumName year  age This rule ONLY applies to the right side. You can’t split the left side because these are the attributes specifically chosen to yield the unique tuples.

Trivial functional dependencies A constraint is trivial if it holds for every instance of the relation, regardless of other constraints A1, A2, … , AnB1,B2, … , Bm such that {B1,B2, … , Bm} ⊆ { A1, A2, … , An } Simply stated: A trivial Functional Dependency has a right side that is a subset of the left side. Examples: make model  make make  make Both are trivial FD’s

Trivial dependency rule When some—but not all—of the attributes on the right side of a Functional Dependency are also on the left, it can be simplified by removing from the right side of an FD those attributes that appear on the left. Attribute1 Attribute2  Attribute1 Attribute2 Attribute3 Attribute1 Attribute2  Attribute3

Closure of attributes Denoted: {R}+ Starting with the given set of attributes, we repeatedly expand the set by adding the right sides of FD’s as soon as we have included their left sides. Eventually, the set cannot be expanded any further and the resulting set is the closure. Denoted: {R}+

Transitive rule If AB and BC, then AC social  lastName Example: social  lastName lastName  firstName The transitive rule allows us to combine the two Functional Dependencies to get a new FD: social  firstName

Closing Sets of Functional Dependencies: If given a set of FD’s S, then any set of FD’s equivalent to S is said to be a basis for S. Easier to work with singleton right sides; so, if need be, we can apply the splitting rule to make the right sides singletons. A minimal basis for a relation is a basis that satisfies three conditions: All the FD’s in B have singleton right sides If any FD is removed from B, the result is no longer a basis If for any FD in B we remove one or more attributes from the left side of F, the result is no longer a basis.

Armstrong’s axioms A set of rules from which it is possible to derive any FD that follows from a given set Reflexivity: If {B1,B2, … , Bm} ⊆ { A1, A2, … , An }, then A1, A2, … , AnB1,B2, … , Bm (trivial FDs) Augmentation: If A1, A2, … , AnB1,B2, … , Bm , then A1, A2, … , An,C1,C2,…,CkB1,B2, … , Bm, C1,C2,…,Ck for any set of attributes C1,C2,…,Ck. Since some C’s may also be A’s or B’s, we should eliminate from the left side duplicate attributes and do the same for the right side. Transitivity: If A1, A2, … , AnB1,B2, … , Bm and B1,B2,…,Bm  C1,C2,…,Ck then A1, A2, … , An  C1,C2,…,Ck

Database Design* Careless implementation of a relational database can carry serve downsides. Like most things in Computer Science we want a plan of action to avoid these design issues. In order to understand our plan of action we must first understand the problems.

Problems caused by poorly designed databases are known as anomalies. Three major types Redundancy anomalies – Occur when we have unnecessary repetitions of data. Update anomalies – Occur when information is changed in a single tuple but is not properly updated elsewhere. Deletion anomalies – If sets of values become empty we may lose other information as a side effect.

How do we solve anomalies? Decomposition – The process of breaking relations down into a collection of smaller relations. These smaller relations when combined must equal the original relation. (A = B u C)

*Note the lack of previous anomalies

What is the goal of decomposition? To replace a poorly designed relation with several well designed relations. There is a simple condition to ensure this called Boyce-Codd Normal Form or BCNF. BCNF only occurs when the left side of every known FD is a superkey.

Does this table follow BCNF? To solve this we must first look at its Functional Dependencies: title and year -> length, genre, studioName However this (title, year) key pairing is NOT a superkey, as it does not help us determine the starName.

Repeatedly choosing proper decompositions creates subsets with the following attributes: 1) These subsets become compliant with BCNF. 2) The data from the original undivided relation is still properly presented. In general, we must continue applying the decomposition rule until every subset is properly following BCNF (every FD of a table produces a superkey).

Recovering information - Lossless Join Lossless join – When joining sub-relations (or tuples) recreates the original relation (or tuple) they were decomposed from. This must be done with a natural join (⟗). Reason: Any single step of a recursive decomposition will always be equal to the join of its projections. This means that if we preform projections on our sub relations, we can use their tuples to find a tuple of our original.

Given R {A,B,C} and it’s FD is B->C. R may be decomposed into R1{A,B} and R2 {B,C} Then assume tuple “t” = (a,b,c) where a,b,c are its components. Projecting t onto R1 yields (a,b) Projecting t onto R2 yields (b,c) R1 ⟗ R2 = t. This means regardless of what tuple t we start with we can always join its projections to get that original tuple back.

Tuple Facts Every tuple produced by a natural join is guaranteed to be a tuple of R. The natural join is both associative and communitive, meaning your join does not need a fixed order. The natural join will ONLY equal R if and only if every tuple in the join is also in R. There is no other algorithm that can reconstruct a relation like this. The natural join must be used.

Chase test – This is an organized way to see if a tuple “t” in a group of sub relations can be proved. To do this test we draw a picture of everything we know called a tableau. We then use a given set of FDs to prove that tuple “t” really does exist in base relation R.

Question: Why must the answer our chase test will produce have to be a lossless join? This is because the chase itself is a proof that one of the projected tuples from R must in fact be the tuple “t” we would get from the join. The entire point of the chase to a check to ensure that a joinable tuple does exist.

Given R(A,B,C,D) and its sub relations: S1 = {A,B}, S2 = {B,C}, and S3 = {C,D}. Given B -> AD. Step 1, simply draw the tableau: Step 2, find unnumbered values that touch And follow our B -> AD rule.

Step 3, rewrite the tableau: Because we were given no C -> … rules we are done. No lossless join exists, why? Because joining all of our projections gives the following:

What we want decomposition to do Elimination of anomalies. Recovery of information. Preservation of dependencies – The ability to recreate the original relation from its’ sub relations projected FD’s, while also satisfying the conditions of the original FD’s. It is impossible to get all 3 of these at once. BCNF can only do the first two. Third Normal Form can only do the last two.

Third Normal Form (3NF) A table is in 3NF if: The dependency X-> A exists and A is a subset of X ( or X -> A is a nontrivial FD). X (the left side of our FD) is a superkey OR The right side of our FD consists of prime attributes only. Prime attributes – refers to an attribute that is a member of some key.

The Synthesis Algorithm for 3NF The goal of this algorithm is: The relations of a decomposition are all in 3NF. The decomposition still has a lossless join. The decomposition preserves dependency. Steps of this algorithm: Find a minimal basis in your set of functional dependencies that still hold for R. For each FD X -> A use XA as a schema of one of the relations in the decomposition. If no relation schema from step 2 is a superkey for R, add another relation whose schema is a key for R

How does this algorithm work? - It works by showing three things: that the lossless join and dependency preservation both hold, and all relations are in 3NF. For the lossless join – We can use the same chase test from before. For the dependency preservation – Each FD of the minimal basis has all of its attributes in some relation of the decomposition. For 3NF – We have to add a relation whose schema is a key, because all attributes of this relation are prime. Thus no 3NF violation could occur.

Example Vendor ID Name Acc_No Bank_Code_No Vendor Bank Bank ID FDs: ID -> Name, Acc_No, & Bank_Code_No Bank_Code_No -> Bank Vendor ID Name Acc_No Bank_Code_No Bank Bank_Code_No http://www.gitta.info/LogicModelin/en/html/DataConsiten_Norm3NF.html

Sources DATABASE SYSTEMS The Complete Book Second Edition. http://www.gitta.info/LogicModelin/en/html/DataConsiten_Norm3NF.html