Prof. Steven A. Demurjian, Sr.

Chap 15 & 16 6e - 14 5e: Relational DB Design, Functional Dependencies and Normalization
Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 191 Auditorium Road, Box U-155 Storrs, CT (860) A portion of these slides are being used with the permission of Dr. Ling Lui, Associate Professor, College of Computing, Georgia Tech. Other slides and figures have been adapted from the AWL web site for the textbook.

Normalizing a Relational DB Schema
Recall: Defining Relations - Deciding which Attributes belong Together in Each Relation Choosing Appropriate Names for the Relations and Their Attributes (with Domains and Data Types) Identifying the Candidate Keys and Choosing a PK for Each Relation, and Specifying All Foreign Keys Two Techniques for Relational Schema Design Using ER-to-Relational Mapping (Chapter 9) Relational Normalization Theory (Chaps 15/16) In Normalization, we Strive for an “Optimal” Design in Terms of Redundancy - Improve Performance Anomalies - Eliminate “Problems”

Design Process - Where are we?
Conceptual Design Conceptual Schema (ER Model) Step 2: Normalization Analyzing the Schema from Performance/Efficiency Perspectives to arrive at “Optimal” Schema Logical Design Analysis of Schema Logical Schema (Relational Model) Normalized Schema

Focus of this Chapter Informal Design Guidelines for Relational Databases Semantics of the Relation Attributes Redundant Information in Tuples/Update Anomalies Null Values in Tuples Spurious Tuples Functional Dependencies (FDs) Recall Key Concepts from Chapter 7 Inference Rules for FDs Normal Form and Normalization First, Second, and Third Normal Forms Boyce-Codd Normal Form

Informal DB Design Guidelines
What is Relational Database Design? The Grouping of Attributes to Form "Good" Relation Schemas Two Levels of Relation Schemas: The Logical "User View" Level The Storage "Base Relation" Level Design is Concerned Mainly with Base Relations What are the Criteria for "Good" Base Relations? We’ll Start with Informal Guidelines for Good Relational Design

What are Commandments for DB Design?
The Four Commandments: Thou Shalt Commit No Redundancy of Fact Thou Shalt Clutter No Facts Thou Shalt Preserve Information Thou Shalt Preserve Functional Dependencies © Leo Mark, Database Group, Georgia Tech

What is a “Good” DB Schema?
Focus on the “Semantics” of the Relations What Does Each Relation Mean? Do the “Semantics” of Each Relation Make Sense? Each Relation has a Consistent Meaning Dependencies Between Relations are Clear What about Keys? Are Primary Keys Well Defined? Do Links to Foreign Keys Make Sense? How Does Relational Schema Relate to ER or EER Predecessor? What is an Example of a “Good” Schema? Why?

A Well Defined DB Schema

Relational Instances for Prior Example
What does DEPT_LOCATIONS Represent?

Relational Instances for Prior Example
What does WORKS_ON Represent?

Guideline 1: Represent a Single Entity
GUIDELINE 1: Informally, Each Tuple in a Relation Should Represent One Entity or Relationship Instance (Applies to Individual Relations and their Attributes) Attributes of Different Entities should not be Mixed in the Same Relation Only FKs should be used to Refer to Other Entities Entity and Relationship Attributes should be Kept Apart as Much as Possible Bottom Line: Design a Schema that can be Explained Easily Relation by Relation The Semantics of Attributes should be Easy to Interpret

What is a “Lousy” Relation? Why?
Represents a “Single” Employee in Each Line as Identified via SSN What is it Trying to Represent? Each Employee Works in a Department Identified by DNUMBER, DNAME, and DMGRSSN What is the Problem with this Design? What Happens When you Update? Delete?

What Happens When you Update? Delete? Update “Research” to “R&D” What are the Problems? Research misspelled in Table Query Needs to Final All Rows Department Name no longer in 1 Location

What is the Problem with this Design? Mixing Attributes from EMPLOYEE and PROJECT Relations! Significant Amounts of Redundant Data More Critically - Anomalies in Update, Delete, Insert

Where are the Redundancies? ENAMEs, PNAMs, PLOCATIONs Are SSNs Redundant?

Guideline 2: Redundant Information and Update Anomalies
Mixing Attributes of Multiple Entities (see Prior Two Slides) May Cause Problems Key Problem: Information is Stored Redundantly There are Two Consequences: Wasting Storage Problems with Update Anomalies Insertion Anomalies - Inserting New Tuples Deletion Anomalies - Removing Existing Tuples Modification Anomalies - Changing Existing Tuples

Insertion Anomalies What Happens When Insert a new Employee Who Works in Department 5? Must Enter DNUMBER, DNAME, and DMGRSSN Must Be Exact w.r.t. Other Dept. 5 Employees! What Happens if you Enter: “Resaerch” or “ ”? What are Implications?

Insertion Anomalies What are Some Specific Problems with Table?
Can’t Add New Department without Employee Redundant Project Names Can you Delete a Department?

Insertion Anomalies What Happens When you Want to Insert a New Department? (3, “Education”, ) Can you do the Insert? If so, How? If Not, Why Not?

Deletion Anomalies What Happens When you Delete “Borg, James” from the EMP_DEPT Table? Is the Resulting Table OK? Why or Why Not?

Modification Anomalies
What Happens When you Want to Change “Research” to “R and D”? What is Required in this Case? Change Multiple Tuples What is the Responsibility of the DB Application Programmer or Anyone Doing an Update? Know the DB Content to Write a Correct Query

Another Schema with Problems
Two relation schemas suffering from update anomalies (a) Dname and Dmgr_ssn Replicated for Each Employee (b) Ename, Pname, Ploc Replicated each SSN/Pnumber combo

Consider Three Tables What Happens Join EMP/DEPT and EMP/PROJ?

Joining EMP/DEPT and EMP/PROJ

Example of an Update Anomaly
Consider the relation: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) Update Anomaly: Changing the name of project number P1 from “Billing” to “Customer-Accounting” requires update to be made for all 100 employees working on project P1.

Example of an Insert Anomaly
Consider the relation: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) Insert Anomaly: Cannot insert a project unless an employee is assigned to it. Conversely Cannot insert an employee unless a he/she is assigned to a project.

Example of an Delete Anomaly
Consider the relation: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) Delete Anomaly: When a project is deleted, it will result in deleting all the employees who work on that project. Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project.

Yet Another Example 1. Each Dept. has several students, and a student may enroll in one Dept. for his/her major 2. A Dept. has only one head, i.e., the Dept. chair 3. A student may register more than one course, and each course may have many students. 4. Each student registered for a course must have a corresponding grade STUDENT_DEPT(S#, DName, DHead, CN, Grade)

Yet Another Example STUDENT_DEPT(S#, DName, DHead, CN, Grade)
s1, CSE, Smith, 1010, B s1, CSE, Smith, 2102, C s1, CSE, Smith, 4701, F s2, CSE, Smith, 1010, A- s2, CSE, Smith , D

Update Anomalies Insertion:
STUDENT_DEPT(S#, DName, DHead, CN, Grade) Insertion: What Must Occur When a New Student is Inserted? (Insert “Correct” DName and DHead) How is the New “BioInformatics” Dept. Added? Can you even Add New Dept Easily? s1, CSE, Smith, 1010, B s1, CSE, Smith, 2102, C s2, CSE, Smith, 1010, A- s2, CSE, Smith , D BIOI, Arthurs

Update Anomalies Deletion:
STUDENT_DEPT(S#, DName, DHead, CN, Grade) Deletion: What Happens When the Last Student in the “Puppetry” Department is Deleted? s1, CSE, Smith, 1010, B s1, CSE, Smith, 2102, C s2, Puppetry, Doe, 2222, C s2, CSE, Smith, 1010, A- s2, CSE, Smith , D s1, CSE, Smith, 1010, B s1, CSE, Smith, 2102, C s2, CSE, Smith, 1010, A- s2, CSE, Smith , D

Update Anomalies Update: What Must Occur When a New DHead Takes Over?
STUDENT_DEPT(S#, DName, DHead, CN, Grade) Update: What Must Occur When a New DHead Takes Over? Smith  Jones s1, CSE, Smith, 1010, B s1, CSE, Smith, 2102, C s1, CSE, Smith, 4701, F s2, CSE, Smith, 1010, A- s2, CSE, Smith , D s1, CSE, Jones, 1010, B s1, CSE, Jones, 2102, C s1, CSE, Jones, 4701, F s2, CSE, Jones, 1010, A- s2, CSE, Jones , D

Guideline 3: Null Values in Tuples
Guideline 3: Relations should be Designed such that their Tuples will have as Few NULL Values as Possible Attributes that are NULL Frequently Could Be Placed in Separate Relations (With the Primary Key) Reasons for Null Values Attribute Not Applicable or Invalid Attribute Value Unkown (May Exist) Value Known to Exist, but Unavailable So Many Types of Null Values Become Impossible to Assess/Understand How can a Developer Understand Data?

Guideline 3 Why do we Need NULL in a Relation?
A “Flat” Relation With Many Attributes May Have Faster Queries (Since Fewer Joins Required) More Null Values Example: Not Every Student Will Enroll in Every Course Offered by the Department Problems with NULL Waste of Space at Storage Level (less of an issue) Aggregate Operations (Sum, Count) Cannot Apply Cannot Join Multiple Meanings, e.g., Unknown, Not Available, Known but Absent

Sample Null Values

How Else Can Null Values Occur?
Recall Options 3 and 4 on Specialization For Each Specialization with m Subclasses {S1, …, Sm} and Generalized Superclass C, where the Attributes of C are {k, A1, …, An} (k is the PK), Convert According to the Following: Option 3: For Disjoint Subclasses: Create a Single Relation U which Contains all the Attributes of all Si and {k, A1, …, An} and t Use k as the primary key of Ui The Attribute t Indicates the Type Attribute According to which Specialization is Performed

Step 8 – Option 3 Example What is True for the Three
Secretary Tech Engr What is True for the Three Attributes Boxed Above?

How Else Can Null Values Occur?
Recall Options 3 and 4 on Specialization Option 4: For Overlapping Subclasses: Create a single relation U which contains all Attributes of all Si and all Attributes of C ({k, A1, …, An}) and {t1, …, tm} Use k as the Primary Key of Ui The Attributes ti are Boolean Valued, Indicating if a Tuple Belongs to Subclass Si Note: May Generate a Large Number of Null Values in the Relation

Step 8 – Option 4 Example What is True regarding the Attributes
Boolean Boolean What is True regarding the Attributes Introduced for MANUFATURED_PART and PURCHASED_PART?

Information Loss and Spurious Tuples
We’ve had Guidelines for: One Concept/Relation, Avoiding Update Anomalies, Null Values Two Other “Related” Concerns Can Arise First, in Decomposing (Splitting) a Relation Apart, we May “Lose” Information Second, in Attempting to Reassemble Two or More Relations into One (via a Join), Spurious Tuples may Result A Spurious Tuple “Wasn’t” Present Originally and Makes No Sense - Didn’t Exist and its Existence is Inconsistency

Suppose Split EMP_PROJ

What are Semantics of Split?
EMP_LOCS Means the Employee ENAME Works on Some Project at PLOCATION EMP_PROJ1 Means the Employee Identified by SSN Works HOURS per Week on Project Identified by PNAME, PNUMBER, PLOCATION What has been Lost in the Split? ENAME to SSN Connection! Can the Information Even be Recovered?

Recall EMP_PROJ

What are Tuples After Split?

What is the Issue? Suppose EMP_PROJ1 and EMP_LOCS used in Place of EMP_PROJ The Split is Legitimate if we Can Recover the Information Originally in EMP_PROJ How could you Recover the Information? Natural Join on EMP_PROJ1 and EMP_LOCS What would be the Result? Note: “*’ed” Entries are Spurious Tuples We do not Obtain the “Correct” Information We have Conducted a “Lossy” Decomposition

What Happens When we Join?
What do “*”ed Tuples Represent?

Guideline 4: Spurious Tuples
The Relations should be Designed to Satisfy the Lossless Join Condition No Spurious Tuples Should Be Generated by Doing a Natural-join of Any Relations Two Important Properties of Decompositions: a. Non-additive(Losslessness) of Corresponding Join b. Preservation of the Functional Dependencies Property (a) is Extremely Important and Cannot Be Sacrificed Property (b) is Less Stringent and May Be Sacrificed

Guideline 4: Lost Information
A First Example of Lost Information What is Lost in the Join of R and S? R = (A, B, C) S = (D, C) b2 b4 c1 c2 A B C c3 d1 d2 d4 d5 D a1 a2 a3 RS(A, B, C, D) lost info.

Guideline 4: Spurious Tuples
A Second Example of Spurious Tuples Decompose R into R1 and R2 by Projection Rejoin R1 and R2 What are Spurious in the Join of R1and R2? a1 a2 a3 a4 b1 b2 c1 c2 d1 d2 d3 A B C D R1 and R2 Join D d1 d2 d3 A a1 a2 a3 a4 R2(A, D) a1 a2 a3 a4 b1 b2 c1 c2 d1 d2 d3 A B C D R(A, B, C, D) R1(B, D) B C b1 b2 d1 d2 d3

Let’s Review Some Other Examples
Some NULL values for Join Attribute DNUM

Natural Join on Employee Department Shown Below What is the Problem? Berger and Benitez Employees are no Longer Present

The Outer Join as Described in Chapter 8 Works

Dangling Tuples Decompose

Dangling Tuples If Decompose when Try to Rejoin on SSN
Lose Both Berger and Benitez These are Called Dangling Tuples

Functional Dependencies (FDs)
FDs are used to Specify Formal Measures of the "Goodness" of Relational Designs FDs and Keys are used to Define Normal Forms for Relations FDs are Constraints that are Derived from the Meaning and Interrelationships of the Data Attributes A Set of Attributes X Functionally Determines a Set of Attributes Y if the Value of X Determines a Unique Value for Y FDs are Derived from the Real-World Constraints on the Attributes A Relational Schema is Relations with Keys and FDs!

Functional Dependencies
A Functional Dependency Exists Between Two (or Two Set Of) Single Valued Attributes X and Y of Relation R, if Each Value of X Corresponds to Precisely One Value of Y Denoted by X  Y X is Called the Left Hand Side of FD Y is Called the Right Hand Side of FD Read as X Functionally Determines Y in R FD Defined on a Table-by-Table Basis For any t1, t2  r(R), if t1[X] =t2[X], then t1[Y] =t2[Y], We say that X Y hold in R

Functional Dependencies – Another view
X -> Y holds if whenever two tuples have the same value for X, they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R): If t1[X]=t2[X], then t1[Y]=t2[Y] X Uniquely Determines Y X and Y can be 1 Attribute or a Set of 2 or More Attributes X -> Y in R specifies a constraint on all relation instances r(R) Written as X -> Y; can be displayed graphically on a relation schema as in Figures. (denoted by the arrow) FDs are derived from the real-world constraints on the attributes

Examples of FD constraints
Social security number determines employee name SSN  ENAME Project number determines project name and location PNUMBER  {PNAME, PLOCATION} Employee ssn and project number determines the hours per week that the employee works on the project {SSN, PNUMBER}  HOURS

What is an FD in a Relation?
Consider the TEACH Relation: Are the following FD? Does Text Uniquely Determine Course? Text  Course? Does Teacher Uniquely Determine Course? Teacher  Course

Textual and Graphical Notations
EMP_DEPT FDs: SSN  Ename, Bdate, Address Dnumber  Dname, Dmgr_ssn EMP_PROJ FDs: SSN Pnumber  Hours SSN  Ename Pnumber  Pname, Plocation

Examples of FD constraints
An FD is a Property of Attributes in the Schema FDs Must Hold on Every Relation Instance R If K is a Key of R, then K Functionally Determines All Attributes in R Since we Never have Two Distinct Tuples with T1[k]=t2[k]

Example of FDs – Textual and Graphical
STUDENT_DEPT (S#, DName, DHead, CN, Grade) FDs over STUDENT_DEPT: {S#, CN} Grade, S# DNAME, DNAME DHead. S# DHead CN Grade DNAME fd1 fd2 fd3

Example of FDs SSN  {ENAME, BDATE, ADDRESS, DNUMBER}
DNUMBER  {DNAME, DMGRSSN} SSN  ENAME PNUMBER  {PNAME, PLOCATION} {SSN, PNUMBER}  HOURS

Determining FDs Must Understand the Semantics of Data Based on Schema or Current/Future Instances Recall FD: TEXT  COURSE What if we add a row to the table? Need to Understand the Potential Future Data James Web Databases Al-Nour

Inference Rules for FDs
Given a set of FDs F, we can Infer Additional FDs that Hold whenever the FDs in F Hold For Example, Consider: F = {SSN {EName, BDate, Address, DNumber}, DNumber  {DName, DMGRSSN} } What are Additional FDs? SSN  EName SSN  BDate SSN  SSN SSN  Address SSN  DNumber DNumber  Dname DNumber  DMGRSSN

Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold Armstrong's inference rules: IR1. (Reflexive) If Y subset-of X, then X -> Y IR2. (Augmentation) If X -> Y, then XZ -> YZ IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z IR1, IR2, IR3 form a sound and complete set of inference rules These are rules hold and all other rules that hold can be deduced from these

Some additional inference rules that are useful: Decomposition: If X -> YZ, then X -> Y and X -> Z Union: If X -> Y and X -> Z, then X -> YZ The last Two inference rules, as well as any other inference rules, can be deduced from IR1, IR2, and IR3 (completeness property)

Summary of Inference Rules
Armstrong’s Inference Rules Derived Inference Rules 1. Reflexive: If X Y, then X Y. 2. Augmentation: If { X Y} then XZYZ. 3. Transitive: If { XY, YZ } then X Z. 4. Decomposition: If { XYZ } then X Y. 5. Additive (Union): If {XY, XZ } then X YZ.

Example of Inference Rules - Reflexive
SSN FNAME LNAME DNO DNAME John Smith 5 Research Jane Doe 4 Payroll Peter Jones Reflective: If X Y, then X Y X = {FNAME, LNAME} and Y = {FNAME} Y ⊆ X means {FNAME} ⊆ {FNAME, LNAME} Therefore X  Y means { FNAME, LNAME }  { FNAME }

Example of Inference Rules - Augmentation
SSN FNAME LNAME DNO DNAME John Smith 5 Research Jane Doe 4 Payroll Peter Jones Augmentation: If { X Y} then XZYZ X = {SSN} Y ={FNAME} Z={DNAME} X  Y means {SSN}  FNAME} Therefore XZ  YZ means {SSN, DNAME}  {FNAME, DNAME}

Example of Inference Rules – Transitivity
SSN FNAME LNAME DNO DNAME John Smith 5 Research Jane Doe 4 Payroll Peter Jones Transitive: If { XY, YZ } then X Z X = {SSN} Y ={DNO} Z={DNAME} X  Y means {SSN}  {DNO} Y  Z means {DNO}  {DNAME} Therefore X  Z means {SSN}  {DNAME}

Towards Normalization of Relations
We take each Relation Individually and “Improve” Them in Terms of the Desired Characteristics Normalization Decomposes Relations into Smaller Relations that Results in No Information Loss Support for Reconstruction No Spurious Joins Query Execution Time May Increase Denormalization May Be Necessary Later on Objectives: Minimizing Redundancy Insertion, Deletion, and Update Anomalies

What is the Normalization Process?
Provides DB Designers with the Ability to “Improve” their Relations Deal with Redundancies and Anomalies Normalization Procedure Provides DB Designs with A Formal Framework for Analyzing Relation Schemas based on their Keys and on the Functional Dependencies among their Attributes A Series of Normal Form Tests that can be Carried out on Individual Relation Schemas so the Relational DB can be Normalized to Desired Degree

What are Normal Forms? A Normal Form is a Condition using Keys and FDs to Certify Whether a Relation Schema meets Criteria Primary keys (1NF, 2NF, 3NF) All Candidate Keys ( 2NF, 3NF, BCNF) Multivalued Dependencies (4NF) Join Dependencies (5NF) 5 NF 4NF 3NF 2NF 1NF

How is Normalization Attained?
Typically, Normalization is Attained through a Process of Decomposition that Breaks Apart Relations to Remove Redundancies and Anomalies In Process, we must Maintain Two Properties: Lossless Join or Nonadditive Join Property Guarantees the Spurious Tuple Generation Problem does not occur on Decomposed Relations Dependency Preservation Property Ensures that each FD is Represented in some Individual Relation(s) after Decomposition Premise: Relational Schema with Primary Keys and Functional Dependencies Specified

50,000 foot View of Normalization – Part I
Remove Composite/ Multi-Value Attributes 1NF “Lousy Tables” Eliminate the Partial Functional Dependencies of Non-prime Attributes to Key Attributes Lossless Decomposition and Dependency Preserving 2NF Eliminate the Transitive Functional Dependencies of Non-prime Attributes to Key Attributes 3NF Eliminate the Partial and Transitive Functional Dependencies of Prime (Key) Attributes to Key Lossless Decomposition but not Dependency Preserving BCNF (Boyce Codd) “Wonderful Tables”

Recall Key Constraints
Superkey (SK): Any Subset of Attributes Whose Values are Guaranteed to Distinguish Among Tuples Candidate Key (CK): A Superkey with a Minimal Set of Attributes (No Attribute Can Be Removed Without Destroying the Uniqueness -- Minimal Identity) A Value of an Attribute or a Set of Attributes in a Relation That Uniquely Identifies a Tuple There may be Multiple Candidate Keys Examples are given in class: Student (S#, SSN, SN, DN, Birthdate) Candidate Key: S#, SSN Superkey: Any combinations of attrributes containing S# or SSN Primary Key: SSN Foreign Key: none COURSE(C#, Sec#, Instructtor, Classroom) S-C(S#, C#, Grade) Candidate Key= Primary key = (S#, C#) Superkeys: (S#, C#), (S#, C#, Grade) Foreign Key: S# with respect to S# in Student.relation C# w.r.t. C# in Course relation

Recall Key Constraints
Primary Key (PK): Choose One From Candidate Keys The Primary Key Attributed are Underlined Foreign Key (FK): An Attribute or a Combination of Attributes (Say A) of Relation R1 Which Occurs as the Primary Key of another Relation R2 (Defined on the Same Domain) Allows Linkages Between Relations that are Tracked and Establish Dependencies Useful to Capture ER Relationships Examples are given in class: Student (S#, SSN, SN, DN, Birthdate) Candidate Key: S#, SSN Superkey: Any combinations of attrributes containing S# or SSN Primary Key: SSN Foreign Key: none COURSE(C#, Sec#, Instructtor, Classroom) S-C(S#, C#, Grade) Candidate Key= Primary key = (S#, C#) Superkeys: (S#, C#), (S#, C#, Grade) Foreign Key: S# with respect to S# in Student.relation C# w.r.t. C# in Course relation

Superkeys vs. Candidate Keys
Superkey of R: A Superkey SK is a Set of Attributes of R Such that No Two Tuples in Any Valid Relation Instance R(r) will Have the Same Value for SK Given R(U), U is the Set of Attributes of R and a Relation Instance of R, Denoted As R(r), For Any Distinct Tuples T1 and T2 in R(r), T1[sk] < > T2[sk] For Cars, Valid Superkeys Must Contain: SerialNo OR State, Reg# OR Both For EMPLOYEE {SSN} is a Key and {SSN}, {SSN, ENAME}, {SSN, ENAME, BDATE} are all SUPERKEYS Example: The CAR relation schema: CAR(State, Reg#, SerialNo, Make, Model, Year) has two keys Key1 = {State, Reg#}, Key2 = {SerialNo}, which are also superkeys. {SerialNo, Make} is a superkey but not a key. If a relation has several candidate keys, one is chosen arbitrarily to be the primary key. The primary key attributes are underlined.

Superkeys vs. Candidate Keys
Candidate Key of R: A "Minimal" Superkey: a Candidate Key K is a Superkey s.t. Removal of any Attribute From K Results in a Set of Attributes that is Not a Superkey Given R(U), U is the Set of Attributes of R and a Relation Instance of R, Denoted as R(r) K is a Candidate Key iff for any A in K, there exists Two Distinct Tuples T1 and T2 in R(r) such that T1[K-A] = T2[K-A] In Previous (State, Reg#, Make, Model) is SK Is it a CK? Why or Why Not? Example: The CAR relation schema: CAR(State, Reg#, SerialNo, Make, Model, Year) has two keys Key1 = {State, Reg#}, Key2 = {SerialNo}, which are also superkeys. {SerialNo, Make} is a superkey but not a key. If a relation has several candidate keys, one is chosen arbitrarily to be the primary key. The primary key attributes are underlined.

Example and Remaining Definitions
CAR(State, Reg#, SerialNo, Make, Model, Year) Primary key is {State, Reg#} It has two candidate keys (also superkeys) Key1 = {State, Reg#} Key2 = {SerialNo} {SerialNo} can also be Chosen as Primary Key Definition: Prime Attribute - Attribute A of R that is Member of some Candidate Key K or R Definition: Non-Prime Attribute - An Attribute that is not Prime (i.e., Not a Member of Any Candidate Key) WORKS_ON – SSN, Pnumber PRIME

First Normal Form (1NF) All Attributes Must Be Atomic Values:
Only Simple and Indivisible Values in the Domain of Attributes. Each Attribute in a 1NF Relation is a Single Value Disallows Composite Attributes, Multivalued Attributes, and Nested Relations (Non-Atomic) 1NF Relation cannot have an Attribute Value : A Set of Values (Set-Value) A Tuple of Values (Nested Relation) 1NF is a Standard Assumption of Relation DBs

One Example of 1NF Consider Following Department Relation
What is the Inherent Problem? DLOCATIONS is Multi-valued

What are Possible Solutions?
Decompose: Move the Attribute DLOCATIONS that Violates 1NF into a Separate Relation DEPT_LOCATIONS(DNUMBER, DLOCATION)

Expand the key to have a Separate Tuple in the DEPARTMENT relation for each location (below)

Introduce DLOC1, DLOC2, DLOC3, if there are Three Maximum Locations Problems with Each? Best Solution? DLOC1 DLOC2 DLOC3 Storrs Vernon Storrs Vernon Bolton Coventry Bolton Vernon

How Does Transition Occur?

Another 1NF Example - Nested Relations
EMP_PROJ - Table and Tuples Transition to:

Another 1NF Example – Redundant data
Transition to: Smith, John B. English, Joyce A. Wong, FrankLin T Wong, FrankLin T

Second Normal Form (2NF)
Second Normal Form Focuses on the Concepts of Primary Keys and Full Functional Dependencies Intuitively: A Relation Schema R is in Second Normal Form (2NF) if Every Non-Prime Attribute A in R is Fully Functionally Dependent on the Primary Key R can be Decomposed into 2NF Relations via the Process of 2NF Normalization Successful Process Typically Involves Decomposing R into Two or More Relations Iteratively Applying to Each Relation in Schema

Full Functional Dependency
Full FD - Formally: Given R(U) and X, YU. If XY holds, and there exists no such X’ that X’X, and X’Y holds over R, then Y is fully dependent on X, denoted as XY Full FD- Intuitively: A FD XY where Removal of any Attribute from X means the FD no Longer Holds {SSN, PNUMBER}  HOURS is Full since Neither SSN -> HOURS nor PNUMBER  HOURS holds {SSN, PNUMBER}  ENAME is Partial since {SSN}  ENAME (or conversely, PNUMBER DOES NOT determine ENAME) f

Partial Functional Dependency
Partial FD - Formally: Given R(U) and X, YU. If XY holds but Y is not fully dependent on X ( XY), then Y is partially functional dependent on X, denoted by XY Partial FD - Intuitively: Removal of a Attribute from the R.H.S. still Results in a Valid FD {SSN, PNUMBER}  ENAME is Partial since Removing PNUMBER still Results in the Valid FD SSN  ENAME f p

What are Examples of Full and Partial FDs?
COURSES (DNUM, CNUM, DNAME, CTITLE) and two FDs: {DNUM, CNUM}  { DNAME, CTITLE} - Full {DNUM}  { DNAME} – Partial since Depends on only “Part” of Key DNUM CNUM CUSTOMER( CUSTID, NAME, ORDERID) {CUSTID, ORDERID}  {NAME} - Full {CUSTID}  { NAME} – Partial since Depends on only “Part” of Key CUSTID ORDERID

What are Examples of Full and Partial FDs?
CLASS (COURSE#, STUDID, STUDNAME, FACID, SCHED, ROOM, GRADE) Full FDs: {COURSE#, STUDID} → {STUDNAME} {COURSE#, STUDID} →{FACID} {COURSE#, STUDID} →{SCHED} {COURSE#, STUDID} →{ROOM} {COURSE#, STUDID} →{GRADE} Partial FDs: Depend on only part of Key COURSE# → FACID COURSE# → SCHED COURSE# → ROOM STUDID → STUDNAME

Second Normal Form (2NF)
Formal 2NF Definition R 2NF iff (i) R 1NF; (ii) all Non-Key Attributes in R are Fully Functional Dependent on Every Key. Alternative Definition: R 2NF iff the Attributes are Either a Candidate Key, or Fully Dependent on Every Key. Reason: Partial Functional Dependencies may cause Update Problems Objective: Identify and Remove Partial FDs!

Another Way to View the Problem
If the Primary Key Contains a Single Attribute, than No Need to Test for Problems This is 1NF but not 2NF since Ename a non-prime attribute in FD2 Violates 2NF since it Depends on Part of Key (SSN) Pname and Plocation two non-prime attributes in FD3 Violates 2NF Depends on Part of Key (Pnumber)

One Example of 2NF Consider the Example Below
STUDENT_DEPT(S#, DName, DHead, CN, Grade) S# DHead CN Grade DName fd1 fd2 fd3 STUDENT_DEPT 1NF But STUDENT_DEPT 2NF “{S#, CN} DName, DHead” since S#  DName is a Partial FD causes Anomalies

Recall the Anomalies… STUDENT_DEPT( S# , DName, DHead, CN, Grade)
Insertion Anomalies: No Department Can Be Recorded if it has No Student Who Enrolls Courses Deletion Anomalies: Delete the Last Student in a Department will also Delete the Department Update Anomalies: Change a Head of a Department must Modify All Students in that Department Due to Redundancies

One Example of 2NF (Continued)
Decomposition into 2NF by Separating Course Information from Department Information (Link S#) Note: Original FDs Maintained S_D(S#, DName, DHead) DHead DName fd2 fd3 S# S_C(S#, CN, Grade) S# CN Grade fd1

Another Example of 2NF EMP_PROJ is 1NF with Key SSN, PNUMBER but…
SSN  ENAME - Means ENAME, a Non-Prime Attribute, Depends Partially on SSN, PNUMBER, i.e., Depend on Only SSN and not Both PNUMBER  {PNAME, PLOCATION} - Means PNAME, PLOCATION, two Non-Prime Attributes, Depends Partially on SSN, PNUMBER, i.e., Depend on Only PNUMEBER and not Both

Another Example of 2NF What Does Decomposition Below Accomplish?
ENAME Fully Dependent on SSN PNAME, PLOC Fully Dependent on PNUMBER Result: 2NF for EP1, EP2, and EP3

Yet Another Example of 2NF
LOTS( PROPI_ID#, COUNTY_NAME, LOT#, AREA, PRICE, TAX_RATE) COUNTY_NAME, LOT# - Candidate Key Consider 1NF Lots to Track Building Lots for Towns What is the 2NF Problem? FD3: COUNTY_NAME  TAX_RATE Means TAX_RATE Depends Partially on Candidate Key {COUNTY_NAME, LOT#} All Other Non-Prime Attributes are Fine

What Does Decomposition Below Accomplish? TAX_RATE Fully Dependent on COUNTY_NAME Result: 2NF for LOTS1 and LOTS2

Third Normal Form (3NF) Third Normal Form Focuses on the Concepts of Primary Keys and Transitive Functional Dependencies Intuitively: A Relation Schema R is in Third Normal Form (3NF) if it is in 2NF and no Non-Prime Attribute A in R is Transitively Dependent on Primary Key R can be Decomposed into 3NF Relations via the Process of 3NF Normalization In XY and Y Z , with X as the Primary Key, there is only a problem only if Y is not a candidate key. EMP(SSN, Emp#, Salary), SSN  Emp#  Salary isn’t Problem Since Emp# is a Candidate Key

Transitive Partial FDs
Transitive FD - Formally: Given R(U) and X, YU. If XY, YX and YX, YZ, then Z is called transitively functional dependent on X. Transitive FD - Intuitively: a FD X Z that can be derived from two FDs XY and YZ SSN  ENAME is non-transitive Since there is no set of Attributes X where SSN  X and X  ENAME For FD X Z that can be derived from two FDs XY and YZ, if Y is a Candidate Key – No Problem

Third Normal Form (3NF) Formal 3NF Definition R 3NF iff (i) R 2NF;
(ii) No Non-Key Attribute of R is Transitively Dependent on Every Candidate Key. Alternative Definition: R 3NF iff for every FD X  Y, either X is a superkey, or Y is a key attribute. Reason: Transitive Functional Dependencies may cause Update Problems

One Example of 3NF S_D(S#, DName, DHead) 2NF S_D 3NF
STUDENT_DEPT(S#, DName, DHead, CN, Grade) 2NF S_D(S#, DName, DHead) 2NF S_D 3NF S_C(S#, CN, Grade) 2NF S_C 3NF “S#  DHead” is a Transitive FD in S_D and “DHead” is non-key attribute since S# (X)  DName (Y) and DName (Y) DHead (Z) S#DHead S# DHead CN Grade DNAME fd1 fd2 fd3

One Example of 3NF S_C(S#, CN, Grade) 2NF
fd S#  DHead DHead DName S# fd2 S#  DName fd3 DName  DHead S_C(S#, CN, Grade) 2NF S_D(S#, DName, DHead) 2NF S_D (S#, DName) DEPT(DName, DHead) 3NF Decompose to Eliminate the Transitivity Within S_D

Another Example of 3NF EMP_DEPT is 2NF with Key SSN, but there are Two Transitive Dependencies (Undesirable) SSN  DNUMBER and DNUMBER  DNAME Means DNAME, Neither Key Nor Subset of Key, is Transitively Dependent on SSN SSN is the Only Candidate Key of EMP_DEPT! Note: Also Similar Problem with SSN and DMGRSSN via DNUMBER SSN  DNUMBER &DNUMBER  DMGRSSN

Another Example of 3NF To Attain 3NF, Decompose into ED1 and ED2
Intuitively - we are Separating Out Employees and Departments from One Another

Recall 2NF Solution for Building Lots Problem What is the 3NF Problem? Violate Alternative Defn. In LOTS1, FD4 AREA  PRICE AREA is not a Superkey PRICE not a Prime Attribute of LOTS1 PROPERTY_ID#  AREA & AREA  PRICE

Decompose to Introduce a Separate Key AREA Result: 3NF for LOTS1A and LOTS1B

1NF and 2NF – Maintain FDs!

Transition to 3NF – Maintain FDs!

Summary of Progression – Maintain FDs!
STUDENT_DEPT 1 N F S# DHead CN Grade DName fd1 fd2 fd3 S_C S_D 2 N F eliminate partial FDs fd1 S# CN Grade DHead DName fd2 fd3 DHead S# S_D DName DEPT S_C 3 N F eliminate transitive FDs fd1 CN Grade fd3 fd2

Another Example – Un-normal to 1NF
Color is Multi-Valued – not allowed

Another Example – Un-normal to 1NF
For 2NF – Split into Two tables

Another Example – From 1NF to 2NF
CustomerID, StoreID - Composite Primary Key CustomerID, StoreID  PurchaseLocation StoreID  PurchaseLocation – Partial FD since PurchaseLocation Depends on Only Part of Key

BookID  GenreID GenreID  GenreType BookID  GenreID & GenreID  GenreType GenreType is Transitively Dependent on BookID through GenreID

One More Example – 1NF to 2NF
Symbol Date – Primary Key FD1: Symbol, Date → Company, Headquarters, Close Price FD2: Symbol → Company, Headquarters Note Company Headquarters Depend on Part of Key (Symbol) One Solution is to Create Two Tables: Company (Company, Symbol, Headquarters) Stock_Prices (Symbol, Data, Close_Price

One More Example – 2NF to 3NF
Symbol – Primary Key FD1: Symbol → Company FD2: Company → Headquarters Symbol → Company & Company → Headquarters Headquarters transitively Dependent on Non Key (Company) One Solution is to Create Two Tables: Stock_Symbols (Company, Symbol) Company_Headquarters (Company, Symbol)

Summary of 1NF, 2NF, 3NF Concepts
Test Remedy (Normalization) 1NF Relation should have Form new relations for each nonatomic no nonatomic attributes attribute or nested relation or nested relations. 2NF For relations where primary Decompose and set up a new relation key contains multiple for each partial key with its dependent attributes, no nonkey attribute(s). Make sure to keep a attribute should be relation with the original primary key functionally dependent on and any attributes that are fully a part of the primary key functionally dependent on it. 3NF Relation should not have a Decompose and set up a relation that nonkey attribute functionally includes the nonkey attribute(s) that determined by another nonkey functionally determine(s) other attribute (or by a set of nonkey nonkey attribute(s). attributes.) That is, there should be no transitive dependency of a nonkey attribute on the primary key.

Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form Focuses on Searching for Remaining Anomalies that can Arise in FDs Intuitively: A Relation Schema R is in Boyce-Codd Normal Form (BCNF) if Whenever an FD X  A Holds in R, then X is a Superkey of R R can be Decomposed into BCNF Relations via the Process of BCNF Normalization There exist Relations that are in 3NF but not in BCNF The Goal is to have each Relation in BCNF (or 3NF)

Boyce-Codd Normal Form (BCNF)
Formal BCNF Definition R BCNF iff (i) R 1NF; (ii) for every FD X  Y, X is a Superkey, i.e., if X  Y and YX, then X Contains a Key. Properties of BCNF R BCNF iff for every FD X  Y, either All Non-key Attributes Fully Dependent on Every Key All Key Attributes Fully Dependent on the Keys that they do not Belong to No Attribute Fully Dependent on any Set of Non-key Attributes

Comparing the Normal Forms
Poor Relational Schema Design Developed as Stepping Stone 1NF Eliminate partial FDs of non-key attributes to key Eliminate the non-trivial functional dependencies of non-key attributes to key 2NF Eliminate transitive FDs of non-key attributes to key 3NF Most 3NF are in BCNF - BCNF Eliminates All Update Anomalies Eliminate partial and transitive FDs of key attributes to key BCNF

One Example of BCNF Recall 3NF Solution for Building Lots Problem
Suppose that AREA is Sizes in Acres with AREAs in Tolland County 0.5, 0.6, …, 1.0 AREAs in Windham County 1.1, 1.2, …, 2.0 Adding FD5: “AREA  COUNTYNAME” What Does Data in LOTS1A Look like for Given Set of Properties?

One Example of BCNF What is the Problem Here? What if you Delete W11?
LOTS1A PROPERTY_ID# COUNTY_NAME LOT# AREA T Tolland L T Tolland L W Windham L W Windham L W Windham L T Tolland L What is the Problem Here? What if you Delete W11? You have “Lost” the “Windham, 1.1” Combination Also - Redundancy since “County Name, Area” is Repeated in Multiple Tuples Throughout LOTS1A Even Though LOTS1A in 3NF - Still Problems Problems with FD5: “AREA  COUNTY_NAME”

Transition to BCNF – Maintain FDs!
Add new FD5

One Example of BCNF FD5: “AREA  COUNTY_NAME”
Satisfies 3NF: COUNTY_NAME is Prime Attribute Violates BCNF: AREA not a SuperKey of LOTS1A So Do One More Split

One Example of BCNF LOTS1AX PROPERTY_ID# COUNTY_NAME LOT# AREA
T Tolland L T Tolland L W Windham L W Windham L W Windham L T Tolland L LOTS1AX PROPERTY_ID# LOT# AREA T L T L W L W L W L T L LOTS1AY AREA COUNTY_NAME Tolland Tolland Tolland Windham Windham Windham

Another Example of BCNF
Consider the TEACH Relation: in 3NF but NOT BCNF with FD1: {STUDENT, COURSE}  INSTRUCTOR FD2: INSTRUCTOR  COURSE 3 Possible Decompositions of TEACH: T1(STUDENT, INSTRUCTOR), T2(STUDENT, COURSE) T1(COURSE, INSTRUCTOR), T2(COURSE, STUDENT) T1(INSTRUCTOR, COURSE), T2 (INSTRUCTOR, STUDENT) All Three “Lose” FD1! 3rd is Best Since After Join, Recaptures FD1 and Doesn’t Generate any Spurious Tuples TEACH(STUDENT, COURSE, INSTRUCTOR)

What Does Table Look Like?
Note TEACH in 3NF but NOT BCNF

Reflections on Normalization
A Tool for Validating the Quality of the Schema, Rather than Merely as a Method for Designing a Relational Schema Promotes Each Concept of the Application Domain Mapping to Exactly One Concept of the Schema Normalization Process Actually a Process of Concept Separation Concept Separation is Result of Applying a Top-down Methodology for Producing a Schema Via Subsequent Refinements and Decompositions

Relational DB Design Process
Normalization Process Focused on Decomposition Raises Number of Questions How do we Decompose a Schema into a Desirable Normal Form? What Criteria Should the Decomposed Schemas Follow in order to Preserve the Semantics of the Original Schema? Can we Guarantee the Decomposition’s Quality? Can we Prevent the “Loss” of Information? Are Dependencies Maintained in Decomposition?

Recall Transitive FD/Update Anomalies
R = ( U, F ) U = { S#, DName, DHead } F = { S#DName, DName DHead } S# DName DHead S1 S2 S3 S4 D1 D2 D3 John Jonh Smith Black S#  Dhead” is a Transitive FD When S4 Graduates, Head Information of D3 Lost Similarly, If D5 has No Students Yet, then the Head Information cannot be Stored in this Database Update Head of Any Department Requires an Update to Every Student Enrolled in the Dept.

What are Possible Decompositions?
R = ( U, F ) U = { S#, DName, DHead } F = { S#DName, DName DHead } S# DName DHead S1 S2 S3 S4 D1 D2 D3 John Smith Black  Information Based  = { R1(S#, ), R2(DName,  R3(DHead, )} is Neither Lossless nor FD-Preserving

R = ( U, F ) U = { S#, DName, DHead } F = { S#DName, DName DHead } S# DName S1 S2 S3 S4 D1 D2 D3 S# DHead Lossless Decomposition but not Dependency-Preserving DNameDHead is lost in the decomposition S1 S2 S3 S4 John Smith Black   = { R1({S# ,DName}, {S#DName}), R2({S#, DHead}, {S#DHead})} 2is Lossless but not FD-Preserving

R = ( U, F ) U = { S#, DName, DHead } F = { S#DName, DName DHead } S# DName DName DHead Lossless & dependency-preserving decomposition S1 S2 S3 S4 D1 D2 D3 D1 D2 D3 John   = { R1({S# ,DName}, {S#  DName}) R3({DName, DHead}, {Dname  DHead})} is both Lossless and FD-Preserving

Summary of Normalization
2NF 3NF BCNF 1NF Eliminate the Partial Functional Dependencies of Non-prime Attributes to Key Attributes Eliminate the Transitive Functional Dependencies of Non-prime Attributes to Key Attributes Eliminate the Partial and Transitive Functional Dependencies of Prime (Key) Attributes to Key Lossless Decomposition but not Dependency Preserving and Dependency Preserving

The Entire Normalization Picture
1NF Eliminate Partial FDs of Non-prime Attributes to Key 2NF Eliminate Transitive FDs of Non-prime Attributes to Key 3NF Eliminate Partial and Transitive FDs of Prime Attributes to Key BCNF Eliminate Non-trivial and Non-functional Multi-Valued Dependencies 4NF Eliminate Join Dependencies that are Not Implied by Candidate Key 5NF

What are Multi-Valued Dependencies?
Focused on the Concept of Multi-Valued Dependencies A MVD X  Y Indicates that a Value of X Corresponds to Multiple Values of Y Consider EMP with MVDs: ENAME  PNAME (E works on many P) ENAME  DNAME (E has many Dependents)

What is Fourth Normal Form (4NF)?
A Relation Schema R is in Fourth Normal Form (4NF) w.r.t Dependencies F (FD and MVD) if for every Non-Trivial MVD X  Y in F+, X is a Superkey for R Reconsider EMP with MVDs: ENAME  PNAME (E works on many P) ENAME  DNAME (E has many Dependents) ENAME is Not a Superkey of R since Need Triple of ENAME, PNAME, and DNAME to Distinguish We need to Decompose EMP!

Decomposition into 4NF ENAME  PNAME is Trivial MVD: ENAME  PNAME is
Equal to EMP_PROJECTS (same for ENAME  DNAME)

What about the Supply Table?
In 4NF But Not in 5NF since: Supplier supplies Parts, Supplier supplies Projects, & Parts Used on Projects Removes Join Dependencies – Many-many-many

Concluding Remarks What have we Learned in Chapter 14?
Guidelines for “Good” Relational Design Avoiding Anomalies Functional Dependencies Augment Schema Normalization “Improves” Design Lossless Joins and Dependency Preservation Quick Look at 4NF (Informally) How is Chapter 14 Related to the Semester Project? Phase II in the Semester Project Step 1: ER to Relational Transformation (Chapter 9) Step 2: Relational Normalization (Chapter 14) which Includes Identification of FDs!

Suggest you Practice on
Problem th ed Problem th ed Problem th ed Problem th ed Problems 3, 4, and 5 in Spring 2015 Midterm Exam Problems 3 and 4 in Fall 2015 Midterm Exam We will Review in Class on Thursday!

Other Problems from Textbook Problem 15.29 6th ed
Consider the table: Orders (O#,I#,Odate, Cust#, Total_amount, Qty_ordered, Total_price, Discount%) What are the Functional Dependencies?: Is it in 2 NF ? O#  Odate Cust# Total_amt O# I#  Qty_ordered Total_price Discount% No! Odate, Cust#, Total_amt Partially Dependent on only Part of Key O# I# namely O#

How do you Fix the Single Orders Table? Is it in 3NF? Yes – No transitive functional dependencies XY and Y Z , with X as the Primary Key, where Y is not a candidate key Create Two Tables Order (O#,Odate, Cust#, Total_amount) OrderedItem (O#,I#, Qty_ordered, Total_price, Discount%)

Consider the table: CAR_SALE( Car# , Date_sold, Salesman#, Commision%, Discount_amt) Assumptions Car can be sold by Multiple Salepersons Thus, Primary Key: Car# , Salesman# w/ FDs Car#  Date_sold Car#  Discount_amt Car#  Salesman# Salesman#  Commission% Is it in 2NF? No! Car#  Date_sold and Car#  Discount_amt are not FFD on Primary Key – since Depend on Only Part

Other Problems from Textbook Problem 15.30
How Do you Convert to 2NF? Recall FDs Split into Three Tables Car#  Date_sold Car#  Discount_amt Car#  Salesman# Salesman#  Commission% CAR1( Car# , Date_sold, Discount_amt) CAR2( Car# , Salesman#) CAR3(Salesman#, Commision%)

Consider the table: TreatPatient (Doctor#, Patient#, Date, Diagnosis, Treat_code, Charge) Assumptions and FDs Patient Treated by Physician on Date with a Diagnosis, Treatment Code, and Charge Every Treatment Code has a Charge {Doctor#, Patient#, Date}{Diagnosis, Treat_code, Charge} {Treat_code}{Charge} Is it in 3NF? No - Convert

Other Problems from Textbook Problem 15.33
What do you look for? Transitivity What is the Problem? Charge a non-Key Attribute is Dependent on Treat Code another non-Key Attribute Split into Two Tables FDs Now are Fine TreatPatient (Doctor#, Patient#, Date, Diagnosis, Treat_code) BillingAmount(Treat_code, Charge)

Problem 15.35 Given Table Below - What is Candidate Key?
BOOK (Book_Name, Author, Edition, Year) Book_Name, Author, Edition Why? Sometimes Edition Issues Twice in 1 Year What is the main FD? Book_Name, Edition  Year Is it in 2NF? No: Since Year Dependent on only Part of Key Convert: BOOK (Book_Name, Author, Edition) BOOK_YEAR (Book_Name, Edition, Year)

Problem 15.35 What are the Multi-Valued Dependencies?
BOOK(Book_Name, Author, Edition) BOOK_YEAR(Book_Name, Edition,Year) What are the Multi-Valued Dependencies? How Do You Separate the Dependencies Book_Name  Author Book_Name  Edition. SPLIT INTO THREE TABLES BOOK (Book_Name, Edition) BOOK_AUTHOR (Book_Name, Edition, Author) BOOK_YEAR (Book_Name, Edition, Year)

Review of Fall 2015 Midterm PLAYER(PLName, PFName, StartYear, NumYears, UniformNumber); COACH(CLName, CFName, StartYear, EndYear); TEAM (TeamID, Year, Squad); ROSTERS(TeamID, PLName, CLName); RSRECORD(TeamID, Wins, Losses); PORECORD(TeamID, Wins, Losses); STATISTICS(PLName, TeamID, PPG, RPG, APG); TITLES(TeamID, TitleType);

Problem 3: Define FDs a. Define functional dependencies (FDs) for ONLY the tables Computer, Inventory, and SoftwareVendor. Computer(CInventNum, ComputerName, ComputerType, AccID) Inventory(InvenNum, SerialNum, PONum, PODate, DeliveredDate, POCost, VendorID) SoftwareVendor (SVendorID, SVName, SVAddr, SWName, SWVersion, SWDesc) b. Identify all multi-valued dependencies in ONLY the table SoftwareVendor. SoftwareVendor (SVendorID, SVName, SVAddr, SWName, SWVersion, SWDesc)

Problem 3- Functional Dependencies (Spr15)
Computer( CInventNum, ComputerName, ComputerType, AccID); CInventNum  ComputerName, ComputerType CInventNum, AccID  ComputerName, ComputerType Inventory( InvenNum, SerialNum, PONum, PODate, DeliveredDate, POCost, VendorID); PONum  InventNum, SerialNum, PODate, DeliveredDate, etc. InventoryNum  SerialNum (or some other one not involving PO) SoftwareVendor (SVendorID, SVName, SVAddr, SWName, SWVersion, SWDesc); SVendorID  SVName, SVAddr SVendorID, SVName, SWName, SWVersion SWDesc, SWAddr SVendorID   SWName (first two basically equivalent) SVName   SWName SWName   SWVersion

Prob 4- Relational Schema Analysis (Spr15)
Analyze the Schema w.r.t. What is a Good DB Schema?" (Slide 14-7) and four Guidelines (Slides 14-11, 14-21, 14-26/27, and 14-39) that are focused on, Consider the Cases of: Represent a Single Entity, Redundant Information Update/Insert/Delete Anomalies Null Values Spurious Tuples

Computer Accessories Schema
Computer(CInventNum, ComputerName, ComputerType, AccID); Accessory(AccID, AInventNum, HVendorID, AccName, AccType, AccSize); Software(SInventNum, SVendorID, SWName, SWVersion); Inventory(InvenNum, SerialNum, PONum, PODate, DeliveredDate, POCost, VendorID); InstalledSoftware(CInventNum, SWInventNum); HardwareVendor (HVendorID, HVName, HVAddr, ModelNum, ModelName, ModelDescr); SoftwareVendor (SVendorID, SVName, SVAddr, SWName, SWVersion, SWDesc); Vendor(VendorID, HWFlag, HVendorID, SWFlag, SVendorID);

Prob 4 – Relational Schema Analysis (Spr15)
Computer( CInventNum, ComputerName, ComputerType, AccID); Insert - For a new computer, you must always insert an accessory (since it is part of the key). If there are N accessories, there are N rows for each computer. Update – if you change the Name or Type, you must change all N tuples. Delete - No obvious delete anomalies. Conclusion: Computer represents two different entities (Guideline 1) – the computer and its accessories, and as a result, violates Guideline 2 in regards to insert anomalies. A better design would separate accessories in a similar manner to the Installed Software table. Accessory( AccID, AInventNum, HVendorID, AccName, AccType, AccSize); From an inventory control perspective, there is no way to track the total number of each accessory that has been purchased. You may have 10 USB 120 Gig external hard drives, and each one would have its own AccID and AIventNum. The other problem is related to Guideline 3 due to null values for AccSize (limited problem). Insert, Delete, and Update: No obvious anomalies. Conclusion: The table is OK – but it could be improved by separating out the different types of accessories (that have been purchased). It may also make sense not to track this at all in their gory detail – many companies (UConn included) don’t track equipment that is less than $1000, and many of these fit into that category.

Software( SInventNum, SVendorID, SWName, SWVersion); The only real problem in this table is that SVendorID, SWName, and SWVersion are foreign keys into the SoftwareVentor table, and as a result, this information is replicated in both tables. Conclusion: There may be a better way to design the Software, Installed Software, and SoftwareVendor tables, particularly in regards to reducing the key size (and hence the foreign key linkages). Inventory(InvenNum, SerialNum, PONum, PODate, DeliveredDate, POCost, VendorID); This table suffers violates two guidelines: Guideline 1 in regards to representing two different entities (inventory and purchase orders), and Guideline 3 in regard to an excessive amount of null values. Insert, Delete, and Update: No obvious anomalies. Conclusion: Split into two different tables: Inventory (InvenNum, SerialNum, PONum) and PurchaseOrder (PONum, PODate, DeliveredDate, POCost, VendorID) which will address Guideline 1 and will not result in a Inventory tuple until the item is actually received. DeliveredDate will still be null for all outstanding orders.

InstalledSoftware( CInventNum, SWInventNum); Vendor( VendorID, HWFlag, HWVendorID, SWFlag, SWVendorID); InstalledSoftware is dealing with two foreign key references to the Computer and Software tables, respectively. Vendor is allowing us to unify the different ID tracking systems for software and hardware vendors. The only problem with Vendor is that there are potentially null values for companies that sell either hardware or software but not both. You could argue that the flags are not needed in Vendor as well, since the null values (or not-null) has this information. Conclusion: In the case of Vendor (and VendorID, SVendorID, HVendorID), this may be a poor design and if the database has not been deployed, it may make sense to totally redesign this identifier to have a single identifier. This would allow the Vendor table to be eliminated. This would separate vendor common information into a single Vendor (VendorID, VName, VAddr). This would eliminate the null value (Guideline 3) problem of Vendor. Thus, changes to Vendor would impact both the HardwareVendor and SoftwareVendor Tables.

HardwareVendor ( HVendorID, HVName, HVAddr, ModelNum, ModelName, ModelDescr); SoftwareVendor (SVendorID, SVName, SVAddr, SWName, SWVersion, SWDesc); Both tables have insert anomalies (can’t insert HWV or SWV without inserting a product), delete anomalies (if you delete the last item for a vendor you delete the vendor), and update anomalies (if you change an address, or a name – buyout, you need to change multiple tuples) – Guideline 2 is a real issue in this regard. Both tables are representing two different entities: the contact information for a vendor (id, name, and address) and the vendors products – violating Guideline 1 for having a relation only represent a single entity. As a result, the keys are convoluted – you need to have a ModelNum for HardwareVendors and a compound key for SoftwareVendors; this makes the foreign key references more complicated. Conclusion: As mentioned, redesign the tables Vendor, HWVendor, and SWVendor to pull out their commonalities and unify the identifier. Use Vendor as defined on the previous slide, use VendorID in the HWVendor and SWVendor tables while dropping Name and Addr from those tables.

Problem 5- Normalization (Spr15)
Consider INVOICE relation in First Normal Form given below with key indicated: INVOICE(OrderID, OrderDate, CustID, CustName, CustAddr, ProdID, ProdDesc, UnitPrice, OrderedQuantity) with the functional dependencies: OrderID, ProdIDg  OrderedQuantity OrderID  fOrderDate, CustID, CustName, CustAddrg CustID  fCustName, CustAddrg ProdID  fProdDesc, UnitPrice a. Identify the Partial, Full, Transitive Dependencies b. Provide both 2NF and 3NF

Problem 5- Normalization (Spr15)
INVOICE( OrderID, OrderDate, CustID, CustName, CustAddr, ProdID, ProdDesc, UnitPrice, OrderedQuantity) FULL A. {OrderID, ProductID}  OrderedQuantity PART B. OrderID  {OrderDate, CustID, CustName, CustAddr} TRANS C. CustID  {CustName, CustAddr} PART D. ProdID  {ProdDesc, UnitPrice} Remove Partial Dependencies ORDER_LINE( OrderID, ProdID, OrderedQuantity) PRODUCT(ProdID, ProdDesc, UnitPrice) CUST_ORDER( OrderID, OrderDate, CustID, CustName, CustAddr) Remove Transitive Dependency in CUST_ORDER ORDER_LINE( OrderID, ProdID, OrderedQuantity) PRODUCT(ProdID, ProdDesc, UnitPrice) ORDER( OrderID, OrderDate, CustID) CUSTOMER( CustID, CustName, CustAddr)

Review of Fall 2015 Midterm Computer(CInventNum, ComputerName, ComputerType, AccID); Accessory(AccID, AInventNum, HVendorID, AccName, AccType, AccSize); Software(SInventNum, SVendorID, SWName, SWVersion); Inventory(InvenNum, SerialNum, PONum, PODate, DeliveredDate, POCost, VendorID); InstalledSoftware(CInventNum, SWInventNum); HardwareVendor (HVendorID, HVName, HVAddr, ModelNum, ModelName, ModelDescr); SoftwareVendor (SVendorID, SVName, SVAddr, SWName, SWVersion, SWDesc); Vendor(VendorID, HWFlag, HVendorID, SWFlag, SVendorID);

Problem 3 - Update Anomalies (Fa15)
No Modify Anomaly - only one entry per player/coach (unique LNames) Insert Anomaly - Player in past can’t be coach in future - Player or coach leaves for 1 (or more years) and then returns - no way to store his/her return. No Delete Anomaly-only one entry per player/coach (unique LNames) Note: Lots of null values due to capturing two types of people.

There are numerous problems for this table, since many values are replicated. For Example, teams that win multiple titles (NCAA and BigEastRS) must have each player and coach listed twice to capture this data. If you insert a player for a past team (that was omitted), you would have to make sure you inserted the player for all Titles of TeamID. Specifically: Insert: Can’t have a Team without having a player. Can’t have TitleType unless there is a Team with that title Delete: Last Player on a team - loose the team Modify: Change PLName, impact all TeamIDs for the player Basically - I looked for reasoning and a solid argument for this table.

Modify Anomaly - Whenever RSWins or RSLosses is modified (for a win or a loss), TTLWins or TTLLosses must be incremented. No Insert Anomaly - only one entry per team. No Delete Anomaly - no information is lost on anything but the team. No Modify Anomaly - PPG, RPG, and APG are independent of one another and no values in common across different players. No Insert Anomaly - only one entry per player. No Delete Anomaly- no information is lost on anything but the player.

Problem 4 - Functional Dependencies (Fa15)
LName  FName, StartYear LName, PFlag  NumYears, UniformNumber LName, CFlag  EndYear Year, Squad  CLName CLName  TitleType TeamID  Year, Squad PLName  TitleType CLName  TeamId, Year, Squad TeamID, Year, Squad  TitleType TeamID, Year, Squad  PLName

Problem 4 - Functional Dependencies (Fa15)
TeamID  RSWins, RSLosses, TTLWins, TTLLosses PLName, TeamID  PPG, RPG, APG PLName   PPG, RPG, APG PLName   TeamID TeamID   PLName

Problem 5 (Fa15) - Normalization
NBAPLAYER( Name, Year, Coach, Team, State, Salary) 2NF Team Depends on {Year, Name} which is part of key Name, Year, Coach Salary Depend on Coach which is part of key Name, Year, Coach 5pts NBAPLAYER1(Name, Year, Coach, Salary) 4pts NBAPLAYER2(Name, Year, Team, State)

Problem 5 (Fa15) - Normalization
NBAPLAYER( Name, Year, Coach, Team, State, Salary) 3NF – Look at NBAPLAYER2 TABLE For Team  State – State depends on Team which is NOT part of a key Transitive from Name,Year to Team to State 3pts NBAPLAYER2a(Name, Year, Team) 3pts NBAPLAYER2b(Team, State)

Prof. Steven A. Demurjian, Sr.

Similar presentations

Presentation on theme: "Prof. Steven A. Demurjian, Sr."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Prof. Steven A. Demurjian, Sr.

Similar presentations

Presentation on theme: "Prof. Steven A. Demurjian, Sr."— Presentation transcript:

Similar presentations

About project

Feedback