ABSTRACT OF FIRST LECTURE then … the second lesson.

Slides:



Advertisements
Similar presentations
Functional Dependencies and Normalization for Relational Databases
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Announcements Read 6.1 – 6.3 for Wednesday Project Step 3, due now Homework 5, due Friday 10/22 Project Step 4, due Monday Research paper –List of sources.
Announcements Read 5.1 – 5.5 for today Read 5.6 – 5.7 for Wednesday Project Step 3, due Monday 10/18 Homework, due Friday 10/15 – by Research paper,
NORMALIZATION. Normalization Normalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =
Ch 10, Functional Dependencies and Normal forms
Functional Dependencies and Normalization for Relational Databases.
The Relational Model System Development Life Cycle Normalisation
Chapter 8 Normal Forms Based on Functional Dependencies Deborah Costa Oct 18, 2007.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Database Design Theory Which tables to have in a database Normalization.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Chapter 14 Advanced Normalization Transparencies © Pearson Education Limited 1995, 2005.
Introduction to Schema Refinement
Ch 7: Normalization-Part 2 Much of the material presented in these slides was developed by Dr. Ramon Lawrence at the University of Iowa.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Normalization B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Week 6 Lecture Normalization
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious.
Announcements Read 5.8 – 5.13 for Monday Project Step 3, due Monday 10/18 Homework 4, due Friday 10/15 – by (or turn in Monday in class)
King Saud University College of Computer & Information Sciences Computer Science Department CS 380 Introduction to Database Systems Functional Dependencies.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Normalization for Relational Databases.
Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.
Objectives of Normalization Develop a good description of the data, its relationships and constraints Produce a stable set of relations that Is a faithful.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Topic 10 Functional Dependencies and Normalization for Relational Databases Faculty of Information Science and Technology Mahanakorn University of Technology.
Instructor: Churee Techawut Functional Dependencies and Normalization for Relational Databases Chapter 4 CS (204)321 Database System I.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Normal Forms through BCNF CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
Functional Dependencies and Normalization for Relational Databases.
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
Normalization Ioan Despi 2 The basic objective of logical modeling: to develop a “good” description of the data, its relationships and its constraints.
Further Normalization I
Chapter Functional Dependencies and Normalization for Relational Databases.
11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003.
1 Functional Dependencies and Normalization Chapter 15.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
Design Process - Where are we?
9/23/2012ISC329 Isabelle Bichindaritz1 Normalization.
Normalization. 2 u Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data,
Normalization.
Chapter 7 Functional Dependencies Copyright © 2004 Pearson Education, Inc.
Databases Illuminated Chapter 6 Normalization. Objectives of Normalization Develop a good description of the data, its relationships and constraints Produce.
Databases Illuminated
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Ch 7: Normalization-Part 1
11/10/2009GAK1 Normalization. 11/10/2009GAK2 Learning Objectives Definition of normalization and its purpose in database design Types of normal forms.
Al-Imam University Girls Education Center Collage of Computer Science 1 st Semester, 1432/1433H Chapter 10_part 1 Functional Dependencies and Normalization.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
NormalisationNormalisation Normalization is the technique of organizing data elements into records. Normalization is the technique of organizing data elements.
Objectives of Normalization  To create a formal framework for analyzing relation schemas based on their keys and on the functional dependencies among.
ITD1312 Database Principles Chapter 4C: Normalization.
Copyright © Curt Hill Schema Refinement II 2 nd NF to 3 rd NF to BCNF.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
Functional Dependency and Normalization
Announcements Read 5.1 – 5.5 for today Read 5.6 – 5.7 for Wednesday
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Relational Database Design by Dr. S. Sridhar, Ph. D
Module 5: Overview of Normalization
Database Management System
Chapter 7a: Overview of Database Design -- Normalization
Presentation transcript:

ABSTRACT OF FIRST LECTURE then … the second lesson

Normalization Ioan Despi

Logical modeling: to develop a “good” description of the data, its relationships and its constraints For the relational model: to identify a suitable set of relations Normalization = a logical design process that produces a stable set of relations, free of anomalies and with reduced redundancy Anomaly = an inconsistent, incomplete, or contradictory state of the database Update Insertion Delete ANOMALIES

A relation is in a specific NF if it satisfies the set of requirements or constraints for that form. All the NF are nested, in that each satisfies the constraints of the previous one but is a “better” form because each eliminates flaws found in the previous one All relations 1NF 2NF 3NF BCNF 4NF 5NF

design objective: to put the schema in the highest NF that is practical and appropiate for the application Normalization = putting a relation into a higher NF using functional dependencies multivalued dependencies join dependencies B is functionally dependent on A if each value of A in R has associated with it exactly one value of B in R A ---> B “A functionally determines B”, “A determines B” R.A ---> R.B to be more specific The (set of)attribute(s) on the left side of the arrow is called determinant

STUDENT(Sid, Sname, Major, Credits, Status, SSN) Sid ---> Sname Sid --> Same, Major, Credits, Status, SSN SSN--> Same, Major, Credits, Status, Sid

Inference rules (Armstrong’s axioms): used to find all the FDs logically implied by a set of FD First normal form (1NF): A relation is in 1NF if and only if every attribute is single - valued for each tuple.

Class (Course#, Sid, Sname, Facid, Sched, Room, Grade) FD: {Course#, Sid} --> Sname, Facid, Sched, Room, Grade Course# --> Facid, Sched, Room Sid ---> Sname {Course#, Sid} --> Grade Definition. Attribute B is fully functional dependent on an attribute A if B is functionally dependent on A but not functionally dependent on any proper subset of A. A relation is in 2NF iff it is in 1NF and all the nonkeys attributes are fully functional dependent on the key. 2NF

A 1NF relation that is not in 2NF can be transformed into an equivalent set of 2NF relations: We perform projections on the original relation s.t. it is possible to get back the original by taking the join of projections To make a relation 2NF: Identify each non full FD Form projections by removing the attributes that depend on each of the determinants so identified These determinants are placed in separate relations along with their dependent attributes The original relation still contains the composite key and any attributes that are fully functional dependenton it

Second Normal Form (2 NF) A relation is in 2NF iff it is in 1NF and all the nonkeys attributes are fully functional dependent on the key. CLASS2 (Course#, Sid, Grade)STUD (Sid, Sname) COURSE(Course#, Facid, Sched, Room) Class (Course#, Sid, Sname, Facid, Sched, Room, Grade)

Transitive dependency: STUD (Sid, Sname, Major, Credits, Status) Sid --> Credits Credits --> Status Sid --> Status Definition. A relation is in 3NF if it is in 2NF and no nonkey attribute is transitively dependent on that key. STUD2 (Sid,Sname, Major, Credits)STATS (Credits, Status) STUD (Sid, Sname, Major, Credits, Status) 3NF

3NF: “each nonkey attribute must depend on the key, the whole key and nothing but the key”

Definition. A relation is in BCNF if and only if every determinant is a candidate key. For a relation with only one candidate key, 3NF and BCNF are equivalent. In fact, some authors refer to this definition as the standard one for 3NF. Note that unlike previous forms that started with relations already in the lower normal form, this definition does not state that the relation must first be 2NF and then satisfy an additional condition. Therefore, to check for BCNF, we simply identify all the determinants and make sure that they are candidat ekeys. However, all relations that are BCNF are also 3NF.

Definition. A relation is in BCNF if and only if every determinant is a candidate key. STUD (Sid, Sname, Major, Credits, Status) Sid --> Credits Credits --> Status The determinants: Sid, Credits Since Credits is not a candidate key, this relation is not in BCNF Performing the projections we did in slide 12, the resulting relations are BCNF STUD2 (Sid,Sname, Major, Credits) STATS (Credits, Status) {Course#, Sid} --> Sname, Facid, Sched, Room, Grade Course# --> Facid, Sched, Room Sid ---> Sname {Course#, Sid} --> Grade Class (Course#, Sid, Sname, Facid, Sched, Room, Grade) CLASS2 (Course#, Sid, Grade)STUD (Sid, Sname) COURSE(Course#, Facid, Sched, Room) The determinants Course# and Sid are not (by themself) a candidate key. Therefore, the Class relation is not BCNF. However, the relations resulting from the projections are BCNF.

Definition. A relation is in BCNF if and only if every determinant is a candidate key. FACULTY (Facname, Dept, Office, Rank, Datehired) Example with overlapping keys: Office  Dept Facname, Dept  Office, Rank, Datehired Facname, Office  Dept, Rank, Datehired Candidate keys: Facname, Dept Facname, Office FAC1 (Dept, Office) FAC2 (Facname, Office, Rank, Datehired ) Assume: faculty names are not unique no two faculty members within a single department have the same name each faculty member has only one Office a department may have several faculty offices faculty members from the same department may share offices I f we choose {Facname, Dept} as PK, we are left a determinant key, Office, that is not a candidate key. BCNF relations: T he relation is 3NF because there is no transitive dependency. Even though Office determines Dept, since it is not part of the key, 3NF is not violated. Note, however, that our final scheme does not show the functional dependency Facname, Dept  Office, Rank., Datehired

Note. If we had chosen Facname, Office as the primary key of the original FACULTY relation, we would have Office  Dept. Since Office is not a candidate key, the relation would not be a BCNF. In fact, it would not be a 2NF, since Dept would not be fully functionally dependent on the key, {Facname, Office} A relation may have overlapping candidate keys and still be BCNF. For example, let us consider the relation: STU (Sname, Sadd, Major, GPA) We will assume that a student can have only one major and that both the combinations {Sname, Sadd} and {Sname, Major} are unique, so we have two overlapping candidate keys. If we choose (Sname, Sadd} as the primary key, our only other determinant, {Sname, Major}, is a candidate key. Therefore, the relation STU is already in BCNF and there is no need to decompose it. Any relation that is not BCNF can be decomposed into BCNF relations by the method illustrated. However, it may not always be desirable to transform the relation into BCNF. if there is functional dependency that is not preserved when we perform the decomposition (i.e., the determinant and the attributes it determines end up in different relations), it is difficult to enforce the functional dependency in the database and an important constraint is lost. In that case, it is ppreferably to settle for 3NF, which always allows us to preserve dependencies

Comprehensive example of functional dependencies: table WORK

1. Each project has a unique name, but names of employees and managers are not unique 2. Each project has one manager, whose name is stored in Prjmgr 3. Many employees may be assigned to work on each project, and an employee may be assigned to more than one project. Hours tells the number of hours per week that a particular employee is assigned to work on a particular project 4. Budget stores the amount budgeted for a project, and Startdate gives the starting date for a project 5. Salary gives the annual salary of an employee 6. Empmgr gives the name of the employee’s manager, eho is not the same as the project manager 7. Empdept gives the employee’s department. Department names are unique. The employee’s manager is the manager of the employee’s department. 8. Rating gives the employee’s rating for a particular project. The project manager assigns the rating at the end of the employee’s work on that project. Prjname --> Prjmgr, Bufget, Startdate Empid --> Empname, Slary, Empmgr, Empdept Prjname, Empid --> Hours, Rating

Since we assumed that people’s names were not unique, Empmgr does not functionally determine Empdept However, since department names are unique and each department has only one manager, we add Empdept --> Empmgr Because people’s names are not unique, Prjmgr does not determine Prjname. Remember that FD does not mean “causes” or “figures out”, so ther eare no FD between Prjmgr [NOT -->] {Budget, Rating} Since we see that every attribute is Fd on the combination {Prjname, Empid}, we will choose that combination as our primary key 1NF: with our composite key, each cell would be single-valued, so WORK is in 1NF 2NF: We found partial (nonfull) dependencies: Prjname --> Prjmgr, Budget, Startdate Empid --> Empname, Salary, Empmgr, Empdept We transform the relation into an equivalent set of 2NF relations by projection, resulting in: PROJ (Prjname, Prjmgr, Budget, Startdate) EMP (Empid, Empname, Salary, Empmgr, Empdept) WORK1 (Prjname, Empid, Hours, Rating)

3NF: Using the set of the above projections, we test each relation for 3NF. PROJ is 3NF because no nonkey attribute functionally determines another nonkey attribute. In EMP we have a transitive dependency, Empdept --> Empmgr. Therefore, we need to rewrite EMP as: EMP1 (Empid, Empname, Salary, Empdept) DEPT (Empdept, Empmgr) In WORK1, there is no FD between Hours and Rating, so the relation is already in 3NF BCNF: Our new set of relations is also BCNF, since in each relation the only determinant is the primary key: PROJ (Prjname, Prjmgr, Budget, Startdate) WORK1 (Prjname, Empid, Hours, Rating) EMP1 (Empid, Empname, Salary, Empdept) DEPT (Empdept, Empmgr)

We already know that our original relation could not have been BCNF, since it was not even in 2NF. To verufy the fact that WORK was not BCNF, all we need to do was find a determinant that was not a candidate key. Any one of Empid, Empdept or Projname is sufficient to show that the original WORK relation was not BCNF Although Boyce-Codd normal form removes any anomalies due to functional dependencies, further research by Fagin led to the identification of another type of dependency called a multivalued dependency that can cause similar design problems.

Multivalued Dependencies FACULTY (Facid, Dept, Committee) Here we will assume that a faculty member can belong to more than one department. For example, a professor can be hired jointly by the CSC and Math departments. A faculty member can belong to several college-wide committees, each identified by the committee name. There is no relationship between department and committees. To make the relation 1NF, we must rewrite it as shown in next slide. Notice that we are forced to write all the combinations of Dept values with Committee values for each faculty member, or else it would appear that there is some relationship between Dept and Committee.

The key is {Facid, Dept, Committee} T he resulting relation is BCNF, since the only determinant is the key. A lthough we have taken care of all FD there are still update, insertion and deletion anomalies If we want to update a committee that F101 belongs to from Budget to Advancement, we need to make two changes or have inconsistent data If we want to insert the record of a faculty member who belongs to one or more departments but who is not on any committee, we are unable to do so, since Committee is part of the key, so null values are not permitted in that column. If F221 drops membership on the Library Committee, we lose all the rest of the information stored for him, since we are not permitted to have a null value for an attribute of the key.

Definition. Let R be a relation having attributes or sets of attributes A, B and C. There is a multivalued dependency of attribute B on attribute A if and only if the set of B values associated with a given A value is independent of the C values. A  B “ A multidetermines B” In R (A, B, C), if A  B, then A  C as well. Earlier we found that similar problems were caused by functional dependencies but there are none in this example, so we need to identify a new cause. Although a faculty member is not associated with only one department, he is certainly associated with a particular set of departments. Similarly, a faculty member is associated with a specific set of committees at any given time. The set of departments for a particular Facid is independent of the set of committees for that faculty member. This independence is the cause of the problems. To see how we can correct it, we need another definition.

Definition. A relation is in 4NF if and only if it is in BCNF and there are no nontrivial multivalued dependencies. The FACULTY relation from slide 27 is not in 4NF because of the non trivial multivalued dependencies: Facid  Dept Facid  Committee When a relation is BCNF but not 4NF, we can transform it into an equivalent set of 4NF relations by projection. We form two separate relations, placing in each the attribute that multideteermines the others, along with one of the multidetermined attributes or sets of attributes.

FAC1 (Facid, Dept) FAC2 (Facid, Committee)

Definition. A decomposition of a relation R is a set of relations {R 1, R 2, …, R n } such that each R i is a subset of R and the union of all the R i is R. Note that the R i need not be disjoint. Definition. A decomposition {R 1, R 2, …, R n } of a relation R is called losless decomposition for R if the natural join of R 1, R 2, …, R n produces exactly the relation R. Not every decomposition is losless. It is possible to produce a decomposition that is lossy, one that loses information.

Each tuple in this table shows that a certain student received the specified grade in the course listed.

A join which produces more tuples than in the original table These spurious tuples were introduced by the operations performed. Since, without the original table, we would have no way of identifying such tuples are genuine and which are spurious, we actually lose information (even though we have more tuples) if we substitute the projections for the original relation We can guarantee that the decomposition is lossless by making sure that for each pair of relations that will be joined, the set of common attributes is a determinant of one of the relations. We can do this by placing functionally dependent attributes in a relation with their determinants and keeping the determinants themselves in the original relation. More formally, if R is decomposed into two relations, {R 1, R 2 } the join is lossless if and only if either of the following holds in the closure of the set of FDs for R: R 1  R 2  R 1 - R 2 or R 1  R 2  R 2 - R 1

Definition. A relation is in 5NF if no remaining nonless projections are possible, except the trivial one in which the key appears in each projection An alternative formulationof 5NF: with the notion of join dependency:A join dependency means that a relation can be reconstructed by taking the join of its projections. Thus, if R (A, B, C) is decomposed into R 1 (A, C) and R 2 (B, C), a join dependency exists if we can get back R by taking the join of R 1 and R 2, R = R 1 JOIN R 2 As we have seen from our MARKS example, not all projections have this property.

Definition. A relation is in domain-key normal form (DKNF) if every constraint is a logical consequence of domain constraints or key constraints. Fagin have proved that a relation in this form cannot have update, insertion or deletion anomalies. Therefore, this form represents the ultimate NF with respect to these defects. Domain = the set of allowable values for an attribute Key = a unique identifier for each entity (we called it superkey) Constraint = a logical predicate which we can verify by examining instances of the relation Although FD, MD and JD are constraints, there are other types, called “general constraints”, as well ( intrarelation dependencies).