Presentation is loading. Please wait.

Presentation is loading. Please wait.

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-1 David M. Kroenke’s Chapter Three: The Relational Model and Normalization.

Similar presentations


Presentation on theme: "DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-1 David M. Kroenke’s Chapter Three: The Relational Model and Normalization."— Presentation transcript:

1 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-1 David M. Kroenke’s Chapter Three: The Relational Model and Normalization Database Processing: Fundamentals, Design, and Implementation

2 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-2 Chapter Premise We have received one or more tables of existing data The data is to be stored in a new database Should the data –be stored as received, or –be transformed for storage?

3 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-3 Example 1

4 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-4 Example 2

5 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-5 How Many Tables? Should we store these two tables as they are, or should we combine them into one table in our new database?

6 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-6 How Many Tables?

7 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-7 Data Redundancy Data redundancy results in data inconsistency –Different and conflicting versions of the same data appear in different places –Errors more likely to occur when the same data must be entered in several different places Data anomalies develop when required changes in redundant data are not made successfully

8 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-8 Modification Anomalies Update anomalies –Occur when changes must be made to existing records Insertion anomalies –Occur when entering new records Deletion anomalies –Occur when deleting records

9 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-9 Modification Anomalies The EQUIPMENT_REPAIR table before and after an incorrect update operation on AcquisitionCost for Type = Drill Press:

10 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-10 The Relational Model Introduced in 1970 Created by E.F. Codd –IBM engineer –The model used the mathematical system known as “relational algebra” Today it is the standard for commercial DBMS products

11 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-11 Important Relational Model Terms Entity Relation (table) Functional Dependency Determinant Candidate Key Composite Key Primary Key Surrogate Key Foreign Key Referential integrity constraint Normal Form Multivalued Dependency

12 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-12 Entity An entity is some identifiable thing that users want to track: –Customer –Computer –Sale –Student –Invoice –Department –Course –Policy

13 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-13 Relation Data about entities is stored in relations A relation is a two-dimensional table with these characteristics: –Rows contain data about an entity –No two rows may contain identical data –Columns contain data about attributes of the entity –All entries in a column are the same data type –Each column has a unique name –Cells of the table hold a single value

14 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-14 A Very Generic Relation

15 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-15 Objectives of Normalization Develop a good description of the data, its relationships and constraints Produce a stable set of relations that –Is a faithful model of the enterprise –Is highly flexible –Reduces redundancy saves space reduces data inconsistency –Is free of all anomalies

16 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-16 Anomalies are very bad!

17 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-17 Anomalies An anomaly is an inconsistent, incomplete, or contradictory state of the database –Insertion anomaly – user cannot insert a new record when it should be possible to do so –Deletion anomaly – when a record is deleted, other information that is tied to it is also deleted (not by design) –Update anomaly – a record is updated, but other appearances of the same data are not

18 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-18 Data redundancy leads to anomalies Find examples of insertion, deletion & update anomalies

19 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-19 Normalization Normalization plays an important role in database design. Through it, we decompose relations (tables) in stages from lower to higher normal forms –1NF, 2NF, 3NF, BCNF –Other normal forms are 4NF, 5NF, DKNF We use normalization and E-R modeling together for good database design It all starts with identifying functional dependencies (FD’s)

20 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-20 Functional Dependency If A and B are (sets of) attributes of relation R, B is functionally dependent on A if… –a particular value of A determines a unique value of B. Emp_Name is functionally dependent on Emp_Num if I give you a particular value for Emp_Num, you can find the name of the employee (Emp_Name) with that Emp_Num A→B says A determines B –or B is dependent on A

21 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-21 Example of FDs R = NewStudent(stuId, lastName, major, credits, status, ssn ) Some FDs in R: (stuId)→(lastName), but not the reverse (stuId) →(lastName, major, credits, status, ssn, stuId) (ssn) →(stuId, lastName, major, credits, status, ssn) (credits)→(status), but not (status)→(credits)

22 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-22 Functional Dependency A functional dependency occurs when the value of one (a set of) attribute(s) determines the value of a second (set of) attribute(s): StudentID  StudentName StudentID  (DormName, DormRoom, Fee) The attribute on the left side of the functional dependency is called the determinant Functional dependencies may be based on equations: ExtendedPrice = Quantity X UnitPrice (Quantity, UnitPrice)  ExtendedPrice But, functional dependencies are not equations!

23 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-23 Composite Determinants A determinant of a functional dependency may itself consist of more than one attribute: (StudentName, ClassName)  (Grade) Note that StudentName  Grade

24 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-24 Functional Dependencies in the SKU_DATA Table Can you find three FDs?

25 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-25 Functional Dependencies in the ORDER_ITEM Table Can you find two?

26 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-26 Keys are determinants A key is a combination of one or more attributes that is used to identify rows in a relation A composite key is a key that consists of two or more attributes

27 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-27 Keys & FD’s Superkey – functionally determines all attributes in a relation Candidate key – a superkey that is a minimal identifier Primary key - chosen candidate key –Must always be filled (non-null) –Must be unique –May be composite –Ideally, it is short, numeric and never changes Entity integrity

28 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-28 Surrogate Keys A surrogate key is an artificial column added to a relation to serve as a primary key: –DBMS supplied –Short, numeric and never changes – an ideal primary key! –Has artificial values that are meaningless to users –Normally hidden in forms and reports

29 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-29 Utility of Surrogate Keys RENTAL_PROPERTY without a surrogate key: RENTAL_PROPERTY (Street, City, State/Province, Zip/PostalCode, Country, Rental_Rate) RENTAL_PROPERTY with a surrogate key: RENTAL_PROPERTY (PropertyID, Street, City, State/Province, Zip/PostalCode, Country, Rental_Rate)

30 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-30 Foreign Keys A foreign key is the primary key of one relation that is placed in another relation to form a link between the relations: –A foreign key can be a single column or a composite key –The term refers to the fact that key values are foreign to the relation in which they appear as foreign key values

31 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-31 Foreign Keys NOTE: The primary keys of the relations are underlined and any foreign keys are in italics in the relations below: DEPARTMENT (DepartmentName, BudgetCode, ManagerName) EMPLOYEE (EmployeeNumber, EmployeeName, DepartmentName)

32 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-32 Referential Integrity Relations must exhibit referential integrity –A foreign key field must either Contain a value that equals a primary key value in the corresponding relation, or Be NULL

33 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-33 Referential Integrity Example NOTE: The primary key of the relation is underlined and any foreign keys are in italics in the relations below: SKU_DATA (SKU, SKU_Description, Department, Buyer) ORDER_ITEM (OrderNumber, SKU, Quantity, Price, ExtendedPrice) Where ORDER_ITEM.SKU must exist in SKU_DATA.SKU

34 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-34 Another Referential Integrity Example When might a foreign key field be NULL? –A CUSTOMER may have no AGENT –An EMPLOYEE may have no DEPARTMENT

35 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-35 Normal Forms 1NF – A table that qualifies as a relation is in 1NF 2NF – A relation is in 2NF if all of its nonkey attributes are dependent on every attribute in the primary key 3NF – A relation is in 3NF if it is in 2NF and has no determinants except the primary key Boyce-Codd Normal Form (BCNF) – A relation is in BCNF if every determinant is a candidate key

36 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-36 First Normal Form (1NF) A relation is in 1NF if every attribute is single-valued for each tuple –each cell of the table contains only one value Domains of attributes are atomic –No sets –No lists –No repeating fields or groups allowed

37 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-37 This relation is not in 1NF stuidlastNamemajor creditsstatus ssn S1001 Smith History 90 Sr 100429500 S1003Jones Math 95 Sr 010124567 S1006Lee CSC 15 Fr 088520876 Math S1010 Burns Art 63 Jr 099320985 English S1060 Jones CSC 25 Fr 064624738 (Assume students can have double majors)

38 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-38 Decomposing into 1NF Create a new table for each multi-valued attribute –Put the PK of the original table and the multi-valued attribute in this table –The PK of this new table is composite –Will have additional rows for each value of the attribute Remove the multi-valued attribute from the original table NewStu2(stuId, lastName, credits,status, ssn) Majors(stuId, major)

39 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-39 stuId major S1001History S1003Math S1006CSC S1006Math S1010Art S1010English S1060CSC Two new tables S1001 Smith 90 Sr 100429500 S1003Jones 95 Sr 010124567 S1006Lee 15 Fr 088520876 S1010 Burns 63 Jr 099320985 S1060 Jones 25 Fr 064624738 stuId lastName credits status ssn NewStu2 Majors

40 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-40 Another method for 1NF “Flatten” the original table by making the multi-valued attribute part of a new composite key Student(stuId, lastName, major, credits, status, ssn) –See next slide…

41 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-41 Flattened and in 1NF stuidlastNamemajor creditsstatus ssn S1001 Smith History 90 Sr 100429500 S1003Jones Math 95 Sr 010124567 S1006Lee CSC 15 Fr 088520876 S1006Lee Math 15 Fr 088520876 S1010 Burns Art 63 Jr 099320985 S1010 Burns English 63 Jr 099320985 S1060 Jones CSC 25 Fr 064624738 NewStu Table with PK (stuID, major)

42 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-42 Another (generally not-so-good) method for 1NF If the number of repeats is specified or limited, can make additional columns for multiple values Student(stuId, lastName, major1, major2, credits, status, ssn)

43 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-43 This relation is also not in 1NF

44 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-44 The “flattening” approach works best here But we are still prone to all forms of anomalies, and so we must go on to transform this into 2NF and above.

45 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-45 Second Normal Form (2NF) A relation is in second normal form (2NF) if it is in first normal form and all the non-key attributes are fully functionally dependent on the key. –No non-key attribute is FD on just part of the key –If R’s key has only one attribute (ie, is not composite), and R is 1NF, R is automatically in 2NF

46 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-46 Full Functional Dependency Transforming to 2NF requires that all dependencies within a relation are full functional dependencies –ie., no partial dependencies on the key In relation R, a set of attributes B is fully functionally dependent on set of attributes A if B is functionally dependent on A… –but not functionally dependent on any proper subset of A This means every attribute in A is needed to functionally determine B

47 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-47 NewClass is not in 2NF NewClass(courseNo, stuId, lastName, facId, schedule, room, grade) FDs: (courseNo,stuId) → (lastName) (courseNo,stuId) →(facId) (courseNo,stuId) →(schedule) (courseNo,stuId) →(room) (courseNo,stuId) →(grade) courseNo → facId courseNo → schedule courseNo → room stuId → lastName …plus trivial FDs that are partial… But…these are all partially dependent on the key Looks like we’ve found a primary key

48 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-48 Decomposing into 2NF Identify each partial FD. Create a new relation for each part of the PK that determines other attributes. –Remove the attributes that depend on each of these determinants from the original relation & put them in the new relation –I.e., place all determinants in separate relations along with their dependent attributes In the original relation keep the composite key and any attributes that are fully functionally dependent on all of it. Even if the composite key has no dependent attributes, keep that relation to connect logically to the other relations.

49 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-49 Putting NewClass into 2NF NewClass(courseNo, stuId, LastName, facId, schedule, room, grade ) FDs grouped by determinant: (courseNo) → (courseNo, facId, schedule, room) (stuId) → (stuId, lastName) (courseNo,stuId) → (courseNo, stuId, facId, schedule, room, lastName, grade) Create tables grouped by determinants: Course(courseNo, facId, schedule, room) Stu(stuId, lastName) Keep relation with original composite key, with attributes FD on it: NewStu2( courseNo, stuId, grade)

50 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-50 2NF - Putting it all together We started with: –NewClass(courseNo, stuId, lastName, facId, schedule, room, grade) It was already in 1NF –We decomposed it into 2NF: Course(courseNo, facId, schedule, room) Stu(stuId, lastName) NewStu2( courseNo, stuId, grade)

51 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-51 Third Normal Form A relation is in 3NF if whenever a non- trivial functional dependency X→A exists, –either X is a superkey or –A is a member of some candidate key To be in 3NF, a relation must be in 2NF and have no transitive dependencies –I.e., no non-key attribute may determine another non-key attribute. –Here key includes “candidate key”

52 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-52 Transitive Dependency If A, B, and C are attributes of relation R, such that A → B, and B → C, then C is transitively dependent on A. NewStudent (stuId, lastName, major, credits, status) FD: credits→status …but credits is not a key By transitivity: stuId→credits AND credits→status implies stuId→status

53 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-53 Decomposing into 3NF NewStudent (stuId, lastName, major, credits, status) FD credits→status Remove the dependent attribute, status, from the relation Create a new table with the dependent attribute and its determinant, credits Keep the determinant in the original table NewStu2 (stuId, lastName, major, credits) Stats (credits, status) In 3NF

54 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-54 1NF/2NF/3NF Process Before moving onto BCNF… –Put relation in 1NF Remove all multi-valued attributes/repeating groups/etc. –List all FD’s / find a key –Put relation in 2NF Remove all partial dependencies on key –Put relation in 3NF Remove all transitive dependencies

55 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-55 Comprehensive example Work(projName, projMgr, empID, hours, empName, budget, startDate, salary, empMgr, empDept, rating) –If not in 1NF, fix it –List all FD’s –Find a key –Remove all partial dependencies (2NF) –Remove all transitive dependencies (3NF)

56 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-56 Boyce-Codd Normal Form A relation is in BCNF if whenever a non- trivial FD X→A exists, then X is a superkey –i.e., every determinant in the table is a candidate key Stricter than 3NF, which allows A to be part of a candidate key –If there is just one single candidate key, then 3NF is the same as BCNF

57 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-57 3NF Table Not in BCNF

58 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-58 Decomposition into BCNF

59 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-59 Decomposition into BCNF

60 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-60 Another BCNF Example NewFac (facName, dept, office, rank, dateHired) FDs: office → dept facName, dept → office, rank, dateHired facName, office → dept, rank, dateHired NewFac is not BCNF because office is not a superkey So, remove the dependent attributes to a new relation, with the determinant as the key Project into Fac1 (office, dept) Fac2 (facName, office, rank, dateHired) Note we have lost a FD in Fac2 – we are no longer able to see that (facName, dept) is a determinant, since they are in different relations

61 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-61 Eliminating Modification Anomalies from Functional Dependencies in Relations Put all relations into Boyce-Codd Normal Form (BCNF):

62 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-62 Putting a Relation into BCNF: EQUIPMENT_REPAIR

63 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-63 Putting a Relation into BCNF: EQUIPMENT_REPAIR EQUIPMENT_REPAIR (ItemNumber, Type, AcquisitionCost, RepairNumber, RepairDate, RepairAmount) ItemNumber  (Type, AcquisitionCost) RepairNumber  (ItemNumber, Type, AcquisitionCost, RepairDate, RepairAmount) ITEM (ItemNumber, Type, AcquisitionCost) REPAIR (ItemNumber, RepairNumber, RepairDate, RepairAmount) Where REPAIR.ItemNumber must exist in ITEM.ItemNumber

64 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-64 Putting a Relation into BCNF: New Relations

65 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-65 Putting a Relation into BCNF: SKU_DATA

66 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-66 Putting a Relation into BCNF: SKU_DATA SKU_DATA (SKU, SKU_Description, Department, Buyer) SKU  (SKU_Description, Department, Buyer) SKU_Description  (SKU, Department, Buyer) Buyer  Department SKU_DATA (SKU, SKU_Description, Buyer) BUYER (Buyer, Department) Where BUYER.Buyer must exist in SKU_DATA.Buyer

67 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-67 Putting a Relation into BCNF: New Relations

68 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-68 Multivalued Dependencies A multivalued dependency occurs when a determinant determines a particular set of values: Employee  Degree Employee  Sibling PartKit  Part The determinant of a multivalued dependency can never be a primary key

69 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-69 Multivalued Dependencies

70 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-70 Eliminating Anomalies from Multivalued Dependencies Multivalued dependencies are not a problem if they are in a separate relation, so: –Always put multivalued dependencies into their own relation –This is known as Fourth Normal Form (4NF)

71 DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-71 David M. Kroenke’s Database Processing Fundamentals, Design, and Implementation (10 th Edition) End of Presentation: Chapter Three


Download ppt "DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-1 David M. Kroenke’s Chapter Three: The Relational Model and Normalization."

Similar presentations


Ads by Google