Normalization CSC 3800 Fall 2008.

Slides:



Advertisements
Similar presentations
Designing MS-Access Tables
Advertisements

Normalization By Jason Park Fall 2005 CS157A. Database Normalization Database normalization is the process of removing redundant data from your tables.
BUSINESS DRIVEN TECHNOLOGY Plug-In T4 Designing Database Applications.
Normalization What is it?
Racoosin Database Normalization TJ Racoosin 2 Dec 1998 CPCUG Access SIG.
+ Review: Normalization and data anomalies CSCI 2141 W2013 Slide set modified from courses.ischool.berkeley.edu/i257/f06/.../Lecture06_257.ppt.
Functional Dependency CS157a Sec. 2 Koichiro Hongo.
Normalization of Database Tables
Chapter 8 Normal Forms Based on Functional Dependencies Deborah Costa Oct 18, 2007.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
Database Normalization Il-Han Yoo CS 157A Professor: Sin-Min Lee.
1 © Prentice Hall, 2002 Chapter 5: Logical Database Design and the Relational Model Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B.
Normalization of Database Tables
SLIDE 1IS 257 – Fall 2004 Database Design: Normalization and The Relational Model University of California, Berkeley School of Information.
Chapter 5 Normalization of Database Tables
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 5 Normalization of Database Tables.
Why Normalization? To Reduce Redundancy to 1.avoid modification, insertion, deletion anomolies 2.save space Goal: One Fact in One Place.
Bad DB Design Duplicate of data Duplicate of data Updating Updating Deleting Deleting.
Normalization.
Normalization of Tables “Between two evils, choose neither; between two goods, choose both.” Tryon Edwards.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Week 6 Lecture Normalization
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
Relational Database Concepts. Let’s start with a simple example of a database application Assume that you want to keep track of your clients’ names, addresses,
Normalization. Database Normalization Database normalization is the process of removing redundant data from your tables in to improve storage efficiency,
Cambridge TEC - Level 3 Certificate/Diploma IT. ICT Dept ScenarioLO1LO2LO3.
MIS 301 Information Systems in Organizations Dave Salisbury ( )
Database Systems: Design, Implementation, and Management Tenth Edition
RDBMS Concepts/ Session 3 / 1 of 22 Objectives  In this lesson, you will learn to:  Describe data redundancy  Describe the first, second, and third.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 6 Normalization of Database Tables.
1 DATABASE SYSTEMS DESIGN IMPLEMENTATION AND MANAGEMENT INTERNATIONAL EDITION ROB CORONEL CROCKETT Chapter 7 Normalisation.
Normalization (Codd, 1972) Practical Information For Real World Database Design.
Concepts of Relational Databases. Fundamental Concepts Relational data model – A data model representing data in the form of tables Relations – A 2-dimensional.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 5 Normalization of Database.
SALINI SUDESH. Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of.
In this chapter, you learn about the following: ❑ Anomalies ❑ Dependency and determinants ❑ Normalization ❑ A layman’s method of understanding normalization.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall, Modified by Dr. Mathis 3-1 David M. Kroenke’s Chapter Three: The Relational.
CORE 2: Information systems and Databases NORMALISING DATABASES.
MS Access: Creating Relational Databases Instructor: Vicki Weidler Assistant: Joaquin Obieta.
Copyright 2008 McGraw-Hill Ryerson 1 TECHNOLOGY PLUG-IN T5 DESIGNING DATABASE APPLICATIONS.
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
1 A Guide to MySQL 2 Database Design Fundamentals.
1 5 Normalization. 2 5 Database Design Give some body of data to be represented in a database, how do we decide on a suitable logical structure for that.
M1G Introduction to Database Development 4. Improving the database design.
ITN Table Normalization1 ITN 170 MySQL Database Programming Lecture 3 :Database Analysis and Design (III) Normalization.
GIS Data Models GEOG 370 Christine Erlien, Instructor.
In this session, you will learn to: Describe data redundancy Describe the first, second, and third normal forms Describe the Boyce-Codd Normal Form Appreciate.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
Normalisation RELATIONAL DATABASES.  Last week we looked at elements of designing a database and the generation of an ERD  As part of the design and.
Logical Database Design and the Relational Model.
IST Database Normalization Todd Bacastow IST 210.
NormalisationNormalisation Normalization is the technique of organizing data elements into records. Normalization is the technique of organizing data elements.
Logical Database Design and Relational Data Model Muhammad Nasir
2006­02-08 | Database Normalization | © MySQL AB 2006 | 1 An Introduction to Database Normalization Mike Hillyer – MySQL AB.
SLIDE 1IS 257 – Fall 2006 Normalization Normalization theory is based on the observation that relations with certain properties are more effective.
Database Planning Database Design Normalization.
MS Access. Most A2 projects use MS Access Has sufficient depth to support a significant project. Relational Databases. Fairly easy to develop a good user.
Lecture # 17 Chapter # 10 Normalization Database Systems.
1 CS490 Database Management Systems. 2 CS490 Database Normalization.
NORMALISATION OF DATABASES. WHAT IS NORMALISATION? Normalisation is used because Databases need to avoid have redundant data, which makes it inefficient.
Database, tables and normal forms
Normalization Karolina muszyńska
INFORMATION TECHNOLOGY – INT211
Database Normalization
Chapter 5: Logical Database Design and the Relational Model
Normalization By Jason Park Fall 2005 CS157A.
Chapter 4.1 V3.0 Napier University Dr Gordon Russell
Normalization Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting.
Normalization By Jason Park Fall 2005 CS157A.
Normalisation 1 Unit 3.1 Dr Gordon Russell, Napier University
Presentation transcript:

Normalization CSC 3800 Fall 2008

Database Normalization Database normalization is the process of removing redundant data from your tables to improve storage efficiency, data integrity, and scalability. In the relational model, methods exist for quantifying how efficient a database is. These classifications are called normal forms (or NF), and there are algorithms for converting a given database between them. Normalization generally involves splitting existing tables into multiple ones, which must be re-joined or linked each time a query is issued.

History Edgar F. Codd first proposed the process of normalization and what came to be known as the 1st normal form in his paper A Relational Model of Data for Large Shared Data Banks Codd stated: “There is, in fact, a very simple elimination procedure which we shall call normalization. Through decomposition nonsimple domains are replaced by ‘domains whose elements are atomic (nondecomposable) values.’”

Normal Form Edgar F. Codd originally established three normal forms: 1NF, 2NF and 3NF. There are now others that are generally accepted, but 3NF is widely considered to be sufficient for most applications. Most tables when reaching 3NF are also in BCNF (Boyce-Codd Normal Form).

Why Normalize? Flexibility Data Integrity Efficiency Structure supports many ways to look at the data Data Integrity “Modification Anomalies” Deletion Insertion Update Efficiency Eliminate redundant data and save space The data structure should be able to answer questions about the data. Using the Northwind example, we can ask, how many customers are in this region? What is the aggregate buying profile. What time of year do they tend to buy? The database design should also help keep as accurate of data as possible. A deletion anomaly occurs when one piece of data is unintentionally deleted along with it. Insertion anomaly occurs when a typo means adding a new something unintentionally rather than relating it to an existing group. When you correct data in one place, but it stays wrong other places, it implies an Update anomaly. A good example of efficiency is pulling up city and state information by typing in a zip code. You’ve probably experienced this when placing a phone order for catalog items. This allows storing just 5 characters versus at least 10 more with city and state added.

Normalization Defined “In relational database design, the process of organizing data to minimize duplication. Normalization usually involves dividing a database into two or more tables and defining relationships between the tables. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.” - Webopedia, http://webopedia.internet.com/TERM/n/normalization.html This is my favorite definition which I got off Webopiedia Remember when I said normalization would save coding time? The last item there brings it home. Change one and propagate through all.

The Normal Forms A series of logical steps to take to normalize data tables First Normal Form Second Third Boyce Codd There’s more, but beyond scope of this class

Un-normalized Table or

First Normal Form Remove horizontal redundancies No two columns hold the same information No single column holds more than a single item Each row must be unique Use a primary key Benefits Easier to query/sort the data More scalable Each row can be identified for updating Back to original table, our phone columns are redundant, our name field holds more than one piece of info, we have redundant email addresses. And even the cell and pager info is redundant in the sense that they are both phone numbers to reach you at.

First Normal Form All columns (fields) must be atomic Means : no repeating items in columns Solution: make a separate table for each set of attributes with a primary key (parser, append query) Customers CustomerID Name Orders OrderID Item CustomerID OrderDate

First Normal Form Tables Customers Orders

Second Normal Form (2NF) In 1NF and every non-key column is fully dependent on the (entire) primary key Means : Do(es) the key field(s) imply the rest of the fields? Do we need to know both OrderID and Item to know the Customer and Date? Clue: repeating fields Orders

Second Normal Form Table must be in First Normal Form Remove vertical redundancy The same value should not repeat across rows Composite keys All columns in a row must refer to BOTH parts of the key Benefits Increased storage efficiency Less data repetition So, we need to remove the vertical redundancy of the company name, and the type column in the joining table violates 2NF, the type has more to do with the phone line than with the user and phone together.

Second Normal Form (2NF) In 1NF and every non-key column is fully dependent on the (entire) primary key Means : Do(es) the key field(s) imply the rest of the fields? Do we need to know both OrderID and Item to know the Customer and Date? Clue: repeating fields Solution: Remove to a separate table (Make Table) Orders OrderID CustomerID OrderDate OrderDetails OrderID Item

Second Normal Form Tables Customers Orders OrderDetails

Third Normal Form (3NF) In 2NF and every non-key column is mutually independent means : Calculations

Third Normal Form Table must be in Second Normal Form If your table is 2NF, there is a good chance it is 3NF All columns must relate directly to the primary key Benefits No extraneous data

Third Normal Form (3NF) In 2NF and every non-key column is mutually independent means : Calculations Solution: Put calculations in queries and forms OrderDetails OrderID Item Quantity Price Put expression in text control or in query: =Quantity * Price

Third Normal Form (3NF) OrderDetails Put expression in text control or in query: =Quantity * Price SELECT OrderID, Item, Quantity, Price, Price*Quantity FROM OrderDetails

Boyce-Codd Form (3NF) - Examples Kumar Madurai: http://www.mgt.buffalo.edu/courses/mgs/404/mfc/lecture4.ppt Boyce-Codd Form (3NF) - Examples A more restricted version of 3NF (known as Boyce-Codd Normal Form) requires that the determinant of every functional dependency in a relation be a key - for every FD: X => Y, X is a key Consider the following relation: STU-MAJ-ADV (Student-Id, Major, Advisor) Advisor => Major, but Advisor is not a key Boyce-Codd Normal Form for above: STU-ADV (Student-Id, Advisor) ADV-MAJ (Advisor, Major) 2/16/98 2/16/98 10 10 MGS 404 8 9

Primary Key Unique Identifier for every row in the table Integers vice Text to save memory, increase speed Can be “composite” Surrogate is best bet! Meaningless, numeric column acting as primary key in lieu of something like SSN or phone number - (both can be reissued!)

Relationships One to many to enforce “Referential Integrity” I can “auto add” to the lookup table usually Two “foreign” keys make a composite primary key and “relate” many to many tables A look up table - it doesn’t reference any others

Table Prefixes Aid Development First, we’ll get replace text PK with number The Items table is a “look up” with tlkp prefix tlkp “lookup” table (no “foreign keys”) OrderDetails is renamed “trelOrderItem” a “relational” table trel “relational” (or junction or linking) two foreign keys make a primary OrderDetails OrderID Item tblOrders OrderID CustomerID OrderDate trelOrderItem OrderID ItemID tlkpItems ItemID ItemName

Referential Integrity Every piece of “foreign” key data has a primary key on the one site of the relationship No “orphan” records. Every child has a parent Can’t delete records from primary table if in related table Benefits - Data Integrity and Propagation If update fields in main table, reflected in all queries Can’t add a record in related table without adding it to main Cascade Delete: If delete record from primary table, all children deleted - use with care! Better idea to “archive” Cascade Update: If change the primary key field, will change foreign key

When Not to Normalize Want to keep tables simple so user can make their own queries Avoid processing multiple tables Archiving Records If no need to perform complex queries or “resurrect” Flatten and store in one or more tables Testing shows Normalization has poorer performance “Sounds Like” field example Can also try temp tables produced from Make Table queries

Real World - School Data Student Student Previous Current Last First Parent 1 Parent 2 Teacher Teacher Smith Renee Ann Jones Theodore Smith Hamil Burke Mills Lucy Barbara Mills Steve Mills Hamil Burke Jones Brendan Jennifer Jones Stephen Jones Hamil Burke …. Street Address City State Postal Code Home Phone 5551 Private Hill Annandale Virginia 22003- (703) 323-0893 4902 Acme Ct Annandale Virginia 22003- (703) 764-5829 5304 Gains Street Fairfax Virginia 22032- (703) 978-1083 …. Why this is important Here is an example of a database that I had to normalize. Using this skill helps make sense of the data. That saves coding time, so it’s more than an academic exercise. Here’s what I’ll cover... First Year Last Year Age Program Enrolled Attended Birthday inSept Map Coord Notes PF / 0 0 6/25/93 5 22 A-3 PF 96/97 0 8/14/93 5 21 F-3 PH 96/97 0 6/13/94 4 21 A-4

One Possible Design Limitations: What if students switch classes mid-year? Business Rules aren’t taken care of in Normalization - watch the assumptions!

Examples

1. Eliminate Repeating Groups In the original member list, each member name is followed by any databases that the member has experience with. Some might know many, and others might not know any. To answer the question, "Who knows DB2?" we need to perform an awkward scan of the list looking for references to DB2. This is inefficient and an extremely untidy way to store information. Moving the known databases into a seperate table helps a lot. Separating the repeating groups of databases from the member information results in first normal form. The MemberID in the database table matches the primary key in the member table, providing a foreign key for relating the two tables with a join operation. Now we can answer the question by looking in the database table for "DB2" and getting the list of members. 1NF Tables Original Table

2. Eliminate Redundant Data In the Database Table, the primary key is made up of the MemberID and the DatabaseID. This makes sense for other attributes like "Where Learned" and "Skill Level" attributes, since they will be different for every member/database combination. But the database name depends only on the DatabaseID. The same database name will appear redundantly every time its associated ID appears in the Database Table. Suppose you want to reclassify a database - give it a different DatabaseID. The change has to be made for every member that lists that database! If you miss some, you'll have several members with the same database under different IDs. This is an update anomaly. Or suppose the last member listing a particular database leaves the group. His records will be removed from the system, and the database will not be stored anywhere! This is a delete anomaly. To avoid these problems, we need second normal form. To achieve this, separate the attributes depending on both parts of the key from those depending only on the DatabaseID. This results in two tables: "Database" which gives the name for each DatabaseID, and "MemberDatabase" which lists the databases for each member. Now we can reclassify a database in a single operation: look up the DatabaseID in the "Database" table and change its name. The result will instantly be available throughout the application.

3. Eliminate Columns Not Dependent On Key The Member table satisfies first normal form - it contains no repeating groups. It satisfies second normal form - since it doesn't have a multivalued key. But the key is MemberID, and the company name and location describe only a company, not a member. To achieve third normal form, they must be moved into a separate table. Since they describe a company, CompanyCode becomes the key of the new "Company" table. The motivation for this is the same for second normal form: we want to avoid update and delete anomalies. For example, suppose no members from the IBM were currently stored in the database. With the previous design, there would be no record of its existence, even though 20 past members were from IBM!

BCNF. Boyce-Codd Normal Form Boyce-Codd Normal Form states mathematically that: A relation R is said to be in BCNF if whenever X -> A holds in R, and A is not in X, then X is a candidate key for R. BCNF covers very specific situations where 3NF misses inter-dependencies between non-key (but candidate key) attributes. Typically, any relation that is in 3NF is also in BCNF. However, a 3NF relation won't be in BCNF if (a) there are multiple candidate keys, (b) the keys are composed of multiple attributes, and (c) there are common attributes between the keys. Basically, a humorous way to remember BCNF is that all functional dependencies are: "The key, the whole key, and nothing but the key, so help me Codd."

Example 2

Example 2 The First Normal Form For a table to be in first normal form, data must be broken up into the smallest units possible. For example, the following table is not in first normal form. Name Address Phone Sally Singer 123 Broadway New York, NY, 11234 (111) 222-3345 Jason Jumper 456 Jolly Jumper St. Trenton NJ, 11547 (222) 334-5566

Example 2 To conform to first normal form, this table would require additional fields. The name field should be divided into first and last name and the address should be divided by street, city state, and zip like this. ID First Last Street City State Zip Phone 564 Sally Singer 123 Broadway New York NY 11234 (111) 222-3345 565 Jason Jumper 456 Jolly Jumper St. Trenton NJ 11547 (222) 334-5566

Example 2 In addition to breaking data up into the smallest meaningful values, tables in first normal form should not contain repetitions groups of fields such as in the following table. Rep ID Representative Client 1 Time 1 Client 2 Time 2 Client 3 Time 3 TS-89 Gilroy Gladstone US Corp. 14 hrs Taggarts 26 hrs Kilroy Inc. 9 hrs RK-56 Mary Mayhem Italiana 67 hrs Linkers 2 hrs  

Example 2 The problem here is that each representative can have multiple clients not all will have three. Some may have less as is the case in the second record, tying up storage space in your database that is not being used, and some may have more, in which case there are not enough fields. The solution to this is to add a record for each new piece of information. Rep ID Rep First Name Rep Last Name Client Time With Client TS-89 Gilroy Gladstone US Corp 14 hrs Taggarts 26 hrs Kilroy Inc. 9 hrs RK-56 Mary Mayhem Italiana 67 hrs Linkers 2 hrs Notice the splitting of the first and last name fields again.

Example 2 This table is now in first normal form. Note that by avoiding repeating groups of fields, we have created a new problem in that there are identical values in the primary key field, violating the rules of the primary key. In order to remedy this, we need to have some other way of identifying each record. This can be done with the creation of a new key called client ID. Rep ID* Rep First Name Rep Last Name Client ID* Client Time With Client TS-89 Gilroy Gladstone 978 US Corp 14 hrs 665 Taggarts 26 hrs 782 Kilroy Inc. 9 hrs RK-56 Mary Mayhem 221 Italiana 67 hrs 982 Linkers 2 hrs This new field can now be used in conjunction with the Rep ID field to create a multiple field primary key. This will prevent confusion if ever more than one Representative were to serve a single client.

Example 2 Second Normal Form The second normal form applies only to tables with multiple field primary keys.  Take the following table for example. Rep ID* Rep First Name Rep Last Name Client ID* Client Time With Client TS-89 Gilroy Gladstone 978 US Corp 14 hrs 665 Taggarts 26 hrs 782 Kilroy Inc. 9 hrs RK-56 Mary Mayhem 221 Italiana 67 hrs 982 Linkers 2 hrs RK-56  Mary  4 hrs This table is already in first normal form.  It has a primary key consisting of Rep ID and Client ID since neither alone can be considered a unique value.  

Example 2 The second normal form states that each field in a multiple field primary key table must be directly related to the entire primary key. Or in other words, each non-key field should be a fact about all the fields in the primary key. Only fields that are absolutely necessary should show up in our table, all other fields should reside in different tables.  In order to find out which fields are necessary we should ask a few questions of our database.  In our preceding example, I should ask the question "What information is this table meant to store?" Currently, the answer is not obvious. It may be meant to store information about individual clients,  or it could be holding data for employees time cards.  As a further example, if my database is going to contain records of employees I may want a table of demographics and a table for payroll.  The demographics will have all the employees personal information and will assign them an ID number.  I should not have to enter the data twice, the payroll table on the other hand should refer to each employee only by their ID number. I can then link the two tables by a relationship and will then have access to all the necessary data.  

Example 2 In the table of the preceding example we are devoting three field to the identification of the employee and two to the identification of the client.  I could identify them with only one field each -- the primary key.  I can then take out the extraneous fields and put them in their own table.  For example,  my database would then look like the following. Rep ID* Client ID* Time With Client TS-89 978 14 hrs 665 26 hrs 782 9 hrs RK-56 221 67 hrs 982 2 hrs 4 hrs The above table contains time card information.

Example 2 Rep ID* First Name Last Name TS-89 Gilroy Gladstone RK-56 Mary Mayhem The above table contains Employee Information.

Example 2 The above table contains Client Information Client ID* Client Name 978 US Corp 665 Taggarts 782 Kilroy Inc. 221 Italiana 982 Linkers The above table contains Client Information These tables are now in normal form.  By splitting off the unnecessary information and putting it in its own tables, we have eliminated redundancy and put our first table in second normal form.  These tables are now ready to be linked through relationship to each other. 

Example 2 Third Normal Form Third normal form is the same as second normal form except that it only refers to tables that have a single field as their primary key.  In other words, each non-key field in the table should be a fact about the primary key. Either of the preceding two tables act as an example of third normal form since all the fields in each table are necessary to describe the primary key.  Once all the tables in a database have been taken through the third normal form, we can begin to set up relationships. 

Example 3

Repeating Groups And Normalization To First Normal Form (1nf) SALES-INFORMATION Invoice# Date Customer# Salesperson Region Item# Description Price Quantity 1001 1002 1003 7/1/92 7/1/92 7/1/92 456 329 897 John Mary Al West East West 121 348 540 Widget Gear Bolt $2.25 $3.70 $0.40 45 10 5 INVOICE-ITEMS (1NF) Invoice# Item# Description Price Quantity INVOICES (2NF) Invoice# Date Customer# Salesperson Region

What Is The Problem With Description/Price? Insert anomalies Delete anomalies Update anomalies

Decomposition Of A First-normal-form (1nf) Table INVOICE-ITEMS (1NF) Invoice# Item# Description Price Quantity ITEMS (2NF) INVOICE-ITEMS-QTY (2NF) Item# Description Price Invoice# Item# Quantity You can only have a 2nd Normal Form problem if there is a composite primary Key

Database Normalization Functional dependency is key in understanding the process of normalization. Functional dependency means that if there is only one possible value of Y for every value of X, then Y is functionally dependent on X.

Database Normalization Think of an invoice table. Two fields would be invoice # and date. Which field is functionally dependent on the other? INVOICE # DATE Date is functionally dependent on invoice number.

Dependencies Functional Dependency is “good”. With functional dependency the primary key (Attribute A) determines the value of all the other non-key attributes (Attributes B,C,D,etc.) Transitive dependency is “bad”. Transitive dependency exists if the primary key (Attribute A) determines non-key Attribute B, and Attribute B determines non-key Attribute C.

Decomposition Of A Second-normal-form (2nf) Table SALES (2NF) Invoice# Date Customer# Salesperson Region This is a transitive dependency which must be eliminated for 3NF INVOICES (3NF) SALESPERSON-REGION (3NF) Invoice# Date Customer# Salesperson Salesperson Region

Summary Of 3nf Relations For Sales Database INVOICES (3NF) SALESPERSON-REGION (3NF) Invoice# Date Customer# Salesperson Salesperson Region 1001 1002 1003 7/1/92 7/1/92 7/1/92 456 329 897 John Mary Al John Mary Al West East West INVOICE-ITEMS-QTY (3NF) ITEMS (3NF) Invoice# Item# Quantity Item# Description Price 1001 1002 1003 121 348 540 45 10 5 121 348 540 Widget Gear Bolt $2.25 $3.70 $0.40

END

Other Slides

Table 1 Title Author1 Author2 ISBN Subject Pages Publisher Database System Concepts Abraham Silberschatz Henry F. Korth 0072958863 MySQL, Computers 1168 McGraw-Hill Operating System Concepts 0471694665 Computers 944

Orders Table Problems This table is not very efficient with storage. This design does not protect data integrity. Third, this table does not scale well.

First Normal Form In our Table, we have two violations of First Normal Form: First, we have more than one author field, Second, our subject field contains more than one piece of information. With more than one value in a single field, it would be very difficult to search for all books on a given subject.

First Normal Table Table 2 Title Author ISBN Subject Pages Publisher Database System Concepts Abraham Silberschatz 0072958863 MySQL 1168 McGraw-Hill Henry F. Korth Computers Operating System Concepts 0471694665 944

Additional Problems We now have two rows for a single book. Additionally, we would be violating the Second Normal Form… A better solution to our problem would be to separate the data into separate tables- an Author table and a Subject table to store our information, removing that information from the Book table:

Second Normal Tables Subject Table Author Table Book Table Subject_ID 1 MySQL 2 Computers Author_ID Last Name First Name 1 Silberschatz Abraham 2 Korth Henry Book Table ISBN Title Pages Publisher 0072958863 Database System Concepts 1168 McGraw-Hill 0471694665 Operating System Concepts 944

Additional Problems Each table has a primary key, used for joining tables together when querying the data. A primary key value must be unique with in the table (no two books can have the same ISBN number), and a primary key is also an index, which speeds up data retrieval based on the primary key. Now to define relationships between the tables

Relationships Book_Author Table Book_Subject Table ISBN Author_ID 0072958863 1 2 0471694665 ISBN Subject_ID 0072958863 1 2 0471694665

Second Normal Form As the First Normal Form deals with redundancy of data across a horizontal row, Second Normal Form (or 2NF) deals with redundancy of data in vertical columns. As stated earlier, the normal forms are progressive, so to achieve Second Normal Form, the tables must already be in First Normal Form. The Book Table will be used for the 2NF example

2NF Table Publisher Table Book Table Publisher_ID Publisher Name 1 McGraw-Hill Book Table ISBN Title Pages Publisher_ID 0072958863 Database System Concepts 1168 1 0471694665 Operating System Concepts 944

2NF Here we have a one-to-many relationship between the book table and the publisher. A book has only one publisher, and a publisher will publish many books. When we have a one-to-many relationship, we place a foreign key in the Book Table, pointing to the primary key of the Publisher Table. The other requirement for Second Normal Form is that you cannot have any data in a table with a composite key that does not relate to all portions of the composite key.

Third Normal Form Third normal form (3NF) requires that there are no functional dependencies of non-key attributes on something other than a candidate key. A table is in 3NF if all of the non-primary key attributes are mutually independent There should not be transitive dependencies

Boyce-Codd Normal Form BCNF requires that the table is 3NF and only determinants are the candidate keys