Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Data Normalization Text book Chapter 3: Jerry Post Copyright © 2003.

Similar presentations


Presentation on theme: "1 Data Normalization Text book Chapter 3: Jerry Post Copyright © 2003."— Presentation transcript:

1 1 Data Normalization Text book Chapter 3: Jerry Post Copyright © 2003

2 2 Introduction A Database is a powerful tool. It provides many advantages over traditional programming. However you get these advantages only if you design the database correctly.

3 3 What is data normalization It is to split your data into several tables that will be connected to each other based on the data within them Before data can be normalized you must Understand the business rules Your tables must match the business rules

4 4 Primary and composite keys Primary Key A column which can uniquely identify a row in a table. E.g. Iqama Number, Saudi Id etc. Composite Key If a table is using more than one column as the part of the primary key, is called composite key

5 5 Identifying Key Columns Orders OrderItems OrderIDDateCustomer 83675-5-046794 83685-6-049263 OrderIDItemQuantity 83672292 83672534 83678761 83685554 83682291 Each order has only one customer. So Customer is not part of the key. Each order has many items. Each item can appear on many orders. So OrderID and Item are both part of the key.

6 6 Identifying Key Columns If you are uncertain about which columns to key. Write them down and evaluate the business rules. OrderIDCustomerID For a given order, can there ever be more than one customer? If yes, then key CustomerID. In most businesses, only one customer per order, so do not key it. For a given customer, can there ever be more than one order? If yes, then key OrderID, otherwise, do not key it. All businesses hope to get more than one order from a customer, so OrderID must be key.

7 7 Surrogate Keys Real world keys sometimes cause problems in a database. Example: Customer Avoid phone numbers: people may not notify you when numbers change. Often best to let the DBMS generate unique values Access: AutoNumber SQL Server: Identity Oracle: Sequences (but require additional rogramming) Drawback: Numbers are not related to any business data, so the application needs to hide them and provide other look up mechanisms.

8 8 Problems with Repeating Sections RentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode, (VideoID, Copy#, Title, Rent ) ) TransIDRentDateCustomerIDLastNamePhoneAddressVideoIDCopy#TitleRent 14/18/043Washington502-777-757595 Easy Street122001: A Space Odyssey$1.50 14/18/04 3Washington502-777-757595 Easy Street63Clockwork Orange$1.50 24/30/04 7Lasater615-888-447467 S. Ray Drive81Hopscotch$1.50 24/30/04 7Lasater615-888-447467 S. Ray Drive21Apocalypse Now$2.00 24/30/04 7Lasater615-888-447467 S. Ray Drive61Clockwork Orange$1.50 34/18/048Jones615-452-1162867 Lakeside Drive91Luggage Of The Gods$2.50 34/18/04 8Jones615-452-1162867 Lakeside Drive151Fabulous Baker Boys$2.00 34/18/04 8Jones615-452-1162867 Lakeside Drive41Boy And His Dog$2.50 44/18/043Washington502-777-757595 Easy Street31Blues Brothers$2.00 44/18/04 3Washington502-777-757595 Easy Street81Hopscotch$1.50 44/18/04 3Washington502-777-757595 Easy Street131Surf Nazis Must Die$2.50 44/18/043Washington502-777-757595 Easy Street171Witches of Eastwick$2.00 Repeating Section Causes duplication Storing data in this raw form would not work very well. For example, repeating sections will cause problems. Note the duplication of data. Also, what if a customer has not yet checked out a movie--where do we store that customer’s data?

9 9 First Normal Form Problems (Data) TransIDRentDateCustIDPhoneLastNameFirstNameAddressCityStateZipCode 14/18/043502-777-7575WashingtonElroy95 Easy StreetSmith's GroveKY42171 24/30/047615-888-4474LasaterLes67 S. Ray DrivePortlandTN37148 34/18/048615-452-1162JonesCharlie867 Lakeside DriveCastalian SpringsTN37031 44/18/043502-777-7575WashingtonElroy95 Easy StreetSmith's GroveKY42171 TransIDVideoIDCopy#TitleRent 1122001: A Space Odyssey$1.50 163Clockwork Orange$1.50 281Hopscotch$1.50 221Apocalypse Now$2.00 261Clockwork Orange$1.50 391Luggage Of The Gods$2.50 3151Fabulous Baker Boys$2.00 341Boy And His Dog$2.50 431Blues Brothers$2.00 481Hopscotch$1.50 4131Surf Nazis Must Die$2.50 4171Witches of Eastwick$2.00 1NF splits repeating groups Still have problems Replication Hidden dependency: If a video has not been rented yet, then what is its title?

10 10 Second Normal Form A relation is in second normal form (2NF) if and only if it is in 1NF and every non key attribute is fully dependent on the primary key

11 11 Second Normal Form Example (Data) TransIDVideoIDCopy# 112 163 221 261 281 341 391 3151 431 481 4131 4171 VideoIDTitleRent 12001: A Space Odyssey$1.50 2Apocalypse Now$2.00 3Blues Brothers$2.00 4Boy And His Dog$2.50 5Brother From Another Planet$2.00 6Clockwork Orange$1.50 7Gods Must Be Crazy$2.00 8Hopscotch$1.50 VideosRented(TransID, VideoID, Copy#) Videos(VideoID, Title, Rent) RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode) (Unchanged)

12 12 Second Normal Form Example Title depends only on VideoID Each VideoID can have only one title Rent depends on VideoID This statement is actually a business rule. It might be different at different stores. Some stores might charge a different rent for each video depending on the day (or time). Each non-key column depends on the key. RentalLine(TransID, VideoID, Copy#, Title, Rent) VideosRented(TransID, VideoID, Copy#) Videos(VideoID, Title, Rent)

13 13 Second Normal Form Problems (Data) TransIDRentDateCustIDPhoneLastNameFirstNameAddressCityStateZipCode 14/18/043502-777-7575WashingtonElroy95 Easy StreetSmith's GroveKY42171 24/30/047615-888-4474LasaterLes67 S. Ray DrivePortlandTN37148 34/18/048615-452-1162JonesCharlie867 Lakeside DriveCastalian SpringsTN37031 44/18/0423502-777-7575WashingtonElroy95 Easy StreetSmith's GroveKY42171 RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode) Even in 2NF, problems remain Replication Hidden dependency If a customer has not rented a video yet, where do we store their personal data? Solution: split table.

14 14 Third Normal Form Definition RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode) Each non-key column must depend on nothing but the key. Some columns depend on columns that are not part of the key. Split those into a new table. Example: Customers name does not change for every transaction. Dependence (definition) If given a value for the key you always know the value of the property in question, then that property is said to depend on the key. Depend only on CustomerID Depend on TransID

15 15 Third Normal Form Example Data TransIDRentDateCustomerID 14/18/04 3 24/30/04 7 34/18/048 44/18/043 CustomerIDPhoneLastNameFirstNameAddressCityStateZipCode 1502-666-7777JohnsonMartha125 Main StreetAlvatonKY42122 2502-888-6464SmithJack873 Elm StreetBowling GreenKY42101 3502-777-7575WashingtonElroy95 Easy StreetSmith's GroveKY42171 4502-333-9494AdamsSamuel746 Brown DriveAlvatonKY42122 5502-474-4746RabitzVictor645 White AvenueBowling GreenKY42102 6615-373-4746SteinmetzSusan15 Speedway DrivePortlandTN37148 7615-888-4474LasaterLes67 S. Ray DrivePortlandTN37148 8615-452-1162JonesCharlie867 Lakeside DriveCastalian SpringsTN37031 9502-222-4351ChavezJuan673 Industry Blvd.CaneyvilleKY42721 10502-444-2512RojoMaria88 Main StreetCave CityKY42127 Rentals(TransID, RentDate, CustomerID ) Customers(CustomerID, Phone, Name, Address, City, State, ZipCode )

16 16 Third Normal Form Tables (3NF) Rentals(TransID, RentDate, CustomerID ) Customers(CustomerID, Phone, Name, Address, City, State, ZipCode ) VideosRented(TransID, VideoID, Copy#) Videos(VideoID, Title, Rent) CustomerID Phone LastName FirstName Address City State ZipCode Customers TransID RentDate CustomerID Rentals TransID VideoID Copy# VideosRented VideoID Title Rent Videos 1 * 1 * * 1

17 17 3NF Rules/Procedure Split out repeating sections Be sure to include a key from the parent section in the new piece so the two parts can be recombined. Verify that the keys are correct Is each row uniquely identified by the primary key? Are one-to-many and many-to-many relationships correct? Check “many” for keyed columns and “one” for non-key columns. Make sure that each non-key column depends on the whole key and nothing but the key. No hidden dependencies.

18 18 Fourth Normal Form (Keys) Problem arise when there are two binary relationships In some cases, there are hidden relationships between key properties. Example: EmployeeTasks(EID, Specialty, ToolID) In 3NF now. Business Rules Each employee has many specialties. Each employee has many tools. Tools and are unrelated EmployeeTasks(EID, Specialty, ToolID) EmployeeSpecialty(EID, Specialty)) EmployeeTools(EID, ToolID))

19 19 Domain-Key Normal Form (DKNF) This describes the ultimate goal in designing a database If a table is in DKNF it must also be in 4NF, 3NF, and all of the other normal forms The catch is that there is no defined method to get a table into DKNF. In fact, it is possible that some tables can never be converted to DKNF

20 20 DKNF(Continues) The goal of DKNF is to have each table represent one topic All business rules are explicitly described by a table rules. For example prices cannot be negative etc. All other business rules must be expressed in terms of relationships with keys In particular, there can be no hidden relationships

21 21 No Hidden Dependencies The simple normalization rules: Remove repeating sections Each non-key column must depend on the whole key and nothing but the key. There must be no hidden dependencies. Solution: Split the table. Make sure you can rejoin the two pieces to recreate the original data relationships. For some hidden dependencies within keys, double-check the business assumption to be sure that it is realistic. Sometimes you are better off with a more flexible assumption.

22 22 Create Tables with SQL CREATE TABLE Customer ( CustomerIDNUMBER(38), LastNameNVARCHAR2(25), FirstNameNVARCHAR2(25), PhoneNVARCHAR2(25), EmailNVARCHAR2(120), AddressNVARCHAR2(50), CityNVARCHAR2(50), StateNVARCHAR2(25), ZIPNVARCHAR2(15), GenderNVARCHAR2(15), DateOfBirthDATE, CONSTRAINT pk_Customer PRIMARY KEY (CustomerID), CONSTRAINT ck_CustGender CHECK (Upper(Gender) IN ('FEMALE', 'MALE', 'UNIDENTIFIED')) );

23 23 Data Rules and Integrity Simple business rules Limits on data ranges Price > 0 Salary < 100,000 DateHired > 1/12/1995 Choosing from a set Gender = M, F, Unknown Jurisdiction=City, County, State, Federal Referential Integrity Foreign key values in one table must exist in the master table. Order(O#, Odate, C#,…) C# must exist in the customer table. O#OdateC#… 11731-4-97321 11741-5-97938 11851-8-97337 11901-9-97321 11921-9-97776 Order C#NamePhone… 321Jones9983- 337Sanchez7738- 938Carson8738- Customer

24 24 SQL Foreign Key (Oracle, SQL Server) CREATE TABLE Order (OIDNUMBER(9) NOT NULL, OdateDATE, CIDNUMBER(9), CONSTRAINT pk_Order PRIMARY KEY (OID), CONSTRAINT fk_OrderCustomer FOREIGN KEY (CID) REFERENCES Customer (CID) ON DELETE CASCADE );

25 25 Relationships: Department and Employee Employee EmployeeID TaxpayerID LastName FirstName Address Phone City State ZIP Department Description 1…1 1…* Foreign Key Reference Table

26 26 Estimating Database Size CustomerIDLong4 LastNameText(50)30 FirstNameText(50)20 PhoneText(50)24 EmailText(150)50 AddressText(50)50 StateText(50)2 ZIPText(15)14 GenderText(15)10 DateOfBirthDate8 Average bytes per customer212 Customers per week (winter)*200 Weeks (winter)*25 Bytes added per year1,060,000

27 27 Data Assumptions 200 customers per week for 25 weeks 2 skills per customer 2 rentals per customer per year 3 items per rental 20 percent of customers buy items 4 items per sale 100 manufacturers 20 models per manufacturer 5 items (sizes) per model

28 28 Database Table Sizes


Download ppt "1 Data Normalization Text book Chapter 3: Jerry Post Copyright © 2003."

Similar presentations


Ads by Google