1 Data Normalization Text book Chapter 3: Jerry Post Copyright © 2003.

Slides:



Advertisements
Similar presentations
All Powder Board and Ski Microsoft Access Workbook Chapter 3: Database Tables Jerry Post Copyright © 2007.
Advertisements

1 Database Management Systems Chapter 1 Introduction.
BUSINESS DRIVEN TECHNOLOGY Plug-In T4 Designing Database Applications.
Tutorial 6: normalize the following relation to 1NF, 2NF, and 3NF TransIDRentDateCustomerIDLastNamePhoneAddressVideoIDCopy#TitleRent 14/18/043Washington
Accounting 6500 Relational Databases: Accounting Applications Introduction to Normalization.
Database Management Systems
Database Design Chapter 2. Goal of all Information Systems  To add value –Reduce costs –Increase sales or revenue –Provide a competitive advantage.
Database Design.  Define a table for each entity  Give the table the same name as the entity  Make the primary key the same as the identifier of the.
Jerry Post Copyright © Database Management Systems Chapter 3 Data Normalization.
1 Copyright © 2010 Jerry Post. All rights reserved. Data Normalization (1) IS240 – DBMS Lecture # 4 – M. E. Kabay, PhD, CISSP-ISSMP Assoc. Prof.
SQL Normalization Database Design Lecture 5. Copyright 2006Page 2 SQL Normalization Database Design 1 st Normal Form 1 st Normal Form 2 nd Normal Form.
Concepts and Terminology Introduction to Database.
Database Development and Data Normalization. 2 What is a Database and a DBMS?  Database  A collection of data stored in a standardized format, designed.
University of Manitoba Asper School of Business 3500 DBMS Bob Travica
Concepts of Relational Databases. Fundamental Concepts Relational data model – A data model representing data in the form of tables Relations – A 2-dimensional.
MS Access: Creating Relational Databases Instructor: Vicki Weidler Assistant: Joaquin Obieta.
DBSYSTEMS Chapter 3 Data Normalization Get data properly tabled! Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba.
Copyright 2008 McGraw-Hill Ryerson 1 TECHNOLOGY PLUG-IN T5 DESIGNING DATABASE APPLICATIONS.
1.NET Web Forms Business Forms © 2002 by Jerry Post.
M1G Introduction to Database Development 2. Creating a Database.
SYSTEMSDESIGNANALYSIS 1 Chapter 17 Data Modeling Jerry Post Copyright © 1997.
INFORMATION TECHNOLOGY DATABASE MANAGEMENT. Adding a new field 1Right click the table name and select design view 2Type the field information at the end.
All Powder Board and Ski Microsoft Access Workbook Chapter 3: Database Tables Jerry Post Copyright © 2003.
Chapter 9 Logical Database Design : Mapping ER Model To Tables.
CS263 Lecture 5: Logical Database Design Can express the structure of a relation by a Tuple, a shorthand notation Name of the relation is followed (in.
All Powder Board and Ski Oracle 9i Workbook Chapter 3: Database Tables Jerry Post Copyright © 2003.
MIS 301 Information Systems in Organizations Dave Salisbury ( )
Understand Primary, Foreign, and Composite Keys Database Administration Fundamentals LESSON 4.2.
Data modeling Process. Copyright © CIST 2 Definition What is data modeling? –Identify the real world data that must be stored on the database –Design.
Logical Database Design and Relational Data Model Muhammad Nasir
Normalisation Unit 6: Databases. Just to recap  What is an Entity  What is an Attribute?
1 CS490 Database Management Systems. 2 CS490 Database Normalization.
Standards and Conventions
Database Constraints ICT 011. Database Constraints Database constraints are restrictions on the contents of the database or on database operations Database.
Getting started with Accurately Storing Data
CompSci 280 S Introduction to Software Development
Data Normalization (1) IS240 – DBMS Lecture # 4 –
Let try to identify the conectivity of these entity relationship
Logical Database Design and the Rational Model
Understanding Data Storage
Chapter 1 Introduction.
Database Constraints Ashima Wadhwa.
Chapter 5 Database Design
A Guide to SQL, Eighth Edition
Database, tables and normal forms
CSIS 115 Database Design and Applications for Business
Get data properly tabled!
The Relational Model and Database Normalization
Database Design Chapter Five DATABASE CONCEPTS, 4th Edition
Normalization Karolina muszyńska
© 2014 by McGraw-Hill Education. This is proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
DESIGNING DATABASE APPLICATIONS
Database Normalization
MIS 322 – Enterprise Business Process Analysis
Tables and Their Characteristics
Database Design Using Normalization
COS 346 Day 8.
© 2011 Pearson Education, Inc. Publishing as Prentice Hall
Concepts of Database Management Eighth Edition
Normalization Referential Integrity
Teaching slides Chapter 8.
Constraints.
SQL data definition using Oracle
Chapter 4.1 V3.0 Napier University Dr Gordon Russell
Normalization Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting.
Copyright © 2018, 2015, 20 Pearson Education, Inc. All Rights Reserved Database Concepts Eighth Edition Chapter # 2 The Relational Model.
Chapter 4 The Relational Model and Normalization
Database Management system
Database Management system
Normalisation 1 Unit 3.1 Dr Gordon Russell, Napier University
Presentation transcript:

1 Data Normalization Text book Chapter 3: Jerry Post Copyright © 2003

2 Introduction A Database is a powerful tool. It provides many advantages over traditional programming. However you get these advantages only if you design the database correctly.

3 What is data normalization It is to split your data into several tables that will be connected to each other based on the data within them Before data can be normalized you must Understand the business rules Your tables must match the business rules

4 Primary and composite keys Primary Key A column which can uniquely identify a row in a table. E.g. Iqama Number, Saudi Id etc. Composite Key If a table is using more than one column as the part of the primary key, is called composite key

5 Identifying Key Columns Orders OrderItems OrderIDDateCustomer OrderIDItemQuantity Each order has only one customer. So Customer is not part of the key. Each order has many items. Each item can appear on many orders. So OrderID and Item are both part of the key.

6 Identifying Key Columns If you are uncertain about which columns to key. Write them down and evaluate the business rules. OrderIDCustomerID For a given order, can there ever be more than one customer? If yes, then key CustomerID. In most businesses, only one customer per order, so do not key it. For a given customer, can there ever be more than one order? If yes, then key OrderID, otherwise, do not key it. All businesses hope to get more than one order from a customer, so OrderID must be key.

7 Surrogate Keys Real world keys sometimes cause problems in a database. Example: Customer Avoid phone numbers: people may not notify you when numbers change. Often best to let the DBMS generate unique values Access: AutoNumber SQL Server: Identity Oracle: Sequences (but require additional rogramming) Drawback: Numbers are not related to any business data, so the application needs to hide them and provide other look up mechanisms.

8 Problems with Repeating Sections RentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode, (VideoID, Copy#, Title, Rent ) ) TransIDRentDateCustomerIDLastNamePhoneAddressVideoIDCopy#TitleRent 14/18/043Washington Easy Street122001: A Space Odyssey$ /18/04 3Washington Easy Street63Clockwork Orange$ /30/04 7Lasater S. Ray Drive81Hopscotch$ /30/04 7Lasater S. Ray Drive21Apocalypse Now$ /30/04 7Lasater S. Ray Drive61Clockwork Orange$ /18/048Jones Lakeside Drive91Luggage Of The Gods$ /18/04 8Jones Lakeside Drive151Fabulous Baker Boys$ /18/04 8Jones Lakeside Drive41Boy And His Dog$ /18/043Washington Easy Street31Blues Brothers$ /18/04 3Washington Easy Street81Hopscotch$ /18/04 3Washington Easy Street131Surf Nazis Must Die$ /18/043Washington Easy Street171Witches of Eastwick$2.00 Repeating Section Causes duplication Storing data in this raw form would not work very well. For example, repeating sections will cause problems. Note the duplication of data. Also, what if a customer has not yet checked out a movie--where do we store that customer’s data?

9 First Normal Form Problems (Data) TransIDRentDateCustIDPhoneLastNameFirstNameAddressCityStateZipCode 14/18/ WashingtonElroy95 Easy StreetSmith's GroveKY /30/ LasaterLes67 S. Ray DrivePortlandTN /18/ JonesCharlie867 Lakeside DriveCastalian SpringsTN /18/ WashingtonElroy95 Easy StreetSmith's GroveKY42171 TransIDVideoIDCopy#TitleRent : A Space Odyssey$ Clockwork Orange$ Hopscotch$ Apocalypse Now$ Clockwork Orange$ Luggage Of The Gods$ Fabulous Baker Boys$ Boy And His Dog$ Blues Brothers$ Hopscotch$ Surf Nazis Must Die$ Witches of Eastwick$2.00 1NF splits repeating groups Still have problems Replication Hidden dependency: If a video has not been rented yet, then what is its title?

10 Second Normal Form A relation is in second normal form (2NF) if and only if it is in 1NF and every non key attribute is fully dependent on the primary key

11 Second Normal Form Example (Data) TransIDVideoIDCopy# VideoIDTitleRent 12001: A Space Odyssey$1.50 2Apocalypse Now$2.00 3Blues Brothers$2.00 4Boy And His Dog$2.50 5Brother From Another Planet$2.00 6Clockwork Orange$1.50 7Gods Must Be Crazy$2.00 8Hopscotch$1.50 VideosRented(TransID, VideoID, Copy#) Videos(VideoID, Title, Rent) RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode) (Unchanged)

12 Second Normal Form Example Title depends only on VideoID Each VideoID can have only one title Rent depends on VideoID This statement is actually a business rule. It might be different at different stores. Some stores might charge a different rent for each video depending on the day (or time). Each non-key column depends on the key. RentalLine(TransID, VideoID, Copy#, Title, Rent) VideosRented(TransID, VideoID, Copy#) Videos(VideoID, Title, Rent)

13 Second Normal Form Problems (Data) TransIDRentDateCustIDPhoneLastNameFirstNameAddressCityStateZipCode 14/18/ WashingtonElroy95 Easy StreetSmith's GroveKY /30/ LasaterLes67 S. Ray DrivePortlandTN /18/ JonesCharlie867 Lakeside DriveCastalian SpringsTN /18/ WashingtonElroy95 Easy StreetSmith's GroveKY42171 RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode) Even in 2NF, problems remain Replication Hidden dependency If a customer has not rented a video yet, where do we store their personal data? Solution: split table.

14 Third Normal Form Definition RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode) Each non-key column must depend on nothing but the key. Some columns depend on columns that are not part of the key. Split those into a new table. Example: Customers name does not change for every transaction. Dependence (definition) If given a value for the key you always know the value of the property in question, then that property is said to depend on the key. Depend only on CustomerID Depend on TransID

15 Third Normal Form Example Data TransIDRentDateCustomerID 14/18/ /30/ /18/048 44/18/043 CustomerIDPhoneLastNameFirstNameAddressCityStateZipCode JohnsonMartha125 Main StreetAlvatonKY SmithJack873 Elm StreetBowling GreenKY WashingtonElroy95 Easy StreetSmith's GroveKY AdamsSamuel746 Brown DriveAlvatonKY RabitzVictor645 White AvenueBowling GreenKY SteinmetzSusan15 Speedway DrivePortlandTN LasaterLes67 S. Ray DrivePortlandTN JonesCharlie867 Lakeside DriveCastalian SpringsTN ChavezJuan673 Industry Blvd.CaneyvilleKY RojoMaria88 Main StreetCave CityKY42127 Rentals(TransID, RentDate, CustomerID ) Customers(CustomerID, Phone, Name, Address, City, State, ZipCode )

16 Third Normal Form Tables (3NF) Rentals(TransID, RentDate, CustomerID ) Customers(CustomerID, Phone, Name, Address, City, State, ZipCode ) VideosRented(TransID, VideoID, Copy#) Videos(VideoID, Title, Rent) CustomerID Phone LastName FirstName Address City State ZipCode Customers TransID RentDate CustomerID Rentals TransID VideoID Copy# VideosRented VideoID Title Rent Videos 1 * 1 * * 1

17 3NF Rules/Procedure Split out repeating sections Be sure to include a key from the parent section in the new piece so the two parts can be recombined. Verify that the keys are correct Is each row uniquely identified by the primary key? Are one-to-many and many-to-many relationships correct? Check “many” for keyed columns and “one” for non-key columns. Make sure that each non-key column depends on the whole key and nothing but the key. No hidden dependencies.

18 Fourth Normal Form (Keys) Problem arise when there are two binary relationships In some cases, there are hidden relationships between key properties. Example: EmployeeTasks(EID, Specialty, ToolID) In 3NF now. Business Rules Each employee has many specialties. Each employee has many tools. Tools and are unrelated EmployeeTasks(EID, Specialty, ToolID) EmployeeSpecialty(EID, Specialty)) EmployeeTools(EID, ToolID))

19 Domain-Key Normal Form (DKNF) This describes the ultimate goal in designing a database If a table is in DKNF it must also be in 4NF, 3NF, and all of the other normal forms The catch is that there is no defined method to get a table into DKNF. In fact, it is possible that some tables can never be converted to DKNF

20 DKNF(Continues) The goal of DKNF is to have each table represent one topic All business rules are explicitly described by a table rules. For example prices cannot be negative etc. All other business rules must be expressed in terms of relationships with keys In particular, there can be no hidden relationships

21 No Hidden Dependencies The simple normalization rules: Remove repeating sections Each non-key column must depend on the whole key and nothing but the key. There must be no hidden dependencies. Solution: Split the table. Make sure you can rejoin the two pieces to recreate the original data relationships. For some hidden dependencies within keys, double-check the business assumption to be sure that it is realistic. Sometimes you are better off with a more flexible assumption.

22 Create Tables with SQL CREATE TABLE Customer ( CustomerIDNUMBER(38), LastNameNVARCHAR2(25), FirstNameNVARCHAR2(25), PhoneNVARCHAR2(25), NVARCHAR2(120), AddressNVARCHAR2(50), CityNVARCHAR2(50), StateNVARCHAR2(25), ZIPNVARCHAR2(15), GenderNVARCHAR2(15), DateOfBirthDATE, CONSTRAINT pk_Customer PRIMARY KEY (CustomerID), CONSTRAINT ck_CustGender CHECK (Upper(Gender) IN ('FEMALE', 'MALE', 'UNIDENTIFIED')) );

23 Data Rules and Integrity Simple business rules Limits on data ranges Price > 0 Salary < 100,000 DateHired > 1/12/1995 Choosing from a set Gender = M, F, Unknown Jurisdiction=City, County, State, Federal Referential Integrity Foreign key values in one table must exist in the master table. Order(O#, Odate, C#,…) C# must exist in the customer table. O#OdateC#… Order C#NamePhone… 321Jones Sanchez Carson8738- Customer

24 SQL Foreign Key (Oracle, SQL Server) CREATE TABLE Order (OIDNUMBER(9) NOT NULL, OdateDATE, CIDNUMBER(9), CONSTRAINT pk_Order PRIMARY KEY (OID), CONSTRAINT fk_OrderCustomer FOREIGN KEY (CID) REFERENCES Customer (CID) ON DELETE CASCADE );

25 Relationships: Department and Employee Employee EmployeeID TaxpayerID LastName FirstName Address Phone City State ZIP Department Description 1…1 1…* Foreign Key Reference Table

26 Estimating Database Size CustomerIDLong4 LastNameText(50)30 FirstNameText(50)20 PhoneText(50)24 Text(150)50 AddressText(50)50 StateText(50)2 ZIPText(15)14 GenderText(15)10 DateOfBirthDate8 Average bytes per customer212 Customers per week (winter)*200 Weeks (winter)*25 Bytes added per year1,060,000

27 Data Assumptions 200 customers per week for 25 weeks 2 skills per customer 2 rentals per customer per year 3 items per rental 20 percent of customers buy items 4 items per sale 100 manufacturers 20 models per manufacturer 5 items (sizes) per model

28 Database Table Sizes