DBSYSTEMS Chapter 3 Data Normalization Get data properly tabled! Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba.

Slides:



Advertisements
Similar presentations
Normalisation.
Advertisements

Chapter 10: Designing Databases
BUSINESS DRIVEN TECHNOLOGY Plug-In T4 Designing Database Applications.
Tutorial 6: normalize the following relation to 1NF, 2NF, and 3NF TransIDRentDateCustomerIDLastNamePhoneAddressVideoIDCopy#TitleRent 14/18/043Washington
Relational Database Systems Higher Information Systems.
Logical Data Modeling Review Lecture for University of Agder, Grimstad DAT202 Databaser (5.5.11) Judith Molka-Danielsen
Accounting 6500 Relational Databases: Accounting Applications Introduction to Normalization.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-1 David M. Kroenke Database Processing Chapter 3 Normalization.
1 © Prentice Hall, 2002 Chapter 5: Logical Database Design and the Relational Model Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B.
Database Design Chapter 2. Goal of all Information Systems  To add value –Reduce costs –Increase sales or revenue –Provide a competitive advantage.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Why Normalization? To Reduce Redundancy to 1.avoid modification, insertion, deletion anomolies 2.save space Goal: One Fact in One Place.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Michael F. Price College of Business Chapter 6: Logical database design and the relational model.
Database Design.  Define a table for each entity  Give the table the same name as the entity  Make the primary key the same as the identifier of the.
PHASE 3: SYSTEMS DESIGN Chapter 7 Data Design.
Chapter 4: Logical Database Design and the Relational Model (Part II)
1 Copyright © 2010 Jerry Post. All rights reserved. Data Normalization (1) IS240 – DBMS Lecture # 4 – M. E. Kabay, PhD, CISSP-ISSMP Assoc. Prof.
Week 6 Lecture Normalization
XP Chapter 1 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Level 3 Objectives: Identifying and Eliminating Database.
SQL Normalization Database Design Lecture 5. Copyright 2006Page 2 SQL Normalization Database Design 1 st Normal Form 1 st Normal Form 2 nd Normal Form.
A Guide to SQL, Eighth Edition Chapter Two Database Design Fundamentals.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Database Design Sections 6 & 7 Second Normal Form (2NF), Unique Identifiers (UID), Third Normal Form (3NF), Arcs, Hierarchies and Recursive relationships.
Concepts and Terminology Introduction to Database.
MIS 301 Information Systems in Organizations Dave Salisbury ( )
MIS 301 Information Systems in Organizations Dave Salisbury ( )
Database Development and Data Normalization. 2 What is a Database and a DBMS?  Database  A collection of data stored in a standardized format, designed.
Fundamentals, Design, and Implementation, 9/e. Database Processing: Fundamentals, Design and Implementation, 9/e by David M. KroenkeChapter 4/2 Copyright.
University of Manitoba Asper School of Business 3500 DBMS Bob Travica
Avoiding Database Anomalies
Normalization A technique that organizes data attributes (or fields) such that they are grouped to form stable, flexible and adaptive entities.
Concepts of Database Management Sixth Edition Chapter 5 Database Design 1: Normalization.
Concepts of Database Management, Fifth Edition
1 A Guide to MySQL 2 Database Design Fundamentals.
Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to.
資料庫正規化 Database Normalization 取材自 AIS, 6 th edition By Gelinas et al.
Logical Database Design Relational Model. Logical Database Design Logical database design: process of transforming conceptual data model into a logical.
SALINI SUDESH. Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of.
In this chapter, you learn about the following: ❑ Anomalies ❑ Dependency and determinants ❑ Normalization ❑ A layman’s method of understanding normalization.
Chapter 7 1 Database Principles Data Normalization Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that.
Copyright 2008 McGraw-Hill Ryerson 1 TECHNOLOGY PLUG-IN T5 DESIGNING DATABASE APPLICATIONS.
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
© Relational Databases. © Entities Data is stored in tables. Each table is concerned with one entity An entity is a.
1 A Guide to MySQL 2 Database Design Fundamentals.
M1G Introduction to Database Development 4. Improving the database design.
ITN Table Normalization1 ITN 170 MySQL Database Programming Lecture 3 :Database Analysis and Design (III) Normalization.
Chapter 16: Using Relational Databases Programming Logic and Design, Third Edition Comprehensive.
Normalization Is the gradual and sequential process of efficiently organizing data in a database that follows the rules listed in the previous slide –
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
CS263 Lecture 5: Logical Database Design Can express the structure of a relation by a Tuple, a shorthand notation Name of the relation is followed (in.
MIS 301 Information Systems in Organizations Dave Salisbury ( )
Concepts of Database Management Seventh Edition Chapter 5 Database Design 1: Normalization.
1 DATABASE TECHNOLOGIES (Part 2) BUS Abdou Illia, Fall 2015 (September 9, 2015)
Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin APPENDIX C DESIGNING DATABASES APPENDIX C DESIGNING DATABASES.
Logical Database Design and the Relational Model.
Sample Table Standard Notation Entity name in uppercase
Lecture 4: Logical Database Design and the Relational Model 1.
Microsoft Access CS 110 Fall Entity Relationship Model Entities Entities Principal data object about which information is to be collectedPrincipal.
What Is Normalization  In relational database design, the process of organizing data to minimize redundancy  Usually involves dividing a database into.
Database Planning Database Design Normalization.
MS Access. Most A2 projects use MS Access Has sufficient depth to support a significant project. Relational Databases. Fairly easy to develop a good user.
Normalisation Unit 6: Databases. Just to recap  What is an Entity  What is an Attribute?
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 4: PART C LOGICAL.
1 Database Design Sections 6 & 7 First Normal Form (1NF), Second Normal Form (2NF), Unique Identifiers (UID), Third Normal Form (3NF), Arcs, Hierarchies.
Data Normalization (1) IS240 – DBMS Lecture # 4 –
Revised: 2 April 2004 Fred Swartz
Get data properly tabled!
Example Question–Is this relation Well Structured? Student
Presentation transcript:

DBSYSTEMS Chapter 3 Data Normalization Get data properly tabled! Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business 3500 DBMS Bob Travica Updated 2015

DBSYSTEMS 2 of 23 Normalization  The process of putting data into the format of relational databases or organizing data into correctly designed tables.  Tables should be designed so that  a) problems (anomalies) with insertion, deletion and modification of data are avoided  b) redundancy is reduced  c) data quality is preserved (completeness, consistency)

DBSYSTEMS 3 of 23 Relational Database Terminology  Relational database: A collection of tables (relations). Tables store atomic data.  Table: A collection of columns (attributes, properties, fields) describing an entity (class). Table is also a collection of rows (records) each with the same number of columns.  Each row represent an object (an instance of a class). EmployeeIDTaxpayerIDLastNameFirstNameHomePhoneAddress CartomAbdul(603) South Street VenetiaanRoland(804) Paramaribo Ln JohnsonJohn(703) Main Street StenheimSusan(410) W. Maple Attributes/ Properties Rows/Objects Entity (Class): Employee Table: Employee

DBSYSTEMS 4 of 23 Relational Database Terminology – Primary Key  Every table has a primary key (key) – an attribute that uniquely identifies each row (e.g., EmployeeID on previous slide)  Primary key can span more than one column combined (combined, composite, concatenated) key. Note: Watch for data types (e.g., number vs. text) and naming rules (arbitrary but consistent). OrderItem OrderIDItemIDQuantity  Primary key can be generated automatically by DBMS – surrogate key.  Other attributes are called non-key columns. A non-key depends on key.

DBSYSTEMS 5 of 23 Relational Database Shorthand Notation Customer(CustomerID, LastName, FirstName, Address, City, State, ZipPostalCode, TelephoneNumber) * Table name Non-key columns Primary key is underlined Note: Telephone number can be used as a “backup key.” Shorthand notation is good for analysis but not for official diagrams. Do not use it in your assignments and exams.

DBSYSTEMS 6 of 23 Class Diagram to Schema Customer Order Salesperson Item OrderItem 1 * 1 * 1 1 * * Tables Diagram – Schema (Normalized) Class Diagram (Non-Normalized) Customer Order Salesperson Item 1 * 1 * * * OrderItem Association class (ItemOrdered, OrderDetail, etc.) Another new detail: Foreign keys shown in a complete schema. places serves contains

DBSYSTEMS 7 of 23 Customer(CustomerID, Name, Address, City, Phone) Salesperson(EmployeeID, Name, DateHired) Order(OrderID, OrderDate, CustomerID, EmployeeID) OrderItem(OrderID, ItemID, Quantity) Item(ItemID, Description, ListPrice) Shorthand Notation for Normalized Tables Diagram – Foreign Key Foreign Key (FK) = Attribute that is a (primary) key in another table (e.g., CustomerID in Order). Logic & naming of OrderItem: Replacing the Order-Item M:M relationship with two 1:M relationships. Also common name: OrderDetail. The OrderItem key is a combination of FKs (OrderID+ItemID).

DBSYSTEMS 8 of 23

DBSYSTEMS 9 of 23 Video Store Transaction Processing System (VSTPS): Classes, Columns & Business Rules  Customer table  Key: CustomerID  Attributes: Name Address Phone  Video table  Key: VideoID  Attributes : Title RentalFee Rating…  RentalTransaction table  Key: TransactionID  Attributes : CustomerID Date  VideoRented table  Key: TransactionID + VideoID  Attributes: Copy# Master Data (“Static”)— Market & Inventory Entities (don’t change often) Transaction Data (“Dynamic” ) — Operations Entities (change more often)

DBSYSTEMS 10 of 23 Business Rules and Class Diagram for VSTPS Business Rules: A customer can have many rental transactions, each being for a specific customer. A transaction can include many video titles, and a title is in many transactions. A transaction can include just one copy of a video title. CustomerVideoTitle RentalTransaction 1 ** * has includes ? VideoRented

DBSYSTEMS Schema for VSTPS 11 of 23 Customer(CustomerID, LastName, FirstName, Address, City, …) VideoRented(TransID, VideoID, Copy#) Video(VideoID, Title, RentalFee) RentalTransaction(TransID, RentDate, CustomerID) Transaction data You can draw a normalized schema based on knowledge of multiplicity and data analysis you already have! * * *

DBSYSTEMS 12 of 23 How to get to those four tables using normalization logic? Why not simple design for recording rentals: VideoRental Poor design because: Master data (Customer, Video) repeat for each transaction - high redundancy. VideoRental(Rec#, CustomerID, LastName, FirstName,… VideoID, Title, RentalFee, Copy#, Date) Deletion of transaction data causes deletion of master data and reverse – deletion anomaly: Cannot delete target data but more (or less) than wanted. A new customer can’t be added without adding a new video and reverse – insertion anomaly: Data can’t be added without corrupting other data. To change customer name, all records must be rewritten – update anomaly: Data can’t be updated only in a single master record. Conclusion: From the normalization perspective, data must be properly designed in order to avoid CRUD* anomalies and reduce redundancy. Why Normalize – Avoiding Data Anomalies Test:

DBSYSTEMS 13 of 23 Normalization A process of splitting a chunk of data to arrive at clear master and transactional classes. Each many-to-many relationship must be replaced by 2 one-to-many relationships. CustomerVideo * rents * RentalTransaction 1 ** * has includes 1. CustomerVideo RentalTransaction 1 * * * has contains VideoRented (copy#) 1 includes * 1 is rented * 2. How to track copies of a same video?

DBSYSTEMS 14 of 23 Normalization Process  Interview users, understand output needed. Put data into a large table (RentalForm).  Pick out attributes.  Find repeating groups (sections).  Look for potential keys.  Identify computed values. RentalForm(TransID, RentDate, (CustomerID, Name, Address, City, State, …), (VideoID, Copy#, Title, RentalFee)) Focus is on logic not really using such process in practice.

DBSYSTEMS 15 of 23 Problems with Repeating Groups (Sections) RentalForm(TransID, RentDate, (CustomerID, Phone, Name, Address, City, State, …), (VideoID, Copy#, Title, Rent)) TransIDRentDateCustomerIDLastNamePhoneAddressVideoIDCopy#TitleRent 14/18/023Washington Easy Street122001: A Space Odyssey$ /18/02 3Washington Easy Street63Clockwork Orange$ /30/02 7Lasater S. Ray Drive81Hopscotch$ /30/02 7Lasater S. Ray Drive21Apocalypse Now$ /30/02 7Lasater S. Ray Drive61Clockwork Orange$1.50 Repeating Groups Repeating groups cause -high redundancy -update anomaly (must run through all records) -insertion anomaly as errors in data (fake CustomerID if new video added) - deletion anomaly (can’t delete simply what is needed) If there are repeating sections, the table is not in the first normal form (1NF).

DBSYSTEMS 16 of 23 First Normal Form (1NF)  1NF: A table is in 1NF if it does not have repeating sections.  Normalization Procedure:  Remove repeating sections by splitting the initial table into new tables.  Preserve associations between the initial table and new tables by replicating the initial key. RentalTransaction(TransID, RentDate) Video(TransID, VideoID, Copy#, Title, RentalFee) Customer(TransID, CustomerID, Phone, Name, Address, City, State, ZipCod) New Reminder of initial table

DBSYSTEMS 17 of 23 Problems with First Normal Form  There are problems in the relationship between the key and non-keys.  Concept of Functional Dependence:  An attribute depends on another attribute if the change of its value is caused by a change of the other attribute.  The key column must be sufficient for determining values of the non- key columns. TransIDVideoIDCopy#TitleRentalFee : A Space Odyssey$ Clockwork Orange$ Hopscotch$ Apocalypse Now$ Clockwork Orange$1.50 Video  Problems apply only to tables with combined keys! (A single-key table in 1NF is also in 2NF.)

DBSYSTEMS 18 of 23 Problems with First Normal Form (cont.)  If any non-key column depends just on a part of the key there is partial functional dependence and the table is not in 2NF. VideoID is sufficient for predicting titles and rental fees. Therefore, there is Partial Functional Dependence between the combined key and Title and RentalFee. ** Copy# depends on full key (TransID + VideoID) -- Full Functional Dependency on the key. * Video(TransID, VideoID, Copy#, Title, RentalFee) Combined determine Sufficient to determine

DBSYSTEMS 19 of 23 Second Normal Form (2NF)  2NF: A table is in 2NF if it is (a) is 1NF and (b) non-key columns depend on the entire key.  Normalization Procedure:  Move TransID and Copy# into a new table VideoRented.  Preserve the association between Video and VideoRented by replicating VideoID in table VideoRented. Video(TransID, VideoID, Copy#, Title, RentalFee) move replicate VideoRented(TransID, VideoID, Copy#) New Video(VideoID, Title, RentalFee) Resulting Video table * X X

DBSYSTEMS 20 of 23 Table Customer must also be brought into 2NF by moving TransID into table RentalTransaction (already there) and replicating CustomerID (see Slide 15). Customer(TransID, CustomerID, Phone, Name, Address, City, State,…) RentalTransaction(TransID, RentDate, CustomerID) movereplicate Completed Resulting Customer table Customer(CustomerID, LastName, FirstName, Address, City, …) Finalize 2NF… X

DBSYSTEMS 21 of 23 Third Normal Form (3NF)  Problems with 3NF: If any non-key depends on some other non-key there is transitive dependence and the table is not in 3NF.  3 NF: Table is in 3NF if it is (a) in 2NF, and (b) each non-key attribute depends on the key only (or the key and nothing but the key).  Our design is already in 3NF! Check it below: Customer(CustomerID, LastName, FirstName, Address, City, …) VideoRented(TransID, VideoID, Copy#) Video(VideoID, Title, RentalFee) RentalTransaction(TransID, RentDate, CustomerID)

DBSYSTEMS 22 of 23 Table in 2NF: Sale(SaleID, CustomerID, SalespersonID, SalespersonRank…) 3NF Example Solution – split table into 2 tables : : Sale(SaleID, CustomerID, SalespersonID) Salesperson(SalespersonID, SalespersonRank) Violation of 3NF: SalespersonRank (non-key) is dependent on SalespersonID, not SaleID. Forms beyond the 3rd are very rare and therefore reaching 3NF is sufficient for most of practical purposes. When we say “create schema”, we mean “create tables that are in 3NF”.

DBSYSTEMS 23 of 23 Simplified Schema for VSTPS Using Different Key Design Customer(CustomerID, LastName, FirstName, Address, City, …)Video(VideoID, Title, RentalFee) RentalTransaction(TransID, CustomerID, VideoID, RentDate) Note: Video key can be made unique: VideoID = 85.1 (decimal place designates a copy), or 85c1 (text type), or use a bar code for each video and copy (ItemID). 1 1 * *

DBSYSTEMS 24 of 23 Summary of Normal Forms (Must know by heart!) 1) If a table has repeating sections, there is huge redundancy, different classes are mixed together, and all anomalies occur. Split the table, so that classes are clearly differentiated. Result: 1NF. 2) If a table has a combined key, non-key columns may depend on just a part of the primary key, and so there is partial functional dependency. Split the table so that in new tables non-keys depend on the entire key. Result: 2NF. 3) If a non-key depends on another non-key, there is transitive dependency. Split the table so that in new tables each non-key depends on the key and nothing but the key. Result: 3NF. 1NF: A table is in 1NF if it does not have repeating sections. 2NF: A table is in 2NF if it is in 1NF and non-key columns depend on the entire key. 3NF: A table is in 3NF if it is in 2NF and all non-key columns depend on the key only.