University of Manitoba Asper School of Business 3500 DBMS Bob Travica

Slides:



Advertisements
Similar presentations
Normalisation.
Advertisements

Chapter 10: Designing Databases
BUSINESS DRIVEN TECHNOLOGY Plug-In T4 Designing Database Applications.
Tutorial 6: normalize the following relation to 1NF, 2NF, and 3NF TransIDRentDateCustomerIDLastNamePhoneAddressVideoIDCopy#TitleRent 14/18/043Washington
Logical Data Modeling Review Lecture for University of Agder, Grimstad DAT202 Databaser (5.5.11) Judith Molka-Danielsen
The Database Approach u Emphasizes the integration of data across the organization.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-1 David M. Kroenke Database Processing Chapter 3 Normalization.
1 © Prentice Hall, 2002 Chapter 5: Logical Database Design and the Relational Model Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B.
Database Design Chapter 2. Goal of all Information Systems  To add value –Reduce costs –Increase sales or revenue –Provide a competitive advantage.
Why Normalization? To Reduce Redundancy to 1.avoid modification, insertion, deletion anomolies 2.save space Goal: One Fact in One Place.
Terms - data,information, file record, table, row, column, transaction, concurrency Concepts - data integrity, data redundancy, Type of databases – single-user,
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Michael F. Price College of Business Chapter 6: Logical database design and the relational model.
Introduction to Databases
Database Design.  Define a table for each entity  Give the table the same name as the entity  Make the primary key the same as the identifier of the.
Chapter 4: Logical Database Design and the Relational Model (Part II)
1 Copyright © 2010 Jerry Post. All rights reserved. Data Normalization (1) IS240 – DBMS Lecture # 4 – M. E. Kabay, PhD, CISSP-ISSMP Assoc. Prof.
Week 6 Lecture Normalization
SQL Normalization Database Design Lecture 5. Copyright 2006Page 2 SQL Normalization Database Design 1 st Normal Form 1 st Normal Form 2 nd Normal Form.
A Guide to SQL, Eighth Edition Chapter Two Database Design Fundamentals.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Concepts and Terminology Introduction to Database.
MIS 301 Information Systems in Organizations Dave Salisbury ( )
MIS 301 Information Systems in Organizations Dave Salisbury ( )
Database Development and Data Normalization. 2 What is a Database and a DBMS?  Database  A collection of data stored in a standardized format, designed.
Fundamentals, Design, and Implementation, 9/e. Database Processing: Fundamentals, Design and Implementation, 9/e by David M. KroenkeChapter 4/2 Copyright.
Avoiding Database Anomalies
Normalization A technique that organizes data attributes (or fields) such that they are grouped to form stable, flexible and adaptive entities.
Concepts of Database Management Sixth Edition Chapter 5 Database Design 1: Normalization.
Concepts of Database Management, Fifth Edition
1 A Guide to MySQL 2 Database Design Fundamentals.
Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to.
資料庫正規化 Database Normalization 取材自 AIS, 6 th edition By Gelinas et al.
Logical Database Design Relational Model. Logical Database Design Logical database design: process of transforming conceptual data model into a logical.
SALINI SUDESH. Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of.
Chapter 7 1 Database Principles Data Normalization Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall, Modified by Dr. Mathis 3-1 David M. Kroenke’s Chapter Three: The Relational.
CORE 2: Information systems and Databases NORMALISING DATABASES.
DBSYSTEMS Chapter 3 Data Normalization Get data properly tabled! Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba.
Copyright 2008 McGraw-Hill Ryerson 1 TECHNOLOGY PLUG-IN T5 DESIGNING DATABASE APPLICATIONS.
Chapter 3: Relational Model  Structure of Relational Databases  Normal forms (chap. 7)  Reduction of an E-R Schema to Relational (Sect. 2.9)  Relational.
1 A Guide to MySQL 2 Database Design Fundamentals.
Chapter 12: Designing Databases
Unit 4 Object Relational Modeling. Key Concepts Object-Relational Modeling outcomes and process Relational data model Normalization Anomalies Functional.
© 2005 by Prentice Hall 1 The Database Development Process Dr. Emad M. Alsukhni The Database Development Process Dr. Emad M. Alsukhni Modern Database Management.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
MIS 301 Information Systems in Organizations Dave Salisbury ( )
Concepts of Database Management Seventh Edition Chapter 5 Database Design 1: Normalization.
1 DATABASE TECHNOLOGIES (Part 2) BUS Abdou Illia, Fall 2015 (September 9, 2015)
Chapter 10 Designing Databases. Objectives:  Define key database design terms.  Explain the role of database design in the IS development process. 
Logical Database Design and the Relational Model.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 7 Normalization Hour1,2 Presented & Modified by Mahmoud Rafeek Alfarra.
Sample Table Standard Notation Entity name in uppercase
Chapter 8: Object-Relational Modeling Object-Oriented Systems Analysis and Design Joey F. George, Dinesh Batra, Joseph S. Valacich, Jeffrey A. Hoffer.
Chapter 4, Part A: Logical Database Design and the Relational Model
Lecture 4: Logical Database Design and the Relational Model 1.
Normalization ACSC 425 Database Management Systems.
Chapter 4 © 2013 Pearson Education, Inc. Publishing as Prentice Hall Chapter 4: Logical Database Design and the Relational Model Modern Database Management.
What Is Normalization  In relational database design, the process of organizing data to minimize redundancy  Usually involves dividing a database into.
Database Planning Database Design Normalization.
Normalisation Unit 6: Databases. Just to recap  What is an Entity  What is an Attribute?
Lecture # 17 Chapter # 10 Normalization Database Systems.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 4: PART C LOGICAL.
Data Normalization (1) IS240 – DBMS Lecture # 4 –
A Guide to SQL, Eighth Edition
Revised: 2 April 2004 Fred Swartz
Get data properly tabled!
Basic Database Design COSC 2328 – Web Programming.
CHAPTER 4: LOGICAL DATABASE DESIGN AND THE RELATIONAL MODEL
Presentation transcript:

University of Manitoba Asper School of Business 3500 DBMS Bob Travica Chapter 3 Data Normalization Based on G. Post, DBMS: Designing & Building Business Applications Updated 2010

Practically boils down to defining tables so that Normalization The process of putting data into the format of relational databases (or, organizing data for relational databases) Practically boils down to defining tables so that a) problems (anomalies) with insertion, deletion and modification of data are avoided b) data quality is preserved (completeness, integrity) c) redundancy is reduced

Relational Database Terminology Relational database: A collection of tables (relations). Tables store atomic data. Table: A collection of columns (attributes, properties, fields) describing an entity (class). Table is also a collection of rows (records) each with the same number of columns. Each row stores data on objects (entity instances). EmployeeID TaxpayerID LastName FirstName HomePhone Address 12512 888-22-5552 Cartom Abdul (603) 323-9893 252 South Street 15293 222-55-3737 Venetiaan Roland (804) 888-6667 937 Paramaribo Ln 22343 293-87-4343 Johnson John (703) 222-9384 234 Main Street 29387 837-36-2933 Stenheim Susan (410) 330-9837 8934 W. Maple Attributes/ Properties Rows/Objects Entity (Class): Employee Table: Employee

Relational Database Terminology – Primary Key Every table has a primary key (key) – an attribute that uniquely identifies each row (e.g., EmployeeID on previous slide) Primary key can span more than one column combined (combined, composite, concatenated) key. OrderItem OrderID ItemID Quantity 1 229 2 1 253 4 2 229 1 2 555 4 Other attributes are called non-key columns. Primary key can be generated automatically by DBMS – surrogate key. Note: Watch for data types (e.g., number vs. text) and naming rules (arbitrary but consistent).

Relational Database Shorthand Notation Primary key is underlined Non-key columns Table name Customer(CustomerID, LastName, FirstName, Address, City, State, ZipPostalCode, TelephoneNumber) * Note: Telephone number can be used as a “backup key.”

Order Management Application Non-Normalized Class Diagram Customer Order Salesperson Item OrderItem 1 * Normalized Tables Diagram, Schema Customer Order Salesperson Item 1 * OrderItem Association class (ItemOrdered, OrderDetail, etc.)

Shorthand Notation for Normalized Tables Diagram – Foreign Key Customer(CustomerID, Name, Address, City, Phone) Salesperson(EmployeeID, Name, DateHired) Order(OrderID, OrderDate, CustomerID, EmployeeID) OrderItem(OrderID, ItemID, Quantity) Item(ItemID, Description, ListPrice) Foreign Key = Attribute that is a key in another table (e.g., CustomerID in Order). Logic & naming of OrderItem: Replacing the Order-Item many-to-many relationship with two 1:M relationships. OrderItem has a combined key—OrderID+ItemID.

NORMALIZATION

“Dynamic Data” (Transaction Data) — Operations Entities Video Store Transaction Management System (VTMS): Classes, Columns & Business Rules “Dynamic Data” (Transaction Data) — Operations Entities (change more often) “Static (Master) Data”— Market &Inventory Entities (don’t change often) Customer table Key: CustomerID Attributes: Name Address Phone Video table Key: VideoID Attributes : Title RentalFee Rating… RentalTransaction table Key: TransactionID Attributes : CustomerID Date VideoRented table Key: TransactionID + VideoID Attributes: Copy# Business Rules: A customer can have many transactions… Each transaction can include many videos… A transaction can include only one copy of a particular video...

Normalized Schema for STMS In Short Hand Notation Customer(CustomerID, LastName, FirstName, Address, City, …) Transaction data stored in 2 tables due to the business rule that a rental transaction can include just 1 copy of a video. RentalTransaction(TransID, RentDate, CustomerID) VideoRented(TransID, VideoID, Copy#) Video(VideoID, Title, RentalFee)

Why Normalize – Avoiding data anomalies How to get to those four tables from the business rule? Are not these two tables enough? Customer Video * rents * Partial schema for this class diagram: Customer(CustomerID, LastName, FirstName, … VideoID, Date) Video(VideoID, Title, RentalFee) Not good because: Transaction data would have to be part of table Customer (or Video), which causes repetition of Customer data for each transaction—redundancy. Deletion of transaction data causes deletion of customer data— deletion anomaly. New customers cannot be added because VideoID as part of the key in Video cannot be empty —insertion anomaly.

Normalization Customer Video * rents * RentalTransaction 1 * has Rule of Thumb: Each many-to-many relationship must be replaced by 2 one-to-many relationships (see Customer-Order-Item above). Customer Video * rents * RentalTransaction 1 * has includes 1. How to track different copies of same video? Still M:M Customer Video RentalTransaction 1 * has contains VideoRented 1 includes* is rented 2. Table VideoRented tracks each copy of a particular video. Multiplicity on the video side is forced down to 1, which enforces the business rule that only 1 copy of a video can be rented out in a transaction (slides 9 & 10).

Normalization – Step by Step RentalForm(TransID, RentDate, (CustomerID, Name, Address, City, State, …), (VideoID, Copy#, Title, RentalFee)) Interview users, understand output needed. Put data into a large table (RentalForm). Pick out attributes. Find repeating groups. Look for potential keys. Identify computed values.

Problems with Repeating Groups (Sections) RentalForm(TransID, RentDate, (CustomerID, Phone, Name, Address, City, State, …), (VideoID, Copy#, Title, Rent)) Repeating Groups TransID RentDate CustomerID LastName Phone Address VideoID Copy# Title Rent 1 4/18/02 3 Washington 502-777-7575 95 Easy Street 1 2 2001: A Space Odyssey $1.50 1 4/18/02 3 Washington 502-777-7575 95 Easy Street 6 3 Clockwork Orange $1.50 2 4/30/02 7 Lasater 615-888-4474 67 S. Ray Drive 8 1 Hopscotch $1.50 2 4/30/02 7 Lasater 615-888-4474 67 S. Ray Drive 2 1 Apocalypse Now $2.00 2 4/30/02 7 Lasater 615-888-4474 67 S. Ray Drive 6 1 Clockwork Orange $1.50 Problems: Insertion Anomaly: Inserting a Customer creates blank space in video and transactions columns. With VideoID as part of key, customer and video data must be inserted at the same time. Deletion Anomaly: Delete transaction data => delete customer and video data. Useless redundancy & wasted storage. If there are repeating sections, the table is not in the first normal form (1NF).

First Normal Form (1NF) 1NF: A table is in 1NF if it does not have repeating sections. Normalization Procedure: Remove repeating sections by splitting the initial table into new tables. Link new tables on the key from the initial table. RentalTransaction(TransID, RentDate) Video(TransID, VideoID, Copy#, Title, RentalFee) Customer(TransID, CustomerID, Phone, Name, Address, City, State, ZipCod) New Reminder of initial table

Problems with First Normal Form Apply only to tables with concatenated keys: TransID VideoID Copy# Title RentalFee 1 1 2 2001: A Space Odyssey $1.50 1 6 3 Clockwork Orange $1.50 2 8 1 Hopscotch $1.50 2 2 1 Apocalypse Now $2.00 2 6 1 Clockwork Orange $1.50 Video There are problems concerning the relationship between the key and non-keys. Concept of Functional Dependence: An attribute depends on another attribute if changing the later causes a change of the former. The key column must be sufficient for determining values of the non-key columns.

Problems with First Normal Form (cont.) Copy# depends on full key (TransID + VideoID) -- Full Functional Dependency on the key. Combined determine Video(TransID, VideoID, Copy#, Title, RentalFee) Sufficient to determine VideoID is sufficient for predicting titles and rental fees. There is Partial Functional Dependency between the combined key and Title and RentalFee. If any non-key column depends just on a part of the key (there is partial functional dependence), the table is not in 2NF.

Second Normal Form (2NF) 2NF: A table is in 2NF if it is (a) is 1NF and (b) non-key columns depend on the entire key. Normalization Procedure: Move TransID and Copy# into a new table VideoRented. Preserve a link between Video and VideoRented by importing VideoID in table VideoRented. Video(TransID, VideoID, Copy#, Title, RentalFee) move export VideoRented(TransID, VideoID, Copy#) New Video(VideoID, Title, RentalFee) Resulting Video table

Finalize 2NF… Table Customer must also be brought into 2NF by moving TransID into table RentalTransaction (already there) and exporting CustomerID. Customer(TransID, CustomerID, Phone, Name, Address, City, State,…) RentalTransaction(TransID, RentDate, CustomerID) move export Completed Resulting Customer table Customer(CustomerID, LastName, FirstName, Address, City, …)

Third Normal Form (3NF) Problems with 3NF: If any non-key depends on some other non-key there is transitive dependency and the table is not in 3NF. 3 NF: Table is in 3NF if it is (a) in 2NF, and (b) each non-key attribute depends on the key only. Our design is already in 3NF! Customer(CustomerID, LastName, FirstName, Address, City, …) VideoRented(TransID, VideoID, Copy#) Video(VideoID, Title, RentalFee) RentalTransaction(TransID, RentDate, CustomerID)

3NF Example Table in 2NF: Sales(CustomerID, CustomerName, Salesperson, Region) Violation of 3NF: Region (non-key) is dependent on Salesperson. Solution – split table into 12 tables: : Sales(CustomerID, CustomerName, Salesperson) Salesperson(Salesperson, Region) Forms beyond the 3rd are very rare and reaching 3NF is sufficient for practical purposes.

Schema for VSTMS Allowing Multiple Copies per Transaction Customer(CustomerID, LastName, FirstName, Address, City, …) Video(VideoID, Title, RentalFee) RentalTransaction(TransID, CustomerID, VideoID, RentDate) 1 * * 1 Note: Video key can be made unique: VideoID = 85.1 (decimal place designates a copy), or 85c1 (text type), or use a bar code for each video and copy (ItemID).

Normalization Summary (Must know!) 1) If a table has repeating sections, there is huge redundancy and different classes are mixed together. Split the table, so that classes are clearly differentiated. Result: 1NF. 1NF: A table is in 1NF if it does not have repeating sections. 2) If a table has a combined key, non-key columns may depend on just a part of the primary key, and so there is partial functional dependency. Split the table so that in new tables non-keys depend on the entire key. Result: 2NF. 2NF: A table is in 2NF if it is in 1NF and non-key columns depend on the entire key. 3) If a non-key depends on another non-key, there is transitive dependency. Split the table so that in new tables each non-key depends on the key and nothing but the key. Result: 3NF. 3NF: A table is in 3NF if it is in 2NF and all non-keys depend on the key only.