1 Database Design - Normalization u Normalization are a set of techniques for organizing data into tables in order to... –Eliminate most redundancy –Prevent.

Slides:



Advertisements
Similar presentations
Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.
Advertisements

Author: Graeme C. Simsion and Graham C. Witt Chapter 4 Subtypes & Supertypes.
Designing MS-Access Tables
Chapter 5 Normalization of Database Tables
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Relational Terminology. Normalization A method where data items are grouped together to better accommodate business changes Provides a method for representing.
Normalization What is it?
Normalization of Database Tables
The Relational Database Model:
Normalization of Database Tables
Chapter 11 Data Management Layer Design
Chapter 5 Normalization of Database Tables
Databases 6: Normalization
NORMALIZATION N. HARIKA (CSC).
Management Information Systems MS Access 2003 By: Mr. Imdadullah Lecturer, Department of M.I.S. College of Business Administration, KSU.
Advanced Tables Lesson 9. Objectives Creating a Custom Table When a table template doesn’t suit your needs, you can create a custom table in Design view.
Normalization. Introduction Badly structured tables, that contains redundant data, may suffer from Update anomalies : Insertions Deletions Modification.
Week 6 Lecture Normalization
Modelling Techniques - Normalisation Description and exemplification of normalisation.Description and exemplification of normalisation. Creation of un-normalised.
CREATE THE DIFFERENCE Normalisation (special thanks to Janet Francis for this presentation)
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
Concepts and Terminology Introduction to Database.
Data and its manifestations. Storage and Retrieval techniques.
Management Information Systems MS Access MS Access is an application software that facilitates us to create Database Management Systems (DBMS)
Database Systems: Design, Implementation, and Management Tenth Edition
RDBMS Concepts/ Session 3 / 1 of 22 Objectives  In this lesson, you will learn to:  Describe data redundancy  Describe the first, second, and third.
Concepts of Database Management, Fifth Edition
5 1 Chapter 5 Normalization of Database Tables Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 6 Normalization of Database Tables.
1 DATABASE SYSTEMS DESIGN IMPLEMENTATION AND MANAGEMENT INTERNATIONAL EDITION ROB CORONEL CROCKETT Chapter 7 Normalisation.
Normalization (Codd, 1972) Practical Information For Real World Database Design.
Concepts of Relational Databases. Fundamental Concepts Relational data model – A data model representing data in the form of tables Relations – A 2-dimensional.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 5 Normalization of Database.
Database Normalization Lynne Weldon July 17, 2000.
MS Access: Creating Relational Databases Instructor: Vicki Weidler Assistant: Joaquin Obieta.
Normalization Information Systems II Ioan Despi. Informal approach Building a database structure : A process of examining the data which is useful & necessary.
Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Chapter - 2 Basics of Sound Structure Author: Graeme C. Simsion and Graham C. Witt.
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
MS Access 2007 Management Information Systems 1. Overview 2  What is MS Access?  Access Terminology  Access Window  Database Window  Create New Database.
(Spring 2015) Instructor: Craig Duckett Lecture 10: Tuesday, May 12, 2015 Mere Mortals Chap. 7 Summary, Team Work Time 1.
Next Back A-1 Management Information Systems for the Information Age Second Canadian Edition Copyright 2004 The McGraw-Hill Companies, Inc. All rights.
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
Introduction to Database using Microsoft Access 2013 Part 7 November 19, 2014.
M1G Introduction to Database Development 4. Improving the database design.
Handling Many to Many Relationships. 2 Handling Many:Many Relationships Aims: To explain why M:M relationships cannot be implemented in relational database.
Component 4/Unit 6d Topic IV: Design a simple relational database using data modeling and normalization Description and Information Gathering Data Model.
Normalization of Database Tables
In this session, you will learn to: Describe data redundancy Describe the first, second, and third normal forms Describe the Boyce-Codd Normal Form Appreciate.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
Database Design Normalisation. Last Session Looked at: –What databases were –Where they are used –How they are used.
A337 - Reed Smith1 Structure What is a database? –Table of information Rows are referred to as records Columns are referred to as fields Record identifier.
Normalisation RELATIONAL DATABASES.  Last week we looked at elements of designing a database and the generation of an ERD  As part of the design and.
Flat Files Relational Databases
Data modeling Process. Copyright © CIST 2 Definition What is data modeling? –Identify the real world data that must be stored on the database –Design.
Sample Table Standard Notation Entity name in uppercase
Ch 7: Normalization-Part 1
1 CS 430 Database Theory Winter 2005 Lecture 7: Designing a Database Logical Level.
Access Module Implementing a Database with Microsoft Access A Great Module on Your CD.
Logical Database Design and Relational Data Model Muhammad Nasir
5 1 Chapter 5 Normalization of Database Tables Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
SLIDE 1IS 257 – Fall 2006 Normalization Normalization theory is based on the observation that relations with certain properties are more effective.
1 The Relational Data Model David J. Stucki. Relational Model Concepts 2 Fundamental concept: the relation  The Relational Model represents an entire.
Normalisation Unit 6: Databases. Just to recap  What is an Entity  What is an Attribute?
1 Entity Relationship Approach u Top-down approach to data modeling u Uses diagrams u Normalization - confirms technical soundness u Entity Relationship.
1 CS490 Database Management Systems. 2 CS490 Database Normalization.
N5 Databases Notes Information Systems Design & Development: Structures and links.
Normalization Karolina muszyńska
Database Normalization
Relational Database Model
Presentation transcript:

1 Database Design - Normalization u Normalization are a set of techniques for organizing data into tables in order to... –Eliminate most redundancy –Prevent incompleteness u Much of this presentation is devoted to a rather long example... will work through it together

2 Two Steps to Normalization u 1.Put the data into tabular form (by removing repeating groups) u 2.Remove duplicated data into separate tables

3 First - A Simple Example u Graphic Employee Qualifications

4 Put Data Into Tabular Form u Problem: What do we do if an employee has more than one qualification? u Graphic 2-2

5 1. Divide Data Into Two Tables u Employee Table –Emp#, Name, Dept#, DeptName, DeptLocation u Qualification Table –Emp#, Qualification Desc, Qualification Year u Graphic 2-3

6 2. Remove Duplicated Data u Notice that Dept. Number "05" is "Auditing" and is located at "HO" u It is repeated for every employee in that department –This wastes space –Makes updating data more complicated –"Elegance" rule is violated

7 Whose Data is it Anyway? u The basic problem is that department name and location are really data about departments rather than employees u It belongs in a separate Department table u Graphic 2-4

8 A Very Basic Example u This example was presented informally u The rules of normalization have their foundation in mathematics –On one hand, we can have confidence in normalization as a technique –On the other hand, it's very easy to become lost in mathematical terminology and proofs –Remember that data modeling is "design" and we should be careful about anything that lead us to one "right" answer.

9 Relational Notation u The sample tables we've seen so far take up a lot of space u We need a more concise notation u If we eliminate the sample rows, we are left with the table names and columns u Graphic 2-5

10 Relational Notation - cont. u Text books usually use data displayed as in Graphic 2-5 u Designers usually want a little more information u Graphic 2-6

11 Exercise u Normalize the database shown in the graphic u Take about 10 minutes u Remember two steps... –Put the information in tabular format –Eliminate redundant information u Graphic 2-E1

12 A More Complex Example u We'll introduce the rules of normalization as we proceed u The rules can be daunting at first, but we'll look at the problems they solve

13 Hospital Survey Example u The form displayed in Graphic 2-7 is one used in an actual survey of antibiotic drug usage in Australian hospitals u Survey used to determine which drugs and dosages were being used for various operations, to ensure that patients were properly prescribed for, and that the public was not paying for unnecessary drugs

14 One Form for Each Operation u Each hospital in the survey was given a unique hospital number u All hospital numbers were prefixed with 'H' u Hospitals fell into three categories: –'P' for public –'V for private –'T' for training »All training hospitals were public so, 'T' implied 'P'

15 Operations u All Operation Numbers were assigned sequentially be each hospital u Operation Code was a standard international code for the named operation u Procedure Group was a broader classification

16 Surgeons u Surgeon Number was allocated by individual hospitals to allow surgeons to remain anonymous u The prefix 'S' stood for surgeon

17 Drugs u The Total Drug Cost was the total cost of all drug doses for the operation u The bottom of the form recorded the individual antibiotic drugs used in the operation u The Drug Code consisted of the short name for the drug and the size of the dose

18 As the Study Continued... u... It was decided to replace the heaps of forms with a computerized database u Graphic 2-8 shows the initial database design using the relational notation u This was done by one person, who was the data modular, the physical database designer, and the programmer

19 Exercise u Normalize the database as shown in Graphic 2-8 u Take about 15 minutes u Remember two steps... –Put the information in tabular format –Eliminate redundant information

20 Determining Columns u Normalization relies on certain assumptions about the way data is represented –We need to make sure that these are valid u There are some problems normalization does not solve –It is better to address these at the outset, rather than carry excess baggage through the whole normalization process u The following steps are necessary...

21 One Fact per Column u First we make sure that each column in the table represents one fact only –The Drug code column holds both a short name for the drug and a dosage size –The Dosage size consists of a numeric size and a unit of measure –The Hospital Category really provides two facts »Hospital Public or Private? »Does the hospital provide training?

22 Hidden Data u Make sure that we have not lost any data in the translation to tabular form u Common problem: We cannot rely on the rows of the table being stored in any particular order –Suppose the original survey forms had been filed in order of return –We would need a Return Date or Sequence column

23 Derivable Data u We need to remove any data that can be derived from other data in the table and amend the columns accordingly –Remember our basic objective is non- redundancy –The Total drug cost was derived by adding together the Drug Costs multiplied by the Number of Doses

24 Determining the Key u Key - a minimal set of columns that hold a different combination of values for each row in the table u The value of the Key uniquely identifies one row on the table –The combination of Hospital Number and Operation number –In relational notation, underline the key column(s)

25 Exercise u Using Graphic 2-8, Clean up the design (i.e. list all the columns that are valid in the new design) u Remember... –One fact per column –Don't lose hidden data –Remove data that can be derived –Determine the Key u Take about 10 minutes

26 After Tidying Up... u Graphic 2-9

27 Repeating Groups & 1st Normal Form u Our first task was to put the data in tabular format… –it might seem like we've done this –but we've actually hidden a problem about the drug administered data

28 Limiting Number of Occurrences u The drug administration data is the major cause of the tables complexity and inelegance –Drug Short Name 2, Drug Short Name 4 etc. u The columns needed to accommodate up to four drugs account for most of the complexity

29 Why Only Four Drugs? u Why not five or six or more? u Four drugs represented the maximum arrived at by asking one of the survey teams "What would be the maximum number of drugs used in an operation?" u In fact, this number was frequently exceeded –some operations had 10 or more

30 What's the Problem? u Part of the problem is that the question was poorly put… a line on the form was required for each drug-dose combination, rather than just for each different drug u The maximum number of drugs could increase later, so this model still rates poorly against the stability criterion

31 Paper Over the Differences u With the paper form, a continuation sheet was simply attached when more drugs were needed u We could add more columns to the table easily, but the application program changes would be much more difficult

32 What Was Done... u The original designer decided to handle continuations by suffixing the operation number with "a", "b", or "c" to indicate a continuation u This caused program changes and comprised the original simplicity of the system

33 Data Reusability & Program Complexity u The main difficulties are with data reusability and program complexity u Program can easily answer questions… –How many operations were performed by neurosurgeons? –Which hospital is spending the most money on drugs? u But not … –How much money was spent on Ampicillin?

34 Another Way u You might argue that some queries are always going to be more difficult than others u What would happen if we had designed the table on the basis of "one row per drug"?

35 Recognizing Repeating Groups u A Set of columns repeated a number of times - a "repeating group" - results in… –inflexibility –complexity –poor data reusability u Graphic 2-10 –curly brackets indicate a repeating group with an indefinite number of occurrences

36 Solution u A general and flexible solution should not set any limit on the maximum number of occurrences u It should also handle of few or no occurrences (the drug-free operation)

37 Removing Repeating Groups u First step of Normalization: –Put the data in tabular form by identifying and eliminating repeating group u Split the file into two tables –basic operation data –(repeating) drug administration table

38 Split into Two Tables u Remove all repeating group columns to a new table (each occurrence of group is a row in the new table u Include the primary key of the original table in the new table (this makes a foreign key in the new table)

39 Split into Two Tables - cont. u Add a 'Sequence' column if needed u Name the new table u Identify and underline the primary key of the new table

40 Exercise u Using Graphic 2-10 remove all repeating groups

41 Solution u Graphic 2-11

42 Determining Key of New Table u Not always an easy task u Question: What is the minimum number of combination of columns needed to uniquely identify one row?

43 A Six Column Key u Hospital Number - FK u Operation Number - FK u Drug Short Name u Dose u Unit of Measure u Method of Administration

44 First Normal Form u Our tables are now technically in first normal form (1NF). What have we achieved? –All data of the same kind in now held in the same place –The number of different drug dosages that can be recorded for an operation is effectively unlimited –An operation that don't use any drugs is allowed

45 Problems with First Normal Form u Look at Operation table in Graphic 2-11 u Every row for an operation (e.g. Hospital number 17) will contain the facts that its name is St. Vincent's and Fred Fleming is the contact person u Criterion of non-redundancy is not being met

46 More Problems u Change a fact about a hospital and you will have to change it for every operation in the hospital u If we delete the last operation for the hospital, then we effectively delete the hospital

47 Eliminating Redundancy u Solve all these problems by removing the hospital information to a separate table, in which each hospital number appears once only (which becomes the primary key) u Graphic 2-12

48 Determinants u For a given hospital number there could be only one... –hospital name –hospital type –contact person –training status u The hospital number is a determinant for the other columns

49 Formal Procedure u 1.Identify and determinants (other than the primary key) and the columns they determine u 2.Create a separate table for each determinant and its columns (the determinant becomes the primary key) u 3.Name the new table u 4.Remove the determined columns from the original table, leave the determinant

50 Other Determinants u Hospital number + Surgeon Number => Surgeon Specialty u Operation code => Operation Name, Procedure Group u Drug Short Name => Drug Name, Manufacturer u Drug Short Name + Method of Administration + Size of Dose + Unit of Measure = Cost of Dose

51 Exercise u Finish normalizing Graphic 2-10

52 Third Normal Form u Graphic 2-13 is the final model u This model is in Third Normal Form (3NF)

53 Where is Second Normal Form? u Our approach took us directly to Third Normal Form u Most texts treat this as a two stage process –deal first with the determinants that are part of the table's key (Second Normal Form) –then with the non-key determinants

54 Is Third Normal Form the End? u Unfortunately, no u Boyce-Codd Normal Form u Fourth Normal Form u Fifth Normal Form u We'll discuss later...

55 Third Normal Form Hint u Every non-key column must be a fact about the key, the whole key, and nothing but the key

56 Candidate Keys u Sometime more than one column or combination of columns could serve as a primary key (e.g. Drug Name instead of Drug Short Name) u These are called Candidate Keys u In step 1 we should really say... –Identify any determinants, other than candidate keys...

57 Foreign Keys u When we removed repeating groups we carried the primary key of the original table to cross-reference back u These columns are called Foreign Keys

58 Self-Referencing Tables u A Foreign Key may refer back to the Primary Key of the same table –Example: An Employees table might have employee id as its Primary Key and a Foreign Key of Manager id which refers to another row in the same table u The convention to represent Foreign Keys is an asterisk... –EMPLOYEE (Employee Id, Name, Manager id*

59 Referential Integrity u If the Employees table has a Department Id foreign key referring back to the Departments table's Primary key, then we would expect to find a valid Department Id on the Departments table for every value of Department id in the Employees table u If not, our database lacks Referential Integrity

60 Summary u Normalization is a set of techniques for organizing data into tables to eliminate certain types of redundancy an incompleteness u Normalization relies on correct identification of determinants and keys

61 Last Slide - Normalization u Assignment #9 due next week