Normalisation Africamuseum 5 June 2013. What is ‘Normalisation’?  Theoretical: satisfying the requirements of the different ‘Normal Forms’, as spelled.

Slides:



Advertisements
Similar presentations
Normalisation.
Advertisements

Relational Terminology. Normalization A method where data items are grouped together to better accommodate business changes Provides a method for representing.
 Definition  Components  Advantages  Limitations Contents  Definition Definition  Normal Forms Normal Forms  First Normal Form First Normal Form.
RJP/RDA 1 /93 Relational Data Analysis (RDA) RDA organises all the system’s data items into a set of well NORMALISED relations. These should avoid: 1.
Normalization What is it?
Topic Database Normalisation S McKeever Advanced Databases 1.
Normalisation Ensuring data integrity in database design 1.
Database table design Single table vs. multiple tables Sen Zhang.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
Boyce-Codd Normal Form Kelvin Nishikawa SE157a-03 Fall 2006 Kelvin Nishikawa SE157a-03 Fall 2006.
CS263:Revision on Normalisation
Normalisation up to 1NF Bottom-up Approach to Data Modelling.
1 NORMALISATION. 2 Introduction Overview Objectives Intro. to Subject Why we normalise 1, 2 & 3 NF Normalisation Process Example Summary.
Normalisation up to 1NF Bottom-up Approach to Data Modelling.
Chapter 5 Normalization of Database Tables
Project and Data Management Software
Why Normalization? To Reduce Redundancy to 1.avoid modification, insertion, deletion anomolies 2.save space Goal: One Fact in One Place.
Normalization Quiz Tao Li Grant Horntvedt. 1. Which of the following statements is true: a. Normal forms can be derived by inspecting the data in various.
Page 1 ISMT E-120 Introduction to Microsoft Access & Relational Databases The Influence of Software and Hardware Technologies on Business Productivity.
Page 1 ISMT E-120 Desktop Applications for Managers Introduction to Microsoft Access.
Normalization of Tables “Between two evils, choose neither; between two goods, choose both.” Tryon Edwards.
Week 6 Lecture Normalization
XP Chapter 1 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Level 3 Objectives: Identifying and Eliminating Database.
Modelling Techniques - Normalisation Description and exemplification of normalisation.Description and exemplification of normalisation. Creation of un-normalised.
CREATE THE DIFFERENCE Normalisation (special thanks to Janet Francis for this presentation)
Relational Database Concepts. Let’s start with a simple example of a database application Assume that you want to keep track of your clients’ names, addresses,
Access Primer UoN workshop Naivasha, 30 July – 4 August 2006.
Representing taxonomy MarBEF-IODE workshop Oostende, March 2007.
Access Primer Africamuseum 5 June MS Access  Relational Database Management System Data/information resides in series of related tables Principle.
Avoiding Database Anomalies
Database Systems: Design, Implementation, and Management Tenth Edition
RDBMS Concepts/ Session 3 / 1 of 22 Objectives  In this lesson, you will learn to:  Describe data redundancy  Describe the first, second, and third.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 6 Normalization of Database Tables.
The Relational Model and Normalization R. Nakatsu.
Normalization. Learners Support Publications 2 Objectives u The purpose of normalization. u The problems associated with redundant data.
Module III: The Normal Forms. Edgar F. Codd first proposed the process of normalization and what came to be known as the 1st normal form. The database.
Normalization (Codd, 1972) Practical Information For Real World Database Design.
SALINI SUDESH. Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of.
M Taimoor Khan Course Objectives 1) Basic Concepts 2) Tools 3) Database architecture and design 4) Flow of data (DFDs)
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall, Modified by Dr. Mathis 3-1 David M. Kroenke’s Chapter Three: The Relational.
CORE 2: Information systems and Databases NORMALISING DATABASES.
1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.
Copyright © 2005 Ed Lance Fundamentals of Relational Database Design By Ed Lance.
M1G Introduction to Database Development 4. Improving the database design.
In this session, you will learn to: Describe data redundancy Describe the first, second, and third normal forms Describe the Boyce-Codd Normal Form Appreciate.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
Database Design Normalisation. Last Session Looked at: –What databases were –Where they are used –How they are used.
Normalization. 2 u Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data,
Extending the biogeographical model Africamuseum 6 (7?) June 2013.
IST Database Normalization Todd Bacastow IST 210.
Ch 7: Normalization-Part 1
1 CS 430 Database Theory Winter 2005 Lecture 7: Designing a Database Logical Level.
NormalisationNormalisation Normalization is the technique of organizing data elements into records. Normalization is the technique of organizing data elements.
Normalization ACSC 425 Database Management Systems.
NORMALIZATION Handout - 4 DBMS. What is Normalization? The process of grouping data elements into tables in a way that simplifies retrieval, reduces data.
Logical Database Design and Relational Data Model Muhammad Nasir
SLIDE 1IS 257 – Fall 2006 Normalization Normalization theory is based on the observation that relations with certain properties are more effective.
MS Access. Most A2 projects use MS Access Has sufficient depth to support a significant project. Relational Databases. Fairly easy to develop a good user.
Normalisation Unit 6: Databases. Just to recap  What is an Entity  What is an Attribute?
Systems Analysis & Design Methods III Classic normalization rules for relational databases III Classic normalization rules for relational databases.
NORMALISATION OF DATABASES. WHAT IS NORMALISATION? Normalisation is used because Databases need to avoid have redundant data, which makes it inefficient.
Relational Databases – Further Study I think we’ve covered all you need to know for GCSE about relational databases I’m not aware of any practical coursework.
Normal Forms 1NF – A table that qualifies as a relation is in 1NF. (Back)(Back) 2NF – A relation is in 2NF if all of its nonkey attributes are dependent.
Dr Gordon Russell, Napier University Normalisation 1 - V2.0 1 Normalisation 1 Unit 3.1.
Revised: 2 April 2004 Fred Swartz
Database Normalization
Chapter 4.1 V3.0 Napier University Dr Gordon Russell
Sampath Jayarathna Cal Poly Pomona
Review of Week 3 Relation Transforming ERD into Relations
Normalisation 1 Unit 3.1 Dr Gordon Russell, Napier University
Presentation transcript:

Normalisation Africamuseum 5 June 2013

What is ‘Normalisation’?  Theoretical: satisfying the requirements of the different ‘Normal Forms’, as spelled out by (mainly) E.F. Codd  Practical: make sure data is in your database once and only once Repeated data go to separate table Relationships between the tables are part of the ‘model’ of the database

Earlier example Species# legs# eyesplaceCountrydate Asterias rubens50OostendeBelgium12/3/2004 Asterias rubens50ZeebruggeBelgium13/3/2005 Asterias rubens50ZeebruggeBelgium14/3/2005 Cancer pagurus102De PanneBelgium12/3/2004 Cancer pagurus102OostendeBelgium12/3/2004 Cancer pagurus102ZeebruggeBelgium14/3/2004 Asterias rubens50WimereuxFrance13/3/2005 Asterias rubens50WimereuxFrance14/3/2005 Cancer pagurus102WimereuxFrance12/3/2004

Why normalise  Save space on disk by avoiding repetition But huge disk space makes this less important Zipping would replace repeated strings by a code  Avoid ‘modification anomalies’  Make model intuitive and informative  Make database unbiased with respect to patterns of querying

Modification anomalies  Update anomalies Potential source of conflicting data  Insertion anomalies Some relevant data can’t be stored  Deletion anomalies Some relevant data are lost while deleting other data

Update anomalies If data is present more than once, it’s possible to create conflicting information by updating one version of he data and not the other Species# legs# eyesplaceCountrydate Asterias rubens60OostendeBelgium12/3/2004 Asterias rubens50ZeebruggeBelgium13/3/2005 Asterias rubens51ZeebruggeFrance14/3/2005

Insertion anomalies If two concepts are mixed in one table, we can’t store information on new items of one type, unless we have at the same time information on the other Species# legs# eyesplaceCountrydate Asterias rubens50OostendeBelgium12/3/2004 Asterias rubens50ZeebruggeBelgium13/3/2005 Asterias arenata50

Deletion anomalies If two concepts are mixed in one table, we loose information on a concept if the last instance of the other concept is deleted Species# legs# eyesplaceCountrydate Asterias rubens50OostendeBelgium12/3/2004 Asterias rubens50ZeebruggeBelgium13/3/2005 Asterias arenata50ZeebruggeBelgium13/3/2005

Making model more intuitive A good model should reflect the reality it tries to mirror, including the relationships between the entities. Separate entities in real life (can be abstract) should be modelled separately Species# legs# eyesplaceCountrydate Asterias rubens50OostendeBelgium12/3/2004 Asterias rubens50ZeebruggeBelgium13/3/2005 Sharedbiologicalbiogeographical

… and robust  Entries in a database should be ‘atomic’ Should not be a combination of several smaller entities such as ‘Oostende, Belgium’ Contain no qualifiers (such as Asterias cfr rubens; Asterias ?rubens…) Not be dependent on the value of another field Not contain repeated values (e.g. several authors for a multi-author publication)

Avoid bias  Asterias rubens Oostende, Belgium, 12/3 Zeebrugge, Belgium, 13/3 Wimereux, France, 13/3  Asterias arenata Den Osse, Netherlands, 17/3  Cancer pagurus Oostende, Belgium, 12/3 De Panne, Belgium, 12/3 Den Osse, Netherlands, 14/5  Abra alba Oostende, Belgium, 14/5 A ‘nested list’ is easier to query on the grouping factor of the list. It is easy to find in which countries Asterias rubens occurs; to find out which species occur in say France, we must read our complete database

The formal process The key, The whole key, And nothing but the key… So help me (E.F.) Codd

N1NF (non-1 Normal Form)  Asterias rubens Oostende, Belgium, 12/3 Zeebrugge, Belgium, 13/3 Wimereux, France, 13/3  Asterias arenata Den Osse, Netherlands, 17/3  Cancer pagurus Oostende, Belgium, 12/3 De Panne, Belgium, 12/3 Den Osse, Netherlands, 14/5  Abra alba Oostende, Belgium, 14/5

N1NF  Structure of the ‘table’: drs (species, legs, eyes, place1, country1, date1, place2, country2, date2, place3, country3, date3)  Entries are not atomic, difficult to query  What if we have a fourth distribution record??

1NF Species# legs# eyesplaceCountrydate Asterias rubens50OostendeBelgium12/3/2004 Asterias rubens50ZeebruggeBelgium13/3/2005 Asterias rubens50ZeebruggeBelgium14/3/2005 Cancer pagurus102De PanneBelgium12/3/2004 Cancer pagurus102OostendeBelgium12/3/2004 Cancer pagurus102ZeebruggeBelgium14/3/2004 Asterias rubens50WimereuxFrance13/3/2005 Asterias rubens50WimereuxFrance14/3/2005 Cancer pagurus102WimereuxFrance12/3/2004

1NF: the key  A distribution record (a line in our table) is unique when taking into account species, place and date drs (species, place, date, legs, eyes, country) Table names are usually plural, field (column) names singular. In this type of analysis keys are underlined

2NF: the whole key  Moving repeating groups to separate entities, and looking for a key for that entity: remove entities that are dependent only on part of the compound key Distribution records (species, place, date) Species (species, legs, eyes) Places (place, country)

2NF: foreign keys  The one original table was split in three Distribution records (drs), species, places  Table drs and species share a field, species, that allow us to find related records Field species is foreign key in table drs Same with drs and places  Species and places can be populated from reference tables (CoL; Gazetteer)

3NF: nothing but the key  Moving attributes that are functionally dependent on non-key attribute  Possible structure (in this case same as 2NF) Distribution records (species, place, date) Places (place, country) Species (species, legs, eyes)

Elaborating further: IDs  Key of drs is compound, composed of three fields – better to replace with a ‘synthetic’ key (id – ‘autonumber’ or ‘sequence’)  Keys of ‘places’ and ‘species’ are names with real meaning; anything with meaning in real life can change, so also better to replace with artificial key

Elaborating further: traits  Our database now has information on number of legs and number of eyes. What if we want to start storing colour? Requires rewrite of the database  Alternative: split out data on biological traits in table with ‘property/value’ pairs Species (id, species, author, parent_id…) Traits (species_id, trait, value)

Model

Remarks  Sometimes it is better not to normalise completely Surname & first name as 1 attribute instead of 2 Calculated fields to speed up queries  Sometimes it is better to denormalise completely Exchange formats such as Darwin Core

Final remarks  Normalisation is a means, not a goal  Intelligent denormalising is as much an art as normalising!