Digital recordkeeping and preservation I

Slides:



Advertisements
Similar presentations
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Advertisements

First Normal Form Second Normal Form Third Normal Form
1 Class Agenda (04/03 and 04/08)  Review and discuss HW #8 answers  Present normalization process Enhance conceptual knowledge of database design. Improve.
Database table design Single table vs. multiple tables Sen Zhang.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Introduction to Schema Refinement. Different problems may arise when converting a relation into standard form They are Data redundancy Update Anomalies.
1 DATABASE TECHNOLOGIES BUS Abdou Illia, Fall 2007 (Week 3, Tuesday 9/4/2007)
Normalization of Tables “Between two evils, choose neither; between two goods, choose both.” Tryon Edwards.
Week 6 Lecture Normalization
1 Class Agenda (11/07 and 11/12)  Review HW #8 answers  Present normalization process Enhance conceptual knowledge of database design. Improve practical.
Concepts and Terminology Introduction to Database.
MIS 301 Information Systems in Organizations Dave Salisbury ( )
MIS 301 Information Systems in Organizations Dave Salisbury ( )
The Relational Model and Normalization R. Nakatsu.
Normalization (Codd, 1972) Practical Information For Real World Database Design.
Concepts of Relational Databases. Fundamental Concepts Relational data model – A data model representing data in the form of tables Relations – A 2-dimensional.
DAY 12: DATABASE CONCEPT Tazin Afrin September 26,
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
Slide Chapter 5 The Relational Data Model and Relational Database Constraints.
System Design System Design - Mr. Ahmad Al-Ghoul System Analysis and Design.
M1G Introduction to Database Development 4. Improving the database design.
ITN Table Normalization1 ITN 170 MySQL Database Programming Lecture 3 :Database Analysis and Design (III) Normalization.
Relational Database. I. Relational Database 1. Introduction 2. Database Models 3. Relational Database 4. Entity-Relationship Models 5. RDB Design Principles.
Database Design Normalisation. Last Session Looked at: –What databases were –Where they are used –How they are used.
Logical Database Design and the Relational Model.
IST Database Normalization Todd Bacastow IST 210.
Lecture 4: Logical Database Design and the Relational Model 1.
1 The Relational Data Model David J. Stucki. Relational Model Concepts 2 Fundamental concept: the relation  The Relational Model represents an entire.
IT 5433 LM3 Relational Data Model. Learning Objectives: List the 5 properties of relations List the properties of a candidate key, primary key and foreign.
Digital recordkeeping and preservation I
Digital recordkeeping and preservation I
Database commands : DDL
CSIS 115 Database Design and Applications for Business
Logical Database Design and the Rational Model
Databases Chapter 9 Asfia Rahman.
Modeling Constraints Extracting constraints is what modeling is all about. But how do we express them? Examples: Keys: social security number uniquely.
Database, tables and normal forms
Relational Databases Chapter 4.
Chapter 4 Logical Database Design and the Relational Model
Database to XML extractions
Digital recordkeeping and preservation I
Databases Chapter 16.
Database Normalization
Chapter 5: Logical Database Design and the Relational Model
Functional Dependencies
CIS 155 Table Relationship
Database Management Systems (DBMS)
Normalizing an Existing Table
Chapter 4 Relational Databases
Introduction to MS Access: creating tables, keys, and relationships
Database Systems Instructor Name: Lecture-12.
Translation of ER-diagram into Relational Schema
What is a Database and Why Use One?
Chapter 9 Designing Databases
Introduction lecture1.
© 2011 Pearson Education, Inc. Publishing as Prentice Hall
Normalization Referential Integrity
Database Fundamentals
Database Design ERD and Normalisation
System Analysis and Design
Functional Dependencies and Normalization
Chapter 4.1 V3.0 Napier University Dr Gordon Russell
CHAPTER 4: LOGICAL DATABASE DESIGN AND THE RELATIONAL MODEL
Normalization DB Design Guidelines Presented by: Dr. Samir Tartir
Relational Databases.
DBMS ER-Relational Mapping
Normalisation 1 Unit 3.1 Dr Gordon Russell, Napier University
Database Design Chapter 7.
Presentation transcript:

Digital recordkeeping and preservation I The relational model ARK2100 Digital recordkeeping and preservation I 2017 Thomas Sødring thomas.sodring@hioa.no P48-R407 67238287

The relational model So far we have looked at databases (DBMS) and discussed their properties Now we take a closer look at the relational model and its use Then we will do some practical work with a DBMS called MySQL And later, we look at the relationship between the Noark 5 standard and the standards implementation in a relational database

Learning What do the following concepts mean The relational model, Schema, Relationship, Tuple, Attribute, Primary key, Foreign key Normalization* Anomalies, Why and How Referential integrity What it is and what kind of anomalies that may arise because of it

The relational model A database is called a schema Data is stored in relations *(tables) Access to data is (usually) with keys Two central key types used in relations Primary Key Foreign key *Formally duplicate rows are not allowed in a relation, while these are allowed in a table

DBMS Schema A Schema B r1 r2 r3 r1 r2 r5 r3 r4 r6 r7 A database management system can contain many schemas. People often call a schema a database and use the terms interchangeably.

Data is stored in a relation (table) Cars RegistrationNr ChassisNr Colour Manufacturer Model LH12984 10946534 Red Volkswagen Golf DK23491 9648573 Blue Toyota Yaris BP12349 5523840 Green Skoda Fabia ZT97495 2643923 White Seat Leon

Relation, Attributtes, Tuples RegistrationNr ChassisNr Colour Manufacturer Model Cars LH12984 10946534 Red Volkswagen Golf 4 Tuples DK23491 9648573 Blue Toyota Yaris BP12349 5523840 Green Skoda Fabia ZT97495 2643923 White Seat Leon 5 Attributtes

An attribute is a column A tuple is a row Roughly speaking ... A relation is a table * An attribute is a column A tuple is a row *again a table can have duplicate rows, a relation cannot

Primary Key A primary key is a value that can be used to identify a unique row (record) in a relation The primary key identifies a unique object (row) with a set of objects (rows) Social security number identify a person ISBN number identifying a book Registration number identifying a car

Foreign key A foreign key is a field (attribute) in a table in a relational database that points to a field (attribute) in (usually) another table This last field is often the tables primary key But it does not have to be This allows us to connect related information between tables The table using the foreign key is often called a child table, while the table the value of the key is defined in is called the parent

Primary and Foreign Keys StudentNr Firstname Etternavn 12345 Jan Karlson 23456 Pål Solberg 34567 Mette Johansen 45678 Ingrid Aleksandersen Surname TelephoneNr 76543829 90783298 99456543 45990234 Student StudentTelephoneNr Parent Child StudentNr is a primary key in both relations StudentNr in the StudentTelephoneNr relation is a foreign key to StudentNr in the Student relation

Primary and Foreign Keys Pål Solberg Nils Nilsen Ari Hansen 1 2 3 Customers Firstname Surname CustomerNr CustomerNr 1 2 3 15486110 06584585 95486110 06759425 AccountOwner AccountNr 06584585 2,000 15486110 8,000 06759425 -3,000 95486110 Account AccountNr Balance AccountNr is the primary key in the Account relation CustomerNr is the primary key in the Customers relation CustomerNr and AccountNr are primary keys in the AccountOwner relation AccountNr is foreign key to the Account relation CustomerNr is foreign key to the Customers relation Parent Parent Child

Let's recap A database is called a schema Data is stored in relations *(tables) Access to data is (usually) with keys Two central key types used in relations Primary Key Foreign key A tuple is a row of information An attribute is a column *Formally duplicate rows are not allowed in a relation, while these are allowed in a table

Another example Telenor is a provider of mobilephone telephony, internet, television (Canal Digital), landline and IP telephony Try to explain a structure that minimizes duplicated data showing primary / foreign keys

Telenor Group - Customers CustomerNr Surname Firstname 1 Hansen Thomas 2 Lie Mona 3 Rørvik Eli 4 Andersen Børre Mobil Landline CanalDigital CustomerNr Number CustomerNr Subscription CustomerNr Subscription 1 45764389 2 1234567 1 1234567 2 95794873 3 2345678 3 2345678 3 91265238 4 3456789 4 3456789 MobilephoneConversations CanalDigitalSubscription MobilFrom ToNumber Time Length Subscription Type 45764389 93473422 1.1.2017 13.45 45 1234567 Pakke 1 95794873 32793455 1.1.2017 13.49 32 2345678 Basic Pkg 91265238 22109344 1.1.2017 13.52 500 3456789 Sport Pkg

Relations Telenor Group - Customers Mobilephone Landline CanalDigital MobilephoneConversations CanalDigitalSubscription

Attributes Telenor Group - Customers Surname Firstname Mobil Landline CustomerNr Surname Firstname Mobil Landline CanalDigital CustomerNr Number CustomerNr Subscription CustomerNr Subscription MobilephoneConversations CanalDigitalSubscription MobilFrom ToNumber Time Length Subscription Type

Tuples Telenor Group - Customers Surname Firstname 1 Hansen Thomas 2 CustomerNr Surname Firstname 1 Hansen Thomas 2 Lie Mona 3 Rørvik Eli 4 Andersen Børre Mobil Landline CanalDigital CustomerNr Number CustomerNr Subscription CustomerNr Subscription 1 45764389 2 1234567 1 1234567 2 95794873 3 2345678 3 2345678 3 91265238 4 3456789 4 3456789 MobilephoneConversations CanalDigitalSubscription MobilFrom ToNumber Time Length Subscription Type 45764389 93473422 1.1.2017 13.45 45 1234567 Pakke 1 95794873 32793455 1.1.2017 13.49 32 2345678 Basic Pkg 91265238 22109344 1.1.2017 13.52 500 3456789 Sport Pkg

Primary Keys Telenor Group - Customers 1 2 3 4 Mobil Landline CustomerNr 1 2 3 4 Mobil Landline CanalDigital CustomerNr CustomerNr CustomerNr 1 2 1 2 3 3 3 4 4 MobilephoneConversations CanalDigitalSubscription MobilFrom ToNumber Time Subscription 45764389 93473422 1.1.2017 13.45 1234567 95794873 32793455 1.1.2017 13.49 2345678 91265238 22109344 1.1.2017 13.52 3456789

Foreign keys Telenor Group - Customers Surname Firstname Mobil CustomerNr Surname Firstname Mobil Landline CanalDigital CustomerNr Number CustomerNr Subscription CustomerNr CustomerNr Subscription MobilephoneConversations CanalDigitalSubscription MobilFrom ToNumber Time Length Subscription Type

Foreign keys Telenor Group - Customers Surname Firstname 1 Hansen (with data) CustomerNr Surname Firstname 1 Hansen Thomas 2 Lie Mona 3 Rørvik Eli 4 Andersen Børre Mobil Landline CanalDigital CustomerNr Number CustomerNr Subscription CustomerNr Subscription 1 45764389 2 1234567 1 1234567 2 95794873 3 2345678 3 2345678 3 91265238 4 3456789 4 3456789 MobilephoneConversations CanalDigitalSubscription MobilFrom ToNumber Time Length Subscription Type 45764389 93473422 1.1.2017 13.45 45 1234567 Pakke 1 95794873 32793455 1.1.2017 13.49 32 2345678 Basic Pkg 91265238 22109344 1.1.2017 13.52 500 3456789 Sport Pkg

Is it as easy as tables? Can we just store data in tables, or are there other things we have to take into account? Redundancy and anomalies Insertion Updating Deletion We are going to working on a fictional scenario You have a small rental company where you rent 3-4 cars and record everything in Excel (flat file) Use this example to explore issues relating to data modelling

Redundancy Redundancy means that your data repeats itself and it makes your database unnecessarily large This can result in errors with your data

Insertion anomaly Every time a customer rents a car all customer data and vehicle data are reinserted If car information is required, we can not insert data about a customer unless they rent a car If customer information is required, we can not insert data about a car unless it is rented

Update anomaly If the colour of a car is entered incorrectly and subsequently has to be updated you must first find all relevant occurrences If all instances are not found and changed our data will be inconsistent Original data Updated data

Deletion anomaly If a customer rents a new car and subsequently cancels the rental, all information about the car disappears Original data After deletion

More about the scenario Now it is unlikely that this is a problem for a single person keeping track of a few cars You probably could work with data in this format But if your company grows and you have 30 cars and hundreds of customer And you have an agreement with another company that you will help eachother And you have to employ people This data model will quickly result in problems

Normalisation Method used to verify if you have a good database model Why do we normalise? Prevent data anomalies from ocurring Minimise duplication of data During data updates, the system must be consistent and data integrity must be ensured When do we normalise? Early in the database design process

Normal forms Good design 3NF 2NF 1NF

First Normal form Atomic values means that each field of a row can only contain one value A table is in the first normal form if and only if all columns contain atomic values

1NF We have to analyse each field in the database and identify whether or not the values are atomic To convert the data to 1NF, we have to make the non-atomic fields atomic This may result in a duplication of rows, introduction of new columns

1NF Sometimes, 1NF will require the creation of new columns becomes Sometimes, 1NF will require the creation of new columns You see this with Name -> fname, sname Othertimes, the change to 1NF will require a duplication of data You see this with License

Second normal form (2NF) A table is in the second normal form (2NF) if and only if it is in 1NF and all columns that are not part of the primary key are dependent on the entire primary key, and not just part of it

Second normal form (2NF) A table is in the second normal form (2NF) if and only if it is in 1NF and all columns that are not part of the primary key are dependent on the entire primary key, and not just part of it Violation of 2NF A and B are primary keys A B C D E E is only dependent on B

2NF To get the table to 2NF, we have to first identify primary keys Remember a primary key is a unique key that identifies a unique row in a relation A primary key can be made up of one or more columns We are trying to find out if multiple primary keys are present in the table and which data is associated with these primary keys Next we have to see which columns are dependent on the primary key(s)

Identify Primary Keys

Identify dependencies rental customer car telephonenr

Solution 2NF Customer Rental Car The solution is to break the table up into four tables Customer Car Rental TelephoneNr TelephoneNr

Third normal form (3NF) A table is in third normal form if and only if it is in the second normal form and all columns that are not part of the primary key, are mutually independent

A B C D E Third normal form (3NF) A table is in third normal form if and only if it is in the second normal form and all columns that are not part of the primary key, are mutually independent Violation of 3NF A and B are primary keys A B C D E Dependency between C og E

Are all columns mutually independent? Customer Rental Car TelephoneNr

Solution 3NF Customer Rental Car ? TelephoneNr Postnr

Something to think about Normalisation is both an art and a science You can systematically go through the steps and arrive at a decent solution But intuition and experience will play a big part in solving the problem You will very rarely have to work like this In this scenario the modelling job was so bad the system was useless But you have to understand what problem normalisation solves to understand its importance

Another example Gerd Bergets book has another example on anomalies and normalisation The scenario this time is a film rental company that records all information in a flat file / excel spreadsheet Again the point is understand normalisation by looking at a scenario that is badly modelled

Simplified film spreadsheet Surname Firstname Address Postnr Town Film ID Title Year Length Company Cust ID 1 Lie Mona Storgata 4 0182 Oslo 1,2 Citizen Kane, Psycho 1941, 1960 115 104 Universal Pictures, Universal Pictures 2 Hansen Thomas Bakken 8b 1406 Ski 3 The Godfather 1972 175 Paramount 3 Rørvik Eli Saturnringen 47 1808 Askim 2 Psycho 1960 104 Universal Pictures 4 Andersen Børre Bekkefaret 5 0348 Oslo Psycho 1998, 109 Universal Pictures

Redundancy and anomalies Redundancy means that data in the database repeats itself and this makes the database unnecessarily large and can also potentially introduce errors with data Insertion anomaly Each time a new customer rents a film all the data about the customer and film have to be reinserted Update anomaly If a film has the wrong date and needs updating we have to first find all relevant instances If all instances are not found then we will have inconsistent data Deletion anomaly If the first customer that rents film subsequently cancels it, then we lose information about the film

Universal Pictures, Universal Pictures Redundancy example Surname Firstname Address Postnr Town Film ID Title Year Length Company Cust ID 1 Lie Mona Storgata 4 0182 Oslo 1,2 Citizen Kane, Psycho 1941, 1960 115 104 Universal Pictures, Universal Pictures 2 Hansen Thomas Bakken 8b 1406 Ski 3 The Godfather 1972 175 Paramount 3 Rørvik Eli Saturnringen 47 1808 Askim 2 Psycho 1960 104 Universal Pictures 4 Andersen Børre Bekkefaret 5 0348 Oslo Psycho 1998, 109 Universal Pictures Reinserting information about films means that the information is duplicated resulting in a database that is larger than it has to be and increases the potential for data errors

Universal Pictures, Universal Pictures Insertion anomaly Surname Firstname Address Postnr Town Film ID Title Year Length Company Cust ID 1 Lie Mona Storgata 4 0182 Oslo 1,2 Citizen Kane, Psycho 1941, 1960 115 104 Universal Pictures, Universal Pictures 2 Hansen Thomas Bakken 8b 1406 Ski 3 The Godfather 1972 175 Paramount 3 Rørvik Eli Saturnringen 47 1808 Askim 2 Psycho 1960 104 Universal Pictures 4 Andersen Børre Bekkefaret 5 0348 Oslo Psycho 1998 109 Universal Pictures When customer 3 (Eli Rørvik) wants to rent a file ('Psycho'), we had to reinsert all data about the film again title, year, length, company

Universal Pictures, Universal Pictures Update anomaly Surname Firstname Address Postnr Town Film ID Title Year Length Company Cust ID 1 Lie Mona Storgata 4 0182 Oslo 1,2 Citizen Kane, Psycho 1941, 1960 115 104 Universal Pictures, Universal Pictures 2 Hansen Thomas Bakken 8b 1406 Ski 3 The Godfather 1972 175 Paramount 3 Rørvik Eli Saturnringen 47 1808 Askim 2 Psycho 1960 104 Universal Pictures 4 Andersen Børre Bekkefaret 5 0348 Oslo Psycho 1998 109 Universal Pictures If we find out that we the wrong data about the movie 'Psycho' was registered, e.g. that length was 108min not 104min then we need to find all the rows that contain 'Psycho' and update them. (Simply searching for the title will not work)

Universal Pictures, Universal Pictures Deletion anomaly Surname Firstname Address Postnr Town Film ID Title Year Length Company Cust ID 1 Lie Mona Storgata 4 0182 Oslo 1,2 Citizen Kane, Psycho 1941, 1960 115 104 Universal Pictures, Universal Pictures 2 Hansen Thomas Bakken 8b 1406 Ski 3 The Godfather 1972 175 Paramount 3 Rørvik Eli Saturnringen 47 1808 Askim 2 Psycho 1960 104 Universal Pictures 4 Andersen Børre Bekkefaret 5 0348 Oslo Psycho 1998 109 Universal Pictures If Customer 1 (Mona Lie) and Customer 3 (Eli Rørvik) cancel their film rental we will lose all data about the movie 'Psycho' from 1960

Normal forms Good design 3NF 2NF 1NF

First Normal form Atomic values means that each field of a row can only contain one value A table is in the first normal form if and only if all columns contain atomic values

Universal Pictures, Universal Pictures 1NF A table is in the first normal form if and only if all columns contain atomic values Surname Firstname Address Postnr Town Film ID Title Year Length Company Cust ID 1 Lie Mona Storgata 4 0182 Oslo 1,2 Citizen Kane, Psycho 1941, 1960 115, 104 Universal Pictures, Universal Pictures 2 Hansen Thomas Bakken 8b 1406 Ski 3 The Godfather 1972 175 Paramount 3 Rørvik Eli Saturnringen 47 1808 Askim 2 Psycho 1960 104 Universal Pictures 4 Andersen Børre Bekkefaret 5 0348 Oslo Psycho 1998 109 Universal Pictures

Solution 1NF Firstname Surname Address Postnr Town Title Mona Lie 0182 Film ID Title Year Length Company Cust ID Mona Lie 0182 Oslo 1 Citizen Kane 1941 115 Universal Pictures Storgata 4 Mona Lie 0182 Oslo 2 Psycho 104 1 Universal Pictures Storgata 4 1960 Thomas Hansen 1406 Ski 3 2 Bakken 8b 1972 175 The Godfather Paramount Eli Rørvik Saturnringen 47 1808 Askim 2 3 Universal Pictures 1960 Psycho 104 Børre Andersen Bekkefaret 5 0348 Oslo 4 Psycho 1998 109 Universal Pictures

2NF To get the table to 2NF, we have to first identify primary keys Remember a primary key is a unique key that identifies a unique row in a relation A primary key can be made up of one or more columns We are trying to find out if multiple primary keys are present in the table and which data is associated with these primary keys Next we have to see which columns are dependent on the primary key(s)

FilmID og CustID stand out as the best primary keys Identify primary keys Firstname Surname Address Postnr Town Film ID Title Year Length Company Cust ID 1 Mona Lie Storgata 4 0182 Oslo 1 Citizen Kane 1941 115 Universal Pictures 1 Mona Lie Storgata 4 0182 Oslo 2 Psycho 1960 104 Universal Pictures 2 Thomas Hansen Bakken 8b 1406 Ski 3 The Godfather 1972 175 Paramount 3 Eli Rørvik Saturnringen 47 1808 Askim 2 Psycho 1960 104 Universal Pictures 4 Børre Andersen Bekkefaret 5 0348 Oslo 4 Psycho 1998 109 Universal Pictures FilmID og CustID stand out as the best primary keys Why?

Second normal form (2NF) A table is in the second normal form (2NF) if and only if it is in 1NF and all columns that are not part of the primary key are dependent on the entire primary key, and not just part of it Violation of 2NF A and B are primary keys A B C D E E is only dependent on B

..... we have to identify dependencies between columns ..... First ..... we have to identify dependencies between columns ..... Firstname Surname Address Postnr Town Film ID Title Year Length Company Cust ID Mona Lie 0182 Oslo 1 Citizen Kane 1941 115 Universal Pictures Storgata 4 Mona Lie 0182 Oslo 2 Psycho 104 1 Universal Pictures Storgata 4 1960 Thomas Hansen 1406 Ski 3 2 Bakken 8b 1972 175 The Godfather Paramount Eli Rørvik Saturnringen 47 1808 Askim 2 3 Universal Pictures 1960 Psycho 104 Børre Andersen Bekkefaret 5 0348 Oslo 4 Psycho 1998 109 Universal Pictures Separated out to own relations telephonenr

Solution 2NF Customer Firstname Surname Address Postnr Town Cust ID 1 Mona Lie Storgata 4 0182 Oslo 2 Thomas Hansen Bakken 8b 1406 Ski 3 Eli Rørvik Saturnringen 47 1808 Askim 4 Børre Andersen Bekkefaret 5 0348 Oslo Film Booking Solution is to separate the table into three different relations Customer Film Booking Film ID Title Year Length Company CustID Film ID 1 Citizen Kane 1941 115 Universal Pictures 1 1 2 Psycho 104 Universal Pictures 1960 1 2 3 1972 175 The Godfather Paramount 2 3 4 Psycho 1998 109 Universal Pictures 3 2 4 4

A B C D E Third normal form (3NF) A table is in third normal form if and only if it is in the second normal form and all columns that are not part of the primary key, are mutually independent Violation of 3NF A og B er primary keys A B C D E Dependency between C og E

Is Film i 3NF? Film ID Title Year Length Company 1 Citizen Kane 1941 115 Universal Pictures 2 Psycho 104 Universal Pictures 1960 1972 175 The Godfather Paramount 3 4 Psycho 109 Paramount 1998 Are there any dependencies between two columns where one of them is not part of the primary key?

Is Customer in 3NF? Customer Firstname Surname Address Postnr Town Cust ID 1 Mona Lie Storgata 4 0182 Oslo 2 Thomas Hansen Bakken 8b 1406 Ski 3 Eli Rørvik Saturnringen 47 1808 Askim 4 Børre Andersen Bekkefaret 5 0348 Oslo Are there any dependencies between two columns where one of them is not part of the primary key?

Solution 3NF Customer Zip Cust ID Firstname Surname Address Postnr Postnr Town 1 Mona Lie Storgata 4 0182 0182 Oslo 2 Thomas Hansen Bakken 8b 1406 1406 Ski 3 Eli Rørvik Saturnringen 47 1808 1808 Askim 4 Børre Andersen Bekkefaret 5 0348 0348 Oslo Film Booking Film ID Title Year Length Company CustID Film ID 1 Citizen Kane 1941 115 Universal Pictures 1 1 Solution is to separate postnumber/town to their own relations 2 Psycho 104 Universal Pictures 1960 1 2 3 1972 175 The Godfather Paramount 2 3 4 Psycho 1998 109 Universal Pictures 3 2 4 4

Learning What do the following concepts mean The relational model, Schema, Relationship, Tuple, Attribute, Primary key, Foreign key Normalization* Anomalies, Why and How Referential integrity What it is and what kind of anomalies that may arise because of it *http://en.wikibooks.org/wiki/Relational_Database_Design/Normalization

Referential Integrity So far we have only looked at intra-relation issues, there can also be inter-relation issues that we have to concern ourselves with Referential integrity is an important inter-relation concept Parent Child (1) Child (2)

Referential integrity - insertion What happens if I try to insert a rental with no corresponding customer? The car would be blocked for the rental period and I potentially lose money ? Customer Rental

Referential integrity - deletion What happens if I try to delete a customer? Try to delete Customer number 2 The rental table will have a missing foreign key reference If the renter committed damage with the car and I later need to find out who rented the car? Customer Rental

Referential integrity - update What happens if I try to change a customers primary key? e.g I change the primary key from 2 to 9 We suddenly lose the entire rental history of that customer Customer Rental

Handling referential integrity Referential integrity is a database mechanism that can be switched on or off The consequence of this is really important Referential integrity goes from a parent to a child table With referential integrity on the Customer (parent) to Rental (child) relationship You cannot delete a customer without deleting all rentals You cannot add a rental if a customer does not exist

Handling referential integrity When referential integrity is enabled, certain rules apply You can not add a row in a child (table) if there is no corresponding row in the parent table (FK) Cannot add a rental without a corresponding car or customer You can not delete a row from a parent (table) if a corresponding row exists in a child table Can not delete a customer if it has rentals You can not change values in the primary key in a parent (table) if there is a related row in a child (table) Can not change the customer number in the customer relation if the customer has rented a car

Handling referential integrity The referential integrity mechanism is often configurable in a database Set to null on delete Delete all children automatically Automatically update the foreign key values in the child relation

Another example Combination of ER- diagram and table infromation Shows Relation names Attribute names Primary Keys Foreign Keys Relationships http://www.jpmensah.com/ITEC485/images/er_diagram.gif

Tables and ER-diagrams The previous slide is a good example of what we are going to learn in this course The relationship between tables in the relational model as defined in an ER-diagram An ER-diagram ultimately defines the structure of the schema But we first have to understand the basic concepts of Schema, relations, attributes, tuples, primary keys, foreign keys and relationships

Finally We have explored a lot of the problems associated with relations in a database We should now have a good grasp of the terminology and the subject area Next we will look at how we model a database using ER-modelling and generate ER-diagrams before we start developing a model and implement it in a database