Component 2 5A-F.

Component 2 5A-F

Assessment Outcomes 5A(A2) - Explain what is meant by data consistency, data redundancy and data independence. 5B - Describe and discuss the benefits and drawbacks of relational database systems and other contemporary database systems. 5C - Explain what is meant by relational database organisation and data normalisation (first, second and third normal forms). 5D - Restructure data into third normal form. 5E - Explain and apply entity relationship modelling and use it to analyse simple problems. 5F - Describe the use of primary keys, foreign keys, and indexes.

What is a database? a database "is a (large) collection of data items and links between them. structured in a way that allows it to be accessed by a number of different applications programs". Databases originally were all paper-based. Each piece of information about each person, or a club member, or a company, or a product, for example, was known as an 'attribute'. All of the attributes together (all of the details about a person, a company etc) were known as a 'record' and were collected together and typically written on one card. The record cards were then collected together to form a 'file'. This file was then put into a filing cabinet. Users would then go into it to find information in the records. A 'database' was simply one or more files, so for example, you might have a file for animals' details, another for vets' details, another for all of the zoos in the country and so on. All the files together (although there might only be just one) were known as a database.

Database Definitions Entities Attributes
An ‘entity’ is the term used to describe something we keep information about. Attributes are the pieces of information we keep about an entity. For example, we might have a pupil database. The entity here is 'Pupil' because that is what we keep information about The attributes are the pieces of information about a pupil, such as their first name, surname, date of birth, form group and so on. In a library, you might keep information about books. 'Book' would be the entity and name of book, author, ISBN number and so on would be the attributes

Database Definitions Tables
Once we have identified an entity, and we know what attributes we want to keep about each entity, we can then store the actual pieces of information about each entity. We can think of a database that holds entities and attributes as a table. Consider this example. It holds records about dogs.

Database Definitions Files Databases
Another name for a table of records is a 'file'. A database can be defined as one or more files. You can have a simple database, with all the information held in just one table.You could also have a bigger database, where all of the entities have been logically split up into different tables. For example, a school database might have one table for students, another for staff, another for all the information about different qualifications, a table for records about each room and the facilities in each room and so on.

Database Definitions Primary Keys Candidate Keys
Each record in each table must have one field that is unique, to ensure that each record can be differentiated from the others, even if the other fields are the same. You all have a unique student number, a unique NI number, you may have a tax number, a driving licence number, a club membership number and so on. These numbers are always different for each record, even if e.g. someone else has the same name as you. When you look at any table, most of the time the primary key will stand out. Often, it will be called something like Order Number, Student ID, Member ID and so on and these clearly will be unique. However, it may be that more than one of the attributes (i.e. more than one of the columns) in a table are unique, or perhaps, more than one combination of attributes are unique (compound keys). All of these are together known as potential 'Candidate keys' for that table, as they could all ultimately be chosen as a suitable primary key.

Database Definitions Foreign Keys
Relational databases have more than one table. Records from each of the tables are combined to form the complete record of someone or something. Foreign keys are used to link the different records in different tables, so the database software knows which record in one table belongs to which record in another table. Foreign keys are used to link entities. A foreign key in one table is a primary key in another table. Although primary keys cannot have duplicate values in a table, foreign keys most definitely can. You have a situation where the same attribute cannot be duplicated in one table but can be duplicated in another! When you have a one-to-many relationship between two entities, you will need to link them using a foreign key. To do this, always copy the primary key from the entity on the 'one' side of the relationship and put it in the table on the 'many' side. In the table on the 'one' side, it is known as a ‘primary key’. In the table on the 'many' side, it is known as a ‘foreign key’.

Database Definitions Referential Integrity
Referential integrity is the term used to describe when all the links between tables using foreign keys are present and valid. If a record in one table refers to a record in another table, and that record is actually missing for some reason, then we talk about the lack of data integrity. Consider these two tables. The first table is a list of Dog Owners and the second table is a list of Dogs. The relationship between the two tables is a one-to-many: Each owner can own many dogs. Each dog is owned by just one owner.

5A - Explain what is meant by data consistency, data redundancy and data independence.
Data consistency is ensuring that data is correct after it has been processed. For example, if you had to calculate someone's age from their date of birth, and their age was calculated incorrectly, then you would say that the data has become inconsistent. This may have happened because the data was entered incorrectly, or calculated incorrectly or for another reason. If you had to convert a measurement in one unit into another unit, and they were in fact incorrect, then the data has become inconsistent. This could have happened for the same reasons as the date of birth and age error. Data redundancy in a database means that the same data is present in more than one table. Or in the case of a flat file database, there are records with partly duplicated data. For example Jones, 48, Male, Teacher Jones, 48, 3 Advent Drive Jones, employee number 22345 As you can see in the records above, the name is repeated three times and the age is duplicated twice. A relational database can avoid this duplication. This is usually a mark of an inefficient database and people go to great lengths to avoid it. In order to reduce duplicated data, you can use the three 'normal forms' of database design i.e. First Normal, Second Normal and the most efficient (but complex) Third Normal Form.

5A - Explain what is meant by data consistency, data redundancy and data independence.
When a database is set up, its structure is created and the data that is kept in it is defined. Data is then entered into the database. You can add to the data, edit it and delete it. Then you have the applications that actually access the database and make use of the data. An organisation can create an entire database, and then allow others to access the data in it for their own applications. The database of data, and the applications that use the data are in fact completely separate. If you modify the data in the database, it won't affect the programs that access it. This is the whole idea of a Database Management System (DBMS) and we talk about data being completely independent from the applications that use it, or data independence.

5B - Describe and discuss the benefits and drawbacks of relational database systems and other contemporary database systems. Flat File Databases When there is only a single table in the database, this is called a 'flat file database'. A flat file database looks something like this:- A flat file database is an excellent way of storing a relatively small amount of records ( few thousand perhaps). For example a spreadsheet application such as Excel can be used as a flat file database. Each row in a worksheet can be a record and each column, a field. The worksheet is effectively a table. ID Title First name Surname Address City Postcode Telephone 1 Mr Tom Smith 42 Mill Street London WE13GW 2 Mrs Sandra Jones 10 Low Lane Hull HU237HJ John

Flat File advantages Everyday things like business contacts, customer lists and so on can be stored and used in a flat file database. Placing data in a flat file database has the following advantages All records are stored in one place Easy to set up using a number of standard office applications Easy to understand Simple sorting of records can be carried out Record can be viewed or extracted on the basis of simple criteria

Flat file Disadvantages
Potential duplication. As more and more records are added to the database it becomes difficult to avoid duplicate records. This is because there is no mechanism built in to the system to prevent duplication. Later you will see how 'primary keys' are used to prevent this. Non-unique records. Notice that Mr & Mrs Jones have identical ID's. This is because the person producing this database decided they may want to sort on identical telephone numbers and so has applied identical ID to the two records. This is fine for that purpose, but suppose you only wanted to extract Mrs Jones' record. Now it is much more difficult. Harder to update. Suppose that this flat file database also stored their work place details - this will result in multiple records for each person. Again, this is fine - but suppose Sandra Jones now wanted to be known as 'Sandra Thompson' after re-marrying? This will have to be done over potentially many records and so flat file updates are more error-prone than other methods Inherently inefficient. Consider a situation where the database now needs to hold an extra field to hold their address. If there are tens of thousands of records, there may be many people having no address, but each record in a flat file database has to have the same fields, whether they are used or not. Other methods avoid this wasted storage. Harder to change data format. Suppose the telephone numbers now have to have a dash between the area code and the rest of the number, like this Adding that extra dash over tens of thousands of records would be a significant task in a flat file database. Poor at complex queries. If we wanted to find all records with a specific telephone number, this is a simple single-field criteria that a flat file can easily deal with. But now suppose we wanted all people living in Hull who share the same surname and similar postcode? - the criteria can quickly become too complex for a flat file to manage. Poor at limiting access. Suppose this flat file database held a confidential field in each record that only certain staff are allowed to see - perhaps salaries. This is difficult to achieve in a flat file database - once a person has entered a valid password to gain access, that person is able to see everything.

Relational Databases To overcome the limitations of a simple flat file database that has only a single table, another type of database has been developed called a 'relational database'. A relational database holds its data over a number of tables instead of one. Records within the tables are linked (related) to records held in other tables.

Relational Databases The picture on the right shows two tables. The main one is called 'customers'. This contains almost the same fields as we have seen in the flat file database. But there is one key difference - the city is now held in a separate table called 'city'. The line between them shows there is a link (relationship) between a record in the city table and records in the main table.

Relational Databases Advantages
1. Data is only stored once. In the previous example, the city data was gathered into one table so now there is only one record per city. The advantages of this are No multiple record changes needed More efficient storage Simple to delete or modify details. All records in other tables having a link to that entry will show the change. 2. Complex queries can be carried out. A language called SQL has been developed to allow programmers to 'Insert', 'Update', 'Delete', 'Create', 'Drop' table records. These actions are further refined by a 'Where' clause. For example SELECT * FROM Customer WHERE ID = 2 This SQL statement will extract record number 2 from the Customer table. Far more complicated queries can be written that can extract data from many tables at once. 3. Better security. By splitting data into tables, certain tables can be made confidential. When a person logs on with their username and password, the system can then limit access only to those tables whose records they are authorised to view. For example, a receptionist would be able to view employee location and contact details but not their salary. A salesman may see his team's sales performance but not competing teams. 4. Cater for future requirements. By having data held in separate tables, it is simple to add records that are not yet needed but may be in the future. For example, the city table could be expanded to include every city and town in the country, even though no other records are using them all as yet. A flat file database cannot do this.

Relational Database Advantages over Flat file
Avoids data duplication Avoids inconsistent records Easier to change data Easier to change data format Data can be added and removed easily Easier to maintain security.

5C - Explain what is meant by relational database organisation and data normalisation (first, second and third normal forms). Desired characteristics of a database include it being efficient in terms of storage and easy to maintain. The first point, of storage, means redundant data should be avoided and the second point, of maintenance, means that a good design will logically separate data into tables. Normalisation is a design method that can be used to achieve this. “a technique for designing relational database tables to minimize duplication of information and, in so doing, to safeguard the database against certain types of logical or structural problems” Normalisation provides rules that help: organise the data efficiently. eliminate redundant data. ensure that only related data are stored in a table. Over time some good rules have been developed that allows a database to be designed with different levels of efficiency. These sets of rules are called the 'Normal Forms' and are numbered from 0 to 5. So there is a first, second, third, fourth and fifth normal form. The short hand for these are 0NF, 1NF, 2NF, 3NF, 4NF, 5NF

Normalisation First Normal Form (1NF) Second Normal Form (2NF)
For a database to be in first normal form (1NF), the following rules have to be met for each table in the database There are no columns with repeated or similar data Each data item cannot be broken down any further. Each row is unique i.e. it has a primary key Each field has an unique name A database is in second normal form if it satisfies the following conditions: It is in first normal form All non-key attributes are fully functional dependent on the primary key

Normalisation Third Normal Form (3NF)
Now our main concern is to get to 3NF; the rules to go from 2NF to 3NF are the following: Are all of the columns fully dependent upon the primary key? By checking the table we notice that the Total is not fully dependent on the primary key since the Total can be derived from multiplying the Unit Price times the Quantity. Therefore, the Total field can be eliminated from our table to comply with 3NF. Order Number Customer Number Unit Price Quantity Total 1 241 $10 2 $20 842 $9 20 $180 3 919 $19 4 $12 10 $120 Order Number Customer Number Unit Price Quantity 1 241 $10 2 842 $9 20 3 919 $19 4 $12 10

5D - Restructure data into third normal form

Take the following table. StudentID is the primary key.
Is it 1NF?

No. There are repeating groups (subject, subjectcost, grade)
How can you make it 1NF?

Create new rows so each cell contains only one value
But now look – is the studentID primary key still valid?

No – the studentID no longer uniquely identifies each row
You now need to declare studentID and subject together to uniquely identify each row. So the new key is StudentID and Subject.

So. We now have 1NF. Is it 2NF?

But they are not dependent on Subject (the other part of the key)
Studentname and address are dependent on studentID (which is part of the key) This is good. But they are not dependent on Subject (the other part of the key)

And 2NF requires… All non-key fields are dependent on the ENTIRE key (studentID + subject)

So it’s not 2NF How can we fix it?

Make new tables Make a new table for each primary key field
Give each new table its own primary key Move columns from the original table to the new table that matches their primary key…

Step 1 STUDENT TABLE (key = StudentID)

Step 2 STUDENT TABLE (key = StudentID) SUBJECTS TABLE (key = Subject)

Step 3 STUDENT TABLE (key = StudentID) SUBJECTS TABLE (key = Subject)
RESULTS TABLE (key = StudentID+Subject)

Step 4 - relationships STUDENT TABLE (key = StudentID)
SUBJECTS TABLE (key = Subject) RESULTS TABLE (key = StudentID+Subject)

Step 4 - cardinality STUDENT TABLE (key = StudentID) 1
Each student can only appear ONCE in the student table SUBJECTS TABLE (key = Subject) RESULTS TABLE (key = StudentID+Subject)

SUBJECTS TABLE (key = Subject) 1 Each subject can only appear ONCE in the subjects table RESULTS TABLE (key = StudentID+Subject)

SUBJECTS TABLE (key = Subject) 1 A subject can be listed MANY times in the results table (for different students) 8 RESULTS TABLE (key = StudentID+Subject)

SUBJECTS TABLE (key = Subject) 1 A student can be listed MANY times in the results table (for different subjects) 8 8 RESULTS TABLE (key = StudentID+Subject)

SubjectCost is only dependent on the primary key,
A 2NF check STUDENT TABLE (key = StudentID) 1 SUBJECTS TABLE (key = Subject) 1 SubjectCost is only dependent on the primary key, Subject 8 8 RESULTS TABLE (key = StudentID+Subject)

Grade is only dependent on the primary key (studentID + subject)
A 2NF check STUDENT TABLE (key = StudentID) 1 SUBJECTS TABLE (key = Subject) 1 8 8 Grade is only dependent on the primary key (studentID + subject) RESULTS TABLE (key = StudentID+Subject)

Name, Address are only dependent on the primary key
A 2NF check STUDENT TABLE (key = StudentID) 1 Name, Address are only dependent on the primary key (StudentID) SUBJECTS TABLE (key = Subject) 1 8 8 RESULTS TABLE (key = StudentID+Subject)

So it is 2NF! But is it 3NF? STUDENT TABLE (key = StudentID) 1
SUBJECTS TABLE (key = Subject) 1 8 8 But is it 3NF? RESULTS TABLE (key = StudentID+Subject)

Oh oh… A 3NF check What? STUDENT TABLE (key = StudentID) 1
SUBJECTS TABLE (key = Subject) 1 8 8 RESULTS TABLE (key = StudentID+Subject)

HouseName is dependent on both StudentID + HouseColour
A 3NF check STUDENT TABLE (key = StudentID) 1 HouseName is dependent on both StudentID + HouseColour SUBJECTS TABLE (key = Subject) 1 8 8 RESULTS TABLE (key = StudentID+Subject)

Or HouseColour is dependent on both StudentID + HouseName
A 3NF check STUDENT TABLE (key = StudentID) 1 Or HouseColour is dependent on both StudentID + HouseName SUBJECTS TABLE (key = Subject) 1 8 8 RESULTS TABLE (key = StudentID+Subject)

non-key fields are dependent on MORE THAN THE PRIMARY KEY (studentID)
A 3NF check STUDENT TABLE (key = StudentID) 1 But either way, non-key fields are dependent on MORE THAN THE PRIMARY KEY (studentID) SUBJECTS TABLE (key = Subject) 1 8 8 RESULTS TABLE (key = StudentID+Subject)

And 3NF says that non-key fields must depend on nothing but the key
A 3NF check STUDENT TABLE (key = StudentID) 1 And 3NF says that non-key fields must depend on nothing but the key SUBJECTS TABLE (key = Subject) 1 8 8 RESULTS TABLE (key = StudentID+Subject)

A 3NF check WHAT DO WE DO? STUDENT TABLE (key = StudentID) 1
SUBJECTS TABLE (key = Subject) 1 8 8 RESULTS TABLE (key = StudentID+Subject)

Again, carve off the offending fields
1 SUBJECTS TABLE (key = Subject) 8 8 1 RESULTS TABLE (key = StudentID+Subject)

A 3NF fix 1 SUBJECTS TABLE (key = Subject) 8 8 1

A 3NF fix 8 1 1 SUBJECTS TABLE (key = Subject) 8 8 1

A 3NF win! Or… 1 8 SUBJECTS TABLE (key = Subject)
RESULTS TABLE (key = StudentID+Subject) 1 8 Or…

The Reveal Before… After… 1 8 1 1 8 8 SUBJECTS TABLE (key = Subject)

5E - Explain and apply entity relationship modelling and use it to analyse simple problems.

Types of Relationship There are three different “degrees” of relationship between two attributes. A relationship may be: One-to-one = Examples of such relationships include the relationship between man and wife or between householder and main residence One-to-many = Examples include the relationship between Mother and children, between customer and order, between borrower and Library Books Many to Many = Examples include the relationship between student and course, between stock item and supplier, between film and film star

Entity – Relationship Diagram
An e-r diagram is a diagrammatic way of representing the relationships between the entities in a database To show the relationship between two entities, both the degree and the name of the relationship need to be specified

Example In the relationship shown below, the degree is one-to-one, the name of the relationship is drives: Employee Drives Company Car

Example In the relationship shown below, the degree is one-to-many, the name of the relationship is holds: Ward holds Patient

Example In the relationship shown below, the degree is Many-to-many, the name of the relationship is features: Album Features Singers

Class Example The data requirements for a hospital in patient system are defined as follows: A hospital is organised into a number of wards. Each ward has a ward number and a name recorded, along with a number of beds in that ward. Each ward is staffed by nurses. Nurses have their staff number and name recorded, and are assigned to a single ward. Each patient in the hospital has a patient ID number, and their name, address and DOB recorded. Each patient is under the care of a single consultant and is assigned to a single ward. Each consultant is responsible for a number of patients. Consultants have their staff number, name and specialism recorded State four entities for the hospital in-patient system and suggest and identifier for each of these entities Draw an ER diagram to show the relationship between the entities

Answer Entity Identifier Ward WardID Nurse StaffID Patient PatientID
Consultant StaffID

ER Diagram Answer HOLDS WARD PATIENT STAFFED BY SEES NURSE CONSULTANT

Note on diagram Note that a one-to-many relationship does not necessarily imply that every ward, for example, has many patients, merely that is possible that at least one ward has more than one patient. It is possible that some wards have no patients at all.

5F - Describe the use of primary keys, foreign keys, and indexes.

Database Definitions Primary Keys Candidate Keys
Each record in each table must have one field that is unique, to ensure that each record can be differentiated from the others, even if the other fields are the same. You all have a unique student number, a unique NI number, you may have a tax number, a driving licence number, a club membership number and so on. These numbers are always different for each record, even if e.g. someone else has the same name as you. When you look at any table, most of the time the primary key will stand out. Often, it will be called something like Order Number, Student ID, Member ID and so on and these clearly will be unique. However, it may be that more than one of the attributes (i.e. more than one of the columns) in a table are unique, or perhaps, more than one combination of attributes are unique (compound keys). All of these are together known as potential 'Candidate keys' for that table, as they could all ultimately be chosen as a suitable primary key.

Database Definitions Foreign Keys
Relational databases have more than one table. Records from each of the tables are combined to form the complete record of someone or something. Foreign keys are used to link the different records in different tables, so the database software knows which record in one table belongs to which record in another table. Foreign keys are used to link entities. A foreign key in one table is a primary key in another table. Although primary keys cannot have duplicate values in a table, foreign keys most definitely can. You have a situation where the same attribute cannot be duplicated in one table but can be duplicated in another! When you have a one-to-many relationship between two entities, you will need to link them using a foreign key. To do this, always copy the primary key from the entity on the 'one' side of the relationship and put it in the table on the 'many' side. In the table on the 'one' side, it is known as a ‘primary key’. In the table on the 'many' side, it is known as a ‘foreign key’.

Component 2 5A-F.

Similar presentations

Presentation on theme: "Component 2 5A-F."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Component 2 5A-F.

Similar presentations

Presentation on theme: "Component 2 5A-F."— Presentation transcript:

Similar presentations

About project

Feedback