Information Systems Database Systems (H)
A Simple Database a “flat-file” database contains only one table Adams Andrea D 64 Carluke Street,Jamestown Glasgow 0141 092 7721 Baird Hamish J 7 Cedar Walk, Aberdeen 01224 928722 Donald Lewis Y 8 Walker Road,Torry Aberdeen 01224 645182 Hastings Paul I 12 Seaview Terrace, Aviemore 01479 971871 Kind Shona E 159 Broomhill Court Aberdeen 01224 313148 Lees Brian K RanchRock, Clearance Glasgow 0141 766 8621 Robertson Roddy P 254 North Road, Culter Aberdeen 01224 651116 St.John Ian W 18 Wilson Street,Muir Tain 01561 908727 Holman Clare C 16a David Hall Elgin 01330 728716 Williams Kirsten A 16 West Hillside Edinburgh 0131 972 4678 Donald Lewis Y 8 Walker Road, Torry Aberdeen 01224 645182 a “flat-file” database contains only one table very simple structure to the data made up of records and fields
Update Anomalies DVD Code Title Cost Date Out Date Due Member Number Name Telephone 002 Finding Nemo £2.50 03/09/04 04/09/04 1034 John Silver 142536 003 American Pie 27/08/04 28/08/04 1056 Fred Flintstone 817263 01/09/04 02/09/04 1012 Isobel Ringer 293847 008 The Pianist 06/09/04 1097 Annette Kirton 384756 There is no way of storing the details of a member who hasn’t rented any DVDs A value must be provided for both DVD Code and Member Number for the key This is called an insertion anomaly.
Update Anomalies DVD Code Title Cost Date Out Date Due Member Number Name Telephone 002 Finding Nemo £2.50 03/09/04 04/09/04 1034 John Silver 142536 003 American Pie 27/08/04 28/08/04 1056 Fred Flintstone 817263 01/09/04 02/09/04 1012 Isobel Ringer 293847 008 The Pianist 06/09/04 1097 Annette Kirton 384756 If a member’s details have to be amended, this must be done in each record with those details. This can lead to data inconsistency if there is an error or omission in making the change. This is called a modification anomaly.
Update Anomalies DVD Code Title Cost Date Out Date Due Member Number Name Telephone 002 Finding Nemo £2.50 03/09/04 04/09/04 1034 John Silver 142536 003 American Pie 27/08/04 28/08/04 1056 Fred Flintstone 817263 01/09/04 02/09/04 1012 Isobel Ringer 293847 008 The Pianist 06/09/04 1097 Annette Kirton 384756 If a DVD is removed from the database, then it may also remove the only record of a member’s details. This is called a deletion anomaly.
Update Anomalies Insertion anomalies Modification anomalies Deletion anomalies These are characteristics of poorly designed databases The solution is to use a relational database. We use normalisation to help work out what tables are required and which data items should be stored in each table.
Data Modelling Data modelling produces a plan for a database system that can be implemented using any relational database software. Identify data in current system. Remove repeating groups by normalising the data. Determine relationships between data items and create entity/relationship diagrams. Create a data dictionary to describe the data items in the system. Identify the inputs, processes and outputs required to make the system function.
A Relational Database data is stored in a set of tables tables are joined by relational links reduces duplication of data in database allows greater flexibility and efficiency
Tables in a Relational Database
Primary Key Each row in an entity must be unique. each entity must have a primary key (known initially as a candidate key COMP 4th 05/01/1975 Burton Charlie Mr 9701111 ENG 2nd 14/12/1982 Ogston Sandy 9806666 OENG 12/09/1986 Low Susan Miss 9802600 dept_code year_of_study date_of_birth surname firstname title student_id COMP 4th 05/01/1975 Burton Charlie Mr 9701173 ENG 2nd 14/12/1982 Ogston Sandy 9806666 OENG 12/09/1986 Low Susan Miss 9802600 9701111 dept_code year_of_study date_of_birth surname firstname title student_id
A primary key is one or more columns of the entity whose values are used to uniquely identify each instance.
Primary Key Candidates – Meaningful Keys Meaningful primary keys tend to change over time and this can introduce significant problems in a database system. Avoid using meaningful primary keys if you possibly can. Use arbitrary codes of numbers or letters instead.
potential_buyer_name Surrogate Keys A surrogate key is an arbitrary single column primary key which is created specifically for an entity. A surrogate key is created when the compound key for an entity is too complex to allow the key to be used efficiently or there is no unique collection of columns available in the entity. 23.04.2004 Mitchell 7225 Parker P106 169 29.05.2004 Jones 1282 168 Smith P101 167 Patel 2983 166 19.04.2004 Perkins 1982 165 17.04.2004 164 date_of_viewing potential_buyer_name potential_buyer_no client_name property_id viewing_id
Foreign Keys Foreign keys are vital in relational databases. Relationships between entities in a database are created by linking a foreign key in one entity with its related primary key in a different entity.
Data Integrity Entity integrity Entity integrity relates to primary keys. This rule states that every entity must have a primary key and the column or columns selected for the primary key should be unique and not null. Referential integrity Referential integrity is concerned with foreign keys. The referential integrity rule states that foreign key should be linked to the primary key of a related entity.
Normalisation The process of normalisation takes the data items (called attributes) of the existing entities and produces new entities that are easier to implement in a relational database. Generally, normalisation will produce a final set of “real world” entities such as “Customers”, “Orders” etc.
First Normal Form To place data into first normal form we remove repeating groups within the primary entities. These repeating groups then become new entities linked together by a one-to-many relationship. Relationships are created by including a primary key from one entity as a foreign key in another entity.
Un-normalised data from an existing system First Normal Form Un-normalised data from an existing system Data in First Normal Form
Second Normal Form To produce data in second normal form we remove attributes that are only dependent on part of the primary key. The only applies to entities with concatenated primary keys (that is primary keys made up of two or more attributes).
Second Normal Form The customer details (such as customer name, warehouse number etc.) are only dependent on the customer number. They are removed to a new entity called Customers and the remaining attributes now hold data concerned with the details of sales between customers and salespersons (Sales-details). There are one-to-many relationships between Customers and Sales-Details and Salespersons and Sales Details
Third Normal Form To present data in 3NF we need to remove attributes that do not depend on the key. This means that if an attribute can be derived from another attribute then it can be removed to a new entity.
Third Normal Form The Warehouse name can is dependent on the warehouse number not the customer number. So we remove the warehouse data to a new entity. A new one-to-many relationship has been formed between Warehouse and Customers
Normalisation Once the data is in 3NF we formally specify the structure of each entity.
Domain Constraints The domain of an attribute is the set of permitted values (e.g. the name must only contain letters). Each domain has a set of domain constraints. These constraints apply to the type and value of the data that the attribute can hold. The domain constraints define the data that can be legally held by an attribute.
Domain Constraints Cont’d Size of attributes Constrained by permitted values Constrained by range Constrained by format Storage requirements
Data Types Text Integer Real Object Boolean Date Time Link
Cardinality Relationships or better known as cardinality can be one of three types: One-to-One One-to-Many Many-to-Many For example: one customer has one address one customer can place many orders. many salespersons can sell to many customers.
Entity/Relationship Occurrences D842 YSA 0921 YT71 7YE KY51 AFZ 2711 P385 ASA Client Nbr Van Reg This assists identification of relationship.
Entity/Relationship Diagrams (ERD) Entity Set - the name must never be a plural, it must always be a singular e.g. person rather than people. The relationship between the two entity sets. Relationships illustrate how two entities sets share information.
ERD One-to-One One-to-Many Many-to-Many NOT
Sample E/R Diagram on Customers Films Transactions make Video Tapes (Film Number Film Title Film Certificate Film Rental Cost) (Video Tape Number *Film Number Video Tape in Stock) (Transaction Number *Customer Number *Video Tape Number Date Booked Out Date Returned) (Customer Number Customer Title Customer Firstname Customer Initials Customer Surname Customer Address Customer Post Code Customer Tel. No.) M 1 Customers Films Video Tapes are rented on make Transactions
Data Dictionary A dictionary is a collection of data about data (meta data). It describes the attributes and their properties: type required (or not) range format
Sample Data Dictionary
Functions (I/P/O) Functions that act on the database are either input, processing or output operations e.g. Editing data is an input operation Carrying out a calculation is a process Producing a report based on the data is an output. All the functions that act on one, some or all of the entities are identified.
What can functions do? Functions can be used to extend the capabilities of the database system beyond what the manual system currently does e.g. statistical reports can be produced data in entities can be cross-reference to ensure integrity operations can be automated (such as looking up over due books in a library system).
Example Functions
Database Design This outcome is concerned with designing database structures for implementation. Ensure that the design closely matches the data model produced during the analysis. The finished design will include. data item names database structure data item characteristic validity checks
Choosing field and table names Most relational database management systems have rules that govern how fields and tables are named. An RDBMS may only allow field and table names of a particular length and may prohibit certain characters from being included in data item names. For example, in Microsoft Access field and table names: Can be up to 64 characters long. Can include any combination of letters, numbers, spaces, and special characters except a period (.), an exclamation point (!), an accent grave (`), and brackets ([ ]). Can't begin with leading spaces. Can't include control characters (ASCII values 0 through 31).
Database Structure The structures designed for implementation in the RDBMS should match those in the analysis e.g. Tables will be implemented from entities Fields will represent attributes Relationships between tables should be established (one-to-many etc.) Functions should be created for the inputs, processes and outputs identified.
Keys and Indexes A primary key uniquely identifies each record in a table. A foreign key is a primary key from a table included in another table to form a relationship. An index is a quick reference created by a RDBMS to speed up the use of the database. If the RDBMS has the facility to index fields then all foreign and primary keys should be indexed.
Data Item Characteristics Appropriate field types should be selected to meet the requirements of the data dictionary produced from the analysis. The field types must be selected from those available in the RDBMS e.g. Dates should be represented using the Date type not Numbers. Money should be represented using the currency data types.
Data types available in Microsoft Access
Validity Checks Data entered into the database should be valid and correct. Validity checks ensure that the data entered meets certain rules such as: Presence (if the data must be in the field or not). Range (if the value entered is within a particular range). Restricted choice (a value entered must be from a specified list of possible values).
Detailed Data Dictionary The detailed data dictionary specifies how the data model will be implemented. It contains the following details about each field Name Type Size Validation Check Format Required (or not) Indexed (or not)
Example Detailed Data Dictionary
Relationships between fields Relationships between fields can be shown in the following format. [table.fieldname] 1:M [table.fieldname] Relationships are shown as follows: 1:M (one-to-many) M:M (many-to-many) 1:1 (one-to-one) For example:
Appropriate Design If you have completed the detailed data dictionary and specified the relationships between the fields in the tables whilst considering the restrictions placed on you by the RDBMS that you have successfully met this performance criterion. When you are designing the database structures for your implementation you must always consider what your RDBMS is capable of doing. There is no point in designing a system that cannot be implemented with the software that you have available.