CS263 Information Modeling Course tutor: Tony Browne Some diagrams in these notes are small when printed 6 to a page, however all notes are available on departmental webpage: Now: Download and install: MySQL 5 from DBDesigner from Database System by P. Rob, Thompson Publishers, ISBN (out this year) A guide to MySQL by P. J. Pratt, Thompson Publishing,
Definitions Data: Meaningful facts, text, graphics, images, sound, video segments. Database: An organized collection of logically related data. Information: Data processed to be useful in decision making. Metadata: Data that describes data.
Evolution of database systems 1960’s – file processing systems: punch cards, paper tape, magnetic tape – sequential access and batch processing 1970s - Hierarchical and Network (legacy, some still used today) – difficulties = hard to access data (navigational record-at-a-time procedures), limited data independence, no widely accepted theoretical model (unlike relational) 1980s - Relational – E.F. Codd and others developed this theoretically well-founded model – all data represented in the form of tables – Oracle, DB2, Ingres 1990s - Object-oriented, but some organisations have to handle large amounts of both structured and unstructured data, so Object-relational databases developed.
Evolution of database systems 2000 and beyond – multi –tier, client-server, distributed environments, web-based, content- addressable storage, data mining
Evolution of database technologies
Three file processing systems
Disadvantages of file processing systems Still widely used today (e.g. for backup) but have the following problems: Program-Data Dependence (see Fig.) – file descriptions are stored within each application that accesses file, so change to file structure requires changes to all file descriptions in all programs. Data Redundancy (Duplication of data) – wasteful, inconsistent, loss of metadata integrity (same data has different names in different files, or same name may be used for different data in different files). Limited Data Sharing – users have little opportunity to share data outside their own applications. Lengthy Development Times – little opportunity to re-use previous development efforts. Excessive Program Maintenance – factors above combine to create heavy maintenance load
Advantages of the database approach Minimal Data Redundancy/Improved Consistency Improved Data Sharing Increased Application Development Productivity Improved Data Quality (Constraints) Better Data Accessibility/ Responsiveness Security, Backup/Recovery, Concurrency DISADVANTAGES? Specialized Personnel required, Management Cost and Complexity, Organizational Conflict
Pine valley furniture company We will be using this fictitious company as a case study The company’s first step was to create an Enterprise Data Model (a model of the organisation that provides valuable information about how the organisation functions, as well as important constraints – it stresses the integration of data and processes by focussing on entities, relationships and business rules) Enterprise data model is a graphical model that shows the high-level entities and the associations among them (see Fig. 3): An ENTITY-RELATIONSHIP DIAGRAM.
Pine valley furniture company The three associations (relationships) are shown by lines connecting the entities Each CUSTOMER places any number of ORDERS (conversely, each ORDER is placed by exactly one CUSTOMER) Each ORDER contains any number of ORDER LINEs (conversely, each ORDER LINE is contained in exactly one ORDER) Each PRODUCT has any number of ORDER LINES (conversely, each ORDER LINE is for exactly one PRODUCT)
Tables Relational databases views all data in the form of tables Each column represents an attribute, e.g. the Customer table has attributes ID, Name, Address..etc. Relationships between entities are represented by values stored in columns of the corresponding tables, e.g. Customer_ID is an attribute of both the Customer table and the Order table. This makes it easy to link an order with its customer.
SQL: Structured Query Language (some for historical reasons call it ‘Sequel’) The standard and most common language for relational database management systems An SQL-based relational database application involves a user interface, a set of tables in the database, and a RDBMS with an SQL capability Within the RDBMS SQL will be used to create the tables, translate user requests, maintain the data dictionary and system catalog, update an maintain the tables, establish security, and carry out backup and recovery procedures
The SQL environment. Each database will have a set of schemas associated with a catalog. Schema = the structure that contains descriptions of objects created by a user (base tables, views, constraints) FOLLOWING IS A CRASH-COURSE IN SQL – DON’T PANIC! – WE WILL BE GOING OVER THESE CONCEPTS IN THE FOLLOWING WEEKS
3 types of SQL commands 1. Data Definition Language (DDL) commands - that define a database, including creating, altering, and dropping tables and establishing constraints 2. Data Manipulation Language (DML) commands - that maintain and query a database 3. Data Control Language (DCL) commands - that control a database, including administering privileges and committing data
DDL, DML, DCL, and the database development process
SQL Data types CHAR(n) – fixed-length character data, n characters long Maximum length = 2000 bytes VARCHAR2 (n) – variable length character data, maximum 4000 bytes LONG – variable-length character data, up to 4GB. Maximum 1 per table NUMBER(p,q) – general purpose numeric data type INTEGER(p) – signed integer, p digits wide FLOAT(p) – floating point in scientific notation with p binary digits precision DATE – fixed-length date/time in dd-mm-yy form
SQL database definition Each of the previous create commands may be reversed using a DROP command, so DROP TABLE will destroy a table (including its definitions and contents) Usually only the table creator may delete the table. ALTER TABLE may be used to change the definition of an existing table
Creating tables Once data model is designed and normalised, the columns needed for each table can be defined using the CREATE TABLE command. The syntax for this is shown in the following Fig. These are the seven steps to follow: 1. Identify the appropriate datatype for each, including length and precision 2. Identify those columns that should accept null values. Column controls that indicate a column cannot be null are established when a table is created and are enforced for every update of the table
Creating tables 3. Identify those columns that need to be UNIQUE - when the data in that column must have a different value (no duplicates) for each row of data within that table. Where a column or set of columns is designated as UNIQUE, this is a candidate key. Only one candidate key may be designated as a PRIMARY KEY 4. Identify all primary key-foreign key mates. Foreign keys can be established immediately or later by altering the table. The parent table in such a parent-child relationship should be created first. The column constraint REFERENCES can be used to enforce referential integrity
Creating tables 5. Determine values to be inserted into any columns for which a DEFAULT value is desired - can be used to define a value that is automatically inserted when no value is provided during data entry. 6. Identify any columns for which domain specifications may be stated that are more constrained than those established by data type. Using CHECK it is possible to establish validation rules for values to be inserted into the database 7. Create the table and any desired indexes using the CREATE TABLE and CREATE INDEX statements
Table creation
The following Fig. Shows SQL database definition commands Here some additional column constraints are shown, and primary and foreign keys are given names For example, the CUSTOMER table’s primary key is CUSTOMER_ID The primary key constraint is named CUSTOMER_PK, without the constraint name a system identifier would be assigned automatically and the identifier would be difficult to read
SQL database definition commands for Pine Valley Furniture
STEP 1 Defining attributes and their data types
STEP2 Non-nullable specifications Note: primary keys should not be null
STEP 3 Identifying primary keys This is a composite primary key
STEP 4 Identifying foreign keys and establishing relationships
STEPS 5 and 6 Default values and domain constraints
STEP 7 Overall table definitions