Information System Solutions: A Project Approach Chapter Four Data Modeling
4 - 3 McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Data Models Represent the data content of an information system May provide little or no information on process and infrastructure Follow a structure of rules and conventions to facilitate good communication
4 - 4 McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Entity Relationship (ER) Models The ER model represents the system in the following terms: Entities – the things about which the organization wishes to maintain data Organizational relationships between the entities Attributes – the items of data the client wishes to collect for each entity
4 - 5 McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Entities Person – customer, student, employee, member Event – rental, sale, repair, enrollment, flight Object – video, product, vehicle, tool Place – city, zip-code, area, store, plant Concept – military unit, department, bank account
4 - 6 McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Entity Class and Instance Entity Class – set or group of things, for example, the entity class Customer represents all of the customers for an organization Entity Instance – one thing or member within an entity class, for example, one customer with the name of Joe Smith In this text “entity” means entity class
4 - 7 McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Attributes Define the properties or characteristics of the entity Can include such properties as names, descriptions, dates, sizes, and others Can include properties of tel-no, name, & address for the entity Customer An entity must have an attribute with unique values over all instances to serve as the primary key or identifier
4 - 8 McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Relationships Link or connect instances of one entity to instances of another entity Describe the way the organization operates In the GB Video example, the entity Customer is related to the entity Rental in that a customer can engage in or make rental transactions.
4 - 9 McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Maximum Cardinalities of Relationships One to one – one instance of entity A can link to one instance of entity B One-to-many – one instance of A can link to more than one instance of B Many to many – one instance of A can link to more than one instance of B; and one instance of B can link to more than one instance of A
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. ERD Symbols (Figure 4.2) Entity Relationship AttributeMulti-valued Attribute Relationship Types One to one One to many Many to many
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Entity Relationship Diagram (ERD) (Figure 4.3) CUSTOMERRENTAL VIDEO Makes Contains Name Member -No Tel-No Addres s Expire -Date Credit- Card- No Employee -No Date Pay- Type Rental -No Video- No Title Date- Acquired Cost Vendo r Overdu e- Charge Return -Date Due- Date StreetCityState Rent- Charg e/Day Zip
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Basic ERD Rules An entity is a thing about which the organization wishes to keep data An entity contains more than one instance An entity has a primary key attribute with unique values for every instance An entity has two or more attributes A many-to-many relationship may have attributes A relationship represents a situation that exists or the organization wants to exist
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. ERD Naming Rules Every component on a ERD has a unique name and/or label An Entity name consists of a singular noun in all capital letters A Relationship name consists of a verb or a phrase in upper and lower case letters Attribute names are nouns or nouns or noun phrases in using upper and lower case When an attribute serves as the primary key for an entity, the attribute name is underlined
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Multivalued Attributes A multivalued attribute can have several values for a single instance of an entity For example, Customer may contain attributes for the names of family members. A single customer (instance) may have more than one family member – spouse, children, etc. An entity can replace a multivalued attribute
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Multivalued Attributes (Figure 4.4) CUSTOMER Person-first- name Person-last- name
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Weak Entity (Figure 4.5) CUSTOMER FAMILY MEMBER Person-ID Person first-name Person- last-name
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. The Role of an Associative Entity An associative entity (AE) replaces or resolves a many-to-many relationship (m:n) One-to-many relationships connect the AE to the original entities in the m:n Many sides always connect to the AE Attributes of the m:n relationship become attributes of the AE
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Associative Entity (Figure 4.6) RENTAL VIDEO RENTAL/ VIDEO Held by Contains
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Degree of a Relationship The number of entity classes that participate in a relationship defines the degree of the relationship A unary relationship links an entity to itself A binary relationship links two entities A ternary relationship links three entities
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Unary Relationships (Figure 4.7) POLICE OFFICER Commands Partner of
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Ternary Relationship (Figure 4.8) ARTIST HALLWORK Time- Period Performs
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Ternary Relationship with an Associative Entity (Figure 4.9) ARTIST HALLWORK Time- Period PERFORMANCE
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Cardinality Maximum cardinality specifies the maximum number of instances of entity B that can link to one instance of entity A by use of the straight line or crowfoot symbols Minimum cardinality or optionality specifies the minimum number of instances of B that can link to one instance of A using the 0 (optional) or 1 (mandatory) symbols
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Minimum and Maximum Cardinalities (Figure 4.10) CUSTOMERRENTAL Makes l0 A customer may make zero (minimum) or many (maximum) rentals; a rental is made by one (minimum) and only one (maximum) Customer
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Supertype and Subtype Entities A supertype has one or more subtypes A supertype entity holds the attributes common to all of its subtype entities A subtype may have additional attributes A subtype entity may participate in relationships with other entities Supertype and subtype entities are linked by one-to-one relationships
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Supertype with a Subtype (Figure 4.11) VIDEOMATERIAL l00l Video- No Area Age- Group ED-VIDEO
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Specialization Total specialization, every instance of the supertype must link to one instance in one of the subtypes Partial specialization, an instance in the supertype may link to zero instances in all of the subtypes. In the GB Video example, some instances of regular videos link to zero instances in the subtype.
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Disjoint and Overlap Instances of a supertype can link to multiple instances of subtypes as follows: Disjoint – a supertype instance can link to an instance of only one subtype Overlap – a supertype instance can link to one instance in each of several subtypes
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Simplified ERD Rules (SERD) Omit the relationship diamonds Keep the relationship names List attributes in the entity box Replace composite attributes with the component attributes Replace many-to many relationships with associative entities Replace multi-valued attributes with entities
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. A Simplified ERD or SERD (Figure 4.12) Held by CUSTOMER Member-No Name Street City State Zip Tel-No Credit-Card-No Expire-Date RENTAL Rental-No Date Employee-No Pay-Type VIDEO Video-No Title Date-Acquired Rent-Charge/Day Vendor RENTAL/VIDEO Rental-No Video-No Due-Date Cost Return-Date Overdue-Charge Makes l0 l l 0l Contains
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Model Types Conceptual – a conceptual data model presents data with no constraints from physical technologies Logical – a logical data model observes constraints within a technology class– for example, relational tables Physical – a physical data model follows the constraints of one specific technology such as MS Access
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Conceptual Data Model (CDM) (Figure 4.13) CUSTOMER Cust-No F-Name L-Name Ads1 Ads2 City State Zip Tel-No CC-No Expire RENTAL Rental-No Date Clerk-No Pay-Type CC-No Expire CC-Approval LINE Line-No Due-Date Return-Date OD-Charge Pay-Type Requestor ofOwner of VIDEO Video-No One-Day-Fee Extra-Days Weekend TITLE Title-No Name Vendor-No Cost Name for Holder of 0lll 0l 0 l SERD Format
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Entity Metadata (Table 4.1) EntityDescription CUSTOMERContains all the available information about each customer who has made a transaction in the last year. LINEContains the information on each video associated with a rental transaction RENTALContains the information on each rental transaction TITLEContains information on each distinct title of the videos VIDEOContains information on each individual video
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Attribute Metadata for RENTAL (Table 4.1) AttributeDescription Rental-NoUnique key assigned to each rental DateDate of the rental Clerk-NoEmployee number of the clerk entering the rental Pay-TypeCash, check or credit card CC-NoCredit card number ExpireExpiration date of the credit card CC-ApprovalCredit card approval code
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Relationship Metadata (Table 4.1) Relationship Name DescriptionEntity1 with (min, max) cardinality Entity2 with (min, max) cardinality Requestor ofLinks each customer to rentals made by the customer CUSTOMER – a rental must be for one customer (1,1) RENTAL - a customer may make many rentals (0,many) Owner ofLinks each rental to the associative entity RENTAL – a line must belong to one rental (1,1) LINE – a rental must contain one or many lines (1, many) Holder ofLinks the associative entity to a specific videotape LINE – a video may be held by many lines (0, many) VIDEO – a line must hold one videotape (1, 1) Name forLinks a title to the video tapes that use the title VIDEO – a title may name many videos (0, many) TITLE – a videotape must be named by one title (1, 1)
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Enterprise Data Model (EDM) Rules The EDM provides an overview of the organizational area under study Associative entities are included if they represent an important “thing” in the organization. Weak entities are omitted Attributes of the entities are not shown Relationships are shown with maximum cardinalities only and relationship phrases; relationship diamonds are omitted. Many to many relationships are acceptable
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Enterprise Data Model (EDM) (Figure 4.14) RENTALCUSTOMER Requests VENDOREMPLOYEE MakesSupplies VIDEO Contains
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Logical Data Models Logical data models translate conceptual data models into a specific data storage structure Many information systems in the 1950 to 1980 time period used a logical structure of sequential or “flat” files stored physically on magnetic tape Today, the most common logical model for data storage is the relational model. Many physical database implementations use the relational model
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. ERD and Corresponding Relational Tables (Figure 4.15) Cust- No L- Name CityState 23278ClintonLittle RockAR 10995DoleWichitaKS 22671KerryBostonMA 00987BushCrawfordTX CUSTOMER Cust-No L-Name City State RENTAL Rental-No Date Clerk-No Makes Rental- No DateClerk- No Cust- No x x x x x x CUSTOMER RENTAL Foreign key
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. ERDs and Relational Models (Table 4.2) E-R Model Terms and ConceptsRelational Model Terms and Concepts Entity (regular, weak or associative)Table or relation Single-valued attributeColumn or attribute Multi-valued attribute(Not allowed) InstanceRow or tuple Primary key (or primary identifier)Primary key One-to-one or one-to-many relationshipForeign key Many-to many-relationship(Not allowed)
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Rules for Relational Models Every table name and the full name of every column must be unique A column must have a single value for each row The meaning of a column is determined only by the name A row is defined only by the content of the information in the row The content of each row must be unique
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Foreign Keys Foreign keys express the relationships between instances in different tables A foreign key may have any unique full-name For 1:m relationships, the foreign key always goes in the table on the many side of the relationship For 1:1 relationships, the foreign key may appear in either table Relational tables do not allow for the implementation of m:n relationships
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Relational Schema Representations (Figure 4.16) CUSTOMER Cust-NoL-NameCityState Rental-NoDateClerk-NoCust-No RENTAL CUSTOMER(Cust-No, City, L-Name, State) RENTAL(Rental-No, Cust-No, Date, Clerk-No) CUSTOMER Cust-No L-Name City State RENTAL Rental-No Date Cust-No Clerk-No 1 * Box Schema Set Notation Schema Column Heading Schema
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Rules for Converting ERDs to Relational Schemas (I) Convert all the many-to-many relationships, if any, to associative entities Convert all multi-valued attributes to entities Convert every entity to a table with column for each attribute of the entity For every one-to-many relationship between two entities, add a foreign key to the table that corresponds to the entity on the many side of the relationship
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Rules for Converting ERDs to Relational Schemas (II) For a one to one relationship between two entities, add or identify a foreign key in either of the tables Add a referential integrity arrow or line from each foreign key to the corresponding primary key With unary relationships, add the foreign key to the single table. The referential integrity arrow connects the primary and foreign key in the same table
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Relational Schema for GB Video (Figure 4.17) CUSTOMER Member-No Name Street City State Zip Tel-No Credit-Card- No Expire-Date RENTAL Rental-No Member-No Date Employee- No Pay-Type VIDEO Video-No Title Date- Acquired Vendor Rent- Charge/Day RENTAL/ VIDEO Rental-No Video-No Due-date Cost Return-Date Overdue- Charge * ** Box Schema Format
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Unary Relationships in Relational Tables (Figure 4.18) ID#NameRankPartner#Commander# POLICE OFFICER ID# Name Rank Partner of Commands POLICE OFFICER
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. A Unary m:n Relationship in an ERD (Figure 4.19) PART ID# Description Weight Goes into or contains Quantit y ERD representation for the bill of materials problem
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Other Unary m:n Representations (Figures 4.20 & 4.21) Line#Assembly-Part#Component-Part#Quantity ID#DescriptionCost PART ID# Description Cost PART/PART Line# Quantity Contains Goes into PART PART/PART Associative Entity Representation Relational Schema Representation
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Normalization Addresses the improvement of a logical data design to avoid possible problems with data duplication and with deletion and updating of data. Converts an un-normalized table into two or more smaller, normalized tables Consists of a six steps that successively transform a relation into First, Second, Third, Boyce/Codd, Fourth and Fifth Normal Forms
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Normalization Steps 1 – 3 1.First Normal Form – remove multi-varied attributes 2.Second Normal Form – remove functional dependency: the value of one of the non- key attributes depends on only part of the composite primary key 3.Third Normal Form – remove transitive dependency: a non-key attribute depends on another non-key attribute
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Second Normal Form (Table 4.3 & Figure 4.22) Rental-NoVideo-NoDue-DateTitle Rental-NoVideo-NoDue-Date Video-NoTitle Second normal form violation Tables in second normal form
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Third Normal Form (Table 4.4 and Figure 4.23) Video-noTitle-NoVendorDate-Acquired Video-NoTitle-NoDate-Acquired Third normal form violation Tables in third normal form Title-noVendor
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Structured Query Language (SQL) A programming language for relational databases that exists at both the logical and physical level SQL provides commands for a: –Data Definition Language (DLL) to create, alter and drop tables –Data Manipulation Language(DML) to insert, update, modify and retrieve data –Data Control language to grant and revoke access privileges for a database
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Dimensional Data Models Basic idea – use static data generated by operations to gain insight for strategic and tactical decisions The typical dimensional model data structure is a data mart designed around a central fact table that contains numeric values for analysis The data mart model supplies the framework for creating a data warehouse