Database Management Systems Entity-Relationship Model
Database Design Why do we need it? Consider issues such as: Agree on structure of the database before deciding on a particular implementation. Consider issues such as: What entities to model How entities are related What constraints exist in the domain How to achieve good designs
Database Design Formalisms 1. Object Definition Language (ODL): Closer in spirit to object-oriented models 2. Entity/Relationship model (E/R): More relational in nature. Both can be translated (semi-automatically) to relational schemas ODL to OO-schema: direct transformation (C++ or Smalltalk based system).
Purpose of E/R Model The E/R model allows us to sketch the design of a database informally. Designs are pictures called entity-relationship diagrams. Fairly mechanical ways to convert E/R diagrams to real implementations like relational databases exist.
Entity / Relationship Diagrams Product Entities Attributes Relationships between entities address buys
name category name price makes Company Product stockprice buys employs Person name ssn address
University Example A college contains many departments Each department can offer any number of courses Many instructors can work in a department An instructor can work only in one department For each department there is a Head An instructor can be head of only one department Each instructor can take any number of courses A course can be taken by only one instructor A student can enroll for any number of courses Each course can have any number of students
Modeling A database can be modeled as: a collection of entities, relationship among entities. An entity is an object that exists and is distinguishable from other objects. Example: person, company, event, plant Entities have attributes Example: people have names and addresses An entity set is a set of entities of the same type that share the same properties. Example: set of all persons, companies, trees, holidays
Entity Sets instructor and student instructor_ID instructor_name student-ID student_name
ER Case Study Banks Database Each bank has a unique name. Each branch has a number, name, address (number, street, city), and set of phones. Customer includes their name, set of address (P.O. Box, city, zip code, country), set of phones, and social security number. Accounts have numbers, types (e.g. saving, checking) and balance. Other branches might use the same designation for accounts. So to name an account uniquely, we need to give both the branch number to which this account belongs to and the account number. Not all bank customers must own accounts and a customer may have at most 5 accounts in the bank. An account must have only one customer. A customer may have many accounts in different branches.
Relationship Sets A relationship is an association among several entities Example: 44553 (Ahmed) advisor 22222 (Hassan) student entity relationship set instructor entity A relationship set is a mathematical relation among n 2 entities, each taken from entity sets {(e1, e2, … en) | e1 E1, e2 E2, …, en En} where (e1, e2, …, en) is a relationship Example: (44553,22222) advisor
Relationship Set advisor
Relationship Sets (Cont.) An attribute can also be property of a relationship set. For instance, the advisor relationship set between entity sets instructor and student may have the attribute date which tracks when the student started being associated with the advisor
What is a Relation ? A mathematical definition: if A, B are sets, then a relation R is a subset of A x B A={1,2,3}, B={a,b,c,d}, R = {(1,a), (1,c), (3,b)} - makes is a subset of Product x Company: 1 2 3 a b c d A= B= makes Company Product
Identifying Relationship Relationships Relationships indicate a meaningful connection between two entity types Relationships may have attributes, but they cannot have key attributes. Identifying relationships connect a weak entity type to some other entity type indicates where the weak entity gets a key to complete its own partial key WorksOn Relationship Identifying Relationship DependentOf
Multiplicity of E/R Relations one-one: many-one many-many 1 2 3 a b c d 1 2 3 a b c d 1 2 3 a b c d
Mapping Cardinalities One to one One to many Note: Some elements in A and B may not be mapped to any elements in the other set
Mapping Cardinalities Many to one Many to many Note: Some elements in A and B may not be mapped to any elements in the other set
One-to-One Relationship one-to-one relationship between an instructor and a student an instructor is associated with at most one student via advisor and a student is associated with at most one instructor via advisor
One-to-Many Relationship one-to-many relationship between an instructor and a student an instructor is associated with several (including 0) students via advisor a student is associated with at most one instructor via advisor,
Many-to-Many Relationship An instructor is associated with several (possibly 0) students via advisor A student is associated with several (possibly 0) instructors via advisor
Alternative Notation for Cardinality Limits Cardinality limits can also express participation constraints
Multi-way Relationships How do we model a purchase relationship between buyers, products and stores? Purchase Product Person Store Can still model as a mathematical set (how ?)
Relational Roles It is sometimes convenient to name an entity’s role in a relationship. particularly useful in recursive relationships removes ambiguity in direction of relationship EMPLOYEE Supervision supervisor supervisee
Roles in Relationships What if we need an entity set twice in one relationship? Product Purchase Store buyer salesperson Person
Data Modeling Case Study The following is description by a pharmacy owner: “Ahmed Hassan catches a cold and what he suspects is a flu virus. He makes an appointment with his family doctor who confirm his diagnosis. The doctor prescribes an antibiotic and nasal decongestant tablets. Ahmed leaves the doctor's office and drives to his local drug store. The pharmacist packages the medication and types the labels for pill bottles. The label includes information about customer, the doctor who prescribe the drug, the drug (e.g., Penicillin), when to take it, and how often, the content of the pill (250 mg), the number of refills, expiration date, and the date of purchase." Please develop a data model for the entities and relationships within the context of pharmacy.
Attributes An entity is represented by a set of attributes, that is descriptive properties possessed by all members of an entity set. Example: instructor = (ID, name, street, city, salary ) course= (course_id, title, credits) Domain – the set of permitted values for each attribute Attribute types: Simple and composite attributes. Single-valued and multivalued attributes Example: multivalued attribute: phone_numbers Derived attributes Can be computed from other attributes Example: age, given date_of_birth
Sets and Derived Attributes Multivalued attributes double lined oval multivalued = set valued that there may be more than one value for the attribute. Derived attributes dashed line ovals the attribute is computed from other data Locations Multivalued Attribute NumEmployees Derived Attribute
Composite Attributes Composite attributes tree composed of other attributes. used for a set of related attributes, when the set is not a conceptual entity the composite doesn’t have identity … it doesn’t have a key Address City ZipCode Street State Composite Attribute
Composite Attributes
Attributes on Relationships date Product Purchase Store Person
Converting Multi-way Relationships to Binary ProductOf date Product Purchase StoreOf Store BuyerOf Person
ER Case Study Television Series Database A Television network wishes to create a database to keep track of its TV series. A television series has one or more episode. Television series identified by name and season number, and includes their production company name and Num_of_Episodes ( i.e. total number of episodes in a specific season of a series ). Episode of a specific season of a series is identified by episode number and has a title and a length. No episode can exist without a corresponding television series. Also each episode has only one writer. A writer is identified by name, and also has birth date and a literary agency that represents him or her. An actor appears as a performer in a television series or a guest star on an episode. An actor is identified by name and also has a nationality and birth date. An actor plays a particular character in a television series or episode.
Keys in E/R Diagrams Every entity must have a key name category price Product
Attributes and Keys Key attributes must be unique for each entity Keys are used to identify particular entities Partial keys are only partially unique used for weak entity types Age Attribute Key Attribute SSN Partial Key Attribute Date
Keys A super key of an entity set is a set of one or more attributes whose values uniquely determine each entity. A candidate key of an entity set is a minimal super key ID is candidate key of instructor course_id is candidate key of course Although several candidate keys may exist, one of the candidate keys is selected to be the primary key.
Keys in E/R Diagrams name category Underline: price Product No formal way to specify multiple keys in E/R diagrams Person name ssn address
Entity With Composite, Multivalued, and Derived Attributes
E-R Diagrams Rectangles represent entity sets. Diamonds represent relationship sets. Attributes listed inside entity rectangle Underline indicates primary key attributes
name category name price makes Company Product stockprice buys employs Person name ssn address
Relationship Sets with Attributes
Entity Set to Relation name category price Product Product(name, category, price) name category price gizmo gadgets $19.99
Classroom Design Exercise Imagine we are creating a database for a dorm, which includes a cooperative kitchen. We want to record certain information about each resident. What? Not all residents belong to the kitchen coop. Those that do interact in various ways: They take turns at various jobs: preparer, cleanup, buyer (for supplies). No one should have two jobs on one day. They may or may not be vegetarian. Each meal must have at least one vegetarian entreé. They pay fees to the coop. For each meal, there is a menu. Each menu item requires certain ingredients, which must be on hand.
Constraints in E/R Diagrams Finding constraints is part of the modeling process. Commonly used constraints: Keys: social security number uniquely identifies a person. Single-value constraints: a person can have only one father. Referential integrity constraints: if you work for a company, it must exist in the database. Other constraints: peoples’ ages are between 0 and 150.
Other Constraints makes <100 Product Company What does this mean ?
Entity Types Entity types boxes Weak entity type double box EMPLOYEE Entity DEPENDENT Weak Entity
Weak Entity Sets Entity sets are weak when their key comes from other classes to which they are related. affiliation Team University sport number name
Weak Entity Sets Sometimes an E.S. E ’s key comes not (completely) from its own attributes, but from the keys of one or more E.S.’s to which E is linked by a supporting many-one relationship. Called a weak E.S. Represented by putting double rectangle around E and a double diamond around each supporting relationship. Many-one-ness of supporting relationship (includes 1-1) essential. With many-many, we wouldn't know which entity provided the key value. “Exactly one” also essential, or else we might not be able to extract key attributes by following the supporting relationship.
Example: Logins (Email Addresses) Login name = user name + host name, e.g., ark@soe.ucsc.edu. A “login” entity corresponds to a user name on a particular host, but the passwd table doesn’t record the host, just the user name, e.g., ark. Key for a login = the user name at the host (which is unique for that host only) + the IP address of the host (which is unique globally). name name @ @ Logins Hosts
Example Schema Name DEPENDENT DependentOf Age EID Name EMPLOYEE Phone PROJECT Budget Name StartDate WorksOn
Design Principles Avoid redundancy Example Setting: client has (possibly vague) idea of what he/she wants. You must design a database that represents these thoughts and only these thoughts. Avoid redundancy = saying the same thing more than once. Wastes space and encourages inconsistency. Example Good: name name addr ManfBy product Manfs
Example Bad: repeats manufacturer address for each beer they manufacture. Bad: manufacturer’s name said twice. name manf product Manf addr name manf name addr ManfBy product Manfs
Entity Sets Vs. Attributes You may be unsure which concepts are worthy of being entity sets, and which are handled more simply as attributes. Especially tricky for the class design project, since there is a temptation to create needless entity sets to make project “larger.” Wrong: Right: name name ManfBy Cars Manfs name manf Cars
Intuitive Rule for E.S. Vs. Attribute Make an entity set only if it either: Is more than a name of something; i.e., it has nonkey attributes or relationships with a number of different entity sets, or Is the “many” in a many-one relationship.
Don't Overuse Weak E.S. There is a tendency to feel that no E.S. has its entities uniquely determined without following some relationships. However, in practice, we almost always create unique ID's to compensate: social-security numbers, VIN's, etc. The only times weak E.S.'s seem necessary are when: We can't easily create such ID's; e.g., no one is going to accept a “species ID” as part of the standard nomenclature (species is a weak E.S. supported by membership in a genus). There is no global authority to create them, e.g., crews and studios.
Notation Summary
Equivalent Schema defined in UML
Design an ER schema for the following enterprise:
University Example A college contains many departments Each department can offer any number of courses Many instructors can work in a department An instructor can work only in one department For each department there is a Head An instructor can be head of only one department Each instructor can take any number of courses A course can be taken by only one instructor A student can enroll for any number of courses Each course can have any number of students
Steps in ER Modeling Step 1: Identify the Entities DEPARTMENT STUDENT COURSE INSTRUCTOR
Steps in ER Modeling Step 2: Find the relationships One course is enrolled by multiple students and one student enrolls for multiple courses, hence the cardinality between course and student is Many to Many. The department offers many courses and each course belongs to only one department, hence the cardinality between department and course is One to Many. One department has multiple instructors and one instructor belongs to one and only one department , hence the cardinality between department and instructor is one to Many. Each department there is a “Head of department” and one instructor is “Head of department “,hence the cardinality is one to one . One course is taught by only one instructor, but the instructor teaches many courses, hence the cardinality between course and instructor is many to one.
Steps in ER Modeling Step 3: Identify the key attributes Deptname is the key attribute for the Entity “Department”, as it identifies the Department uniquely. Course# (CourseId) is the key attribute for “Course” Entity. Student# (Student Number) is the key attribute for “Student” Entity. Instructor Name is the key attribute for “Instructor” Entity. Step 4: Identify other relevant attributes For the department entity, the relevant attribute is location For course entity, course name, duration, prerequisite For instructor entity, room#, telephone# For student entity, student name, date of birth
Steps in ER Modeling Step 5: Draw complete E-R diagram with all attributes including Primary Key
E-R Diagram for a University Enterprise
Case Study 2 Design a DB representing cities, counties, and states in the US: For states, record the name, population, and state capital (a city). For counties, record the name, the population, and the located state. For cities, record the name, the population, the located state and the located county. Uniqueness assumptions: Names of states are unique. Names of counties are unique within a state (e.g., 26 states have Washington Counties). Cities are unique only within a state (e.g., there are 24 Springfields among the 50 states). Some counties and cities have the same name, even within a state (e.g., Los Angeles). All cities are located within a single county 9
Design 1: bad Problem: County Population is repeated for each city. Co. Popu. Co. name Popu. Located cities states name Ci. Popu. Ci. name capital Problem: County Population is repeated for each city.
Design 2: good Co. Popu. Co. name Popu. name Located counties states Belongs-to capitals cities Ci. Popu. Ci. name
Case Study 3 Design a DB consistent with the following facts. Trains are either local trains or express trains, but never both. A train has a unique number and an engineer. Stations are either express stops or local stops, but never both. A station has a unique name and an address. All local trains stop at all stations. Express trains stop only at express stations. For each train and each station the train stops at, there is a time. 13
Design 1: bad number type time name addr StopsAt trains stations engineer type Problem: does not capture the constraints that express trains only stop only at express stations and local trains stop at all local stations 15
Design 2: good Lname address local stations Lnumber engineer time StopsAt2 local trains Enumber engineer express trains time StopsAt1 express stations Ename address 16
Case Study 4 (Pine Valley Furniture Company 1. The company sells a number of different furniture products. These products are grouped into several product lines. The identifier for a product is Product_ID, while the identifier for a product line is Product_Line_ID. Referring to the customer invoice, we identify the following additional attributes for product: Product_Description, Product_Finish, and Unit_Price. Another attribute for product line is Product_Line_Name. A product line may group any number of products, but must group at least one product. Each product must belong to exactly one product line.
Case Study 4 2. Customers submit orders for products The identifier for an order is Order-ID, and another attribute is Order_Date. A customer may submit any number of orders, but need not submit any orders. Each order is submitted by exactly one customer. The identifier for a customer is Customer_ID. Other attributes include Customer_Name and Customer_Address.
Case Study4 3. A given customer order must request at least one product. Any product sold by Pine Valley Furniture may not be requested on any order, or may be requested on one or more orders. An attribute associated with each order and product is Quantity, which is the number of units requested.
Case Study4 4. Pine Valley Furniture has established sales territories for its customers. Each customer does business in one or more of these sales territories. The identifier for a sales territory is Territory_ID. A sales territory may have any number of customers, or may not have any customers doing business.
Case Study4 5. Pine Valley Furniture Company has several salespersons. The identifier for a salesperson is Salesperson_ID. Other attributes include Salesperson_Name, Salesperson_Telephone, and Salesperson_Fax. A salesperson serves exactly one sales territory. Each sales territory is served by one or more salespersons.
Case Study4 6. Each product is assembled from one or more raw materials. The identifier for the raw material entity is Material_ID. Other attributes include Unit_of_Measure and Unit_Price. Each raw material may be assembled into one or more products.
Case Study4 7. Raw materials are supplied by vendors. The identifier for a vendor is Vendor_ID. Other attributes include Vendor_Name and Vendor_Address. Each raw material can be supplied by one or more vendors. A vendor may supply any number of raw materials, or may not supply any raw materials to Pine Valley Furniture. An attribute of the relationship between vendor and raw material is Unit_Price
Case Study4 8. Pine Valley Furniture has established a number of work centers. The identifier for a work center is Work_Center_ID. Another attribute is Location. Each product is produced in one or more work centers. A work center may be used to produce any number of products, or may not be used to produce any products.
Case Study4 9. The company has over 100 employees. The identifier for employee is Employee_ID. Other attributes are Employee_Name, EmployeeAddress, and Skill. An employee may have more than one skill. And Each skill can be mastered by many employees or none.
Case Study4 10: Each employee works in one or more work centers. A work center must have at least one employee working in that center, but may have any number of employees.
Case Study4 11. Each employee has exactly one supervisor. An employee who is a supervisor may supervise any number of employees, but not all employees are supervisors.
Library Case Study When a library first receives a book from a publisher it is sent, together with the accompanying delivery note, to the library desk. Here the delivery note is checked against a file of books ordered. If no order can be found to match the note, a letter of enquiry is sent to the publishers. If a matching order is found, a catalogue note is prepared from the details on the validated delivery note. The catalogue note, together with the book, is sent to the registration department. The validated delivery note is sent to the accounts department where it is stored. On receipt of an invoice from the publisher, the accounts department checks its store of delivery notes. If the corresponding delivery note is found then an instruction to pay the publishers is made, and subsequently a cheque is sent. If no corresponding delivery note is found, the invoice is stored in a pending file.
Conference centre booking system A conference centre takes bookings from clients who wish to hold courses or conferences at the centre. When clients make bookings they specify how many people are included in the booking, and of these, how many will be resident during the booking, and how many will require catered or non-catered accommodation at the centre. The centre contains a number of facilities which may be required by clients making bookings as follows: A. There are 400 bedrooms for clients who will be resident during the Course or conference. B. A maximum of 250 catered people can be handled at any one time. C. Six main lecture theatres providing seating for 200 people. D. Twenty seminar rooms each able to accommodate 25 people. E. Video conference facilities. The video conference facilities consist of four separate video conference networks. Each video conference network has a large screen based in one of the main lecture theatres, along with 3 satellite screens each of which is based in one of the seminar rooms. 14 11
Shipping company example The London and Ireland Shipping Company PLC (LISC) was founded in 1852 and owns a fleet of cargo ships. The company had historically run passenger liners, but recent policy decisions involved the sale of all passenger-carrying vessels. The company currently has 14 vessels, including one oil tanker and one tugboat operating out of Liverpool. Most of the vessels are registered in Liberia for tax reasons. Each ship has one or more holds divided into spaces. The holds are defined by steel bulkheads and the spaces are defined by shelf racks or other physical dividers. Sister ships, built by the same shipbuilders and to the same designs have similar names, such as Pride of Ireland, Queen of Ireland, Song of Ireland and Warrior of Ireland. Sister ships also have identical cargo storage facilities. LISC issues contracts to agents for one or more manifests (lists of cargo items to be shipped). LISC's charges for cargo carried are based on the number of spaces the cargo requires for storage. The types of cargo typically carried by LISC include grain, coal and ores (carried only in ships equipped with bulk cargo holds). They also transport sacked grain, heavy cases, containers (which may be carried on deck), pallets and so on.
Shipping company example Cargo items may take up less than one space in a hold, or one or more spaces, depending on the size of the item. A space may therefore contain several small cargo items. The ships owned by LISC are kept as busy and as full as possible, in order to maximise the profits that each vessel makes and minimise running & operating costs. LISC's ships ply most of the seas of the world, but tend to operate mainly in the Mediterranean, the North and Mid Atlantic and the Indian Ocean. Different ships require different crew complements. LISC intends to create a computer based information system that will be able to perform the following tasks: • record the voyages of each ship with the start and end ports. • record the cargo held by a ship on each voyage • keep records of their employees and the ships they are assigned to • producing invoices for agents and customers • keep a record of customers' payments on invoices • analyse the efficiency of use of cargo space and of percentage wasted cargo space for ships voyages
Film Club Case Study: Film Club UK is a company that owns or leases a number of small cinemas in the UK. They have commissioned a database designer to design a database solution to enable them to maintain details about their cinemas and the films that they show. Note that it is possible to have two cinemas in the same location with the same name (there used to be two Odeons in Newcastle). It is also possible to have different films with the same title (for example, different versions of a Shakespeare play). Films are scheduled for one or more showings at a cinema within a ‘season’. Season details are to be notified in advance of the dates and times of showings, takings, etc. to be notified later. Any one film may have more than one season at any one cinema (for example, a cinema showing ‘The Snowman’ each Christmas). At present, all cinemas are single-screen.