LIS 557 Database Design and Management William Voon Michael Cole Spring '04
Data modeling 19 February 2004
One can't proceed from the informal to the formal by formal means. -- Alan J. Perlis
The Zepplin View ● To build an RDB, our data model starts as a conceptual model and then becomes a logical design and finally is physically implemented.
RDB in Seven Steps 1. Interview users and domain experts 2. Identify the data elements and their relationships. 3. Create a data model. 4. Select the database management software (DBMS). 5. Map data-model elements to tables, and normalize them. 6. Create data type definitions and a database structure. 7. Design the application.
Data Modeling ● Entity relationships show us (more or less) how many tables we need and how they may be linked – They do not tell us how to make an optimally efficient design of linked tables ● Tonight, we will learn detailed procedures that can be used to go from entity relationships to a well-designed database. – These procedures are not a substitute for insight into the system we wish to model
What is being modeled? ● There are several levels of modeling that are possible: – Physical model – Internal model – External models – Conceptual model Recall that Codd's RDB is intended to hide the physical details of data storage and use, so we can concentrate on the Conceptual, Internal and External models.
The Conceptual Model ● We have already seen this – E-R diagrams are the most widely used conceptual modelling tool ● ERDs provide a conceptual schema, that is, a plan for the RDB ● Much more information can be packed into an ERD
Tiny College Entities for Conceptual Model
The Model ● Consider each of the entities and write out the relationships to the other entities ● EXERCISE: Teams of two – Enumerate the relationships that make sense between the entities of Tiny College
The Basic TC Model
Internal Model ● Taking the conceptual model, match the characteristics and constraints to the specific software (DBMS) used – So internal models are software dependent ● Usually this means indicating the data location in a storage group. ● The internal model is (for RDBs) the implementation model. – So, M:N relationships must be resolved using bridging tables
TC Internal Model
TC External Schema(s) External models are what users see. For TC, think of two web sites, one for students, one for professors.
External Models ● Models the database slice that is seen and used by distinct groups of users ● Each uses a subset of the data in the system ● Even when two groups use overlapping data sets (or even the same data set) there may be different constraints to implement
TC External Model Constraints ● Student registration constraints – A class is limited to 25 students – A student can enroll for up to 5 classes ● Class scheduling – A room can be used by many classes, but a class may only use one room – A class is taught by one professor – A professor may teach up to three classes
Entity Relationship Models ● We have already seen Entity Relationship Diagrams, ● Now, we will now add some elements for a richer description. ● The goal is to have an ERD that describes the complete structure of all the tables and their relationships to one another (THE RDB!!)
Attributes in ERDs ● Attributes are ovals attached to an entity (remember in this use the entity is entity set = table) ● Attributes have a domain of values, e.g. (T F), (0, 10), (female male) etc. ● An entity is a collection of attributes: Car = CAR(CAR_ID_NUM, MOD_CODE, CAR_YEAR, CAR_COLOR)
Entity with Attributes Multi-valued attribute (double line) Need to split into distinct attributes for implementation Key attribute (underlined)
Derived Attribute A derived attribute is calculated or inferred (usually directly from other entity attributes)
Relationships ● Degree: unary, binary, ternary,... – How many entities are involved in the relationship? ● Connectivity: 1:1, 1:M, M:N ● Cardinality – Number of entity occurrences associated with one occurrence of related entity
Relationships Cardinalities express constraints in the internal model. Here, a professor can teach between one and four classes.
Cardinality Exercise ● A car can have four body colors (r b y g), draw the ERD. ● A manufacturer makes three car types (sedan SUV van), with the four body colors, draw the ERD.
Another example Interpret this ERD.
Existence Dependency ● If the existence of one or more entities is required for another entity, that entity is existence-dependent – e.g. A course must exist for there to be a specific class ● Existence-dependence is important because it determines the order in which tables must be created
Relationships: Mandatory or Optional? ● Constraints show where a relationship must exist for a specific entity and where the relationship is optional
Weak Entity Relationships ● A weak entity – Is existence-dependent (i.e. Other entities must exist if it exists) – Has primary key that is derived (at least in part, usually totally) from the parent entity ● Use a double rectangle to denote a weak entity
Weak Entity Diagram
Recursive Entities ● A unary relationship is a recursive entity – One consequence is that only one table is required for the relationship – Example: Prerequisites for a course – Other examples? ● Common when we model part-whole relationships and in state transformations (so a subsequent state is related to its predecessor state)
Composite Entity ● Resolution of M:N relationships requires a bridge entity. This is a composite entity. ● Notice that a composite entity is existence- dependent (Why?) ● The composite entity can have attributes that are not required for the bridging function
Bridging from M:N to 1:M To break down a M:N relationship, create a new entity that bridges between the two entities So from STUDENT to CLASS build a new table with a link (=key) to STUDENT and a link to CLASS This is efficient because only the key information is redundant. These tables are composite or bridge entities
Bridging Tables
The Bridging Process
A fully specified bridge
Entity subtypes ● Entities that have subtypes are often best handled by a generalization hierarchy ● Isolate the common attributes in a superclass
Generalization Hierarchy
A Table View
Developing an E-R Diagram ● It is an interative process – A reworking of the conceptual model to elaborate all of the details of the database – Adding the constraints to reach the internal model ● Thinking about what parts of the internal model are exposed to users (external models)
The Seven Steps (Again) 1.Interview users and domain experts 2.Identify the data elements and their relationships. 3. Create a data model. 4. Select the database management software (DBMS). 5. Map data-model elements to tables, and normalize them. 6. Create data type definitions and a database structure. 7. Design the application.