Principles of Database Systems With Internet and Java Applications Today’s Topic Chapter 2: Representing Information with Data Models The lecture notes that I have prepared are quite simple. My own style is to use the projected notes to direct the lectures. I do not often put detailed comments in the lecture notes. My intention is for the students to focus on what I’m saying, rather than on what is projected. I also hope in the lectures to motivate students to read the textbook. I have used the notes pages to include my motivation for the slides and something about what I talk about when each is presented. I always make the slides available to students before class so that they can anticipate the lecture and will not have to write the presented material. These simple lecture notes do not substitute for attendance at lectures—an obvious fact for students. It’s probably best to remove the notes pages before distributing the lecture notes. I typically print the slides at 3 pages per page of output as a postscript file and include the ppt file. I have removed most of the references to myself. You should edit the master and title master, including header and footer, to put your own identity into the presentations. You can expect that any URLs in the document need to be modified for your presentation. Of course, you are free to modify these notes. Please send revisions or comments to me at riccardi@acm.org. Greg Riccardi Instructor’s name and information goes here Please see the notes pages for more information. Chapter 2: Data Models January 11, 2000
Chapter 2, Representing Information with Data Models Entity Relationship (ER) Model high-level, conceptual data model Specify conceptual schema conceptual database design Identify the data requirements of users and detailed descriptions of data types, relationships and constraints. Concentrate on specifying the properties of the data, not storage. Overview of the chapter. January 11, 2000
An Example of ER Modeling Company database Department name, number, manager (employee), start date of manager Projects controlled by department name, number, single location Employees name, ssn, address, salary, sex, birthdate assigned to department, several projects Dependents of employees This is a simple, intuitive example. The scenario is that a company wants to keep track of its employees and begins to consider what are the basic concepts (entity classes) and attributes that are needed. As we begin to distinguish between entities and relationships, the use of manager attributes and the “assigned to department” phrase will be seen as evidence of relationships. This can be gently pointed out before the concepts are defined in the next slide. January 11, 2000
Principals of ER Modeling Entities and classes Entity, a thing in the real world Entity Class, the structure of a collection of similar entities Attributes Attribute, a property of an entity Each entity has a value for each of its attributes Types of attributes simple vs. composite, single-valued vs. multi-valued, stored vs. derived domains of attributes Very important definitions. These should be related to the previous slide’s list of objects and data. An object-oriented view of classes is appropriate here. There are real objects and we want to represent some, but not all, of their characteristics in a computer system. We divide the real objects into groups whose individuals share common characteristics and differ in the values of the characteristics. Each group is represented by an entity class. Similarly, the real characteristics of the individuals are abstracted as attributes and the actual characteristics are represented (incompletely) as attribute values. January 11, 2000
Relationships Between Entities Relationship type defines a set of associations among given types. Relationsip Instances are particular relationships among objects. Examples of relationship types in company database Manages: 1:1 between employee and department Works-for: 1:N between department and employee Controls: 1:N between department and project Our intention is to separate relationships and attributes. The main reason to make this separation is that relationships are conceptually important to users and to developers. One reason is that there are many ways to represent relationships in logical models and we want to make sure to preserve our ability to choose representations. As the discussion develops the difference between the cardinalities will be very important and will enhance the semantics of the model and expose potential errors. January 11, 2000
Relationships, Roles, and Structural Constraints Roles are attributes that signify the function of a particular entity (type) in a relationship Employee manages department Department is managed by employee Employee works-for department Department has employees who work for it Constraints can be cardinality Each department can have no more than one manager participation Each department must have a manager There are 2 objects related by each relationship. The relationship is not symmetric in its meaning. Each object fulfills one of the two roles of the relationship. The examp Conceptual modeling attempts to restrict (constrain) the facts that can be represented. A cardinality constraint limits the number of objects that may be related to a specific object. It does not limit the number of objects that may participate in relationships, just how many may be related to an individual. A participation constraint specifies that an object must be related to some other object. That is, a zero cardinality is not allowed in the presence of a participation constraint. January 11, 2000
ER schema diagram for Company Give some examples of the members of the classes and relationship types. It is important to emphasize the distinction between the classes and the members. The attributes are missing from this diagram. A relationship type (diamond and lines) represents the possibility that some individuals will be related. The diagram does not say anything about whether a particular object is related to another particular object. Some of the notations should be deferred. In particular, the double boxes and double diamonds, which represent weak entity classes and their defining relationships, should be deferred. January 11, 2000
Entity Classes for BigHit Video This table lists some of the entity classes. The description is a very important part and should never be omitted. You can also point out that this is an example of the representation of information in a table. This form of representation will certainly be important in this course! January 11, 2000
Sample Attribute Specifications Here, we present the idea that attributes are somewhat independent of entities. We can define a specific attribute and use it in more than one entity. Some discussion of an enterprise data dictionary is in order here. January 11, 2000
Entity Classes, Attributes and Constraints This table puts the two previous tables together and forms a partial conceptual model. Constraints have been added in the form of key, not null, and derived. At this point it’s appropriate to explain what a key constraint is. January 11, 2000
Entities, instances of classes Previous tables represent schemas, or models. This one finally represents real objects. The basic structure of table is the same as previously, the column headings contain the names of the attributes, each row represents a single entity, and a cell contains the value of the attribute for the object. It’s appropriate to point out that all of these tables are explicitly represented in a database, since it keeps track of the logical model as well as the data values. The tables of the previous slides are model, or meta-data, and are kept in the database in the same form as all other data is stored. January 11, 2000
Relationships Between Entities Relationship type defines a set of associations among given types. Relationship Instances are particular relationships among objects. Examples of relationship types in company database Manages: 1:1 between employee and department Works-for: 1:N between department and employee Controls: 1:N between department and project Here we finally define the concept of relationship type and relationship instance. I don’t think we can over emphasize the difference between schema and instance. The model shows the schema but not the instances. January 11, 2000
Relationships, Roles, and Structural Constraints Roles are attributes that signify the function of a particular entity (type) in a relationship Employee manages department Employee works-for department Constraints can be cardinality Each department can have no more than one manager participation Each department must have a manager More definitions, to make the concepts clear. January 11, 2000
Relationship Types and Instances Marriage relationship type Person related to Person One person has the role of “wife” one has the role of “husband” Relationship type may have one or more attributes e.g. weddingDate Marriage relationship (instance) Jane Block is married to Joe Block (relationship) Jane Block is the wife of Joe Block (role) Joe Block is the husband of Jane Block (role) Parent-child relationship type A person may have zero or more children This slide includes several examples of types and instances, of roles and of cardinality constraints. January 11, 2000
Relationships are always one-to-one A relationship is an instance These pictures are sets of instances Cardinality constraints are specified as part of the schema and enforced among the instances. The cardinality of the relationship (e.g. one-to-many) is enforced as two separate roles. One role is to-one and the other is to-many. Each relationship (instance) relates one object to another. An object whose role is to-many many have several relationship instances, each with a different object. It is not possible for an object to be related to more than one other by a single relationships. It is possible to have multiple relationships of a single type, it the role is to-many. All of this discussion is based on having binary relationships. There are natural examples of multi-ary relationships. The discussion of relationships of higher degree is on p. 38. January 11, 2000
Find the Entities, Attributes and Relationships With this slide, I try to get the class to break this receipt into distinct entities. In particular, the account id represents the relationship between the rental of the videotape and the customer. The customer name and address are associated with the customer and are not attributes of the rental. Similarly, the videotape ID represents a relationship. The dates and cost are certainly attributes of the rental. I might draw the resulting ER diagram on the chalk board to illustrate the design process. January 11, 2000
ER schema diagram for BigHit Video This slide presents an opportunity to discuss many features of ER diagrams and to look into what the symbols mean and what meaning they bring to the diagram. The numberRentals attribute is derived because it is a count of the number of Rents relationships for the customer. The cardinality of 1 between Rents and Customer implies that a Videotape can have only one customer at a time. Hence, Rents represents the current rentals. Clearly a tape can be rented more than once in its lifetime, but only one at a time. “A videotape may be rented by one customer.” The cardinality of M between Rents and Videotape allows a customer to have many tapes rented. “A customer may rent any number of videotapes.” A discussion of keys is also in order, as are discussions of multi-valued and composite attributes and attributes of relationship types. The next slide has a discussion of keys. January 11, 2000
Keys of entities A key is a set of attributes that uniquely identify one entity within the class accountId is a key for Customer may be multiple attributes (examples follow) A key constraint specifies a restriction on a set of entities no 2 entities in the set may have the same values for the key an attempt to add a new entity with the same key as another entity is not allowed Emphasize the need for unique identification. I take every opportunity to tell students that a constraint places limits on the contents of the database. An object (entity) cannot be added if it violates a constraint. An object cannot be modified if the resulting object would violate some constraint. January 11, 2000
Weak Entity Classes An entity class that has no key is a weak entity class A weak entity is identified by its relationships The relationships are called identifying relationships A weak entity may exist only if it is related to other entities by its identifying relationships Examples Rental TimeCard The developers of ER modeling invented the notion of weak classes to cover a frequent situation. An object that is clealyr distinct from its related objects has no identifying information. It exists only in relationship to other objects. A weak entity may be represented as a collection of attributes of one or more related objects. However, the designers find that it is too important to be treated that way, or that it is naturally independent. The weak entity class allows these classes to be directly represented in the model. Other extensions to the ER model are covered in Chapter 3. January 11, 2000
More facets of ER diagrams The discussion of this should focus on the need for identifying roles, since each relationship type connects a class with itself. Without the role names, we wouldn’t know which individual was which. This is a strange diagram for several reasons. You might want to create a table on the board to illustrate some things that are allowed and some that are not. First, it doesn’t require that the husband or wife is of a particular sex. The diagram does not enforce monogamy. In essence, a person may be the wife of one person and the husband of another or a person may be married to himself! The diagram does not enforce simple natural laws of parenthood. For instance, Jill is the parent of Joe and Joe is the parent of Jill. Finally, the cardinality of IsChildOf is improper. Each person has either 1 or 2 parents. Hence, either the set of people is infinite or someone is his or her own ancestor. A simple cardinality argument shows that this is true. In database systems, we are always and only interested in finite sets. This exposes a common fallacy with data modeling. We are not trying to represent real objects and certainly not all objects. We are representing some information about some individuals. The database will not include all of the parents of all of the people in the database. Hence, the cardinality should be a limit of 2, but not a lower bound of 1. For our particular set of people, some of their parents will be o January 11, 2000
Treating Rents as an entity class Should Rental be an entity class? instead of relationship type Rents A rental entity represents the possession of a videotape by a customer This is a standard transformation which may take place during the data modeling or during the translation into a logical model. Identifying the rental as an entity is appropriate, especially since there is a receipt (document) associated with it. The word “rental” is likely used in the description of the rental business. A discussion of the role of the identifying relationship types is appropriate. Here, the relationship type with Videotape is identifying but the relationship type with Customer is not. That is, since each Videotape has no more than one Rental, the relationship of a Rental object with a Videotape completely identifies the Rental. It is not necessary to include the Customer to determine the Rental. It is reasonable to mark the relationship type with Customer as identifying, but it is not necessary. January 11, 2000
Subtleties in Meaning of Purchase Order This is an example of how ER modeling can expose errors in understanding. The upper picture appears first and the lower one after a click. Explain the problem with an attempt to purchase many copies of a videotape. Students usually understand that such an order should be for a quantity of single movie, rather than many orders for the same movie. After explaining the problem, the instructor can show the lower picture. This shows a much more accurate model. January 11, 2000
Summary of Chapter 2 Information system is a repository of facts about an organization Discovery of the structure of information Data model specifies the structure and meaning of data Limitations on systems created by faulty models Options and alternatives are exposed by the process Forms the basis for system development Provides basis for agreement between developer and user Entity-relationship modeling Entity is a thing, entity class is a set of things Relationship type represents the possibility that two entities are related in a specific way E-R diagram is an appropriate way to represent a data model E-R diagrams are the deliverables of the initial phase of information system development