The presentation will address the following questions:

The presentation will address the following questions:
Introduction The presentation will address the following questions: What is systems modeling and what is the difference between logical and physical system models? What is data modeling and what are its benefits? Can you recognize and understand the basic concepts and constructs of a data model? Can you read and interpret a entity relationship data model? When in a project are data models constructed and where are they stored? Can you discover entities and relationships? Can you construct an entity-relationship context diagram? 170 This is the first of three graphical systems modeling chapters. In this chapter you will learn how to use a popular data modeling tool, entity-relationship diagrams, to document the data that must be captured and stored by a system, independently of showing how that data is or will be used – that is, independently of specific inputs, outputs, and processing.

The presentation will address the following questions:
Introduction The presentation will address the following questions: Can you discover or invent keys for entities? Can you construct a fully attributed entity relationship diagram and describe all data structures and attributes to the repository or encyclopedia? 170 No additional notes provided.

An Introduction to Systems Modeling
One way to structure unstructured problems is to draw models. A model is a representation of reality. Just as a picture is worth a thousand words, most system models are pictorial representations of reality. Models can be built for existing systems as a way to better understand those systems, or for proposed systems as a way to document business requirements or technical designs. What are Logical Models? Logical models show what a system ‘is’ or ‘does’. They are implementation-independent; that is, they depict the system independent of any technical implementation. As such, logical models illustrate the essence of the system. 172 In the last chapter you were introduced to activities that called for drawing system models. System models play an important role in systems development.

What are Physical Models? Physical models show not only what a system ‘is’ or ‘does’, but also how the system is physically and technically implemented. They are implementation-dependent because they reflect technology choices, and the limitations of those technology choices. Systems analysts use logical system models to depict business requirements, and physical system models to depict technical designs. 173 No additional notes provided.

Systems analysis activities tend to focus on the logical system models for the following reasons: Logical models remove biases that are the result of the way the current system is implemented or the way that any one person thinks the system might be implemented. Logical models reduce the risk of missing business requirements because we are too preoccupied with technical details. Logical models allow us to communicate with end-users in non-technical or less technical languages. 173 Logical models remove biases that are the result of the way the current system is implemented or the way that any one person thinks the system might be implemented. Thus, we overcome the “we’ve always done it that way’’ syndrome. Consequently, logical models encourage creativity. Logical models reduce the risk of missing business requirements because we are too preoccupied with technical details. Such errors can be costly to correct after the system is implemented. By separating what the system must do from how the system will do it, we can better analyze the requirements for completeness, accuracy, and consistency. Logical models allow us to communicate with end-users in non-technical or less technical languages. Thus, we don’t lose requirements in the technical jargon of the computing discipline.

Data modeling is a technique for defining business requirements for a database. Data modeling is a technique for organizing and documenting a system’s DATA. Data modeling is sometimes called database modeling because a data model is usually implemented as a database. It is sometimes called information modeling. Many experts consider data modeling to be the most important of the modeling techniques. Why is data modeling considered crucial? Data is viewed as a resource to be shared by as many processes as possible. As a result, data must be organized in a way that is flexible and adaptable to unanticipated business requirements – and that is the purpose of data modeling. Logical models remove biases that are the result of the way the current system is implemented or the way that any one person thinks the system might be implemented. Thus, we overcome the “we’ve always done it that way’’ syndrome. Consequently, logical models encourage creativity. Logical models reduce the risk of missing business requirements because we are too preoccupied with technical details. Such errors can be costly to correct after the system is implemented. By separating what the system must do from how the system will do it, we can better analyze the requirements for completeness, accuracy, and consistency. Logical models allow us to communicate with end-users in non-technical or less technical languages. Thus, we don’t lose requirements in the technical jargon of the computing discipline.

Why is data modeling considered crucial? (continued) Data structures and properties are reasonably permanent – certainly a great deal more stable than the processes that use the data. Often the data model of a current system is nearly identical to that of the desired system. Data models are much smaller than process and object models and can be constructed more rapidly. The process of constructing data models helps analysts and users quickly reach consensus on business terminology and rules. 175 No additional notes provided.

174 Figure 5.1 An Entity Relationship Data Model
This diagram in the figure above makes the following business assertions: We need to store data about CUSTOMERs, ORDERs, and INVENTORIED PRODUCTs. The value of CUSTOMER NUMBER uniquely identifies one and only one CUSTOMER. The value of ORDER NUMBER uniquely identifies one and only one ORDER. The value of PRODUCT NUMBER uniquely identifies one and only one INVENTORIED PRODUCT. For a CUSTOMER we need to know the CUSTOMER NAME, SHIPPING ADDRESS, BILLING ADDRESS, and BALANCE DUE. For an ORDER we need to know ORDER DATE and ORDER TOTAL COST. For an INVENTORIED PRODUCT we need to know PRODUCT NAME, PRODUCT UNIT OF MEASURE, and PRODUCT UNIT PRICE. A CUSTOMER has placed zero, one, or more ORDERs. An ORDER is placed by exactly one CUSTOMER. The value of CUSTOMER NUMBER (as recorded in ORDER) identifies that CUSTOMER. An ORDER sold one or more ORDERED PRODUCTs. Thus, an ORDER must contain at least one ORDERED PRODUCT. An INVENTORIED PRODUCT may have been sold as zero, one, or more ORDERED PRODUCTs. An ordered product identifies a single INVENTORIED PRODUCT on a single ORDER. The ORDER NUMBER (for an ORDERED PRODUCT) identifies the ORDER, and the PRODUCT NUMBER (for an ORDERED PRODUCT) identifies the INVENTORIED PRODUCT. Together, the identify a one and only one ORDERED PRODUCT. For each ORDERED PRODUCT we need to know QUANTITY ORDERED and UNIT PRICE AT TIME OF ORDER.

System Concepts for Data Modeling
Most systems analysis techniques are strongly rooted in systems thinking. Systems thinking is the application of formal systems theory and concepts to systems problem solving. There are several notations for data modeling, but the actual model is frequently called an entity relationship diagram (ERD). An ERD depicts data in terms of the entities and relationships described by the data. 175 Most people can learn the techniques of systems analysis. But if they understand the underlying concepts, they can adapt the techniques to ever-changing changing problems and conditions. They can also improve upon the techniques, recognize the advantages of new techniques, and see opportunities to integrate different techniques. Therein lies your true opportunity for competitive advantage and security in today’s business world. If you understand theory, concepts, and techniques, you will be able to do more than just solve textbook exercises – you will be able to solve real world problems and command the premium salaries paid to today’s best problem solvers! There are several notations for ERDs. Most are named after their inventor (e.g., Chen, Martin, Bachman, Merise) or after a published standard (e.g., IDEF1X). These data modeling ‘languages’ generally support the same fundamental concepts and constructs. We have adopted the Martin (Information Engineering) notation because of its widespread use and CASE tool support.

Entities All systems contain data. Data describes ‘things’. A concept to abstractly represent all instances of a group of similar ‘things’ is called an entity. An entity is something about which we want to store data. Synonyms include entity type and entity class. An entity is a class of persons, places, objects, events, or concepts about which we need to capture and store data. An entity instance is a single occurrence of an entity. Consider a school system. A school system includes data that describes things such as STUDENTs, TEACHERs, COURSEs, CLASSROOMs. For any of these things, it is not difficult to imagine some of the data that describes any given instance of the thing. For example, the data that describes a particular STUDENT might include name, address, phone number, date of birth, gender, race, major, and grade point average, to but a few data items. Examples of entities include: Persons: AGENCY, CONTRACTOR, CUSTOMER, DEPARTMENT, DIVISION, EMPLOYEE, INSTRUCTOR OFFICE, STUDENT, SUPPLIER. Notice that a person entity can represent either individuals, groups, or organizations. Places: SALES REGION, BUILDING, ROOM, BRANCH OFFICE, CAMPUS. Objects: BOOK ,MACHINE, PART, PRODUCT, RAW MATERIAL, SOFTWARE LICENSE, SOFTWARE PACKAGE, TOOL, VEHICLE MODEL, VEHICLE. An object entity can represent actual objects (such as SOFTWARE LICENSE), or specifications for a type of object (such as SOFTWARE PACKAGE). Events: APPLICATION, AWARD, CANCELLATION, CLASS, FLIGHT, INVOICE, ORDER, REGISTRATION, RENEWAL, REQUISITION, RESERVATION, SALE, TRIP. Concepts: ACCOUNT, BLOCK OF TIME, BOND, COURSE, FUND, QUALIFICATION, STOCK. The entity STUDENT may have multiple instances: Mary, Joe, Mark, Susan, Deborah, and so forth. In data modeling, we do not concern ourselves with individual students because we recognize that each student is described by similar pieces of data.

Attributes The pieces of data that we want to store about each instance of a given entity are called attributes. An attribute is a descriptive property or characteristic of an entity. Synonyms include element, property, and field. Some attributes can be logically grouped into super-attributes called compound attributes. A compound attribute is one that actually consists of more primitive attributes. Synonyms in different data modeling languages are numerous: concatenated attribute, composite attribute, and data structure. 176 As noted at the beginning of this section, each instance of the entity STUDENT might be described by the following attributes: NAME, ADDRESS, PHONE NUMBER, DATE OF BIRTH, GENDER, RACE, MAJOR, GRADE POINT AVERAGE, and others. A student’s NAME is actually a composite attribute that consists of LAST NAME, FIRST NAME, and MIDDLE INITIAL.

Attributes Domains: The values for each attribute are defined in terms of three properties: data type, domain, and default. The data type for an attribute defines what class of data can be stored in that attribute. For purposes of systems analysis and business requirements definition, it is useful to declare logical (non-technical) data types for our business attributes. An attribute’s data type determines its domain. The domain of an attribute defines what values an attribute can legitimately take on. Every attribute should have a logical default value. The default value for an attribute is that value which will be recorded if not specified by the user. An attribute is a piece of data. When analyzing a system, it makes sense that we should define those values for an attribute that are legitimate, or which make sense.

177 Table 5.1 Representative Logical Data Types for Attributes
No additional notes provided.

Table 5.2 Representative Logical Domains for Logical Data Types No additional notes provided.

177-178 Table 5.3 Permissible Default Values for Attributes

Attributes Identification: An entity typically has many instances; perhaps thousands or millions and there exists a need to uniquely identify each instance based on the data value of one or more attributes. Every entity must have an identifier or key. An key is an attribute, or a group of attributes, which assumes a unique value for each entity instance. It is sometimes called an identifier. Sometimes more than one attribute is required to uniquely identify an instance of an entity. A group of attributes that uniquely identifies an instance of an entity is called a concatenated key. Synonyms include composite key and compound key. For example, each instance of the entity STUDENT might be uniquely identified by the key STUDENT NUMBER. No two students can have the same STUDENT NUMBER. For example, each TAPE entity instance in a video store might be uniquely identified by the concatenation of TITLE NUMBER plus COPY NUMBER. TITLE NUMBER by itself would be inadequate because we may own many copies of a single title. COPY NUMBER by itself would also be inadequate since we would have a copy #1 for every single title we own. We need both pieces of data to identify a specific tape (e.g., copy #7 of Jurassic Park). In this book, we will give a name to the group as well as the individual attributes. For example, the concatenated key for TAPE world be recorded as follows: TAPE ID (PRIMARY KEY) TITLE NUMBER COPY NUMBER

Attributes Identification: Frequently, an entity may have more than one key. Each of these attributes is called a candidate key. A candidate key is a ‘candidate to become the primary identifier’ of instances of an entity. It is sometimes called a candidate identifier. (Note: A candidate key may be a single attribute or a concatenated key.) A primary key is that candidate key which will most commonly be used to uniquely identify a single entity instance. Any candidate key that is not selected to become the primary key is called an alternate key. 179 For example, the entity EMPLOYEE may be uniquely identified by their SOCIAL SECURITY NUMBER, or their company assigned EMPLOYEE NUMBER, or by their ADDRESS. The default for a primary key is always NOT NULL. Why? Because if the key has no value, then it cannot identify an instance of an entity.

Attributes Identification: Sometimes, it is also necessary to identify a subset of entity instances as opposed to a single instance. For example, we may require a simple way to identify all male students, and all female students. A subsetting criteria is a attribute (or concatenated attribute) whose finite values divide all entity instances into useful subsets. Some methods call this an inversion entry. 179 For example, in our STUDENT entity, the attribute GENDER divides the instances of STUDENT into two subsets: male students and female students. In general, subsetting criteria are only useful when an attribute has a finite (meaning limited) number of legitimate values. For example, GRADE POINT AVERAGE would not be a good subsetting criteria because there are 999 possible values of that attribute.

Relationships Conceptually, entities and attributes do not exist in isolation. Entities interact with, and impact one another via relationships to support the business mission. A relationship is a natural business association that exists between one or more entities. The relationship may represent an event that links the entities, or merely a logical affinity that exists between the entities. A connecting line between two entities on an ERD represents a relationship. A verb phrase describes the relationship. All relationships are implicitly bidirectional, meaning that they can interpreted in both directions. 179 Consider, for example the entities STUDENT and CURRICULUM. We can make the following business assertions that link students and courses: a current STUDENT IS ENROLLED IN one or more CURRICULA a CURRICULUM IS BEING STUDIED BY zero, one, or more STUDENTs The underlined verb phrases define business relationships that exist between the two entities. Because a STUDENT can be enrolled in many CURRICULA, and a CURRICULUM can enroll many STUDENTs, this is often called a many-to-many relationship.

179-180 Figure 5.2 A Relationship (Many-to-Many)
The figure above also shows the complexity or degree of each relationship. For example, in the above business assertions, we must also answer the following questions: Must there exist an instance of STUDENT for each instance of CURRICULUM? No! Must there exist an instance of CURRICULUM for each instance of STUDENT? Yes! How many instances of CURRICULUM can exist for each instance of STUDENT? Many! How many instances of STUDENT can exist for each instance of CURRICULUM? Many!

Relationships Cardinality: Each relationship on an ERD also depicts the complexity or degree of each relationship and this is called cardinality. Cardinality defines the minimum and maximum number of occurrences of one entity for a single occurrence of the related entity. Because all relationships are bi-directional, cardinality must be defined in both directions for every relationship. 180 Consider, for example the entities STUDENT and CURRICULUM. We can make the following business assertions that link students and courses: a current STUDENT IS ENROLLED IN one or more CURRICULA a CURRICULUM IS BEING STUDIED BY zero, one, or more STUDENTs The underlined verb phrases define business relationships that exist between the two entities. Because a STUDENT can be enrolled in many CURRICULA, and a CURRICULUM can enroll many STUDENTs, this is often called a many-to-many relationship.

180-181 Figure 5.3 Cardinality Notations
Conceptually, cardinality tells us the following rules about the data we want to store: When we insert a STUDENT instance in the database, we must link (associate) that STUDENT to at least one instance of CURRICULUM. In business terms, “a student cannot be admitted without declaring a major.” (Note: Most schools would include an instance of CURRICULUM called “undecided” or “undeclared”.) A STUDENT can study more than one CURRICULUM, and we must be able to store data that indicates all CURRICULA for a given STUDENT. We must insert a CURRICULUM before we can link (associate) STUDENTs to that CURRICULUM. That is why a CURRICULUM can have zero students – no students have yet to be admitted to that CURRICULUM. Once a CURRICULUM has been inserted into the database, we can link (associate) many STUDENTs with that CURRICULUM.

Relationships Degree: The degree of a relationship is the number of entities that participate in the relationship. A binary relationship has a degree = 2, because two different entities participated in the relationship. Relationships may also exist between different instances of the same entity. This is called a recursive relationship (sometimes called a unary relationship; degree = 1). 180 For example, in your school a course may be a prerequisite for other courses. Similarly, a course may have several other courses as its prerequisite

181 Figure 5.4 A Recursive Relationship No additional notes provided.

Relationships Degree: (continued) Relationships can also exist between more than two different entities. These are sometimes called N-ary relationships. A relationship existing among three entities is called a 3-ary or ternary relationship. An N-ary relationship maybe associated with an associative entity. An associative entity is an entity that inherits its primary key from more than one other entity (parents). Each part of that concatenated key points to one and only one instance of each of the connecting entities. 180 No additional notes provided.

181-182 Figure 5.5 A Ternary Relationship
In the figure above, the associative entity SCHEDULED CLASS (notice the unique shape) matches a COURSE, a ROOM, and an INSTRUCTOR. For each instance of SCHEDULED CLASS the key indicate which COURSE ID, which ROOM ID and which INSTRUCTOR ID is combined to form that class. Also shown above, an associative entity can be described by its own non-key attributes. In addition to the primary key, a SCHEDULED CLASS is described by the attributes DIVISION NUMBER, DAYS OF WEEK, START TIME, AND END TIME. If you think about it, none of these attributes describes a COURSE, ROOM, or INSTRUCTOR – they describe a single instance of the relationship between an instance of each of those three entities.

Relationships Foreign Keys: A relationship implies that instances of one entity are related to instances of another entity. To be able to identify those instances for any given entity, the primary key of one entity must be migrated into the other entity as a foreign key. A foreign key is a primary key of one entity that is contributed to (duplicated in) another entity for the purpose of identifying instances of a relationship. A foreign key (always in a child entity) always matches the primary key (in a parent entity). For example, consider a relationship between the entities MAJOR and DEPARTMENT. A CURRICULUM is taught by exactly one DEPARTMENT. For a CURRICULUM, which DEPARTMENT teaches it? A DEPARTMENT teaches one or more CURRICULA. For a CURRICULUM, which STUDENTs are enrolled in that CURRICULUM?

183 Figure 5.6 How to Show Foreign Keys
In the figure above, we demonstrate the concept of foreign keys with our simple data model. In this case, DEPARTMENT is called the parent entity and CURRICULUM is the child entity. The primary key is always contributed by the parent to the child as a foreign key. Thus, an instance of CURRICULUM now has an foreign key DEPARTMENT NAME whose value points to the correct instance of DEPARTMENT that offers that curriculum. (Foreign keys are never contributed from child-to-parent.)

Relationships Foreign Keys: (continued) When you have a relationship that you cannot differentiate between parent and child it is called a non-specific relationship. A non-specific relationship (or many-to-many relationship) is one in which many instances of one entity are associated with many instances of another entity. Such relationships are suitable only for preliminary data models, and should be resolved as quickly as possible. All non-specific relationships can be resolved into a pair of one-to-many relationships by inserting an associative entity between the two original entities. 183 No additional notes provided.

Figure 5.7 Resolving Nonspecific Relationships with an Associative Entity In figure (a) above, we see that a CURRICULUM is being studied by zero, one, or more STUDENTs. At the same time, we see that a STUDENT is studying one or more CURRICULA. The maximum cardinality on both sides is ‘many’. So, which is the parent and which is the child? You can’t tell! This is called a non-specific relationship. All non-specific relationships can be resolved into a pair of one-to-many relationships. As illustrated in figure (b) above, each entity becomes a parent. A new, associative entity is introduced as the child of each parent. In the figure above, each instance of MAJOR represents a one STUDENT’s enrollment in one CURRICULUM. If a student is pursuing two majors, that student will have two instances of the entity MAJOR.

Relationships Generalization: Generalization is an approach that seeks to discover and exploit the commonalties between entities. Generalization is a technique wherein the attributes that are common to several types of an entity are grouped into their own entity, called a supertype. An entity supertype is an entity whose instances store attributes that are common to one or more entity subtypes. The entity supertype will have one or more one-to-one relationships to entity subtypes. These relationships are sometimes called IS A relationships (or WAS A, or COULD BE A) because each instance of the supertype ‘is also an’ instance of one or more subtypes. Most people associate the concept of generalization with modern object- oriented techniques. In reality, the concepts have been applied by data modelers for many years. Consider, for example, an extension of the hypothetical academic scenario we’ve been using throughout this chapter. Our school enrolls STUDENTs and employs EMPLOYEEs. There are several attributes that are common to both entities; for example, NAME, GENDER, RACE, MARITAL STATUS, and possibly even a key based on SOCIAL SECURITY NUMBER. What if we consolidated these common attributes into an entity supertype called PERSON.

Relationships Generalization: (continued) An entity subtype is an entity whose instances inherit some common attributes from an entity supertype, and then add other attributes that are unique to an instances of the subtype. An entity can be both a supertype and subtype. Through inheritance, the concept of generalization in data models permits the the reduction of the number of attributes through the careful sharing of common attributes. The subtypes not only inherit the attributes, but also the data types, domains, and defaults of those attributes. In addition to inheriting attributes, subtypes also inherit relationships to other entities. No additional notes provided.

185-186 Figure 5.8 A Generalization Hierarchy
In the above example, “a PERSON is an employee, or a student, or both.” The top half of the figure illustrates this generalization as a hierarchy. Notice that the subtypes STUDENT and EMPLOYEE have inherited attributes from PERSON, as well as adding their own. (Unfortunately, most CASE tools do not actually migrate the inherited attributes.) Notice that a STUDENT (which was a subtype of PERSON) has its own subtypes. In the diagram, we see that a STUDENT is either a PROSPECT, or a CURRENT STUDENT, or a FORMER STUDENT (having left for any reason other than graduation), and a STUDENT could be an ALUMNUS. Each of these additional subtypes inherit all of the attributes from STUDENT, as well as those from PERSON. Notice also that all EMPLOYEEs and STUDENTs inherit the relationship between PERSON and PERSONAL ADDRESS. But only EMPLOYEEs inherit the relationship with EMPLOYMENT CONTRACTs. And only an ALUMNUS can be related to a DEGREE.

The Process of Logical Data Modeling
Strategic Data Modeling Many organizations select application development projects based on strategic information system plans. Strategic planning is a separate project. This project produces an information systems strategy plan that defines an overall vision and architecture for information systems. Almost always, the architecture includes an enterprise data model. 186 Data modeling may be performed during various types of projects, and in multiple phases of projects. Data models are progressive—there is no such thing as the ``final’’ data model for a business or application. Instead, a data model should be considered a living document that will change in response to a changing business. Data models should ideally be stored in a repository so that they can be retrieved, expanded, and edited over time.

Strategic Data Modeling An enterprise data model typically identifies only the most fundamental of entities. The entities are typically defined (as in a dictionary) but they are not described in terms of keys or attributes. The enterprise data model may or may not include relationships (depending on the planning methodology’s standards and the level of detail desired by executive management). If relationships are included, many of them will be non-specific. The enterprise data model is usually stored in a corporate repository. 186 How does an enterprise data model impact subsequent applications development? Part of the information strategy plan identifies application development projects and prioritizes them according to whatever criteria that management deems appropriate. As those projects are started, the appropriate subsets of the information system architecture – including a subset of the enterprise data model – are provided to the application development team as a point of departure. The enterprise data model is usually stored in a corporate repository. When the application development project is started, the subset of the enterprise data model (as well as the other models) is exported from the corporate repository into a project repository. Once the project team completes systems analysis and design, the expanded and refined data models are imported back into the corporate repository

Data Modeling During Systems Analysis The data model for a single system or application is usually called an application data model. Logical data models have a DATA focus and a SYSTEM USER perspective. Logical data models are typically constructed as deliverables of the study and definition phases of a project. Logical data models are not concerned with implementation details or technology, they may be constructed (through reverse engineering) from existing databases. Data models are rarely constructed during the survey phase of systems analysis. Data models are rarely constructed during the survey phase of systems analysis. The short duration of that phase makes them impractical. On the other hand, if an enterprise data model exists, the subset of that model that is applicable to the project might be retrieved and reviewed as part of the survey phase requirement to establish context. Alternatively, the project team could identify a simple list of entities; the things about which they think the system will have to capture and store data.

187 Figure 5.9 Data Modeling in the FAST Methodology

Data Modeling During Systems Analysis Data modeling is rarely associated with the study phase of systems analysis. Most analysts prefer to draw process models to document the current system. Many analysts report that data models are far superior for the following reasons: Data models help analysts to quickly identify business vocabulary more completely than process models. Data models are almost always built more quickly than process models. A complete data model can be fit on a single sheet of paper. Process models often require dozens of sheets of paper. Process modelers too easily get hung up on unnecessary detail. 188 No additional notes provided.

Data Modeling During Systems Analysis Many analysts report that data models are far superior for the following reasons: (continued) Data models for existing and proposed systems are far more similar than process models for existing and proposed systems. Consequently, there is less work to throw away as you move into later phases. A study phase model should include only entities relationships, but no attributes – a context data model. The intent is to refine the understanding of scope; not to get into details about the entities and business rules. 188 No additional notes provided.

Data Modeling During Systems Analysis The definition phase data model will be constructed in at least two stages: A key-based data model will be drawn. This model will eliminate non-specific relationships, add associative entities, include primary, alternate keys, and foreign keys, plus precise cardinalities and any generalization hierarchies. A fully attributed data model will be constructed. The fully attributed model includes all remaining descriptive attributes and subsetting criteria. Each attribute is defined in the repository with data types, domains, and defaults. The completed data model represents all of the business requirements for a system’s database. 188 No additional notes provided.

Looking Ahead to Systems Configuration and Design The logical data model from systems analysis describes business data requirements, not technical solutions. The purpose of the configuration phase is to determine the best way to implement those requirements with database technology. During system design, the logical data model will be transformed into a physical data model (called a database schema) for the chosen database management system. This model will reflect the technical capabilities and limitations of that database technology, as well as the performance tuning requirements suggested by the database administrator. The physical data model will also be analyzed for adaptability and flexibility through a process called normalization. 188 No additional notes provided.

Fact-Finding and Information Gathering for Data Modeling Data models cannot be constructed without appropriate facts and information as supplied by the user community. These facts can be collected by a number of techniques such as sampling of existing forms and files; research of similar systems; surveys of users and management; and interviews of users and management. The fastest method of collecting facts and information, and simultaneously constructing and verifying the data models is Joint Application Development (JAD). 189 No additional notes provided.

177 Table 5.4 JAD and Interview Questions for Data Modeling
The table above summarizes some questions that may be useful for fact finding and information gathering as it pertains to data modeling.

Computer-Aided Systems Engineering (CASE) for Data Modeling Data models are stored in the repository. In a sense, the data model is metadata – that is, data about the business’ data. Computer-aided systems engineering (CASE) technology, provides the repository for storing the data model and its detailed descriptions. No additional notes provided.

Computer-Aided Systems Engineering (CASE) for Data Modeling Using a CASE product, you can easily create professional, readable data models without the use of paper, pencil, erasers, and templates. The models can be easily modified to reflect corrections and changes suggested by end-users. Most CASE products provide powerful analytical tools that can check your models for mechanical errors, completeness, and consistency. 190 No additional notes provided.

Computer-Aided Systems Engineering (CASE) for Data Modeling Not all data model conventions are supported by all CASE products. It is very likely that any given CASE product may force the company to adapt their methodology’s data modeling symbols or approach so that it is workable within the limitations of their CASE tool. 190 No additional notes provided.

How to Construct Data Models
1st Step - Entity Discovery The first task in data modeling is to discover those fundamental entities in the system that are or might be described by data. There are several techniques that may be used to identify entities. During interviews or JAD sessions with system owners and users, pay attention to key words in their discussion. During interviews or JAD sessions, specifically ask the system owners and users to identify things about which they would like to capture, store, and produce information. Study existing forms and files. Some CASE tools can reverse engineer existing files and databases into physical data models. There are several techniques that may be used to identify entities. For example, during an interview with an individual discussing SoundStage’s business environment and activities, a user may state that “We have to keep track of all our members and the many clubs in which they are enrolled.’’ Notice that the key words in this statement are MEMBERs and CLUBs. Both are entities! Another technique for identifying entities is to study existing forms and files. Some forms identify event entities. Examples include ORDERs, REQUISITIONs, PAYMENTs, DEPOSITs, and so forth. But most of these same forms also contain data that describe other entities. Consider a registration form used in your school’s course registration system. A REGISTRATION is itself an event entity. But the average registration form also contains data that describe other entities, such as STUDENT (a person), COURSEs (which are concepts), INSTRUCTORs (other persons), ADVISOR (yet another person), DIVISIONs (another concept), and so forth. These same entities could also be discovered by studying the computerized registration system’s computer files databases, or outputs. Some CASE tools can reverse engineer existing files and databases into physical data models. The analyst must usually clean up the resulting model by physical names, codes, and comments with their logical, business-friendly equivalents.

1st Step - Entity Discovery A true entity has multiple instances—dozens, hundreds, thousands, or more! Entities should be named with nouns that describe the person, event, place, or tangible thing about which we want to store data. Try not to abbreviate or use acronyms. Names should be singular so as to distinguish the logical concept of the entity from the actual instances of the entity. Define each entity in business terms. Don’t define the entity in technical terms, and don’t define it as ‘data about …’. Your entity names and definitions should establish an initial glossary of business terminology that will serve both you and future analysts and users for years to come. 192 Names may include appropriate adjectives or clauses to better describe the entity—for instance, an externally generated CUSTOMER ORDER must be distinguished from an internally generated PURCHASE ORDER.

192 Table 5.5 Fundamental Entities for the Soundstage Project

2nd Step - The Context Data Model The second task in data modeling is to construct the context data model. The context data model includes the fundamental or independent entities that were previously discovered. An independent entity is one which exists regardless of the existence of any other entity. Its primary key contain no attributes that would make it dependent on the existence of another entity. Independent entities are almost always the first entities discovered in your conversations with the users. Relationships should be named with verb phrases that, when combined with the entity names, form simple business sentences or assertions. Always name the relationship from parent-to-child. 193 Some CASE tools, such as System Architect let you name the relationships in both directions.

193-194 Figure 5.11 The SoundStage Context Data Model
The ERD communicates the following: A CLUB establishes one or more AGREEMENTs. Members will learn about these agreements through advertisements and other marketing programs. An AGREEMENT is established by exactly one CLUB. The double hash marks mean one-and-only-one. An AGREEMENT binds zero, one, or more MEMBERs. Members join clubs via such an agreement. Why zero? Because a club may be new with no membership as yet. A MEMBER is bound by one or more AGREEMENTs. Note: This is a non-specific (or many-to-many) relationship. A MEMBER belongs to one or more CLUBs. A CLUB enrolls zero, one, or more MEMBERs. Again, the club may be new. Each month or quarter, a CLUB sponsors zero, one, or more PROMOTIONs. Why zero? Again, a club may be just starting, and not yet offering promotions. A PROMOTION is sponsored by exactly one CLUB. Each PROMOTION features exactly one PRODUCT. A PRODUCT is featured in zero, one, or more PROMOTIONs. For example, a CD that appeals to both country/western and light rock audiences might be featured in the promotion for both. Since products greatly outnumber promotions, most products are never featured in a promotion. A PROMOTION generates many MEMBER ORDERs. These are dated orders to which a member must reply by the specified date – else the order is filled. The promotion always generates more than one order; in fact, it generates one order per club member. A MEMBER ORDER is generated for zero or one PROMOTION. Why zero? In the desired system, a member can initiate their own order. It is permissible for more than one relationship to exist between the same two entities if the separate relationships communicate different business events or associations. Thus, A MEMBER responds to zero, one, or more MEMBER ORDERs. This relationship supports the promotion-generated orders. A MEMBER places zero, one, or more MEMBER ORDERs. This relationship supports member-initiated orders. In both cases, a MEMBER ORDER is placed by (is responded to by) exactly one MEMBER. A MEMBER ORDER sells one or more PRODUCTs. A PRODUCT is sold on zero, one, or more MEMBER ORDERs.

3rd Step - The Key-Based Data Model The third task is to identify the keys of each entity. The following guidelines are suggested for keys: The value of a key should not change over the lifetime of each entity instance. The value of a key cannot be null. Controls must be installed to ensure that the value of a key is valid. 194 NAME would be a poor key since a person’s last name could change by marriage or divorce.

3rd Step - The Key-Based Data Model The following guidelines are suggested for keys: (continued) Some experts suggest that you avoid intelligent keys because the key may change over the lifetime of the entity instance. An intelligent key is a business code whose structure communicates data about an entity instance (such as its classification, size, or other properties). A code is a group of characters and/or digits that identifies and describes something in the business system. Other experts suggest that you avoid intelligent keys because business codes can return value to the organization because they can be quickly processed by humans without the assistance of a computer. 194 No additional notes provided.

3rd Step - The Key-Based Data Model The following guidelines are suggested for keys: (continued) Consider inventing a surrogate key instead to substitute for large concatenated keys of independent entities. This suggestion is not practical for associative entities since because each part of the concatenated key is a foreign key that must precisely match its parent entity’s primary key. If you cannot define keys for an entity, it may be that the entity doesn’t really exist—that is, multiple occurrences of the so-called entity do not exist. No additional notes provided.

3rd Step - The Key-Based Data Model Business Codes There are several types of codes and they can be combined to form effective means for entity instance identification. Serial codes assign sequentially generated numbers to entity instances. Many database management systems can generate and constrain serial codes to a business’ requirements. Block codes are similar to serial codes except that serial numbers are divided into groups that have some business meaning. Alphabetic codes use finite combinations of letters (and possibly numbers) to describe entity instances. Alphabetic codes must usually be combined with serial or block codes in order to uniquely identify instances of most entities. 195 Block codes are similar to serial codes except that serial numbers are divided into groups that have some business meaning. For instance, a satellite television provider might assign as PAY PER VIEW channels, as CABLE CHANNELS, to SPORT channels, to ADULT PROGRAMMING channels, to MUSIC-ONLY channels, to INTERACTIVE GAMING channels, to INTERNET channels, to PREMIUM CABLE channels, and to PREMIUM MOVIE AND EVENT channels. Alphabetic codes use finite combinations of letters (and possibly numbers) to describe entity instances. For example, each STATE has a unique two character alphabetic code.

3rd Step - The Key-Based Data Model Business Codes There are several types of codes and they can be combined to form effective means for entity instance identification. (continued) In significant position codes, each digit or group of digits describes a measurable or identifiable characteristic of the entity instance. Significant digit codes are frequently used to code inventory items. Hierarchical codes provide a top-down interpretation for an entity instance. Every item coded is factored into groups, subgroups, and so forth. 195 Significant digit codes are frequently used to code inventory items. The codes you see on tires and light bulbs are examples of significant position codes. They tell us about characteristics such as tire size and wattage, respectively. Hierarchical codes provide a top-down interpretation for an entity instance. Every item coded is factored into groups, subgroups, and so forth. For instance, we could code employee positions as follows: First digit identifies classification (e.g., clerical, faculty, etc.) Second and third digit indicates level within classification Fourth and fifth digits indicate calendar of employment

3rd Step - The Key-Based Data Model Business Codes The following guidelines are suggested when creating a business coding scheme: Codes should be expandable to accommodate growth. The full code must result in a unique value for each entity instance. Codes should be large enough to describe the distinguishing characteristics, but small enough to be interpreted by people without a computer. Codes should be convenient. A new instance should be easy to create. 195 No additional notes provided.

195-197 Figure 5.12 The SoundStage Key-Based Data Model
The figure above is the key-based data model for the SoundStage project. We have eliminated all non-specific relationships by resolving them into associative entities and one-to-many relationships (as described earlier in the chapter). Since all of our relationships are now one-to-many, we have adopted the common practice of naming the relationship from parent-to-child. The inverse relationship, while not shown, is implicit. We call your attention to the following noteworthy items: Many entities have a simple, single-attribute primary key (PK1). In the PRODUCT entity, either one of two attributes could uniquely identify an instance of the entity. We designate them as separate primary keys (PK1 and PK2). Notice how the primary keys for AGREEMENT and PROMOTION were constructed. Each has a concatenated key. Part of that key is inherited from the parent entity CLUB. You can tell that because CLUB NAME is also a foreign key (FK). When one entity contributes its key to another entity across a relationship, the relationship is said to be identifying – because it helps to identify the child entity. Notice that all of the attributes that comprise the concatenated key have the same primary key number, PK1. We resolved the non-specific relationship between ORDER and PRODUCT by introducing the associative entity ORDER ON A PRODUCT. Each associative entity instance represents one product on one order. The parent entities contributed their own primary keys to comprise the associative entity’s concatenated key (PK1). Also notice that each attribute in that concatenated key is a foreign key that points back to the correct parent instance. CLUB MEMBERSHIP is a ternary relationship that simultaneously associates one MEMBER, CLUB, and AGREEMENT. Thus, the concatenated key consists of four attributes contributed by the three participating parent entities. All relationships contribute foreign keys from parent-to-child. You just learned that if the contributed foreign key helps to uniquely identify instances of the child entity, the relationship is said to be identifying. On the other hand, if the foreign key plays no role in identifying instances of the child entity, then it is recorded as non-key data in our model. It’s only purpose is to point to a child entity’s specific parent. For example, MEMBER NUMBER in the MEMBER ORDER entity serves only to point to the correct MEMBER entity instance for an order. In this case, the relationship is called non-identifying.

4th Step - Generalized Hierarchies At this time, it would be useful to identify any generalization hierarchies in a business problem. 197 No additional notes provided.

Figure 5.13 The SoundStage Key-Based Data Model with a Generalization Hierarchy The SoundStage minicase at the beginning of this chapter identified at least one supertype/subtype structure. Subsequent discussions uncovered the generalization hierarchy shown in the figure above. We had to layout the model somewhat differently because of the hierarchy; however, the relationships and keys that were previously defined have been retained. We call your attention to the following: The SoundStage CASE tool automatically draws a dashed box around generalization hierarchy. The subtypes inherited the keys of the supertypes. We disconnected PROMOTION from PRODUCT as it was shown earlier, and reconnected it to the subtype TITLE. This was done to accurately properly assert that MERCHANDISE is never featured on a PROMOTION.

5th Step - The Fully Attributed Data Model The fifth task is to identify the remaining data attributes. The following guidelines are offered for attribution. Many organizations have naming standards and approved abbreviations. The data or repository administrator usually maintains such standards. Many attributes share common base names such as NAME, ADDRESS, DATE. Unless the attributes can be generalized into a supertype, it is best to give each variation a unique name such as: CUSTOMER NAME vs SUPPLIER NAME Names must be distinguishable across projects. Logical attribute names should not be abbreviated. 197 Choose attribute names carefully. Many attributes share common base names such as NAME, ADDRESS, DATE. Unless the attributes can be generalized into a supertype, it is best to give each variation a unique name such as: CUSTOMER NAME CUSTOMER ADDRESS ORDER DATE SUPPLIER NAME SUPPLIER ADDRESS INVOICE DATE EMPLOYEE NAME EMPLOYEE ADDRESS FLIGHT DATE Some organizations maintain reusable, global templates for these common base attributes. This promotes consistent data types, domains, and defaults across all applications. Physical attribute names on existing forms and reports are frequently abbreviated to save space. Logical attribute names should be clearer – for example, translate the order form’s attribute COD into its logical equivalent, AMOUNT TO COLLECT ON DELIVERY; translate QTY into QUANTITY ORDERED, and so forth.

5th Step - The Fully Attributed Data Model The following guidelines are offered for attribution. (continued) For attributes that have only YES or NO values, name as questions. For example, CANDIDATE FOR A DEGREE? Each attribute should be mapped to only one entity. Foreign keys are the exception – they identify associated instances of related entities. An attribute’s domain should not be based on logic. 197 An attribute’s domain should not be based on logic. For example, in the SoundStage case we learned the values of MEDIA were dependent on the type of product. If the product type is a video, the media could be VHS tape, 8mm tape, laserdisc, or DVD. On the other hand, if the product type is audio, the media could be cassette tape, CD, or MD. The best solution would be to assign separate attributes to each domain: AUDIO MEDIA and VIDEO MEDIA.

199-201 Figure 5.14 The SoundStage Fully Attributed Data Model

6th Step - The Fully Described Model The last task is to fully describe the data model. This task is the most time consuming. This task can be started in parallel with the key-based model or fully attributed model, but it is usually the last data modeling task completed. At this time the descriptions for the attributes are still incomplete – they require domains. Most CASE tools provide extensive facilities for describing the data types, domains, and defaults for all attributes to the repository. 199 No additional notes provided.

6th Step - The Fully Described Model Additional descriptive properties may be recorded for attributes such as: Who should be able to create, delete, update, and access each attribute? How long should each attribute (or entity) be kept before the data is deleted or archived? 199 No additional notes provided.

Data modeling should remain a value-added skill for many years.
The Next Generation Data modeling should remain a value-added skill for many years. The demand for data modeling as a skill is dependent on two factors: (1) the need for databases, and (2) the use of relational database management system technology to implement those databases. There is some belief that relational database technology will eventually be replaced by object technology. If that were to happen, data modeling would be replaced by object modeling techniques. Even as object database technology becomes available, we expect the relational database industry to add object features and technologies to their product lines. 199 No additional notes provided.

CASE technology will continue to improve.
The Next Generation CASE technology will continue to improve. Today’s better CASE tools provide a two-way synchronization between the logical data models and their database designs. This synchronization will likely extend as CASE vendors enable their tools to directly communicate and interoperate with database management systems and working databases. 202 No additional notes provided.

An Introduction to Systems Modeling System Concepts for Data Modeling
Summary Introduction An Introduction to Systems Modeling System Concepts for Data Modeling The Process of Logical Data Modeling How to Construct Data Models The Next Generation

The presentation will address the following questions:

Similar presentations

Presentation on theme: "The presentation will address the following questions:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The presentation will address the following questions:

Similar presentations

Presentation on theme: "The presentation will address the following questions:"— Presentation transcript:

Similar presentations

About project

Feedback