Download presentation
Presentation is loading. Please wait.
1
Conventional Files Versus the Database
Introduction All information systems create, read, update and delete data. This data is stored in files and databases. Files are collections of similar records. Databases are collections of interrelated files. The key word is interrelated. The records in each file must allow for relationships (think of them as ‘pointers’) to the records in other files. In the file environment, data storage is built around the applications that will use the files. In the database environment, applications will be built around the integrated database. 395 A database is not merely a collection of files. The records in each file must allow for relationships (think of them as ‘pointers’) to the records in other files. For example, a SALES database might contain ORDER records that are somehow “linked’’ to their corresponding CUSTOMER and PRODUCT records. The database is not necessarily dependent on the applications that will use it. In other words, given a database, new applications can be built to share that database. Each environment has its advantages and disadvantages.
2
395-396 Figure 11.1 Conventional Files versus the Database
No additional notes provided.
3
Conventional Files Versus the Database
The Pros and Cons of Database Pros: The principal advantage of a database is the ability to share the same data across multiple applications and systems. Database technology offers the advantage of storing data in flexible formats. Databases allow the use of the data in ways not originally specified by the end-users - data independence. The database scope can even be extended without impacting existing programs that use it. New fields and record types can be added to the database without affecting current programs. 397 A common misconception about the database approach is that you can build a single, super-database that contains all data items of interest to an organization. This notion, however desirable, is not currently practical. The reality of such a solution is that it would take forever to build such a complex database. Realistically, most organizations build several databases, each one sharing data with several information systems. Thus, there will be some redundancy between databases. However, this redundancy is both greatly reduced and, ultimately, controlled. Database technology offers the advantage of storing data in flexible formats. This is made possible because databases are defined separately from the information systems and application programs that will use them. Theoretically, this allows us to use the data in ways not originally specified by the end-users. Care must be taken to truly achieve this data independence. If the database is well designed, different combinations of the same data can be easily accessed to fulfill future report and query needs. The database scope can even be extended without impacting existing programs that use it. In other words, new fields and record types can be added to the database without affecting current programs.
4
Conventional Files Versus the Database
The Pros and Cons of Database Cons: Database technology is more complex than file technology. Special software, called a database management system (DBMS), is required. A DBMS is still somewhat slower than file technology. Database technology requires a significant investment. The cost of developing databases is higher because analysts and programmers must learn how to use the DBMS. In order to achieve the benefits of database technology, analysts and database specialists must adhere to rigorous design principles. Another potential problem with the database approach is the increased vulnerability inherent in the use of shared data. 397 Another potential problem with the database approach is the increased vulnerability inherent in the use of shared data. You are literally placing all your eggs in one basket. Therefore, backup and recovery, and security and privacy become important issues in the world of databases. Despite the problems discussed, database usage is growing by leaps and bounds. The technology will continue to improve, and performance limitations will all but disappear. Design methods and tools will also improve. For these reasons, this chapter will focus on database design as an important skill for tomorrow’s system analysts.
5
Conventional Files Versus the Database
Database Design in Perspective To fully exploit the advantages of database technology, a database must be carefully designed. The end product is called a database schema, a technical blueprint of the database. Database design translates the data models that were developed for the system users during the definition phase, into data structures supported by the chosen database technology. Subsequent to database design, system builders will construct those data structures using the language and tools of the chosen database technology. 397 No additional notes provided.
6
Figure 11.2 Database Design in the Information Systems Framework No additional notes provided.
7
Database Concepts Databases
Databases provide for the technical implementation of entities and relationships. The history of information systems has led to one inescapable conclusion: Data is a resource that must be controlled and managed! Out of necessity, database technology was created so an organization could maintain and use its data as an integrated whole instead of as separate data files. 401 As described earlier, stand-alone, application-specific files were once the lifeblood of most information systems; however, they are being slowly but surely replaced with databases. Recall that a database may loosely be thought of as a set of interrelated files. By interrelated, we mean that records in one file may be associated with the records in a different file. For example, a STUDENT record may be linked to all of that student’s COURSE records. In turn, a COURSE record may be linked to the STUDENT records that indicate completion of that course. This two-way linking and flexibility allows us to eliminate most of the need to redundantly store the same fields in the different record types. Thus, in a very real sense, multiple files are consolidated into a single file – the database. So many applications are now being built around database technology that database design has become an important skill for the analyst. Indeed, database technology, once considered important only to the largest corporations with the largest computers, is now common for applications developed on microcomputers and departmental networks. Few, if any information systems staffs have avoided the frustration of uncontrolled growth and duplication of data stored in their systems. As systems were developed, implemented, and maintained, the common data needed by the different systems was duplicated in multiple, conventional files. This duplication carried with it a number of costs: extra storage space required, duplicated input to maintain redundantly stored data and files, and data integrity problems (e.g., the ADDRESS for a specific customer not matching in the various files that contain that customer’s ADDRESS).
8
401-402 Figure 11.3 A Typical Modern Data Architecture
The figure above illustrates the data architecture into which many companies have evolved. As shown in the figure, most companies still have numerous conventional file-based information system applications, most of which were developed prior to the emergence of high performance database technology. In many cases, the processing efficiency of these files or the projected cost to redesign these files has slowed conversion of the systems to database.
9
Database Concepts Databases Database Architecture:
Database architecture refers to the database technology including the database engine, database management utilities, database CASE tools for analysis and design, and database application development tools. The control center of a database architecture is its database management system. A database management system (DBMS) is specialized computer software available from computer vendors that is used to create, access, control, and manage the database. The core of the DBMS is often called its database engine. The engine responds to specific commands to create database structures, and then to create, read, update, and delete records in the database. 403 The database management system is purchased from a database technology vendor such as Oracle, IBM, Microsoft, or Sybase.
10
Database Concepts Databases Database Architecture:
A systems analyst, or database analyst, designs the structure of the data in terms of record types, fields contained in those record types, and relationships that exist between record types. These structures are defined to the database management system using its data definition language. Data definition language (or DDL) is used by the DBMS to physically establish those record types, fields, and structural relationships. Additionally, the DDL defines views of the database. Views restrict the portion of a database that may be used or accessed by different users and programs. DDLs record the definitions in a permanent data repository. 403 No additional notes provided.
11
403-404 Figure 11.4 A Typical Database Architecture
No additional notes provided.
12
Database Concepts Databases Database Architecture:
Some data dictionaries include formal, elaborate software that helps database specialists track metadata – the data about the data –such as record and field definitions, synonyms, data relationships, validation rules, help messages, and so forth. The database management system also provides a data manipulation language to access and use the database in applications. A data manipulation language (or DML) is used to create, read, update, and delete records in the database, and to navigate between different records and types of records. The DBMS and DML hide the details concerning how records are organized and allocated to the disk. The metadata is stored in a data dictionary or repository (which may or may not be provided by the DBMS vendor). To help design databases, CASE tools may be provided either by the database technology vendor (e.g., Oracle) or from a third-party CASE tool vendor (e.g., Popkin, Logic Works, etc.). In general, the DML is very flexible in that it may be used by itself to create, read, update, and delete records; or its commands may be ‘called’ from a separate host programming language such as COBOL, Visual Basic, or Powerbuilder.
13
Database Concepts Databases Database Architecture:
Many DBMSs don’t require the use of a DDL to construct the database, or a DML to access the database. They provide their own tools and commands to perform those tasks. This is especially true of PC-based DBMSs. Many DBMSs also include proprietary report writing and inquiry tools to allow users to access and format data without directly using the DML. Some DBMSs include a transaction processing monitor (or TP monitor) that manages on-line accesses to the database, and ensures that transactions that impact multiple tables are fully processed as a single unit. This especially true of PC-based DBMSs such as Microsoft Access. Access provides a simple graphical user interface to create the tables, and a form-based environment to access, browse, and maintain the tables. Most high-end DBMSs are designed to interact with popular third-party transaction processing monitors such as CICS and Tuxedo.
14
Database Concepts Databases Relational Database Management Systems:
There are several types of database management systems and they can be classified according to the way they structure records. Early database management systems organized records in hierarchies or networks implemented with indexes and linked lists. Relational databases implement data in a series of tables that are ‘related’ to one another via foreign keys. Files are seen as simple two-dimensional tables, also known as relations. The rows are records. The columns correspond to fields. 405 No additional notes provided.
15
405 Figure 11.5 A Simple, Logical Data Model
No additional notes provided.
16
405-406 Figure 11.6 A Simple, Physical Database Schema
No additional notes provided.
17
Database Concepts for the Systems Analyst
Databases Relational Database Management Systems: Both the DDL and DML of most relational databases is called SQL (which stands for Structured Query Language). SQL supports not only queries, but complete database creation and maintenance. A fundamental characteristic of relational SQL is that commands return ‘a set’ of records, not necessarily just a single record (as in non-relational database and file technology). 405 To access tables and records, SQL provides the following basic commands: SELECT specific records from a table based on specific criteria (e.g. SELECT CUSTOMER WHERE BALANCE > ) PROJECT out specific fields from a table (e.g. PROJECT CUSTOMER TO INCLUDE ONLY CUSTOMER_NUMBER, CUSTOMER_NAME, BALANCE) JOIN two or more tables across a common field – a primary and foreign key (JOIN CUSTOMER AND ORDER USING CUSTOMER_NUMBER)
18
Database Concepts for the Systems Analyst
Databases Relational Database Management Systems: High-end relational databases also extend the SQL language to support triggers and stored procedures. Triggers are programs embedded within a table that are automatically invoked by a updates to another table. Stored procedures are programs embedded within a table that can be called from an application program. Both triggers and stored procedures are reusable because they are stored with the tables themselves. This eliminates the need for application programmers to create the equivalent logic within each application that use the tables. Triggers - For example, if a record in deleted from an PASSENGER AIRCRAFT table, a trigger can force the automatic deletion of all corresponding records in a SEATS table for that aircraft. Stored procedures - For example, a complex data validation algorithm might be embedded in a table to ensure that new and updated records contain valid data before they are stored. Examples of high performance relational DBMSs include Oracle Corporation’s Oracle, IBM’s Database Manager, Microsoft’s SQL Server (being used in the SoundStage project), and Sybase Corporation’s Sybase. Many of these databases run on mainframes, minicomputers, and network database servers. Additionally, most personal computer DBMSs are relational (or partially so). Examples include Microsoft’s Access and Foxpro, and Borland’s Paradox and dBASE. These systems can run on both stand-alone personal computers and local area network file servers.
19
Data Analysis for Database Design
What is a Good Data Model? A good data model is simple. As a general rule, the data attributes that describe an entity should describe only that entity. A good data model is essentially non-redundant. This means that each data attribute, other than foreign keys, describes at most one entity. A good data model should be flexible and adaptable to future needs. We should make the data models as application-independent as possible to encourage database structures that can be extended or modified without impact to current programs. While a data model effectively communicates database requirements, it does not necessarily represent a good database design. It may contain structural characteristics that reduce flexibility and expansion, or create unnecessary redundancy. Therefore, we must ‘prepare’ the data model for database design and implementation.
20
Data Analysis for Database Design
Data analysis is a process that prepares a data model for implementation as a simple, non-redundant, flexible, and adaptable database. The specific technique is called normalization. Normalization is a technique that organizes data attributes such that they are grouped together to form stable, flexible, and adaptive entities. 408 No additional notes provided.
21
Data Analysis for Database Design
Normalization is a three-step technique that places the data model into first normal form, second normal form, and third normal form. An entity is in first normal form (1NF) if there are no attributes that can have more than one value for a single instance of the entity. An entity is in second normal form (2NF) if it is already in 1NF, and if the values of all non-primary key attributes are dependent on the full primary key – not just part of it. An entity is in third normal form (3NF) if it is already in 2NF, and if the values of its non-primary key attributes are not dependent on any other non-primary key attributes. Any attributes that can have multiple values actually describe a separate entity, possibly an entity (and relationship) that we haven’t yet included in our data model . Any non-key attributes that are dependent on only part of the primary key should be moved to any entity where that partial key becomes the full key. Again, this may require creating a new entity and relationship on the model. Any non-key attributes that are dependent on other non-key attributes must be moved or deleted. Again, new entities and relationships may have to be added to the data model.
22
Data Analysis for Database Design
Normalization Example First Normal Form: The first step in data analysis is to place each entity into 1NF. 409 First Normal Form - Please refer to the following figures 11-8 through Figures 11.9 through demonstrate how to place these three entities into 1NF. The original entity is depicted on the left side of the page. The 1NF entities are on the right side of the page. Each figure shows how normalization changed the data model and attribute assignments.
23
409-410 Figure 11.8 An Unnormalized SoundStage Data Model
Referring to the figure above, you should find three entities that are not in 1NF – MEMBER, MEMBER ORDER and CLUB. Each contains a repeating group, that is, a group of attributes that have multiple values for a single instance of the entity (denoted by the brackets). Consider, for example, the entity MEMBER. A single MEMBER can belong to multiple CLUBs and, therefore, have multiple values for CLUB NAME AND AGREEMENT NUMBER – one for each club to which he or she belongs. For a single instance of MEMBER, the number of clubs and agreements may vary. Similarly, a MEMBER ORDER can contain data about more than one ORDERED PRODUCT. And a CLUB can sponsor more than one AGREEMENT. How do we fix these anomalies in our model.
24
409-412 Figure 11.9 First Normal Form
Let’s examine the MEMBER ORDER entity in the figure above. First, we remove the attributes that can have more than one value for an instance of the entity. That alone places MEMBER ORDER in 1NF. But what do we do with the removed attributes? These attributes repeat many times ‘as a group’. Therefore, we moved the entire group of attributes to a new entity, MEMBER ORDERED PRODUCT. Each instance of these attributes describes one PRODUCT on a single MEMBER ORDER. Thus, if a specific ORDER contains five PRODUCTs, there will be five instances of the new MEMBER ORDERED PRODUCT entity. Each entity instance has only one value for each attribute; therefore, the new entity is also in first normal form. Notice how the primary key of the new entity was created—that is, by combining the primary key of the original entity, ORDER NUMBER, with the implicit key attribute of the group, PRODUCT NUMBER. Thus, we have what was described in Chapter 5 as a concatenated key. Since we know from Chapter 5 that each part of a concatenated key is a foreign key back to another entity, we added relationships (and cardinality) from the new MEMBER ORDERED PRODUCT entity to both the MEMBER and PRODUCT entities.
25
412 Figure 11.10 First Normal Form
Another example of 1NF is shown in the figure above for the CLUB entity. The attributes that can have many values (commonly called ‘repeating attributes’) are easy to spot. They include attributes such as AGREEMENT ACTIVE DATE and OBLIGATION PERIOD. As before, we created a new entity, AGREEMENT (as named by the users), keyed by the concatenation of CLUB NAME and AGREEMENT NUMBER. We moved the repeating attributes to that new entity. Once again, we also created a relationship between AGREEMENT and CLUB.
26
412-413 Figure 11.11 First Normal Form
To place the MEMBER entity in 1NF, we removed the repeating attributes. Those attributes seemed dependent on a combination of CLUB NAME and AGREEMENT NUMBER, so we created a new entity called CLUB MEMBERSHIP with that key. The repeating attributes were then moved to that entity. It was then that we noticed that the CLUB MEMBERSHIP entity was, in fact, a ternary associative entity (review Chapter 5). Each part of the concatenated key (MEMBER NUMBER, CLUB NAME, and AGREEMENT NUMBER) was a foreign key back to different entities. Thus, we completed our model by adding relationships (with cardinality) from that associative entity back to the MEMBER, CLUB, and AGREEMENT entities.
27
Data Analysis for Database Design
Normalization Example Second Normal Form: The next step of data analysis is to place the entities into 2NF. It is assumed that you have already placed all entities into 1NF. 2NF looks for an anomaly called a partial dependency, meaning an attribute(s) whose value is determined by only part of the primary key. Entities that have a single attribute primary key are already in 2NF. Only those entities that have a concatenated key need to be checked. 414 No additional notes provided.
28
414-415 Figure 11.12 Second Normal Form
First, let’s check the MEMBER ORDERED PRODUCT entity. Most of the attributes are dependent on the full primary key. For example, QUANTITY ORDERED makes no sense unless you have both a ORDER NUMBER and a PRODUCT NUMBER. Think about it! By itself, ORDER NUMBER is inadequate since the order could have as many quantities ordered as there are products on the order. Similarly, by itself, PRODUCT NUMBER is inadequate since the same product could appear on many orders. Thus, QUANTITY ORDERED requires both parts of the key and is fully dependent on the key. The same could be said of QUANTITY SHIPPED and PURCHASE UNIT PRICE. But what about ORDERED PRODUCT DESCRIPTION and ORDERED PRODUCT TITLE? Do we really need ORDER NUMBER to determine a value for either. No! Instead, the values of these attributes are dependent only on the value of PRODUCT NUMBER. Thus, the attributes are not dependent on the full key – we have uncovered a partial dependency error that must be fixed. How do we fix this type of normalization error? To fix the problem, we simply move the non-key attributes, ORDERED PRODUCT DESCRIPTION and ORDERED PRODUCT TITLE, to an entity that only has PRODUCT NUMBER as its key. If necessary, we would have to create this entity, but the PRODUCT entity with that key already exists. But we had to be careful because PRODUCT is a supertype. Upon inspection of the subtypes (see Figure 11.12), we discover that the attributes are already in the MERCHANDISE and TITLE entities, albeit under a synonym. Thus, we didn’t actually have to move the attributes from the MEMBER ORDERED PRODUCT entity – we just deleted them as redundant data.
29
Data Analysis for Database Design
Normalization Example Third Normal Form: Entities are assumed to be in 2NF before beginning 3NF analysis. Third normal form analysis looks for two types of problems, derived data and transitive dependencies. In both cases, the fundamental error is that non key attributes are dependent on other non key attributes. Derived attributes are those whose values can either be calculated from other attributes, or derived through logic from the values of other attributes. A transitive dependency exists when a non-key attribute is dependent on another non-key attribute (other than by derivation). Transitive analysis is only performed on those entities that do not have a concatenated key. If you think about it, storing a derived attribute makes little sense. First, it wastes disk storage space. Second, it complicates simple updates. Why? Every time you change the base attributes, you must remember to re-perform the calculation and also change its result.
30
Data Analysis for Database Design
Normalization Example Third Normal Form: Third normal form analysis looks for two types of problems, derived data and transitive dependencies. (continued) A transitive dependency exists when a non-key attribute is dependent on another non-key attribute (other than by derivation). This error usually indicates that an undiscovered entity is still embedded within the problem entity. Transitive analysis is only performed on those entities that do not have a concatenated key. “An entity is said to be in third normal form if every non-primary key attribute is dependent on the primary key, the whole primary key, and nothing but the primary key.” 416 Such a condition, if not corrected, can cause future flexibility and adaptability problems if a new requirement eventually requires us to implement that undiscovered entity as a separate database table. Before we leave the subject of normalization, we should acknowledge that several normal forms beyond 3NF exist. Each successive normal form makes the data model simpler, less redundant, and more flexible. However, systems analysts (and most database experts) rarely take data models beyond 3NF unless absolutely necessary.
31
416 Figure 11.13 Third Normal Form
For example, look at the MEMBER ORDERED PRODUCT entity in the figure above. The attribute EXTENDED PRICE is calculated by multiplying QUANTITY ORDERED by PURCHASE UNIT PRICE. Thus, EXTENDED PRICE (a non-key attribute) is not dependent on the primary key as much as it is dependent on QUANTITY ORDERED and PURCHASE UNIT PRICE. Thus, we simplify the entity by deleting EXTENDED PRICE. Sounds simple, right? Well, not always! There is disagreement on how far you take this rule. Some experts argue that the rule should be applied only within a single entity. Thus, these experts would not delete a derived attribute if the attributes required for the derivation are assigned to different entities. Other experts argue that the rule should be required regardless of where the base attributes are stored. We tend to agree based on the argument that a derived attribute that involves multiple entities presents a greater danger for data inconsistency caused by updating an attribute in one entity and forgetting to subsequently update the derived attribute in another entity. (The exception to this rule would be those databases that support triggers, described earlier in this chapter) that could automatically update the derived attributes.) Transitive analysis is only performed on those entities that do not have a concatenated key. In our example, this includes PRODUCT, MEMBER ORDER, MEMBER, and CLUB. For the entity PRODUCT, all of the non key attributes are dependent on the primary key, and only the primary key. Thus, PRODUCT is already in third normal form.
32
416-417 Figure 11.14 Third Normal Form
Transitive analysis is only performed on those entities that do not have a concatenated key. In our example, this includes PRODUCT, MEMBER ORDER, MEMBER, and CLUB. For the entity PRODUCT, all of the non key attributes are dependent on the primary key, and only the primary key. Thus, PRODUCT is already in third normal form. But look at the entity, MEMBER ORDER, in the figure above. In particular, examine the attributes MEMBER NAME and MEMBER ADDRESS. Are these attributes dependent on the primary key, MEMBER ORDER NUMBER? No! The primary key MEMBER ORDER NUMBER in no way determines the value of MEMBER NAME and MEMBER ADDRESS. On the other hand, the values of MEMBER NAME and MEMBER ADDRESS are dependent on the value of another non-primary key in the entity, MEMBER NUMBER. How do we fix this problem? MEMBER NAME and MEMBER ADDRESS need to be moved from the MEMBER ORDER entity to an entity whose key is just MEMBER NUMBER. If necessary, we would create that entity, but in our case we already have an MEMBER entity with the required primary key. And as it turns out, we don’t need to really move the problem attributes since they are already assigned to the MEMBER entity. We did, however, have to notice that MEMBER ADDRESS was a synonym for MEMBER STREET ADDRESS. We elected to keep the latter term in MEMBER.
33
Data Analysis for Database Design
Simplification by Inspection: When several analysts work on a common application, it is not unusual to create problems that won’t be taken care of by normalization. These problems are best solved through simplification by inspection, a process wherein a data entity in 3NF is further simplified by such efforts as addressing subtle data redundancy. Please refer to figure on page 419 in the textbook. The authors apologize that this figure is not available at this time. Also through inspection, we realized that the CLUB MEMBERSHIP attributes for ‘taste’ and ‘media’ preferences were in fact different depending on which club to which a member belongs. For example, ‘media’ has a different set of possible values based on club. In an AUDIO CLUB, the value set is CASSETTE, COMPACT DISC, MINI-DISC, and DIGITAL VERSATILE DISC. In the VIDEO CLUB, the value set is VHS TAPE, LASER DISC, 8MM TAPE, and DIGITAL VERSATILE DISC. In the GAME CLUB, media values include CD-ROM, DIGITAL VERSATILE DISC, and various CARTRIDGE formats. Thus, what we thought was one attribute, MEDIA PREFERENCE was, in fact, three attributes, AUDIO MEDIA PREFERENCE, VIDEO MEDIA PREFERENCE, and GAME MEDIA PREFERENCE.
34
Data Analysis for Database Design
CASE Support for Normalization: Most CASE tools can only normalize to first normal form. They accomplish this in one of two ways. They look for many-to-many relationships and resolve those relationships into associative entities. They look for attributes specifically described as having multiple values for a single entity instance. It is exceedingly difficult for a CASE tool to identify second and third normal form errors. That would require the CASE tool to have the intelligence to recognize partial and transitive dependencies. 418 No additional notes provided.
35
Database Design The Database Schema
The design of a database is depicted as a special model called a database schema. A database schema is the physical model or blueprint for a database. It represents the technical implementation of the logical data model. A relational database schema defines the database structure in terms of tables, keys, indexes, and integrity rules. A database schema specifies details based on the capabilities, terminology, and constraints of the chosen database management system. 420 No additional notes provided.
36
Database Design The Database Schema
Transforming the logical data model into a physical relational database schema rules and guidelines: Each fundamental, associative, and weak entity is implemented as a separate table. The primary key is identified as such and implemented as an index into the table. Each secondary key is implemented as its own index into the table. Each foreign key will be implemented as such. Attributes will be implemented with fields. These fields correspond to columns in the table. No additional notes provided.
37
Database Design The Database Schema
Transforming the logical data model into a physical relational database schema rules and guidelines: (continued) The following technical details must usually be specified for each attribute. Data type. Each DBMS supports different data types, and terms for those data types. Size of the Field. Different DBMSs express precision of real numbers differently. NULL or NOT NULL. Must the field have a value before the record can be committed to storage? Domains. Many DBMSs can automatically edit data to ensure that fields contain legal data. Default. Many DBMSs allow a default value to be automatically set in the event that a user or programmer submits a record without a value. 421 Data type. For example, different systems may designate a large alphanumeric field differently (e.g., MEMO in Access and LONG VARCHAR in Oracle). Also, some databases allow the choice of no compression versus compression of unused space (e.g., CHAR versus VARCHAR in Oracle). Size of the Field. For example, in Oracle, a size specification of NUMBER (3,2) supports a range from to 9.99. NULL or NOT NULL. Again, different DBMSs may require different reserved words to express this property. Primary keys can never be allowed to have null values. Domains. This can be a great benefit to ensuring data integrity independent from the application programs. If the programmer makes a mistake, the DBMS catches the mistake. But for DBMSs that support data integrity, the rules must be precisely specified in a language that is understood by the DBMS. Many of the above specifications were documented as part of a complete data model. If that data model was developed with a CASE tool, the CASE tool may be capable of automatically translating the data model into the language of the chosen database technology.
38
Database Design The Database Schema
Transforming the logical data model into a physical relational database schema rules and guidelines: (continued) Supertype/subtype entities present additional options as follows: Most CASE tools do not currently support object-like constructs such as supertypes and subtypes. Most CASE tools default to creating a separate table for each entity supertype and subtype. If the subtypes are of similar size and data content, a database administrator may elect to collapse the subtypes into the supertype to create a single table. Evaluate and specify referential integrity constraints. Would you ever want to compromise the third normal form entities when designing the database? For example, would you ever want to combine two third normal form entities into a single table (that would, by default, no longer be in third normal form)? Usually not! Although a DBA may create such a compromise to improve database performance, he or she should carefully weigh the advantages and disadvantages. Although such compromises may mean greater convenience through fewer tables or better overall performance, such combinations may also lead to the possible loss of data independence—should future, new fields necessitate resplitting the table into two tables, programs will have to be rewritten. As a general rule, combining entities into tables is not recommended. Please refer to figure on page 422 in the textbook. The authors apologize that this figure is not available at this time.
39
Data and Referential Integrity
Database Design Data and Referential Integrity There are at least three types of data integrity that must be designed into any database - key integrity, domain integrity and referential integrity. Key Integrity: Every table should have a primary key (which may be concatenated). The primary key must be controlled such that no two records in the table have the same primary key value. The primary key for a record must never be allowed to have a NULL value. 423 No additional notes provided.
40
Data and Referential Integrity
Database Design Data and Referential Integrity Domain Integrity: Appropriate controls must be designed to ensure that no field takes on a value that is outside of the range of legal values. Referential Integrity: A referential integrity error exists when a foreign key value in one table has no matching primary key value in the related table. 423 For example, if GRADE POINT AVERAGE is defined to be a number between 0.00 and 4.00, then controls must be implemented to prevent negative numbers and numbers greater than Not long ago, application programs were expected to perform all data editing. Today, most database management systems are capable of data editing. For the foreseeable future, the responsibility for data editing will continue to be shared between the application programs and the DBMS. The architecture of relational databases implements relationships between the records in tables via foreign keys. The use of foreign keys increases the flexibility and scalability of any database, but it also increases the risk of referential integrity errors. For example, an INVOICES table usually includes a foreign key, CUSTOMER NUMBER, to ‘reference back to’ the matching CUSTOMER NUMBER primary key in the CUSTOMERS table. What happens if we delete a CUSTOMER record? There is the potential that we may have INVOICE records whose CUSTOMER NUMBER has no matching record in the CUSTOMERS table. Essentially, we have compromised the referential integrity between the two tables.
41
Data and Referential Integrity
Database Design Data and Referential Integrity Referential Integrity: Referential integrity is specified in the form of deletion rules as follows: No restriction. Any record in the table may be deleted without regard to any records in any other tables. Delete:Cascade. A deletion of a record in the table must be automatically followed by the deletion of matching records in a related table. Delete:Restrict. A deletion of a record in the table must be disallowed until any matching records are deleted from a related table. No additional notes provided.
42
Data and Referential Integrity
Database Design Data and Referential Integrity Referential Integrity: Referential integrity is specified in the form of deletion rules as follows: (continued) Delete:Set Null. A deletion of a record in the table must be automatically followed by setting any matching keys in a related table to the value NULL. 424 The final database schema, complete with referential integrity rules is illustrated in Figure 11.17, on page 425 in the textbook. The authors apologize that this figure is not available at this time. This is the blueprint for writing the SQL code (or equivalent) to create the tables and data structures
43
Database Design Roles Some database shops insist that no two fields have exactly the same name. This presents an obvious problem with foreign keys A role name is an alternate name for a foreign key that clearly distinguishes the purpose that the foreign key serves in the table. The decision to require role names or not is usually established by the data or database administrator. 424 Some database shops insist that no two fields have exactly the same name. This constraint serves to simplify documentation, help systems, and metadata definitions. This presents an obvious problem with foreign keys. By definition, a foreign key must have a corresponding primary key. During logical data modeling, using the same name suited our purpose of helping the users understand that these foreign keys allow us to match up related records in different entities. But in a physical database, it is not always necessary or even desirable to have these redundant field names in the database.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.