PHASE 3: SYSTEMS DESIGN Chapter 7 Data Design
Chapter Objectives Explain data design concepts and data structures Describe file processing systems and various types of files Understand database systems and define the components of a database management system (DBMS) Describe Web-based data design 3
Chapter Objectives Explain data design terminology, including entities, fields, common fields, records, files, tables, and key fields Describe data relationships, draw an entity-relationship diagram, define cardinality, and use cardinality notation Explain the concept of normalization Explain the importance of codes and describe various coding schemes 3
Chapter Objectives Describe relational and object-oriented database models Explain data warehousing and data mining Differentiate between logical and physical storage and records Explain data control measures
Introduction You will develop a physical plan for data organization, storage, and retrieval Begins with a review of data design concepts and terminology, then discusses file-based systems and database systems, including Web-based databases Concludes with a discussion of data storage and access, including strategic tools such as data warehousing and data mining, physical design issues, logical and physical records, data storage formats, and data control 4
Data Design Concepts Data Structures A file or table contains data about people, places, things, or events that interact with the system File-oriented system File processing system Database system Data Structure A framework for organizing and storing data in an information system. Consists of files or tables linked in various ways. Depending on this link either a file processing system or database management system
Data Design Concepts Overview of File Processing Potential problems Data redundancy Data integrity Rigid data structure File Processing A system that uses files that contain all the data necessary for inquiries/reporting It is still used. It is efficient, because … They do not require the additional processor time and memory space needed by preprogrammed database functions They are simple to create, especially when a data file is accessed by only one application They can be tailored tightly to specific application or business needs Disadvantages: Data redundancy results because each department has its own files, and the same data (such as an employee name) frequently appears in each of those files In a file processing system, the need to make changes to data in multiple files (which results from data redundancy) can compromise data integrity. The rigid data structure of a file processing system results from a department’s data files usually being closely tied to the department’s applications. This can make it difficult for managers who require information from multiple departments Example: previous slide 3 items of information (mechanic no, name, and payrate) are stored in both data files
Data Design Concepts Overview of File Processing Uses various types of files Master file Table file Transaction file Work file – scratch file Security file History file These files are used by a file processing system. Master File: Dynamic file that stores permanent type data and contains one record for each record Ex. School maintains master files for courses, students, faculty Table File: Contains reference data used by the information system Static and not updated by information system Ex. Tax tables, Postage rate tables, Zip code tables Transaction File: Stores records that contain day-to-day business and operational data An input file that updates a master file Ex. Charges, Payments Work File or Scratch File: A temporary file created by in information system for a single task. Usually created by one process in the information system and used by another process within the same system. Ex. Sorted files, Report files, Output reports until printed Security File: Created and saved for backup and recovery purposes Ex. Audit trail files, backups of master, table transaction files History File: Created for archiving purposes Ex. Inactive Student file If student hasn’t register in last 2 semesters, student deleted from active student master and added to inactive. If re-registers, deleted from inactive and added back to active student master
Data Design Concepts Overview of Database Systems A properly designed database system offers a solution to the problems of file processing Provides an overall framework that avoids data redundancy and supports a real-time, dynamic environment Database management system (DBMS) The main advantage of a DBMS is that it offers timely, interactive, and flexible data access
Data Design Concepts Overview of Database Systems Advantages Scalability Better support for client/server systems Economy of scale Flexible data sharing Enterprise-wide application – database administrator (DBA) Stronger standards Scalability system can be expanded, modified, or downsized to meet changing needs of business Client/Server support these systems require power and flexibility of a db design Economies of Scale a company that uses an enterprise-wide db with a powerful mainframe server instead of several smaller computers is saving money through economies of scale. The more processing, the cheaper it gets. Flexible Data Sharing The usage of a DB allows users to access information from anywhere and view consistent information in different ways DBA The Database Administrator typically administers the databse. Stronger Standards Important to have standardization on data names, formats, and documentation throughout the organization used in the database (DBA helps to ensure)
Data Design Concepts Overview of Database Systems Advantages Controlled redundancy Better security Increased programmer productivity Data independence Advantages of DB Systems Controlled Redundancy: We are not storing data in numerous place Reduces inconsistency and data errors Data items do not need to be duplicated in multiple locations Better Security: DBA defines authorization procedures to ensure correct access to db Programmer Productivity: Programmers to not have to create the underlying file structure for a database They can concentrate on logical design and a new db application can be developed more quickly than a file-oriented system Data Independence: Systems that interact with a DBMS are relatively independent of how the physical data is maintained Thus you can alter data structures without modifying the systems that use data
Data Design Concepts Database Tradeoffs Because DBMSs are powerful, they require more expensive hardware, software, and data networks capable of supporting a multi-user environment More complex than a file processing system Procedures for security, backup, and recovery are more complicated and critical
DBMS Components Interfaces for Users, Database Administrators, and Related Systems Users Query language Query by example (QBE) SQL (structured query language) Database Administrators A DBA is responsible for DBMS management and support To work with the data within a DBMS there are several options: Query Language: A query is a request for specific data from a database. Query languages allow users to specify the data to display, print, or store. Each query language has its own grammar and vocabulary. Even without a programming background, most query languages can be learned in a short time. Query By Example: Query by example provides a graphical user interface to assist users with retrieving data. QBE is a relatively simple concept in which users are led intuitively through a query, step by step. Most DBMSs provide a QBE feature. (Access) SQL (Structured Query Language): SQL truly is a multiplatform tool. SQL is available in popular database programs for personal computers and networks, and most relational databases for midrange servers and mainframes include SQL. At one time, only professional programmers could access mainframe data. Database Administrators: The DBA maintains the database and assesses its requirements. Many large companies have both a DBA and a database analyst (DA), or data modeler, who focuses on the meaning and use of data. In smaller companies, one person often assumes both roles.
DBMS Components Interfaces for Users, Database Administrators, and Related Systems Related information systems A DBMS can support several related information systems that provide input to, and require specific data from, the DBMS Related information systems: With these systems, unlike a human interface, no human intervention is required for the 2-way communication that occurs between the DBMS and the related systems.
DBMS Components Data Manipulation Language Schema A data manipulation language (DML) controls database operations, including storing, retrieving, updating, and deleting data Schema The complete definition of a database, including descriptions of all fields, tables, and relationships, is called a schema You also can define one or more subschemas Data Manipulation Language: DML serves the user who is querying a database. Some DBMSs, such as Microsoft Access, use QBE to hide the DML from a user Schema: Describes the structure of the database and contains descriptive information about the stored data, including access and content controls, relationships among data elements, and details of physical data store organization. Subschema: A view of the database that a particular system or user needs or is allowed to access To protect privacy project management system does not retrieve employee pay rates This could be used to restrict the level of access the a user is given to the schema some users can only update data, others can delete and create
DBMS Components Physical Data Repository The data dictionary is transformed into a physical data repository, which also contains the schema and subschemas The physical repository might be centralized, or distributed at several locations ODBC – open database connectivity JDBC – Java database connectivity ODBC Uses SQL statements that the DBMS understands and can execute JDBC Enables Java applications to exchange data with any database that uses SQL statements and is JDBC-compliant
Web-Based Database Design Characteristics of Web-Based Design In a Web-based design, the Internet serves as the front end, or interface, for the database management system Internet technology provides enormous power and flexibility Web-based systems are popular because they offer ease of access, cost-effectiveness, and worldwide connectivity
Web-Based Database Design Connecting a Database to the Web Database must be connected to the Internet or intranet Middleware Macromedia’s ColdFusion To access data in a Web-based system, the db must be connected to the Internet or intrnet The db and the Internet speak two different languages, however Db have nothing to do with HTML, the language of the Web Middleware: Software that allows DB to connect to the Web and enable data to be viewed and updated ColdFusion – A Middleware product to accelerate the deploy web applications
Web-Based Database Design Data Security Web-based data must be totally secure, yet easily accessible to authorized users To achieve this goal, well-designed systems provide security at three levels: The database itself The Web server The telecommunication links that connect the components of the system
Data Design Terminology Definitions Entity Table or file Field Attribute Common field Record Tuple Now it is time for the systems analyst to select a design approach and begin to construct the system. Entity: Something we are collecting and maintaining data about a person, place, thing, or event Table/File: Data is organized into tables or files which contain related records about the entity Structure consists of columns and rows Field: Attribute or characteristic about the entity. First Name, Last Name, Address A common field is an attribute that appears in more than one entity can is used to link entities in various types of relationships (primary foreign keys) Record: Also called tuple describes one instance or occurrence of an entity A set of related fields
Data Design Terminology Key Fields Primary key Composite key (aka Combination key , Concatenated key, Multi-valued key) Candidate key Nonkey field Foreign key Secondary key Primary Key Uniquely identifies a field or combination of fields of an entity Would you want to use a field like Name for a Key field? No Why not? Not Unique Composite Key A combination of 2 or more key fields that make up the primary key Candidate Key Before you choose the primary key, a candidate key is a field that could be a choice for the primary key If not chosen, this is a nonkey Foreign Key A field in one table that matches a primary key in another table to establish the relationship between the two tables The foreign key does not have to be unique Ex. Carlton Smith has Advisor 49. The value 49 must be a unique value in the ADVISOR table because it is the PK but 49 can appear any number of times in the STUDENT table, where the advisor number is a foreign key. Turn to pg. 317 Figure 7-12. Bottom: Student-Number and Course-ID are foreign keys that serve as the PK in the GRADE table. Using both of these fields as the PK assures the the grade will be assigned to the proper student in the proper courses Secondary Keys Key values that are not unique Zip Code
Data Design Terminology Referential Integrity Validity checks can help avoid data input errors A type of validity check that is a set of rules that will not allow … data inconsistencies quality problems When refer to a relational db, it means that … A foreign key value cannot be entered in one table unless you have the matching primary key in another table. You cannot enter a customer order into the ORDER table unless that customer already exists in the CUSTOMER table If you don’t have referential integrity you could end up with an orphan order because there was not related customer If you had an Order Master and an Order Detail table and you don’t have referential integrity you could end up deleting the order master records from the ORDER MASTER table leaving the ORDER DETAIL records as orphans.
Entity-Relationship Diagrams An entity is a person, place, thing, or event for which data is collected and maintained Provides an overall view of the system, and a blueprint for creating the physical data structures Entity-relationship diagram ERD a model that shows the logical relationships and interaction among system entities. Provides an overall view of the system and blueprint for creating physical data structures
Entity-Relationship Diagrams Drawing an ERD The first step is to list the entities that you identified during the fact-finding process and to consider the nature of the relationships that link them Entities are labeled with singular nouns Relationships are diamond shapes ERDs depict relationships, not data or information flows
Entity-Relationship Diagrams Types of Relationships One-to-one relationship (1:1) One-to-many relationship (1:M) Many-to-many relationship (M:N) Associative entity There are 3 types of Relationships that can exist between entities: 1-1 exists when exactly one of the 2nd entity occurs for each instance of the 1st entity Figure 7-15, pg. 318 1-M exists when one occurrence of the 1st entity can relate to many instances of the 2nd entity, but each instance of the 2nd entity can associate with only one instance of the 1st entity Figure 7-16, pg. 319 M-M exists when one instance of the 1st entity can relate to many instances of the 2nd entity , and one instance of the 2nd entity can relate to many instances of the 1st entity Figure 7-17, pg. 319 Notice the M-M is different from the 1:1 or 1:M in in that the event or transaction that links the 2 entities together is actually a third entity, called an associative entity that has its own characteristics.
Entity-Relationship Diagrams Cardinality Cardinality notation Crow’s foot notation Cardinality: After drawing the initial ERD then you need to determine how many instances of one entity relate to instances of the other entity. This technique is called cardinality. Cardinality Notation: Modeling this interaction is done by using special symbols that represent the relationship, termed cardinality notation. Crow’s Foot Notation: The symbols used for the notation include circles, bars, and symbols that resemble crow’s feet. Figure 7-20, Pg. 321 Notation Single Bar | indicates one Double Bar | | indicates one and only one Circle indicates zero Crow’s Foot indicates many Go through Figure 7-20, pg. 321
Normalization Table design Involves four stages: unnormalized design, first normal form, second normal form, and third normal form Most business-related databases must be designed in third normal form We now start the process of creating our tables but this is only a logical view. These tables may or may not translate into actual tables in the physical design. Normalization: Is a process where we create table designs by assigning specific fields to each table in the database based on a set of rules with the goal in mind to correct inherent problems with our table design. It involves 4 stages: Unnormalized Design, 1NF, 2NF, 3NF The 3 normal forms constitute a progression in 3NF is the best design Must business db must be designed in 3NF
Normalization Standard Notation Format Designing tables is easier if you use a standard notation format to show a table’s structure, fields, and primary key Example: NAME (FIELD 1, FIELD 2, FIELD 3)
Normalization Repeating Groups and Unnormalized Designs Often occur in manual documents prepared by users Unnormalized design The first inherent problem with unnormalized data is something called repeating groups. A group of fields that occur any number of times in a single record with each occurrence have different values. They often occur in manual documents. Ex. A report card with student’s information at the top followed by a list of courses and grades at the bottom. This academic or grade information would represent a repeating group. P. 7-22, pg. 323 Think of repeating group as set of child (subsidiary) records contained within the parent (main) record.
Normalization First Normal Form Second Normal Form A table is in first normal form (1NF) if it does not contain a repeating group To convert, you must expand the table’s primary key to include the primary key of the repeating group Second Normal Form To understand second normal form (2NF), you must understand the concept of functional dependence Functionally dependent 1NF: A table is in 1NF if it DOES NOT contain a repeating group You give the new table the new primary key and include that as a FK in the original table Pg. 323, Figure 722 Raw, Pg. 324, Figure 723 – 1NF 2NF: The table is in 1NF AND all fields that are not part of the primary key are functionally dependent on the entire primary key.
Normalization Second Normal Form A standard process exists for converting a table from 1NF to 2NF Create and name a separate table for each field in the existing primary key Create a new table for each possible combination of the original primary key fields Study the three tables and place each field with its appropriate primary key
Normalization Second Normal Form Four kinds of problems are found with 1NF designs that do not exist in 2NF Consider the work necessary to change a particular product’s description 1NF tables can contain inconsistent data Adding a new product is a problem Deleting a product is a problem 2NF: Consider Work to Make Changes to Fields: If there are 500 current orders for Product #304 We have to modify 500 records. Updating would be cumbersome and expensive. 1NF Table Contain Inconsistent Data: Entering data manually leaves you wide open to mismatching of data. Adding A New Product is a Problem: Because PK includes Order Number and Product #, you need values for both fields to add a record. How do you enter a new product that has not yet ben ordered by a customer? Could use a dummy order number but can create difficulties. Pg. 326, Figure 7-24
Normalization Third Normal Form 3NF design avoids redundancy and data integrity problems that still can exist in 2NF designs A table design is in third normal form (3NF) if it is in 2NF and if no nonkey field is dependent on another nonkey field To convert the table to 3NF, you must remove all fields from the 2NF table that depend on another nonkey field and place them in a new table that uses the nonkey field as a primary key Figure 7-26, pg. 327 2NF Figure 7-27, pg 328 3NF
Normalization A Normalization Example To show the normalization process, consider the familiar situation, which depicts several entities in a school advising system: ADVISOR, COURSE, and STUDENT
Steps in Database Design Create the initial ERD Assign all data elements to entities Create 3NF designs for all tables, taking care to identify all primary, secondary, and foreign keys Verify all data dictionary entries After creating your final ERD and normalized table designs, you can transform them into a database Create ERD: Review the DFDs to identify system entities, talk to users Create a draft ERD Analyze each relationship of the ERD to see if it is 1:1, 1:m, or M:N Figure 7-39, p. 337 Assign all data elements to entities: Verify that every data element in DD is associated logically with an entity Create 3NF designs for all tables, taking care to identify all primary, secondary, and foreign keys Generate the final ERD that will include new entities identified during normalization Figure 7-40, p. 338 Verify all data dictionary entries Verify and check all entries have been made for data stores, records, and data elements, and codes. Transformation into a database will take place once finished and other steps have been completed.
Chapter Summary Any questions? Files and tables contain data about people, places, things, or events that affect the information system DBMS designs are more powerful and flexible than traditional file-oriented systems Data design tasks include creating an initial ERD; assigning data elements to an entity; normalizing all table designs; and completing the data dictionary entries for files, records, and data elements Any questions? 49