Chapter 1 The Database Environment and Development Process Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration Gonzaga University Spokane, WA 99258 chen@jepson.gonzaga.edu
Importance of Information Information (and energy) are at the core of everything around us. Our entire existence (including businesses) is a process of gathering, analyzing, understanding, and acting on the information. Modern organizations are said to be drowning in data but starving for information.
CAREER EARNING POTENTIAL Employment for Computer Systems Analysts is expected to grow much faster than average. Jobs in this area are expected to increase by 25% between 2012 and 2020. With a Bachelor's degree -- the median salary per year for this occupation is $79,680. It is expected that 127,000 new jobs will be created in this area by 2020. Source: U.S. Bureau of Labor Statistics.
Objectives Define terms Name limitations of conventional file processing Explain advantages of databases Identify costs and risks of databases List components of database environment Identify categories of database applications Describe database system development life cycle Explain prototyping and agile development approaches Explain roles of individuals Explain the three-schema architecture for databases
Definitions Data: Meaningful facts, text, graphics, images, sound, video segments. Database: An organized collection of logically related data. Information: Data processed to be useful in decision making. Metadata: Data that describes data (Data about the data).
Business and Modeling Environment Examples: Model Meta-Data Invoice Core/Essence Data Debt, Revenue Furniture Store Business Reality
Business and Modeling Environment Examples: Model Meta-Data Invoice Core/Essence Data Debt, Revenue Furniture Store Business Reality
Types of Data Processing Two types of data processing File-based data processing e.g., applications developed by Java Data-based data processing e.g, applications developed by Oracle or MS/Access)
Figure 1-2: Old file processing systems at Pine Valley Furniture What is the main problem in the company’s processing systems? Duplicate Data
Disadvantages of File Processing Program-Data Dependence All programs maintain metadata for each file they use Data Redundancy (Duplication of data) Different systems/programs have separate copies of the same data Limited Data Sharing No centralized control of data Lengthy Development Times Programmers must design their own file formats Excessive Program Maintenance 80% of of information systems budget
Problems with Data Dependency Each application programmer must maintain their own data Each application program needs to include code for the metadata of each file Each application program must have its own processing routines for reading, inserting, updating and deleting data Lack of coordination and central control Non-standard file formats
Problems with Data Redundancy (conti.) Waste of space to have duplicate data Causes more maintenance headaches The biggest Problem: When data changes in one file, could cause inconsistencies Compromises data integrity
SOLUTION: The DATABASE Approach Central repository of shared data Data is managed by a controlling agent Stored in a standardized, convenient form Requires a Database Management System (DBMS)
Database Management System A software system that is used to create, maintain, and provide controlled access to user databases Database containing centralized shared data Application #1 (Order Filing) #2 (Invoicing Sys) #3 (Payroll Sys.) DBMS DBMS manages data resources like an operating system manages hardware resources
Database Management System A DBMS is a data storage and retrieval system which permits data to be stored non-redundantly while making it appear to the user as if the data is well-integrated. In short, a DBMS is a software package that manages a data base.
Advantages of Database Approach Program-Data Independence Metadata stored in DBMS, so applications don’t need to worry about data formats (you will know this when you learn Oracle) Data queries/updates managed by DBMS so programs don’t need to process data access routines Results in: increased application development and maintenance productivity Minimal Data Redundancy Leads to increased data integrity/consistency
Advantages of Database Approach Program-data independence Planned data redundancy Improved data consistency Improved Data Sharing Different users get different views of the data Improved productivity of application development Enforcement of Standards All data access is done in the same way Improved Data Quality Constraints, data validation rules Improved data accessibility and responsiveness Use of standard data query language (SQL) Reduced program maintenance Improved decision support Security, Backup/Recovery, Concurrency (not in Table 1-3) Disaster recovery is easier
Costs and Risks of the Database Approach New specialized personnel Up-front costs: Installation Management Cost and Complexity Conversion Costs Ongoing Costs Requires New, Specialized Personnel Need for Explicit Backup and Recovery Organizational Conflict Old habits die hard Other hidden costs
Why Do We Still Learn File Processing Systems? File processing systems are still widely used today, especially for backing up database systems. Understanding the problems and limitations inherent in file processing systems can help us avoid these same problems when designing database.
Elements of the Database Approach Data models Graphical system capturing nature and relationship of data Enterprise Data Model–high-level entities and relationships for the organization Project Data Model–more detailed view, matching data structure in database or data warehouse Entities Noun form describing a person, place, object, event, or concept Composed of attributes Relationships Between entities Usually one-to-many (1:M) or many-to-many (M:N) Relational Databases Database technology involving tables (relations) representing entities and primary/foreign keys representing relationships
How about the relationship between ORDER and PRODUCT? A Data Model on Customer and Order Segment of an Enterprise Data Model CUSTOMER ORDER M Q1. One CUSTOMER normally places ___ ORDER? 1 Q2. One ORDER normally is placed by __ CUSTOMER? How about the relationship between ORDER and PRODUCT? (see next slide)
How about the relationship between ORDER and PRODUCT? A Data Model on Customer and Order Segment of an Enterprise Data Model CUSTOMER ORDER M Q1. One CUSTOMER normally places ___ ORDER? 1 Q2. One ORDER normally is placed by __ CUSTOMER? How about the relationship between ORDER and PRODUCT? (see next slide)
Figure 1-3: Comparison of enterprise and project level data models ORDER ORDER_NUMBER Customer_ID Order_Date Q3. One PRODUCT normally is contained in __ ORDER? M Q4. One ORDER normally contains __ PRODUCT? M
Figure 1-3: Comparison of enterprise and project level data models ORDER ORDER_NUMBER Customer_ID Order_Date Q3. One PRODUCT normally is contained in __ ORDER? M Q4. One ORDER normally contains __ PRODUCT? M
Q: what are two major differences between (a) & (b)? Figure 1-3: Comparison of enterprise and project level data models (a) Segment of an Enterprise Data Model Q: what are two major differences between (a) & (b)? (b) Segment of a Project-Level Data Model Q: what are major differences between (a) & (b)? In (a) high-level (entities only); in (b) with fields M -> M was broken into two 1 -> M
Figure 3 Figure 1-3 Segment from Enterprise Data Model Enterprise data model is a graphical model that shows the high-level entities for the organization and the relationship among these entities. (E/R Diagram)
Figure 1-3 Segment from enterprise data model One customer may place many orders, but each order is placed by a single customer One-to-many relationship
Figure 1-3 Segment from enterprise data model One order has many order lines; each order line is associated with a single order One-to-many relationship
Figure 1-3 Segment from enterprise data model One product can be in many order lines, each order line refers to a single product One-to-many relationship
Figure 1-3 Segment from enterprise data model Therefore, one order involves many products and one product is involved in many orders Many-to-many relationship
Figure 1-4 Enterprise data model for Figure 1-3 segments
Figure 1-5 Components of the Database Environment
Components of the Database Environment CASE Tools–computer-aided software engineering Repository–centralized storehouse of metadata Database Management System (DBMS) –software for managing the database Database–storehouse of the data Application Programs–software using the data User Interface–text and graphical displays to users Data/Database Administrators–personnel responsible for maintaining the database System Developers–personnel responsible for designing databases and software End Users–people who use the applications and databases
Evolution of Database Technologies Flat files - 1960s - 1980s Hierarchical – 1970s - 1990s Network – 1970s - 1990s Relational – 1980s - present Object-oriented – 1990s - present Object-relational – 1990s - present Data warehousing – 1980s - present
Figure 1-10a Evolution of Database Technologies
Figure 1-10b Database architecture
Figure 1-10b Database architecture
The Range of Database Applications Personal databases Two-tier Client/Server databases Multitier/N Client/Server (or called web-enabled) databases Enterprise applications Enterprise resource planning (ERP) systems Data warehousing implementations (Web-enabled Database ) (Wide Area Network) (WAN) (Local Area Network)
Figure 1-11 Two-tier database with local area network Chapter 1
Figure 1-12 Three-tiered client/server database architecture Chapter 1
Muti/N-tier: Web-Enabled Databases Web applications requiring databases Customer relationship management (CRM) Business-to-consumer (B2C) Electronic data interchange (EDI) Private intranets XML-defined Web services 41
Enterprise Applications Enterprise Resource Planning (ERP) Integrate all enterprise functions (manufacturing, finance, sales, marketing, inventory, accounting, human resources) Data Warehousing implementation Integrated decision support system derived from various operational databases
Break ! (Ch. 1) Exercise #1 (p.44) Homework: 1. Complete chapter1 quiz by Sunday evening. 2. HW#12 (a) only (homework assignment - high-level (no attributes) - draw by Visio clearly - turn in a hardcopy next class
Be Prepared for ...
Discuss HW HW#12 (a) Volunteer?
Enterprise Data Model Enterprise data model is a graphical model that shows the high-level entities for the organization and the relationship among these entities. (E/R Diagram) Enterprise data modeling is the first step in database development, in which the scope and general contents of organizational databases are specified. Descriptions of entity types Relationships between entities Business rules Is EDA (Enterprise Data Modeling) a top-down or bottom-up approach? (top-down; p.24) Q: Is ER/M a top-down or bottom-up approach?
Steps in the Database Development Process Enterprise Modeling Conceptual Data Modeling Cuts across Project Initiation and Planning & Analysis phases of SDLC Logical Database Design (E/R) Physical Database Design and Creation Database Implementation Database Maintenance
Life Cycle Phases of DA and DBA (Ch.11) Database Planning Database Analysis Database Design Database Implementation Operations and Maintenance Growth and Change
Two Approaches to Database and IS Development SDLC System Development Life Cycle Detailed, well-planned (and structured) development process Time-consuming, but comprehensive Long development cycle Prototyping Rapid application development (RAD) Cursory attempt at conceptual data modeling Define database during development of initial prototype Repeat implementation and maintenance activities with new prototype versions Tool: Oracle Designer
Database Design Systems Development Life Cycle (SDLC) Entity-relationship model (E-R model) Normalization
Systems Development Life Cycle Systems Implementation Product: Operational System Systems Investigation (Definition) Product: Feasibility Study Systems Analysis Functional Requirements Systems Design System Specifications Systems Maintenance Improved System Understand the Business Problem or Opportunity Develop an Information System Solution The traditional information systems development cycle is based upon the stages in the systems approach to problem solving, where each step is interdependent on the previous step: Systems Investigation. This stage may begin with a formal information systems planning process to help sort out choices from many opportunities. Typically, due to the expense associated with information systems development this stage includes a cost/benefit analysis as part of a feasibility study. Systems Analysis. This stage includes an analysis of the information needs of end users, the organizational environment, and any system currently used to develop the functional requirements of a new system. Systems Design. This stage develops specifications for the hardware, software, people, network, and data resources of the system. The information products the system is expected to produce are also designated. Systems Implementation. Here the organization develops or acquires the hardware and software needed to implement the system design. Testing of the system and training of people to operate and use the system are also part of this stage. Finally, the organization converts to the new system. Systems Maintenance. In this stage, management uses a postimplementation review process to monitor, evaluate, and modify the system as needed. Implement the Information System Solution 2000 McGraw-Hill Companies
Prototyping Prototpying is one of the most popular rapid application development (RAD) methods. It is an iterative process of system development in which requirements are converted to a working system that is continually revised through close work between analysts and users. A prototype is a small, but working system that contains only those important (not complete) features.
Systems Development Life Cycle (see also Figure 1-7) Project Identification and Selection Project Initiation and Planning Analysis Physical Design Implementation Maintenance Logical Design
Systems Development Life Cycle (see also Figure 1.7) Project Identification and Selection Purpose --preliminary understanding Deliverable –request for project Project Initiation and Planning Analysis Logical Design Physical Design Database activity – enterprise modeling Implementation Maintenance
Systems Development Life Cycle (see also Figure 1.7) Project Identification and Selection Purpose – state business situation and solution Deliverable – request for analysis Project Initiation and Planning Analysis Logical Design Physical Design Database activity – conceptual data modeling Implementation Maintenance
Systems Development Life Cycle (see also Figure 1.7) Project Identification and Selection Purpose –thorough analysis Deliverable – functional system specifications Project Initiation and Planning Analysis Logical Design Physical Design Database activity – conceptual data modeling Implementation Maintenance
Systems Development Life Cycle (see also Figure 1.7) Project Identification and Selection Purpose –information requirements structure Deliverable – detailed design specifications Project Initiation and Planning Analysis Logical Design Physical Design Database activity – logical database design Implementation Maintenance
Systems Development Life Cycle (see also Figure 1.7) Purpose –develop technology specs Deliverable – program/data structures, technology purchases, organization redesigns Project Identification and Selection Project Initiation and Planning Analysis Logical Design Physical Design Database activity – physical database design Implementation Maintenance
Systems Development Life Cycle (see also Figure 1.7) Purpose –programming, testing, training, installation, documenting Deliverable – operational programs, documentation, training materials Project Identification and Selection Project Initiation and Planning Analysis Logical Design Physical Design Database activity – database implementation Implementation Maintenance
Systems Development Life Cycle (see also Figure 1.7) Project Identification and Selection Purpose –monitor, repair, enhance Deliverable – periodic audits Project Initiation and Planning Analysis Logical Design Physical Design Database activity – database maintenance Implementation Maintenance
Figure 1-7: Database development activities during the systems development life cycle (SDLC) Planning (Enterprise modeling) Project Identification and Selection Conceptual data modeling Project Initiation and Planning Analysis Integrate database views and perform normalization Upper CASE tool (Front-end) Logical Design Physical Design Implementation Lower CASE tool (Back-end) Maintenance Growth and Change
Figure 1-8The prototyping methodology and database development process 62 62
Figure 1-8 The prototyping methodology and database development process (cont.) 63 63
Figure 1-8 The prototyping methodology and database development process (cont.) 64 64
Figure 1-8 The prototyping methodology and database development process (cont.) 65 65
Figure 1-8 The prototyping methodology and database development process (cont.) 66 66
Managing Projects Project–a planned undertaking of related activities to reach an objective that has a beginning and an end Initiated and planned in planning stage of SDLC Executed during analysis, design, and implementation Closed at the end of implementation
Managing Projects: People Involved Project is a planned undertaking of related activities to reach an objective that has a beginning and an end People involved: Business analysts – work with management and users to analyze business Systems analysts - business situation and IS needs Database analysts and modelers - requirements and design for the database component of the IS Users - assessment of their information needs and monitor the developed system meet their needs Programmers – design and write computer programs Database architects – establish standards Database and data administrators - ensure database consistency, integrity and provide consulting, training etc.. Project managers – oversees assigned projects Other technical experts - network administrators, testers, technical writers
The Technology Level of Models Conceptual models focus on the underlying content of an information system with no assumptions about technology Logical models assume a general class of technology (H/S W independent) – a relational database Internal models assume specific technologies – for example, an Oracle database engine
Database Schema External Schema (during the analysis and logical design phases) User Views Subsets of Conceptual Schema Can be determined from business-function/data entity matrices DBA determines schema for different users This is part of people-management in databases Conceptual Schema ER models (during the analysis phase)– covered in chapters 2 and 3 Internal Schema Logical structures–covered in Chapter 4 Physical structures–covered in Chapter 5
Figure 1-9 Three-schema architecture Different people have different views of the database…these are the external schema The internal schema is the underlying design and implementation` 71 71
External Schema E/R, OO … Relations Database Internal Schema Figure 1-9: Three-schema database architecture External Schema Ch. 4 E/R, OO … Relations Database Ch. 2,3,4 Meta-data/ Repository/ D.D. All good (database) things comes in threes!!! The pictures at the top level indicate that some views are for users, some for reports and some for application programs. The logical level is where the data is seen as being composed of tables. At the physical level, files are stored on disk. The mappings are specified by developers and because of the capabilities of DBMSs, no programs must be written to support these levels. What is the “Three-schema Architecture” for? Help to identify user requirements in an efficient, correct way (systematically). - iterative process - systematical way Ch. 5 Internal Schema N
Levels of database schemas Different schemas are presented to different users All good (database) things comes in threes!!! The pictures at the top level indicate that some views are for users, some for reports and some for application programs. The logical level is where the data is seen as being composed of tables. At the physical level, files are stored on disk. The mappings are specified by developers and because of the capabilities of DBMSs, no programs must be written to support these levels. What is the “Three-schema Architecture” for? Help to identify user requirements in an efficient, correct way (systematically). - iterative process - systematical way N
Exercise/Homework Homework (1) #17 ; p.45; three views with E/R - Statement View - Deposit View - Conceptual View (an integrated view the above two views) Hint: account/customer is one of common entity (Draw by Visio/Word, turn in hardcopy, due next class) (2) Online Quiz#1 (due date, midnight Sunday)
FIGURE 1-15 (a): Preliminary data model for Home Office product line marketing support system
FIGURE 1-15: Project data model for Home Office product line marketing support system