Chapter 4: Databases and Data Warehouses

Slides:



Advertisements
Similar presentations
Information Systems Today: Managing in the Digital World
Advertisements

Database Management3-1 L3 Database Management Santa R. Susarapu Ph.D. Student Virginia Commonwealth University.
The Database Environment
Management Information Systems, Sixth Edition
Database Management: Getting Data Together Chapter 14.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
Chapter 14 The Second Component: The Database.
Business Driven Technology Unit 2 Exploring Business Intelligence Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution.
BUSINESS DRIVEN TECHNOLOGY
Mgt 20600: IT Management & Applications Databases Tuesday April 4, 2006.
Chapter 4: Managing Information Resources with Databases Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall Chapter
Chapter 1: The Database Environment
PHASE 3: SYSTEMS DESIGN Chapter 7 Data Design.
Database Systems: Design, Implementation, and Management Ninth Edition
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
6-1 DATABASE FUNDAMENTALS Information is everywhere in an organization Information is stored in databases –Database – maintains information about various.
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 13 Database Management Systems: Getting Data Together.
Web-Enabled Decision Support Systems
1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM L E C T U R E
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Organizing Data and Information AD660 – Databases, Security, and Web Technologies Marcus Goncalves Spring 2013.
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
© 2007 by Prentice Hall 1 Introduction to databases.
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Chapter 1: The Database Environment Modern Database Management 9 th Edition Jeffrey A. Hoffer,
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
1.file. 2.database. 3.entity. 4.record. 5.attribute. When working with a database, a group of related fields comprises a(n)…
Storing Organizational Information - Databases
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
Chapter 4: Managing Information Resources with Databases Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall Chapter
Chapter 1 1 Lecture # 1 & 2 Chapter # 1 Databases and Database Users Muhammad Emran Database Systems.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
McGraw-Hill/Irwin ©2009 The McGraw-Hill Companies, All Rights Reserved CHAPTER 6 DATABASES AND DATA WAREHOUSES CHAPTER 6 DATABASES AND DATA WAREHOUSES.
© 2006 Pearson Education Canada Inc. 3-1 Chapter 3 Database Management PowerPoint Presentation Jack Van Deventer Ward M. Eagen.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
Fundamentals of Information Systems, Sixth Edition Chapter 3 Database Systems, Data Centers, and Business Intelligence.
Managing Data Resources File Organization and databases for business information systems.
© 2017 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
Introduction To DBMS.
CHAPTER SIX DATA Business Intelligence
Fundamentals of Information Systems, Sixth Edition
Information Systems Today: Managing in the Digital World
Fundamentals of Information Systems, Sixth Edition
Fundamentals & Ethics of Information Systems IS 201
Chapter Ten Managing a Database.
Chapter 4 Relational Databases
Database Management System (DBMS)
What is a Database and Why Use One?
Basic Concepts in Data Management
System And Application Software
MANAGING DATA RESOURCES
Chapter 1 Database Systems
Data Resource Management
Chapter 1: The Database Environment
MANAGING DATA RESOURCES
CHAPTER 1: THE DATABASE ENVIRONMENT AND DEVELOPMENT PROCESS
Chapter 1: The Database Environment
The Database Environment
Chapter 1: The Database Environment
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Chapter 1 Database Systems
The ultimate in data organization
Data Resource Management
Chapter 3 Database Management
The Database Environment
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
Presentation transcript:

Chapter 4: Databases and Data Warehouses Every organization is awash with information resources of all kinds, and it takes considerable effort to bring together the people, technology, and processes needed to manage those resources effectively. This chapter explores the structure and quality of information, and how people organize, store, manipulate, and retrieve it. Copyright © 2015 Pearson Education, Inc.

Learning objectives 1. Information resources 2. Database advantages 3. Relational database 4. Master data management 5. Data warehouse 6. Information management The material in this chapter will enable you to: Explain the nature of information resources in terms of structure and quality, and show how metadata can be used to describe these resources. Compare file processing systems to the database, explaining the database’s advantages. Describe how a relational database is planned, accessed, and managed, and how the normalization process works. Explain why multiple databases emerge, and how master data management helps address the challenge of integration. Describe why a data warehouse is valuable for integration and strategic planning, and explain the three steps involved in creating one. 6. Explain how the human element and ownership issues affect information management. Copyright © 2015 Pearson Education, Inc.

Information resources Structured information Unstructured information Semi-structured information Metadata Every organization relies on structured information, usually considered to be facts and data. It is reasonably orderly, and can be broken down into component parts and organized into hierarchies. For example, your credit card company maintains your customer record in a structured format. It can be broken down into elements such as last name, first name, street address, phone number, and e-mail address. In contrast, unstructured information has no inherent structure or order, and the parts can’t be easily linked together. It is more difficult to break down, categorize, organize, and query. Consider a company involved in a touchy lawsuit. The information related to that could include letters, e-mails, business cards, post-it notes attached to legal documents, meeting minutes, phone calls, voice mail messages, video recordings, progress reports, project timelines, resumes, and photos in file cabinets. Drawing information out of unstructured collections presents challenges. Between the extremes of structured and unstructured information, semi-structured information shows at least some structure. Metadata is data about data, and clarifies the nature of the information. For structured information, metadata describes the definitions for each field, table, and their relationships. For semi-structured and unstructured information, metadata describe properties of a document or other resource. Flickr, a popular photo sharing website with more than 4 billion queries per day, relies on metadata to search its enormous photo collection. Copyright © 2015 Pearson Education, Inc.

Quality of information Accuracy Precision Completeness Consistency Timeliness Bias Duplication The most important characteristics that affect information quality are: • Accuracy: Mistakes reduce the quality of information. • Precision: Rounding might not reduce quality in some settings, but would be unacceptable in other settings. • Completeness: While omitting the zip code on the customer’s address record might not be a problem because the zip can be determined by the address, leaving off the house number would delay the order. • Consistency: Reports that show what appears to be the same information may conflict because the people generating the reports are using slightly different definitions. • Timeliness: Outdated information has less value than up-to-date information, and thus is lower quality unless you are specifically looking for historical trends. • Bias: Information that is biased lacks objectivity, a feature that reduces its value and quality. • Duplication: Information can be redundant, resulting in misleading and exaggerated summaries. Copyright © 2015 Pearson Education, Inc.

Managing information Record Field Table The record is a means to represent an entity, which might be a person, a product, a purchase order, or some other “thing” that has meaning to people. The record is made up of attributes of that thing, and each of the attributes is called a field. For a record intended to represent an event, common fields are event name, start date, and end date. Fields can contain numeric data or text, or a combination of the two. Each field should have a data definition that specifies its characteristics, such as the type of data it will hold or the maximum number of characters it can contain. For example, in an employee record the fields might include the person’s first name, last name, birth date, employee number, gender, e-mail address, and office phone. A group of records, with one record for each employee of the company, would be logically organized into a table, so that each unique employee is a row and the fields are the table’s columns. Copyright © 2015 Pearson Education, Inc.

File processing systems Redundancy and inconsistency Lack of integration Inconsistent definitions Dependence Initially, electronic records were created and stored as computer files, and programmers wrote computer programs to add, delete, or edit the records. Each department maintained its own records with its own computer files, each containing information that was required for operations. However, it didn’t take long for problems to surface as other offices began to develop their own file processing systems. Data redundancy and inconsistency. Because each set of computer programs operated on its own records, much information was redundant and inconsistent. Lack of data integration. Integrating data from the separate systems was a struggle. Inconsistent data definitions. When data definitions are inconsistent, the meaning of different fields will vary across departments and summaries will be misleading. Data dependence. Maintenance was difficult because the programs and their files were so interconnected and dependent on one another. The disadvantages to the file processing approach led to a better way of organizing structured data, one that relies on the database. Copyright © 2015 Pearson Education, Inc.

Databases Reduced redundancy Integrity and accuracy Ability to adapt to changes Performance and scalability Security The foundation of today’s information management relies on the database and the software that manages it. The database is an integrated collection of information that is logically related and stored in such a way as to minimize duplication and facilitate rapid retrieval. Its major advantages over file processing systems include: • Reduced redundancy and inconsistency • Improved information integrity and accuracy • Improved ability to adapt to changes • Improved performance and scalability • Increased security Database management software (DBMS) is the software that manages the database, and provides tools for ensuring security, replication, retrieval, and other administrative and housekeeping tasks. The DBMS serves as a gateway to the database and as a manager to handle creation, performance tuning, transaction processing, general maintenance, access rights, deletion, and backups. Copyright © 2015 Pearson Education, Inc.

Database architecture One to one (1:1) One to many (1:M) Many to many (M:M) To be most useful, a database must handle three types of relationships with a minimum of redundancy: • One to one (1:1) • One to many (1:M) • Many to many (M:M) The one-to-one relationship (1:1) is relatively easy to accommodate. For example, each person has one and only one birth date. The one-to-many relationship (1:M) between records is more challenging. A person might have one or more e-mail addresses, for example, or one or more employees reporting to him or her. The many-to-many relationship (M:M) is also more complicated. This might involve a situation in which a person might be working on any number of projects, each of which can have any number of employees assigned to it. Copyright © 2015 Pearson Education, Inc.

Relational database Tables of records Link field in one table to field in another table Separates data from paths to retrieve data E. F. Codd, a British mathematician working at IBM, invented the relational database, which organizes information into tables of records that are related to one another by linking a field in one table to a field in another table with matching data. The approach separates the data from the paths to retrieve them, thus making the database less dependent on the hardware and its particular operating system. His invention eventually came to dominate the field. Consider the two tables about students in this slide. The first table shows the student ID, last name, first name, and birth date. The second table shows student registrations with fields that display student ID, class code, and grade. Note that student ID is also included in this table, which makes it possible to link records in the two tables together. Copyright © 2015 Pearson Education, Inc.

Data model (1:2) Entities and attributes Primary key Normalization The data model for a relational database includes entities and attributes, primary keys and uniqueness, and normalization. Each entity represented in the model will have attributes, or fields, that describe the entity. For example, employees is a straightforward entity with attributes such as personnel number, last name, first name, birth date, e-mail address, and phone number. The entity and its attributes (or fields) will become a record, and a collection of records will become a table. Each record in a table must have one primary key, which is a field or group of fields that makes the record unique in that table. Normalization is the process of refining entities and their relationships, and helps minimize the duplication of information in the tables. For example, in the employee table, one goal of normalization is to make each attribute functionally dependent on the employee ID number, which uniquely identifies each employee. Functional dependence means that for each value of employee ID, there is exactly one value for each of the attributes included in the record, and that the employee ID determines that value. There would be just one employee e-mail address, one birth date, one last name, one first name, and one department. Copyright © 2015 Pearson Education, Inc.

Data model (2:2) Relationships and foreign keys Complex relationships When a primary key appears as an attribute in a different table, it is called a foreign key. A foreign key can be used to link records across tables. Normalization uncovers one-to-many and many-to-many relationships. For example, for a company in which each employee can work on multiple projects (many-to-many relationship), we can normalize the relationship between employees and projects. We would have one table with employees that uses EmployeeID as the primary key, a second table with projects that uses ProjectID as its primary key, and a third table that links employees with projects and contains EmployeeID and ProjectID. Copyright © 2015 Pearson Education, Inc.

Retrieving information Structured query language (SQL) Interactive voice response (IVR) Natural language interfaces Most people access the database through an application interface with user-friendly web-based forms to securely enter, edit, delete, and retrieve data. Web-based forms also make it easy to let customers and suppliers access the database, with appropriate security controls. Application software can be created in many different development environments and programming languages, and DBMS software vendors include their own tools for creating applications. As the front end or gateway, the application software performs a number of duties in addition to allowing users to enter, edit, or retrieve information. Although the application software can be developed in a number of programming languages, structured query language (SQL) is the most popular. SQL uses queries to retrieve records, insert and edit data, and delete records. Other platforms for application software include interactive voice response (IVR), which uses signals transmitted via phone to access the database, retrieve account information, and enter data. Natural language queries that respond to spoken or typed language are also improving, especially for more narrow domains and databases with limited vocabularies. Copyright © 2015 Pearson Education, Inc.

Managing the database Performance tuning and scalability Integrity, security, and recovery Documentation The database needs tuning for optimal performance, and the tuning process takes into account the way the end users access the data. Optimizing performance for speedy retrieval of information may require slowing down other tasks such as data entry or editing. Scalability refers to a system’s ability to handle rapidly increasing demand, and this is another performance issue. The database administrator (DBA) manages the rules that help ensure the integrity of the data. The software can enforce many different rules, such as the referential integrity constraint, which ensures that every foreign key entry actually exists as a primary key entry in its main table. A database management system (DBMS) will provide tools to handle access control and security, such as password protection, user authentication, and access control. The data model can be documented using a database schema, which graphically shows the tables, attributes, keys, and logical relationships. The data dictionary should contain the details of each field, including descriptions written in language users can easily understand in the context of the business. These details are sometimes omitted when developers rush to implement a project, but the effort pays off later. Copyright © 2015 Pearson Education, Inc.

Multiple databases Integration challenges Shadow systems Master data management Data stewards As organizations grow, the same disadvantages creep back in because the number of databases used to handle operations multiplies. With multiple databases, software applications, and inconsistent data definitions for similar entities, integration is a major challenge. Mergers and acquisitions contribute to the complexity, leading to many different databases operating under the same corporate umbrella. Shadow systems are smaller databases developed by individuals or departments, because people see advantages to managing their own information resources, especially the ability to control their data and make changes rapidly. They are not managed by central IT staff, who may not even know they exist. Organizations approach integration challenges in several ways. One approach that addresses the underlying inconsistencies in the way people use data is master data management (MDM), which attempts to achieve consistent and uniform definitions for entities and attributes across all business units. Teams from different business units participate in master data management to negotiate and resolve differences. Data stewards may be assigned to ensure people adhere to the definitions for the master data in their organizational units. MDM requires constant coordination among organizational units that are building, buying, or modifying databases. Copyright © 2015 Pearson Education, Inc.

Data warehouses Building data warehouses Extract, transform, and load (ETL) Data mining Another integration strategy involves data warehouses. The data warehouse is a central data repository containing information drawn from multiple sources that can be used for analysis, intelligence gathering, and strategic planning. The process to build a data warehouse is called extract, transform, and load (ETL). The first step is to extract data from its home database, and then transform and cleanse it so that it adheres to common data definitions. After transformation, the data is loaded into the data warehouse, typically another database. At frequent intervals, the load process repeats to keep it up to date. The data warehouse makes data easily accessible for strategic planning, and opens up a wealth of opportunities for managers seeking insights about their markets, customers, industry, and more. It often becomes the main source of business intelligence that managers tap to understand their customers and markets, and make strategic plans. For example, data mining is a type of intelligence gathering that uses statistical techniques to explore records in a data warehouse, hunting for hidden patterns and relationships that are undetectable in routine reports. The ability to draw high-quality information from an organization’s databases to spot trends, identify patterns, generate reports for compliance purposes, do research, and plan strategy is critical for any organization. Copyright © 2015 Pearson Education, Inc.

Human element Ownership issues Databases without boundaries Stakeholders Managing information resources is not just about managing technology. It is also about people and processes; and understanding how people view, guard, and share information resources is a critical ingredient for any successful strategy. Although a company may set the policy that all information resources are company-owned, in practice people often view these resources more protectively, even when compliance and security don’t demand tight access controls. Ownership issues have to be negotiated among many stakeholders. Another challenge is the time required to make changes to an integrated enterprise database because those changes affect many more people who will want to have input. Another example of how the human element interacts with database management involves databases without boundaries, in which people outside the enterprise enter and manage most of the records. These contributors feel strong ownership over their records. A valuable lesson from the efforts to build databases without boundaries is simply the need to plan for scalability. Meeting the needs of all stakeholders, including management, operating units and customers, is a balancing act that requires leadership, compromise, negotiation, and well-designed databases. Copyright © 2015 Pearson Education, Inc.

Summary 1. Information resources 2. Database advantages 3. Relational database 4. Master data management 5. Data warehouse 6. Information management Information resources can be described as structured, unstructured, or semi-structured. Metadata provides details about information structure and properties. Information quality is affected by several characteristics, such as accuracy, completeness, and timeliness. The database approach creates a shared resource with minimal redundancy. In a relational database, information is organized into tables in which each row represents a record. Relationships between tables are created by linking a field in one table to a field in another table with matching data. Primary keys ensure that each record in a table is unique, and foreign keys help establish relationships among tables. Integration strategies, such as master data management, are needed to provide enterprise-wide summaries for strategic planning. The data warehouse draws information from multiple sources to create one information storehouse that can be used for reporting and analysis. Enterprise information management is not just about technology. It involves a variety of challenges that touch on the human element. Leadership, cooperation, negotiation, and a well-designed database are all needed to balance all stakeholder requirements. Copyright © 2015 Pearson Education, Inc.

UK police case Video surveillance Automatic plate number recognition Database Queries and data mining Privacy Almost every city street in London is under constant video surveillance. The cameras are equipped with automatic number plate recognition (ANPR) capabilities, which use optical character recognition to decipher license plate numbers and letters. Camera data is sent to the national ANPR Data Centre in north London, which also houses the Police National Computer. Police on the beat can query it to see whether a nearby vehicle is flagged for some reason. Crosschecking the license plate information against the crime database can turn up vehicles involved in crimes, or registered to wanted criminals. Beyond criminal activity, the police database contains extensive information linked to the license plate data. For instance, a car might show that it is registered to someone who owes parking fines, or who is uninsured. The data are maintained for five years, creating a rich repository for data mining. Privacy advocates, however, are concerned about the mounting power of integrated databases and surveillance technologies to scrutinize human behavior. The United Kingdom is tightening regulations to provide better protections for citizens in an attempt to balance privacy concerns against the enormous value these databases offer to law enforcement. Copyright © 2015 Pearson Education, Inc.

Colgate Palmolive case $15 billion sales, 70 countries Consistency in products and data Colgate Business Planning (CBP)—profit, loss and ROI by product, region, and retailer Reinvested $100 million in most profitable promotions, goal $300 million With over $15 billion in annual sales, Colgate-Palmolive’s global operations span more than 70 countries. Managing this sprawling global empire requires a dedication to consistency, not just in the products themselves, but in the data they use to track every aspect of the company’s operations and performance. Colgate’s integrated backend database and enterprise software, supplied by SAP, supports a consistent approach to master data management. The database supports the Colgate Business Planning (CBP) initiative, which guides Colgate’s investment decisions around the world. CBP combined with the integrated master database, allows Colgate management to measure actual profit, loss, and return on investment for individual products, regions, and retailers, providing a very clear window into how much any investment contributed to the company’s profit. Corporate headquarters uses these finely tuned results to plan new investments. Based on CBP, Colgate reinvested $100 million in promotions found to be more profitable, and its long term goal is $300 million —a sum that could be reinvested in promotions, or added to Colgate’s bottom line. Copyright © 2015 Pearson Education, Inc.

Chapter 4 - 20