Dr. Chen, Data Base Management Chapter 10: Data Quality and Integration Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration Gonzaga.

Slides:



Advertisements
Similar presentations
Chapter 11: Data Warehousing
Advertisements

IS 4420 Database Fundamentals Chapter 11: Data Warehousing Leon Chen
C6 Databases.
Chapter 13 The Data Warehouse.
COMP 578 Data Warehouse Architecture And Design
Data Integration Combining data from different sources, providing a unified view of the data Combining data from different sources, providing a unified.
Chapter 10: data Quality and Integration
Managing Data Resources
Chapter 11: Data Warehousing
Information Integration. Modes of Information Integration Applications involved more than one database source Three different modes –Federated Databases.
© 2007 by Prentice Hall 1 Chapter 11: Data Warehousing Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
© 2007 by Prentice Hall 1 Chapter 1: The Database Environment Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
Data Warehouse success depends on metadata
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
DATA QUALITY PROBLEMS AND THEIR ROOT CAUSES DAMA COLUMBUS, OH CHAPTER MEETING – JANUARY 2015.
Chapter 1: The Database Environment
Chapter 4 Data Warehousing.
ETL By Dr. Gabriel.
BUSINESS INTELLIGENCE/DATA INTEGRATION/ETL/INTEGRATION AN INTRODUCTION Presented by: Gautam Sinha.
Data Warehousing.
Database Systems: Design, Implementation, and Management Ninth Edition
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
© 2011 Pearson Education, Inc. Publishing as Prentice Hall 1 Chapter 10: Data Quality and Integration Modern Database Management 10 th Edition Jeffrey.
L/O/G/O Metadata Business Intelligence Erwin Moeyaert.
Data Governance Data & Metadata Standards Antonio Amorin © 2011.
1 C omputer information systems Design Instructor: Mr. Ahmed Al Astal IGGC1202 College Requirement University Of Palestine.
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
Database Design - Lecture 1
MBA 664 Database Management Systems Dave Salisbury ( )
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
Objectives Overview Define the term, database, and explain how a database interacts with data and information Define the term, data integrity, and describe.
Data and database administration. Data and Database Administration CISB514 Advanced Database Database administrator.
Information Assurance The Coordinated Approach To Improving Enterprise Data Quality.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Chapter 9 Designing Databases Modern Systems Analysis and Design Sixth Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich.
© 2007 by Prentice Hall 1 Introduction to databases.
BUS1MIS Management Information Systems Semester 1, 2012 Week 6 Lecture 1.
1 Data Warehouses BUAD/American University Data Warehouses.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
1 Data Warehousing. 2Definition Data Warehouse Data Warehouse: – A subject-oriented, integrated, time-variant, non- updatable collection of data used.
Data Warehousing.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Chapter 1 Chapter 1: The Database Environment Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
5 Levels of MDM Maturity.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
1 Database Systems Instructor: Nasir Minhas Assistant Professor UIIT PMAS-AAUR
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
Chapter 11: Data Warehousing Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
7 Strategies for Extracting, Transforming, and Loading.
Carnegie Mellon University © Robert T. Monroe Management Information Systems Data Warehousing Management Information Systems Robert.
© 2013 Pearson Education, Inc. Publishing as Prentice Hall 1 CHAPTER 10: DATA QUALITY AND INTEGRATION Modern Database Management 11 th Edition Jeffrey.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 1 Database Systems.
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
Lecture 12: Data Quality and Integration
1 HCMC UT, 2008 Data Warehousing 1.Basic Concepts of data warehousing 2.Data warehouse architectures 3.Some characteristics of data warehouse data 4.The.
Overview of MDM Site Hub
Summarized from various resources Modern Database Management
Chapter 11: Data Warehousing
Data Warehouse.
Chapter 1 Database Systems
Data Quality By Suparna Kansakar.
Chapter 1 Database Systems
Data Warehousing Concepts
Presentation transcript:

Dr. Chen, Data Base Management Chapter 10: Data Quality and Integration Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration Gonzaga University Spokane, WA

Dr. Chen, Data Base Management 2 Objectives Define terms Describe importance and goals of data governance Describe importance and measures of data quality Define characteristics of quality data Describe reasons for poor data quality in organizations Describe a program for improving data quality Describe three types of data integration approaches Describe the purpose and role of master data management Describe four steps and activities of ETL for data integration for a data warehouse Explain various forms of data transformation for data warehouses

Dr. Chen, Data Base Management 3 Data Governance Data governance  High-level organizational groups and processes overseeing data stewardship across the organization Data steward  A person responsible for ensuring that organizational applications properly support the organization’s data quality goals

Dr. Chen, Data Base Management 4 Requirements for Data Governance Sponsorship from both senior management and business units A data steward manager to support, train, and coordinate data stewards Data stewards for different business units, subjects, and/or source systems A governance committee to provide data management guidelines and standards

Dr. Chen, Data Base Management 5 Importance of Data Quality If the data are bad, the business fails. Period.  GIGO – garbage in, garbage out  Sarbanes-Oxley (SOX) compliance by law sets data and metadata quality standards Purposes of data quality  Minimize IT project risk  Make timely business decisions  Ensure regulatory compliance  Expand customer base

Dr. Chen, Data Base Management 6 Uniqueness Accuracy Consistency Completeness Timeliness Currency Conformance Referential integrity Characteristics of Quality Data

Dr. Chen, Data Base Management 7 Causes of poor data quality External data sources  Lack of control over data quality Redundant data storage and inconsistent metadata  Proliferation of databases with uncontrolled redundancy and metadata Data entry  Poor data capture controls Lack of organizational commitment  Not recognizing poor data quality as an organizational issue

Dr. Chen, Data Base Management 8 Steps in Data quality improvement Get business buy-in Perform data quality audit Establish data stewardship program Improve data capture processes Apply modern data management principles and technology Apply total quality management (TQM) practices

Dr. Chen, Data Base Management 9 Business Buy-in Executive sponsorship Building a business case Prove a return on investment (ROI) Avoidance of cost Avoidance of opportunity loss

Dr. Chen, Data Base Management 10 Data Quality Audit Statistically profile all data files Document the set of values for all fields Analyze data patterns (distribution, outliers, frequencies) Verify whether controls and business rules are enforced Use specialized data profiling tools

Dr. Chen, Data Base Management 11 Data Stewardship Program Roles:  Oversight of data stewardship program  Manage data subject area  Oversee data definitions  Oversee production of data  Oversee use of data Report to: business unit vs. IT organization?

Dr. Chen, Data Base Management 12 Improving Data Capture Processes Automate data entry as much as possible Manual data entry should be selected from preset options Use trained operators when possible Follow good user interface design principles Immediate data validation for entered data

Dr. Chen, Data Base Management 13 Apply modern data management principles and technology Software tools for analyzing and correcting data quality problems:  Pattern matching  Fuzzy logic  Expert systems Sound data modeling and database design

Dr. Chen, Data Base Management 14 TQM Principles and Practices TQM – Total Quality Management TQM Principles:  Defect prevention  Continuous improvement  Use of enterprise data standards  Strong foundation of measurement Balanced focus  Customer  Product/Service

Dr. Chen, Data Base Management 15 Master Data Management (MDM) Disciplines, technologies, and methods to ensure the currency, meaning, and quality of reference data within and across various subject areas Three main architectures  Identity registry – master data remains in source systems; registry provides applications with location  Integration hub – data changes broadcast through central service to subscribing databases  Persistent – central “golden record” maintained; all applications have access. Requires applications to push data. Prone to data duplication.

Dr. Chen, Data Base Management 16 Data Integration Data integration creates a unified view of business data Other possibilities:  Application integration  Business process integration  User interaction integration Any approach requires changed data capture (CDC)  Indicates which data have changed since previous data integration activity

Dr. Chen, Data Base Management 17 Techniques for Data Integration Consolidation (ETL)  Consolidating all data into a centralized database (like a data warehouse) Data federation (EII)  Provides a virtual view of data without actually creating one centralized database Data propagation (EAI and ERD)  Duplicate data across databases, with near real-time delay

Dr. Chen, Data Base Management 18

Dr. Chen, Data Base Management 19 The Reconciled Data Layer Typical operational data is:  Transient–not historical  Not normalized (perhaps due to denormalization for performance)  Restricted in scope–not comprehensive  Sometimes poor quality–inconsistencies and errors

Dr. Chen, Data Base Management 20 The Reconciled Data Layer After ETL, data should be:  Detailed–not summarized yet  Historical–periodic  Normalized–3 rd normal form or higher  Comprehensive–enterprise-wide perspective  Timely–data should be current enough to assist decision-making  Quality controlled–accurate with full integrity

Dr. Chen, Data Base Management 21 The ETL Process Capture/Extract Scrub or data cleansing Transform Load and Index ETL = Extract, transform, and load  During initial load of Enterprise Data Warehouse (EDW)  During subsequent periodic updates to EDW

Dr. Chen, Data Base Management 22 Static extract Static extract = capturing a snapshot of the source data at a point in time Incremental extract Incremental extract = capturing changes that have occurred since the last static extract Capture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse Figure 10-1 Steps in data reconciliation 22

Dr. Chen, Data Base Management 23 Scrub/Cleanse…uses pattern recognition and AI techniques to upgrade data quality Fixing errors: Fixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies Also: Also: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing data Figure 10-1 Steps in data reconciliation (cont.) 23

Dr. Chen, Data Base Management 24 Transform … convert data from format of operational system to format of data warehouse Record-level: Selection–data partitioning Joining–data combining Aggregation–data summarizationField-level: single-field–from one field to one field multi-field–from many fields to one, or one field to many 24 Figure 10-1 Steps in data reconciliation (cont.)

Dr. Chen, Data Base Management 25 Load/Index…place transformed data into the warehouse and create indexes Refresh mode: Refresh mode: bulk rewriting of target data at periodic intervals Update mode: Update mode: only changes in source data are written to data warehouse 25 Figure 10-1 Steps in data reconciliation (cont.)

Dr. Chen, Data Base Management 26 Selection – the process of partitioning data according to predefined criteria Joining – the process of combining data from various sources into a single table or view Normalization – the process of decomposing relations with anomalies to produce smaller, well-structured relations Aggregation – the process of transforming data from detailed to summary level Record Level Transformation Functions

Dr. Chen, Data Base Management 27 Figure 10-2 Single-field transformation In general, some transformation function translates data from old form to new form a) Basic Representation

Dr. Chen, Data Base Management 28 Figure 10-2 Single-field transformation (cont.) Algorithmic transformation uses a formula or logical expression b) Algorithmic

Dr. Chen, Data Base Management 29 Figure 10-2 Single-field transformation (cont.) Table lookup uses a separate table keyed by source record code c) Table lookup

Dr. Chen, Data Base Management 30 Figure 10-3 Multi-field transformation a) Many sources to one target

Dr. Chen, Data Base Management 31 Figure 10-3 Multi-field transformation (cont.) b) One source to many targets