Lecture 12: Data Quality and Integration

Slides:



Advertisements
Similar presentations
Chapter 11: Data Warehousing
Advertisements

Chapter 10: Designing Databases
IS 4420 Database Fundamentals Chapter 11: Data Warehousing Leon Chen
C6 Databases.
Chapter 1: The Database Environment
Chapter 13 The Data Warehouse.
Data Integration Combining data from different sources, providing a unified view of the data Combining data from different sources, providing a unified.
Chapter 10: data Quality and Integration
Dr. Chen, Data Base Management Chapter 10: Data Quality and Integration Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration Gonzaga.
Managing Data Resources
Chapter 11: Data Warehousing
© 2005 by Prentice Hall Chapter 3a Database Design Modern Systems Analysis and Design Fourth Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich.
© 2007 by Prentice Hall 1 Chapter 11: Data Warehousing Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
© 2007 by Prentice Hall 1 Chapter 1: The Database Environment Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
Modern Systems Analysis and Design Third Edition
Information Technology in Organizations
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
Chapter 4: Managing Information Resources with Databases Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall Chapter
Chapter 1: The Database Environment
Chapter 4 Data Warehousing.
Data Warehousing.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
© 2011 Pearson Education, Inc. Publishing as Prentice Hall 1 Chapter 10: Data Quality and Integration Modern Database Management 10 th Edition Jeffrey.
CHAPTER 1: THE DATABASE ENVIRONMENT AND DEVELOPMENT PROCESS Modern Database Management 11 th Edition Jeffrey A. Hoffer, V. Ramesh, Heikki Topi © 2013 Pearson.
1 C omputer information systems Design Instructor: Mr. Ahmed Al Astal IGGC1202 College Requirement University Of Palestine.
Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall 9.1.
Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall 9.1.
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
Chapter 1: The Database Environment and Development Process
Database Design - Lecture 1
MBA 664 Database Management Systems Dave Salisbury ( )
Chapter 9 Designing Databases Modern Systems Analysis and Design Sixth Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich.
Copyright 2006 Prentice-Hall, Inc. Essentials of Systems Analysis and Design Third Edition Joseph S. Valacich Joey F. George Jeffrey A. Hoffer Chapter.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Chapter 1: The Database Environment Modern Database Management 9 th Edition Jeffrey A. Hoffer,
© 2007 Robert T. Monroe Carnegie Mellon University © Robert T. Monroe BI Tools and Techniques Data Warehousing II: Extract, Transform,
1 Data Warehouses BUAD/American University Data Warehouses.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
1 Data Warehousing. 2Definition Data Warehouse Data Warehouse: – A subject-oriented, integrated, time-variant, non- updatable collection of data used.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Principles of Database Design, Conclusions AIMS 2710 R. Nakatsu.
Chapter 1 Chapter 1: The Database Environment Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice.
Chapter 4: Managing Information Resources with Databases Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall Chapter
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Copyright 2006 Prentice-Hall, Inc. Essentials of Systems Analysis and Design Third Edition Joseph S. Valacich Joey F. George Jeffrey A. Hoffer Chapter.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
McGraw-Hill/Irwin ©2009 The McGraw-Hill Companies, All Rights Reserved CHAPTER 6 DATABASES AND DATA WAREHOUSES CHAPTER 6 DATABASES AND DATA WAREHOUSES.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
Chapter 11: Data Warehousing Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
7 Strategies for Extracting, Transforming, and Loading.
Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall Essentials of Systems Analysis and Design Fourth Edition Joseph S. Valacich Joey F.
Copyright 2001 Prentice-Hall, Inc. Essentials of Systems Analysis and Design Joseph S. Valacich Joey F. George Jeffrey A. Hoffer Chapter 9 Designing Databases.
Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall Chapter 9 Designing Databases 9.1.
Carnegie Mellon University © Robert T. Monroe Management Information Systems Data Warehousing Management Information Systems Robert.
Chapter 1 © 2013 Pearson Education, Inc. Publishing as Prentice Hall Chapter 1: The Database Environment and Development Process Modern Database Management.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Chapter 5 (Part a): Logical Database Design and the Relational Model Modern Database Management.
6.1 © 2007 by Prentice Hall Chapter 6 (Laudon & Laudon) Foundations of Business Intelligence: Databases and Information Management.
© 2013 Pearson Education, Inc. Publishing as Prentice Hall 1 CHAPTER 10: DATA QUALITY AND INTEGRATION Modern Database Management 11 th Edition Jeffrey.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Lecture 14: Data Warehousing Modern Database Management 9 th Edition Jeffrey A. Hoffer, Mary.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
Converting ER/EER to logical schema; physical design issues 1.
1 HCMC UT, 2008 Data Warehousing 1.Basic Concepts of data warehousing 2.Data warehouse architectures 3.Some characteristics of data warehouse data 4.The.
Summarized from various resources Modern Database Management
Chapter 11: Data Warehousing
Data Warehouse.
Chapter 9 Designing Databases
Chapter 9 Designing Databases
Chapter 12 Designing Databases
CHAPTER 1: THE DATABASE ENVIRONMENT AND DEVELOPMENT PROCESS
Presentation transcript:

Lecture 12: Data Quality and Integration Modern Database Management 9th Edition Jeffrey A. Hoffer, Mary B. Prescott, Heikki Topi © 2009 Pearson Education, Inc.  Publishing as Prentice Hall

Importance of Data Quality Minimize IT project risk Make timely business decisions Ensure regulatory compliance Expand customer base

Characteristics of Quality Data Uniqueness Accuracy Consistency Completeness Timeliness Currency Conformance Referential integrity

Causes of poor data quality External data sources Lack of control over data quality Redundant data storage and inconsistent metadata Proliferation of databases with uncontrolled redundancy and metadata Data entry Poor data capture controls Lack of organizational commitment Do not recognize poor data quality as an organizational issue

Data quality improvement Perform data quality audit Improve data capture processes Establish data stewardship program Apply total quality management (TQM) practices Apply modern DBMS technology Estimate return on investment Start with a high-quality data model

Improving Data Capture Processes Automate data entry as much as possible Manual data entry should be selected from preset options Use trained operators when possible Follow good user interface design principles Immediate data validation for entered data

Data Stewardship Program A person responsible for ensuring that organizational applications properly support the organization’s data quality goals Data governance High-level organizational groups and processes overseeing data stewardship across the organization

Principles for High Quality Data Models Entity types represent underlying nature of an object Entity types part of subtype/supertype hierarchy for universal context Activities and associations represented by (event) entity types, not relationships Relationships used to represent only involvement of entity types with activities or associations Candidate attributes suspected of representing relationships to other entity types Entity types should have a single attribute as the primary unique identifier

Figure 12-1 Example of a many-to-many relationship as an entity type

Data Integration Data integration creates a unified view of business data Other possibilities: Application integration Business process integration User interaction integration Any approach required changed data capture (CDC) Indicates which data have changed since previous data integration activity

Techniques for Data Integration Consolidation (ETL) Consolidating all data into a centralized database (like a data warehouse) Data federation (EII) Provides a virtual view of data without actually creating one centralized database Data propagation (EAI and ERD) Duplicate data across databases, with near real-time delay

Table 12-3 Comparison of Consolidation, Federation, and Propagation Forms of Data Integration

Master Data Management (MDM) The disciplines, technologies, and methods to ensure the currency, meaning, and quality of reference data within and across various subject areas Three main approaches Identity registry Integration hub Persistent

The Reconciled Data Layer Typical operational data is: Transient–not historical Not normalized (perhaps due to denormalization for performance) Restricted in scope–not comprehensive Sometimes poor quality–inconsistencies and errors After ETL, data should be: Detailed–not summarized yet Historical–periodic Normalized–3rd normal form or higher Comprehensive–enterprise-wide perspective Timely–data should be current enough to assist decision-making Quality controlled–accurate with full integrity

The ETL Process Capture/Extract Scrub or data cleansing Transform Load and Index ETL = Extract, transform, and load

Figure 12-2 Steps in data reconciliation Capture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse Figure 12-2 Steps in data reconciliation Incremental extract = capturing changes that have occurred since the last static extract Static extract = capturing a snapshot of the source data at a point in time

Figure 12-2 Steps in data reconciliation Scrub/Cleanse…uses pattern recognition and AI techniques to upgrade data quality Figure 12-2 Steps in data reconciliation (cont.) Fixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies Also: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing data

Figure 12-2 Steps in data reconciliation Transform = convert data from format of operational system to format of data warehouse Figure 12-2 Steps in data reconciliation (cont.) Record-level: Selection–data partitioning Joining–data combining Aggregation–data summarization Field-level: single-field–from one field to one field multi-field–from many fields to one, or one field to many

Figure 12-2 Steps in data reconciliation Load/Index= place transformed data into the warehouse and create indexes Figure 12-2 Steps in data reconciliation (cont.) Refresh mode: bulk rewriting of target data at periodic intervals Update mode: only changes in source data are written to data warehouse

Figure 12-3 Single-field transformation In general–some transformation function translates data from old form to new form Algorithmic transformation uses a formula or logical expression Table lookup–another approach, uses a separate table keyed by source record code

Figure 12-4 Multi-field transformation M:1–from many source fields to one target field 1:M–from one source field to many target fields

Table12-4 Samples of Tools to Support Data Reconciliation and Integration