© 2011 Pearson Education, Inc. Publishing as Prentice Hall 1 Chapter 10: Data Quality and Integration Modern Database Management 10 th Edition Jeffrey.

Slides:



Advertisements
Similar presentations
Chapter 11: Data Warehousing
Advertisements

IS 4420 Database Fundamentals Chapter 11: Data Warehousing Leon Chen
By Mary Anne Poatsy, Keith Mulbery, Eric Cameron, Jason Davidson, Rebecca Lawson, Linda Lau, Jerri Williams Chapter 9 Fine-Tuning the Database 1 Copyright.
Chapter 13 The Data Warehouse.
Data Integration Combining data from different sources, providing a unified view of the data Combining data from different sources, providing a unified.
Chapter 10: data Quality and Integration
Dr. Chen, Data Base Management Chapter 10: Data Quality and Integration Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration Gonzaga.
Managing Quality Chapter 5.
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Process Specifications and Structured Decisions Systems Analysis and Design, 8e Kendall.
Chapter 11: Data Warehousing
Chapter 2: The Database Development Process Modern Database Management 9 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Heikki Topi 1 © 2009 Pearson Education,
© 2007 by Prentice Hall 1 Chapter 11: Data Warehousing Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
© 2007 by Prentice Hall 1 Chapter 1: The Database Environment Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
Information Technology in Organizations
Getting Started Chapter One DAVID M. KROENKE and DAVID J. AUER DATABASE CONCEPTS, 5 th Edition.
Chapter 8 Structuring System Data Requirements
With Microsoft Access 2010 © 2011 Pearson Education, Inc. Publishing as Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Access.
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
Chapter 4: Managing Information Resources with Databases Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall Chapter
Chapter 1 The Systems Development Environment Modern Systems Analysis and Design Sixth Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich.
Chapter 4 Data Warehousing.
Chapter © 2012 Pearson Education, Inc. Publishing as Prentice Hall.
Data Warehousing.
Getting Started Chapter One DAVID M. KROENKE and DAVID J. AUER DATABASE CONCEPTS, 6 th Edition.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
CHAPTER 1: THE DATABASE ENVIRONMENT AND DEVELOPMENT PROCESS Modern Database Management 11 th Edition Jeffrey A. Hoffer, V. Ramesh, Heikki Topi © 2013 Pearson.
1 C omputer information systems Design Instructor: Mr. Ahmed Al Astal IGGC1202 College Requirement University Of Palestine.
Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall 9.1.
Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall 9.1.
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
Chapter 1: The Database Environment and Development Process
Database Design - Lecture 1
Computers Are Your Future Tenth Edition Chapter 12: Databases & Information Systems Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall1.
MBA 664 Database Management Systems Dave Salisbury ( )
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall 1 Committed to Shaping the Next Generation of IT Experts. Chapter 1: Finding Your.
Chapter 1 The Systems Development Environment Modern Systems Analysis and Design Sixth Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich.
Chapter 9 Designing Databases Modern Systems Analysis and Design Sixth Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich.
Chapter 3 Appendix Object-Oriented Analysis and Design: Project Management Modern Systems Analysis and Design Sixth Edition Jeffrey A. Hoffer Joey F. George.
INSERT BOOK COVER 1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Access 2010 by Robert Grauer, Keith.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Chapter 1: The Database Environment Modern Database Management 9 th Edition Jeffrey A. Hoffer,
1 Copyright © 2014 Pearson Education, Inc. Publishing as Prentice Hall. Access Module 1 Workshop 2 Tables, Keys, and Relationships Series Editor Amy Kinser.
1 Data Warehouses BUAD/American University Data Warehouses.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
1 Data Warehousing. 2Definition Data Warehouse Data Warehouse: – A subject-oriented, integrated, time-variant, non- updatable collection of data used.
Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Chapter.
Chapter 1 Chapter 1: The Database Environment Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice.
Chapter 4: Managing Information Resources with Databases Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall Chapter
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Copyright 2006 Prentice-Hall, Inc. Essentials of Systems Analysis and Design Third Edition Joseph S. Valacich Joey F. George Jeffrey A. Hoffer Chapter.
Chapter 11: Data Warehousing Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
7 Strategies for Extracting, Transforming, and Loading.
Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall Essentials of Systems Analysis and Design Fourth Edition Joseph S. Valacich Joey F.
Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall Chapter 9 Designing Databases 9.1.
© 2012 Pearson Education, Inc. publishing Prentice Hall. Note 9 The Product Life Cycle.
Carnegie Mellon University © Robert T. Monroe Management Information Systems Data Warehousing Management Information Systems Robert.
Chapter 1 © 2013 Pearson Education, Inc. Publishing as Prentice Hall Chapter 1: The Database Environment and Development Process Modern Database Management.
© 2013 Pearson Education, Inc. Publishing as Prentice Hall 1 CHAPTER 10: DATA QUALITY AND INTEGRATION Modern Database Management 11 th Edition Jeffrey.
© 2012 Pearson Education, Inc. publishing Prentice Hall. Note 16 What is a Marketing Strategy?
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
Lecture 12: Data Quality and Integration
MIS 322 – Enterprise Business Process Analysis
Summarized from various resources Modern Database Management
Chapter 11: Data Warehousing
Data Warehouse.
Chapter 9 Designing Databases
Chapter 9 Designing Databases
Data Quality By Suparna Kansakar.
CHAPTER 1: THE DATABASE ENVIRONMENT AND DEVELOPMENT PROCESS
Getting Started Chapter One DATABASE CONCEPTS, 5th Edition
Chapter 2: The Database Development Process
Presentation transcript:

© 2011 Pearson Education, Inc. Publishing as Prentice Hall 1 Chapter 10: Data Quality and Integration Modern Database Management 10 th Edition Jeffrey A. Hoffer, V. Ramesh, Heikki Topi

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 2 Objectives Define terms Define terms Describe importance and goals of data governance Describe importance and goals of data governance Describe importance and measures of data quality Describe importance and measures of data quality Define characteristics of quality data Define characteristics of quality data Describe reasons for poor data quality in organizations Describe reasons for poor data quality in organizations Describe a program for improving data quality Describe a program for improving data quality Describe three types of data integration approaches Describe three types of data integration approaches Describe the purpose and role of master data management Describe the purpose and role of master data management Describe four steps and activities of ETL for data integration for a data warehouse Describe four steps and activities of ETL for data integration for a data warehouse Explain various forms of data transformation for data warehouses Explain various forms of data transformation for data warehouses

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data Governance Data governance Data governance High-level organizational groups and processes overseeing data stewardship across the organization High-level organizational groups and processes overseeing data stewardship across the organization Data steward Data steward A person responsible for ensuring that organizational applications properly support the organization’s data quality goals A person responsible for ensuring that organizational applications properly support the organization’s data quality goals 3

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall Requirements for Data Governance Sponsorship from both senior management and business units Sponsorship from both senior management and business units A data steward manager to coordinate data stewards A data steward manager to coordinate data stewards Data stewards for different business units, subjects, and/or source systems Data stewards for different business units, subjects, and/or source systems A governance committee to provide data management guidelines and standards A governance committee to provide data management guidelines and standards 4

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 5 Importance of Data Quality Minimize IT project risk Minimize IT project risk Make timely business decisions Make timely business decisions Ensure regulatory compliance Ensure regulatory compliance Expand customer base Expand customer base

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall Characteristics of Quality Data Uniqueness Uniqueness Accuracy Accuracy Consistency Consistency Completeness Completeness Timeliness Timeliness Currency Currency Conformance Conformance Referential integrity Referential integrity 6

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 7 Causes of poor data quality External data sources External data sources Lack of control over data quality Lack of control over data quality Redundant data storage and inconsistent metadata Redundant data storage and inconsistent metadata Proliferation of databases with uncontrolled redundancy and metadata Proliferation of databases with uncontrolled redundancy and metadata Data entry Data entry Poor data capture controls Poor data capture controls Lack of organizational commitment Lack of organizational commitment Not recognizing poor data quality as an organizational issue Not recognizing poor data quality as an organizational issue

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 8 Data quality improvement Get business buy-in Get business buy-in Perform data quality audit Perform data quality audit Establish data stewardship program Establish data stewardship program Improve data capture processes Improve data capture processes Apply modern data management principles and technology Apply modern data management principles and technology Apply total quality management (TQM) practices Apply total quality management (TQM) practices

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall Business Buy-in Executive sponsorship Executive sponsorship Building a business case Building a business case Prove a return on investment (ROI) Prove a return on investment (ROI) Avoidance of cost Avoidance of cost Avoidance of opportunity loss Avoidance of opportunity loss 9

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data Quality Audit Statistically profile all data files Statistically profile all data files Document the set of values for all fields Document the set of values for all fields Analyze data patterns (distribution, outliers, frequencies) Analyze data patterns (distribution, outliers, frequencies) Verify whether controls and business rules are enforced Verify whether controls and business rules are enforced Use specialized data profiling tools Use specialized data profiling tools 10

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data Stewardship Program Roles: Roles: Oversight of data stewardship program Oversight of data stewardship program Manage data subject area Manage data subject area Oversee data definitions Oversee data definitions Oversee production of data Oversee production of data Oversee use of data Oversee use of data Report to: business unit vs. IT organization? Report to: business unit vs. IT organization? 11

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall Improving Data Capture Processes Automate data entry as much as possible Automate data entry as much as possible Manual data entry should be selected from preset options Manual data entry should be selected from preset options Use trained operators when possible Use trained operators when possible Follow good user interface design principles Follow good user interface design principles Immediate data validation for entered data Immediate data validation for entered data 12

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall TQM Principles and Practices TQM – Total Quality Management TQM – Total Quality Management TQM Principles: TQM Principles: Defect prevention Defect prevention Continuous improvement Continuous improvement Use of enterprise data standards Use of enterprise data standards Balanced focus Balanced focus Customer Customer Product/Service Product/Service 13

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall Master Data Management (MDM) The disciplines, technologies, and methods to ensure the currency, meaning, and quality of reference data within and across various subject areas The disciplines, technologies, and methods to ensure the currency, meaning, and quality of reference data within and across various subject areas Three main architectures Three main architectures Identity registry Identity registry Integration hub Integration hub Persistent Persistent 14

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data Integration Data integration creates a unified view of business data Data integration creates a unified view of business data Other possibilities: Other possibilities: Application integration Application integration Business process integration Business process integration User interaction integration User interaction integration Any approach requires changed data capture (CDC) Any approach requires changed data capture (CDC) Indicates which data have changed since previous data integration activity Indicates which data have changed since previous data integration activity 15

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall Techniques for Data Integration Consolidation (ETL) Consolidation (ETL) Consolidating all data into a centralized database (like a data warehouse) Consolidating all data into a centralized database (like a data warehouse) Data federation (EII) Data federation (EII) Provides a virtual view of data without actually creating one centralized database Provides a virtual view of data without actually creating one centralized database Data propagation (EAI and ERD) Data propagation (EAI and ERD) Duplicate data across databases, with near real-time delay Duplicate data across databases, with near real-time delay 16

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 17

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 18 The Reconciled Data Layer Typical operational data is: Typical operational data is: Transient–not historical Transient–not historical Not normalized (perhaps due to denormalization for performance) Not normalized (perhaps due to denormalization for performance) Restricted in scope–not comprehensive Restricted in scope–not comprehensive Sometimes poor quality–inconsistencies and errors Sometimes poor quality–inconsistencies and errors After ETL, data should be: After ETL, data should be: Detailed–not summarized yet Detailed–not summarized yet Historical–periodic Historical–periodic Normalized–3 rd normal form or higher Normalized–3 rd normal form or higher Comprehensive–enterprise-wide perspective Comprehensive–enterprise-wide perspective Timely–data should be current enough to assist decision- making Timely–data should be current enough to assist decision- making Quality controlled–accurate with full integrity Quality controlled–accurate with full integrity

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 19 The ETL Process Capture/Extract Capture/Extract Scrub or data cleansing Scrub or data cleansing Transform Transform Load and Index Load and Index ETL = Extract, transform, and load

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 20 Static extract Static extract = capturing a snapshot of the source data at a point in time Incremental extract Incremental extract = capturing changes that have occurred since the last static extract Capture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse Figure 10-1 Steps in data reconciliation

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 21 Scrub/Cleanse…uses pattern recognition and AI techniques to upgrade data quality Fixing errors: Fixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies Also: Also: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing data Figure 10-1 Steps in data reconciliation (cont.)

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 22 Transform = convert data from format of operational system to format of data warehouse Record-level: Selection–data partitioning Joining–data combining Aggregation–data summarization Field-level: single-field–from one field to one field multi-field–from many fields to one, or one field to many Figure 10-1 Steps in data reconciliation (cont.)

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 23 Load/Index= place transformed data into the warehouse and create indexes Refresh mode: Refresh mode: bulk rewriting of target data at periodic intervals Update mode: Update mode: only changes in source data are written to data warehouse Figure 10-1 Steps in data reconciliation (cont.)

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 24 Figure 10-2 Single-field transformation In general–some transformation function translates data from old form to new form a) Basic Representation

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 25 Figure 10-2 Single-field transformation (cont.) Algorithmic transformation uses a formula or logical expression b) Algorithmic

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 26 Figure 10-2 Single-field transformation (cont.) Table lookup–another approach, uses a separate table keyed by source record code c) Table lookup

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 27 Figure 10-3 Multi-field transformation a) Many sources to one target

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 28 Figure 10-3 Multi-field transformation (cont.) b) One source to many targets

Chapter 10 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 29 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America. Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall