Data Quality David Loshin. Course Structure Overview of Data Quality –Data Ownership and Data Roles –Cost Analysis of Poor Data Qaulity Dimensions of.

Slides:



Advertisements
Similar presentations
1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)
Advertisements

MEDICAL MUTUAL OF OHIO Corporate Data Warehouse January 17, 2000 By Terry Cleary Alycia Lieber Mike Mina.
HELP Water Law and Policy Dr. Patricia Wouters Director, Water Law and Policy Programme University of Dundee, Scotland Member of HELP Task Force.
C6 Databases.
Code number Data Management Council Overview North American Financial Summit April 29, 2008.
30 Jan Information Management Framework IMF Training 19 November 2003 Overview.
Basic guidelines for the creation of a DW Create corporate sponsors and plan thoroughly Determine a scalable architectural framework for the DW Identify.
Lecture 5 Themes in this session Building and managing the data warehouse Data extraction and transformation Technical issues.
SAS® Data Integration Solution
All Rights Reserved: JusticeExperts.com Enterprise? What Enterprise? Enterprise Development.
Data Quality Class 2 David Loshin. Goals Overview of Databases Cost of low data quality The information chain Use of Mini Tools.
Data Quality Class 2 David Loshin. Goals Cost of low data quality Mapping the information chain Data Quality impacts Economic measures Impact domains.
Data Quality David Loshin Knowledge Integrity Inc.
Managing Information Resources No longer “managing databases” No longer “managing databases” Much more information (good thing) Much more information (good.
Page 1Prepared by Sapient for MITVersion 0.1 – August – September 2004 This document represents a snapshot of an evolving set of documents. For information.
System Engineering Instructor: Dr. Jerry Gao. System Engineering Jerry Gao, Ph.D. Jan System Engineering Hierarchy - System Modeling - Information.
Chapter 4: Managing Information Resources with Databases Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall Chapter
® IBM Software Group © IBM Corporation IBM Information Server Metadata Management.
Chapter 1: The Database Environment
Urban Planning and Management Tools for Poverty Alleviation
LEVERAGING THE ENTERPRISE INFORMATION ENVIRONMENT Louise Edmonds Senior Manager Information Management ACT Health.
Database Administration Chapter 16. Need for Databases  Data is used by different people, in different departments, for different reasons  Interpretation.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
What is data quality? An introduction to the culture and philosophy of collecting and using accurate and useful data.
Data Governance Data & Metadata Standards Antonio Amorin © 2011.
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
Database Design - Lecture 1
Data Administration & Database Administration
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
1 IBM Software Group ® Mastering Requirements Management with Use Cases Module 4: Analyze the Problem.
Introduction to the Orion Star Data
Bennett Adelson. Microsoft Solution Center. Independence OH February 4, 2010 BENNETT ADELSON Microsoft® Solution Center.
Organizing Data and Information AD660 – Databases, Security, and Web Technologies Marcus Goncalves Spring 2013.
Master Data Impact, Data Standards, and Management Process and Tools.
Requirements Elicitation. Who are the stakeholders in determining system requirements, and how does their viewpoint influence the process? How are non-technical.
Agenda 03/27/2014 Review first test. Discuss internal data project. Review characteristics of data quality. Types of data. Data quality. Data governance.
Data Warehousing Data Mining Privacy. Reading Bhavani Thuraisingham, Murat Kantarcioglu, and Srinivasan Iyer Extended RBAC-design and implementation.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Chapter 4: Managing Information Resources with Databases Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall Chapter
DAC’ing and SMAT’ing UW Data: A Primer on the intersection of data governance and data security at University of Washington Anja Canfield-Budde Senior.
FEA DRM Management Strategy Presented by : Mary McCaffery, US EPA.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Database Administration
Paul A. Strassmann, Copyright Stevens Institute of Technology The Structure of I.T. Spending as Measure of Organizational Disorder Paul A. Strassmann,
DATA IT Senate Data Governance Membership IT Senate Data Governance Committee Membership Annie Burgad, Senior Programmer, Central IT Julie Cannon, Director.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
1 Inputs Physical Resources Environment The General Systems Model of the firm Transformation Process Output Resources Physical Resources Information Processor.
Chapter 14: Managing Technology Resources. The Technology Assets n Data n The physical infrastructure n The applications portfolio.
Collaborative Planning Training. Agenda  Collaboration Overview  Setting up Collaborative Planning  User Setups  Collaborative Planning and Forecasting.
 An Information System (IS) is a collection of interrelated components that collect, process, store, and provide as output the information needed to.
Information Integration 15 th Meeting Course Name: Business Intelligence Year: 2009.
Data Warehouses, Online Analytical Processing, and Metadata 11 th Meeting Course Name: Business Intelligence Year: 2009.
Representation of Trust Model using RDBMs Priyank Sharma Colorado State University.
Data Warehousing Data Mining Privacy. Reading FarkasCSCE Spring
Chapter 8: Data Warehousing. Data Warehouse Defined A physical repository where relational data are specially organized to provide enterprise- wide, cleansed.
Online Transactions Iowa County Information Technology Association November 20, 2002.
EECS David C. Chan1 Computer Security Management Session 1 How IT Affects Risks and Assurance.
Banner Data Correction Training Employee Data Correction Process.
Introduction BIM Data Mining.
Data Warehouse.
Student Data Governance and MDM
Managing data Resources:
(VIP-EDC) Point 6 of the agenda
Data Management Capability Assessment Model
Data Validation in the ESS Context
Metadata The metadata contains
Chapter 1 Database Systems
Data Governance & Management Skills and Experience
The Database Environment
Presentation transcript:

Data Quality David Loshin

Course Structure Overview of Data Quality –Data Ownership and Data Roles –Cost Analysis of Poor Data Qaulity Dimensions of Data Quality –Data models, Data values, Presentation Data Extraction and Transformation –ETL, Data transformation

Course Structure (2) Data Quality Improvement Metadata and Enterprise Reference Data –Domains and Mappings Data Quality Rules –Definition of Rules –Discovery of Rules

Course Structure (3) Using Data Quality Rules –Message Transformation and Routing –Data warehouse validation –GUI Generation Data Warehouse Population

Course Structure (4) Data Cleansing –Data Parsing –Standardization –Linkage –Duplicate Elimination –Approximate Searching Scalability Issues

Project Build a data quality tool –rule definition –data parsing –data element standardization –record linkage Apply the tool in characterizing real-world data (I’ll supply some, don’t worry ;-)

Some Examples Frequent Flyer Miles and Long-Distance Service Corporate Credit Card Direct Marketing Event CD Club Scam

What is Data? Working definitions: –Data: arbitrary values (with their own representation) –Information: data within a context –Knowledge: Understanding of information within its context –Metadata: data about data

Who Owns Data? Important question, because the answers indicate where responsibility for data quality lies Data quality can be difficult to effect because of complicating notions Data Processing as an “information Factory” Actors in the information factory and their roles

Actors and Their Roles Supplier Acquirer Creator Processor Packager Delivery Agent Consumer Middle Manager Senior Manager Decision-maker

Ownership Responsibilities Definition of data Authorization and Security User support Data packaging and delivery Maintenance Data quality Management of business rules Management of metadata Standards management Supplier management

Owernship Paradigms Creator Consumer Compiler Enterprise Funder Decoder Packager Reader Subject Purchaser Everyone

Complicating Notions Ownerhsip is affected by the value of data Privacy Turf Fear Bureaucracy

The Data Ownership Policy Order of enforcement Identify stakeholders Identify data sets Allocation of ownership Ownership roles and responsibilities Dispute Resolution

The Data Ownership Policy (2) Maintain a metadata database for data ownership –Parties table –Data set table –Roles and responsibilities –Policies (i.e., dispute resolution, communication, etc.)

Ownership Roles CIO CKO Trustee Policy Manager Registrar Steward Custodian Data Administrator Security Administrator Information Flow Information Processing Application development Data Provider Data Consumer

The Information Factory Information processing can be broken down into a graph Each node in the graph is a data producer, data consumer, or both The edges represent communcation paths

What is Data Quality? “Fitness for Use” Different rules for different data sets Includes, but is more than: –Data cleansing –Standardization –Deduplification –Merge-purge

Lather, Rinse, Repeat Data quality is a process: 1.Assess the current state of the quality of data 2.Determine the area that needs most improvement 3.Determine success criteria 4.Implement the improvement 5.Measure against success threshold 6.If success: goto 2

Data Quality is Hard to Do No one wants to admit mistakes Denial of responsibility Lack of understanding “Dirty work” Lack of recognition

Steps to Data Quality Training Data ownership policy Economic model of data quality Current state assessment and requirements analysis Project selection and implementation