Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Quality David Loshin. Course Structure Overview of Data Quality –Data Ownership and Data Roles –Cost Analysis of Poor Data Qaulity Dimensions of.

Similar presentations


Presentation on theme: "Data Quality David Loshin. Course Structure Overview of Data Quality –Data Ownership and Data Roles –Cost Analysis of Poor Data Qaulity Dimensions of."— Presentation transcript:

1 Data Quality David Loshin

2 Course Structure Overview of Data Quality –Data Ownership and Data Roles –Cost Analysis of Poor Data Qaulity Dimensions of Data Quality –Data models, Data values, Presentation Data Extraction and Transformation –ETL, Data transformation

3 Course Structure (2) Data Quality Improvement Metadata and Enterprise Reference Data –Domains and Mappings Data Quality Rules –Definition of Rules –Discovery of Rules

4 Course Structure (3) Using Data Quality Rules –Message Transformation and Routing –Data warehouse validation –GUI Generation Data Warehouse Population

5 Course Structure (4) Data Cleansing –Data Parsing –Standardization –Linkage –Duplicate Elimination –Approximate Searching Scalability Issues

6 Project Build a data quality tool –rule definition –data parsing –data element standardization –record linkage Apply the tool in characterizing real-world data (I’ll supply some, don’t worry ;-)

7 Some Examples Frequent Flyer Miles and Long-Distance Service Corporate Credit Card Direct Marketing Event CD Club Scam

8 What is Data? Working definitions: –Data: arbitrary values (with their own representation) –Information: data within a context –Knowledge: Understanding of information within its context –Metadata: data about data

9 Who Owns Data? Important question, because the answers indicate where responsibility for data quality lies Data quality can be difficult to effect because of complicating notions Data Processing as an “information Factory” Actors in the information factory and their roles

10 Actors and Their Roles Supplier Acquirer Creator Processor Packager Delivery Agent Consumer Middle Manager Senior Manager Decision-maker

11 Ownership Responsibilities Definition of data Authorization and Security User support Data packaging and delivery Maintenance Data quality Management of business rules Management of metadata Standards management Supplier management

12 Owernship Paradigms Creator Consumer Compiler Enterprise Funder Decoder Packager Reader Subject Purchaser Everyone

13 Complicating Notions Ownerhsip is affected by the value of data Privacy Turf Fear Bureaucracy

14 The Data Ownership Policy Order of enforcement Identify stakeholders Identify data sets Allocation of ownership Ownership roles and responsibilities Dispute Resolution

15 The Data Ownership Policy (2) Maintain a metadata database for data ownership –Parties table –Data set table –Roles and responsibilities –Policies (i.e., dispute resolution, communication, etc.)

16 Ownership Roles CIO CKO Trustee Policy Manager Registrar Steward Custodian Data Administrator Security Administrator Information Flow Information Processing Application development Data Provider Data Consumer

17 The Information Factory Information processing can be broken down into a graph Each node in the graph is a data producer, data consumer, or both The edges represent communcation paths

18 What is Data Quality? “Fitness for Use” Different rules for different data sets Includes, but is more than: –Data cleansing –Standardization –Deduplification –Merge-purge

19 Lather, Rinse, Repeat Data quality is a process: 1.Assess the current state of the quality of data 2.Determine the area that needs most improvement 3.Determine success criteria 4.Implement the improvement 5.Measure against success threshold 6.If success: goto 2

20 Data Quality is Hard to Do No one wants to admit mistakes Denial of responsibility Lack of understanding “Dirty work” Lack of recognition

21 Steps to Data Quality Training Data ownership policy Economic model of data quality Current state assessment and requirements analysis Project selection and implementation


Download ppt "Data Quality David Loshin. Course Structure Overview of Data Quality –Data Ownership and Data Roles –Cost Analysis of Poor Data Qaulity Dimensions of."

Similar presentations


Ads by Google