Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Quality Case Study Prepared by ORC Macro. 2 Background –Data Correction Tracking system SAS AF query application Guidelines –Profile Analysis SSNs.

Similar presentations

Presentation on theme: "Data Quality Case Study Prepared by ORC Macro. 2 Background –Data Correction Tracking system SAS AF query application Guidelines –Profile Analysis SSNs."— Presentation transcript:

1 Data Quality Case Study Prepared by ORC Macro

2 2 Background –Data Correction Tracking system SAS AF query application Guidelines –Profile Analysis SSNs Names Data Correction

3 3 Profile Analysis—SSNs

4 4 Shared SSNs (n=7,100) Different Names 27% Candidates for Correction Same or Similar Names 73% Candidates for Collapse

5 5 Profile Analysis—Names Possible Duplicates 23% n=79,300 Unique Persons 77% n=267,081

6 6 Profile Analysis—Names

7 7 Definition Statistics Status OLTP—Commons Cases

8 8 Identifying the extent of the problem Investigating based on type of error Validating the investigation Implementing the change Tracking the identification, investigation, validation, and implementation Data Correction

9 9 PERSON_ID=3070908—PPRF record Identification of problem –Two different middle initials found Investigation of problem –TA module –Scripts run Validation of information –Name, SSN, degree(s), grant(s) –Sources Data Correction—An Example

10 10 PERSON_ID=3070908—PPRF record Implementation of correction –Grants report submitted to NIH OD Tracking of correction –Internal tracking system Post-correction –Loss of control of data Data Correction—An Example

11 Developing a Data Quality Business Plan

12 12 Focus of Our Activities Examination of the Database, Procedures, and Interface Development of Modified Use Cases Unified Modeling Language Identification and Extraction of Business Rules Identification of Business Model

13 13 Data Quality Issues Type-over of information Generation of duplicate persons Collapsing Changes in degree and address data Generation of orphans

14 14 Type-Over Practices Intentions: –Assign a new principal investigator (PI) to a grant –Change the name of a PI on a grant –Correct a misspelled name Consequences: –Inclusion of incorrect information in a person profile –Absence of linkages between PIs and grant applications –Creation of false linkages between PIs and grant applications

15 15 Factors Affecting Quality Relatively easy access to person-related data elements Lack of self-validation routines Interface issues

16 16 Solutions Restricted access Quality control validation Interface simplification Self-validation algorithm

17 17 Data Quality Validation Who does it? –ICs –A Quality Assurance group –Other How is it done? –Staging areas –Manual and intelligent filtering –Architecture

18 18 GM Module Screen GM1040

19 19 GM Module Screen COM1100

20 20 Self Validation Name-matching algorithm Consistency checking

21 21 Higher-Level Analysis The following are being examined relative to their effect on quality: Commons interface with IMPAC II Database redundancy Business rules in the database Master person file Front-end design Human factors Ownership

22 Development of a Data Quality Model

23 23 Evaluate the different identification algorithms currently in use for IMPAC II Develop identification algorithm(s) and procedures Serve as consultant and guarantor of efficacy of algorithm implementation Major Goals Quality improvements plan for personal identifiers

24 24 Understanding the technical infrastructure Identification of specific areas of concern Development/proposal of data quality expectations Development/proposal of appropriate, acceptable solutions Moving Forward

25 25 Outline Definition Rules Risks and Costs NIH Expectations Process Measurements/Metrics Testing Continuous Improvements Conclusions Data Quality White Paper Knowledge assets are very real and carry tremendous value.

26 26 Development/Proposal of Data Quality Expectations Develop- ment/Proposal of Appropriate, Acceptable Solutions Identification of Specific Areas of Concern Understanding the Technical Infrastructure Examination of the Database, Procedures, and Interface Development of Modified Use Cases Unified Modeling Language Identification and Extraction of Business Rules Identification of Business Model Conclusion

Download ppt "Data Quality Case Study Prepared by ORC Macro. 2 Background –Data Correction Tracking system SAS AF query application Guidelines –Profile Analysis SSNs."

Similar presentations

Ads by Google