Download presentation
Presentation is loading. Please wait.
1
Practical Considerations for Data Validation
NJ CDISC User Group Novartis, East Hanover, NJ Sergiy Sirichenko February 27, 2017
2
Abbreviations CT – CDISC Control Terminology FN – False Negative
FP – False Positive P21C – Pinnacle 21 Community Validator SDRG – Study Data Reviewers Guide TCG – FDA Submission Data Technical Conformance Guide
3
Why validation? Today Data Conformance is standard and required part of regulatory submissions Mistakes in data validation may be too costly
4
Potential business case
Large Pharma filed submission to PMDA. It was returned due to data issues Sponsor was not aware of PMDA has special Rejection business rules Some of them related to CDISC CT. In particular, to FLAG variables P21 uses CDISC CT ODM files with extensions for validation needs. For example, (FLAG) Codelist with a single term ‘Y’ is introduced in addition to (NY) codelist Sponsor used original CDISC CT ODM files in P21C Or Sponsor used outdated version of P21C As a result, some PMDA rejection rules were not executed due to missing (FLAG) codelist in configuration files
5
Most submissions have major problems with data validation
Typical examples are Incorrect installation and configuration of Validator Incorrect execution of data validation Missing validation for define.xml Incorrect interpretation validation results Lack of understanding of validation process Invalid, incorrect, irrelevant explanations for reported validation findings Lack of knowledge on how evaluate and handle particular data issues
6
Major reason – lack of knowledge
Data conformance is a new concept Simplicity and ease of use of P21C may be misleading in term of its correct use Lack of documentation and education resources for data validation
7
“Good Data Validation Practice”
Our goal is to fix these deficiencies by introducing special education and training What is data validation? How to configure Validator? How to perform data validation? How to interpret validation results? How to evaluate risk of data issues? How to fix data errors? How to explain data issues?
8
Standards and data validation
Originally OpenCDISC Validator was created to help with implementation of CDISC standards Standardized data allows automation (reuse of programming code) including data validation Today vendors can provide data in SDTM format in 1 week after the first subject first visit FDA/PMDA now have capability to check and enforce data quality Data Conformance is a required part of Reviewer’s Guide
9
FDA definition for Data Validation
“data validation is a process that attempts to ensure that submitted data are both compliant and useful Compliant means the data conform to the applicable and required standards Useful means that the data can support the intended use (i.e., regulatory review and analysis)” *Source: FDA Study Data Technical Conformance Guide
10
Source: http://www. gphaonline
11
Source: http://www. gphaonline
12
Source: http://www. phusewiki
13
Changing landscape - focus is moving from Compliance to Data Quality
Adaption of data standards is accomplished Compliance is commodity now Compliance is pre-requisition for data processing (SDTM is similar to XPT) Major value of data is its content, not its format Drug works not due to SDTM format Reviewers are interested in Data Quality FDA outsourced Standard Compliance to CDISC
14
Executable Checks vs. Manual Review
P21 Errors – Checks based on executable algorithm which produce issue reports with 100% confidence Start Date is after End Date New terms in Non-Extensible CT P21 Warnings - Reports of potential data issues for manual review New terms in Extensible CT Missing Units on Results
15
Reports for DQ assessment
P21 Reports* - additional diagnostics, source of useful information Death info reconciliation Quality of MedDRA Coding Content of SUPPQUALs Missing BASELINE (all records for reported subject) *Available only in Enterprise edition
16
Major implementation challenge
Data quality is defined by intended use and as absence of errors which matter All users are different Each particular user is interested only in data issues directly related to him/her Janus Data Warehouse FDA JumpStart analyst Reviewer Level of details Project Manager vs. Programmer vs. Data Manager
17
“Severity” is user-specific
FDA vs. PMDA PMDA Rejections rules FDA/PMDA Severity is as a Risk for particular use There are many different users with different needs and assessments of Severity P21 Severity is a property of check algorithm
18
Message and description are user- specific
P21 checks are designed for Programmers “Inconsistent LBSTRESU” “Missing Seriousness Criteria for SAE” Examples of Reviewer-friendly messages: “Conversion of collected results into the same units was not done for all records” “No seriousness qualifiers (AES*) were collected for all Serious Adverse Events (AESER=Y)”
19
User-specific messages
Examples of Data Management-friendly messages “Subject ABC-001 has potentially invalid Visit Dates: VISIT 2: , VISIT 3: ” “Seriousness Criteria are missing for ‘Myocardial Infarction’ Adverse event of subject ABC-001 started on and flagged by Investigator as Serious” Message includes all diagnostics info for tracking info needed for issue resolution or generating DM query
20
Scope of P21C validation checks
It is impossible to please everybody using simple solution Who is “final” user? CDISC? - Sponsors do not submit data to CDISC FDA/PMDA Reviewers! “Minimal standard compliance” Need for review specific data elements EPOCH, --BLFL, etc. FDA may overwrite CDISC standard For example, a missing value in Required ARMCD variable for non-randomized subjects Data Quality
21
Review specific checks
SD2236: ACTARMCD does not equal ARMCD Here is no violation of SDTM compliance It’s quite common case in study conduct Goal: to report all subjects who received wrong study treatment and provide explanations to Reviewers in SDRG
22
P21 Community P21C is a personal desktop tool for SAS Programmers to QC their work for regulatory submissions For everything else, there is P21E
23
Source of data validation checks
Standards specs New CDISC Validation rules documents FDA/PMDA business rules Data management Tool specific requirements Additional requests by users There are no “exact” implementations due to multiple stakeholders and programming limitations
24
True and false “false-positive” validation messages
Issue exists, and is reported Issue does not exist, but it is reported but it is not reported Issue does not exist, and it is not reported True Positive False Positive False Negative True Negative
25
Diagnostics tests High Abnormal Diabetics Blood sugar Non-diabetics
Blood sugar Normal Low Diabetics Non-diabetics True Positive False Positive Test Cut-Point True Negative False Negative
26
Balancing between FP and FN
All P21 Warning checks produce both FP and potentially FN messages due to nature of these business rules Algorithms’ tuning yields different specificities and sensitivities in detection of real data errors
27
What to do? Minimize False-Positives
If confirmatory diagnostics for Warnings is expensive for example, extra work for data vendors or missing planned timelines Maximize True-Positives (minimize False-Negatives) If penalty for non-fixed errors is high for example, delay in submission review or compromising study results
28
Almost nobody complains about False-Negative
However, issue is still an issue if it is reported or not Incorrect algorithm for existing checks New checks for implementation Example, “Date is after RFPENDTC” Check is introduced in OpenCDISC v1.4.1 and removed in v2.0 limited to FDA official checks. However, it’s still part of P21E and FDA DataFit Most studies have problem with this rule
29
False Positive messages
Warnings are not Checks, but Reports for review User-specific issues “If I don’t care about it, then it’s not an issue!” Programming bug is a bug Report bugs to P21 for their fast fix Check a list of known bugs Use auto-update functionality for patch releases Introducing BETA releases with new checks Invalid business rule -> invalid P21 check
30
CDISC ADaM Checks 1.3 example
#279: AESEVN is not equal to 1, 2, 3,or null #282: ASEVN is not equal to 1, 2, or 3 #281: There is more than one value of AESEVN for a given value of AESEV (AESEVN & AESEV 1:1 map) #190: A variable with a prefix of R2A and a suffix of LO has y fragment appended after R2A that is not a single-digit integer [1-9] Expectations are R2A1LO, R2A2LO, … But apparently R2ALO is also legal name
31
Continuous Checks’ tuning is expected
ARMCD example FDA asked to populate a missing value for ARMCD instead of SCRNFAL, NOTASSGN Many checks use filters for Screen Failure and Not Treated Subjects Adding of a missing value to ARMCD filter is needed
32
Summary Data validation is important and required process
Most studies have deficiencies in data conformance Due to lack of knowledge P21 started “Good Data Validation Practice” efforts Data Quality is defined by Intended Use All users are different P21C is a personal desktop tool for SAS programmers to QC their work for FDA/PMDA submissions P21 Errors are Checks, Warnings are Reports to review
33
Contact info: Sergiy Sirichenko
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.