Validation of CDISC data sets, current practice and future Jozef Aerts University of Applied Sciences FH Joanneum XML4Pharma
Who is Jozef Aerts? CDISC Volunteer since 2002 Teaching Medical Informatics in Graz Owner of XML4Pharma, a software and consultancy company Member of several CDISC development teams
Validation of CDISC datasets - current practice SDTM SEND ADaM define.xml
Validation of CDISC datasets - current practice Based on (arbitrary) interpretation of the standards documents and implementation guides By a single company With extensions from FDA and PMDA
Validation of CDISC datasets - the problem CDISC never published any validation rules (except recently for ADaM) These rules are not machine-readable CDISC Implementation guides do not provide clear rules
The old solution A company picked up the lack of CDISC initiative and developed a validation software Originally as "open source" New license however strongly limits the user rights Available as "Enterprise" and as "Community" edition Tool is also used by FDA and PMDA
The old solution - problems Tool used by FDA and PMDA reflects the interpretation of the standards and IGs by one company Not necessarily the interpretation of CDISC (teams) Leading to many discussions about what the rules exactly are Rules implementation are intransparent You cannot see how the rule is implemented (unless you look into the source code) Many false positive errors FDA/PMDA rules sometimes contradict with IG Some of them look more to be the result of a "complaint box" Copyright Charles Hope, Flickr https://www.flickr.com/photos/charleshope/4056571043/
The old solution - problems - examples https://www.pinnacle21.net/forum
Solution for the problems with the old solution It is advised to document any deviations (including false positive errors) in the "Reviewers Guide" Is this the ideal world?
The future … CDISC has established "Validation Rule Teams" For ADaM => rules already published For SDTM => work in progress These rules still come as Excel worksheets or PDF documents So not well machine-readable (although some first attempts), and therefore possibly open for different interpretations Idea/principle: what is not in the set of rules, is not a rule Validation tools should not add other/new ones Except for when mandated by the FDA, PMDA, …
CDISC SDTM validation rules (provisional)
The role of define.xml Define.xml is "the sponsor's truth" Helps the reviewers understand the submission One cannot validate submission datasets without define.xml Should validation be a 2-step mechanism? Step 1: validate contents of define.xml against SDTM/ADaM/SEND standard Step 2: validate contents of dataset against define.xml
Machine-readable rules Machine-readable rules should also be human-readable Reason: TRANSPARENCY Must have a precondition and a postcondition When does the rule apply? What is the consequence of the rule being violated (Error, Warning, …)? In case of XML, XML-Schema is always the first step But we are still using SAS-XPT!
Machine-readable rules: define.xml Step 1: validation against the XML-Schema (also see "XML Schema Validation for Define.xml White Paper") ~ 50% of the rules Step 2: validation against Schematron ~ 40% of the rules
Machine-readable rules: Submission datasets Hmmm - it's still all in XPT … Validation rules programmed in SAS? Not vendor neutral … Anyone volunteering to do so? We need something else …
Machine-readable rules in XQuery XQuery = querying language for XML documents Is a W3C standard (W3C = World Wide Web Consortium) However: only applicable to XML But we can easily transform XPT to CDISC Dataset-XML FDA and PMDA will have to move to CDISC Dataset-XML some day anywhere …
CDISC Dataset-XML CDISC's alternative to SAS-XPT
Validation rules in XQuery Human-readable & machine-executable No "wiggle room" for different interpretations Proven technology (W3C standard), vendor neutal, software language independent (Java, C#, C++, PHP, …) But only applicable to XML documents …
Validation rules in XQuery - Example Rule FDAC066: Invalid IDVAR: IDVAR must have a valid value of variables from the referenced domain Get the define.xml Get the RELREC, CO and SUPPxx dataset definitions Iterate over the RELREC, CO and SUPPxx datasets Get the physical location Get the IDVAR variable
Validation rules in XQuery - Example Rule FDAC066: Invalid IDVAR: IDVAR must have a valid value of variables from the referenced domain Iterate over all the records in the dataset Get the record number Get the value of the IDVAR Check whether it is defined in the define.xml Error message
This is work in progress … ~90% of FDA and PMDA rules done (SDTM/SEND) But ~10% of these rules is nonsense ~30% of CDISC-ADaM rules done Need help from community as I am not an ADaM specialist A few CDISC-SDTM rules done But overlap with FDA rules Available at: http://xml4pharmaserver.com/RulesXQuery/index.html Also available through RESTful web service „give me the last version of rule XYZ …“
Shouldn't the rules be in the IG itself? Do we really need separate documents with the rules? Why aren't the rules described in the IG itself? But the IGs are PDFs, not machine-readable … But highly structured, so they could be …
The future? Machine-readable Implementation Guides? Some first attempts … - rules
The future? Machine-readable Implementation Guides? Some first attempts … - rules
The future? Machine-readable Implementation Guides? Some first attempts … - codelists
Conclusions Current validation rules and their implementation are unsatisfactory Intransparent Single-sided interpretation Many false-positive errors CDISC teams want to take control back over the validation rules Publishing CDISC-owned validation rules In future, CDISC hopes to publish validation rules im human-readable, machine-executable format through SHARE Ideally, validation rules should be within a machine-readable IG
Links / Disclaimer http://cdisc-end-to-end.blogspot.com http://cdiscguru.blogspot.com The information in this presentation contains statements that are the personal opinion of Jozef Aerts and not necessarily the opinion of Jozef Aerts. None of the pictures in this presentation is owned or requires a license from APA PictureDesk, a Vienna-based „Picture Troll“ company.
Thank you for your atttention! Contact: Jozef.Aerts@XML4Pharma.com