« Lost » in Traceability, from SDTM to ADaM … « Lost » in Traceability, from SDTM to ADaM …. finally Analysis Results Metadata Presented by Angelo Tinazzi Cytel Inc. Geneva, Switzerland CDISC Interchange Europe Vienna 27-28 April 2016 Geneva Branch It is always a challenge for the results in the clinical study report to be able to trace back to the original raw data source. The traceability challenges intensify when raw data is converted to SDTM after the fact, while analysis datasets and the study report trace back to the original raw data source. This project will discuss and define traceability considerations and best practices for study level dataset and integrated datasets conversion for a variety of different data flow scenarios.
Traceability - Definition Traceability is the ability to verify the history, location, or application of an item by means of documented recorded identification "Glossary," ASME Boiler and Pressure Vessel Code, Section III, Article NCA-9000 All clinical trial information should be recorded, handled, and stored in a way that allows its accurate reporting, interpretation and verification ICH E6 GCP Section 2.10 “GCP Principles” All clinical trial information should be recorded, handled, and stored in a way that allows its accurate reporting, interpretation and verification. An international code establising rules of safety relating to pressure integrity in the fabrication of boilers and pressure vessels, and nuclear power plant component during and they mentioned traceability in their quality assurance plan. We also have some kind of definition in GCP. Actually in the entire clinical development process documentation is a key together with the ability of trace back
Traceability - Definition An important component of a regulatory review is an understanding of the provenance of the data (i.e., traceability of the sponsor’s results back to the CRF data). Traceability permits an understanding of the relationships between the analysis results, analysis datasets, tabulation datasets, and source data. FDA Study Technical Conformance Guide – March 2016 FDA has mentioned several time the importance of traceability in the review process in particular for them to be able to understand the provenance of data, how analysis datasets are built from the collected data, which observations and which algorithms have been used to derive variables or which statistical procedure options have been used to calculate a particular statistical measure and give emphasys again on the possibility of trace back from analysis results to analysis back to source data
Traceability - Definition The property that enables the understanding of the data’s lineage and/or the relationship between an element and its predecessor(s). Traceability facilitates transparency, which is an essential component in building confidence in a result or conclusion. Ultimately, traceability in ADaM permits the understanding of the relationship between the analysis results, the ADaM datasets, the SDTM datasets, and the data collection instrument. Traceability is built by clearly establishing the path between an element and its immediate predecessor. The full path is traced by going from one element to its predecessors, then on to their predecessors, and so on, back to the SDTM datasets, and ultimately to the data collection instrument Analysis Data Model Implementation Guide v1.1 – February 2016 All clinical trial information should be recorded, handled, and stored in a way that allows its accurate reporting, interpretation and verification. The CDISC definition from the ADAM Ig, re-enforce the concept of being able to trace back and to establish a clear path between each element and its predecessor in all steps from data collection to the production of analysis results that will be used in the CSR
Traceability – The Why, What and How Transparency and Understanding Analysis results ADaM SDTM CRF CRF SDTM ADAM Analysis results By establishing the path (a link) between an element and its immediate predecessor So all these definitons gives a rationale of why traceability is needed What should be able to trace back, either from right to left (that will be the reviewer approach in in assessing the material they have received) or from left to right (that will be sequence of how traceability and corresponding materials is developped) And how this can or should be be done
Traceability – More on the “How” ”Data-point” vs “Metadata” Traceability Datapoint traceability points directly to the specific predecessor record(s) and should be implemented if practical and feasible. Metadata traceability facilitates the understanding of the relationship of the analysis variable to its source dataset(s) and variable(s). This traceability is established by describing (via metadata) the algorithm used or steps taken to derive or populate an analysis variable from its immediate predecessor. Metadata traceability is also used to establish the relationship between an analysis result and ADaM dataset(s). Traceability can be done in the data itself or through metadata in any form
Traceability – A Case Primary Endpoint Overall Survival Analysis
Traceability – A Case Primary Endpoint Overall Survival Analysis ADaM define.xml (ARM extension) aCRF ADaM define.xml SDTM define.xml DM
Traceability - How DATA POINT METADATA SUPPORTIVE DOCUMENTS SDTM «Not applicable/Not needed» aCRF define.xml SDRG Mapping Specifications ADaM SDTM variables copy --SEQ from SDTM SRCDOM/SRCVAR/SRCSEQ ADTF ASEQ DTYPE ANLxxFL Occurrence Flags in OCCDS Intermediate ADaMs ADRG SAP Analysis Results define.xml (ARM extension) A table to summarize available tools in the data standards content and in particular in CDISC framework Data-points techniques apply to ADaM either ADAM to SDTM or ADAM to ADAM where for example you have to handle imputations. And all tools where you have space to provide additional information either through metedata with define.xml and CRF but also additional documentations specifically developped to support cases where both data-point and metadata traceability is not feasible so basically the SDRG and ADRG. But also also SAP but I would say the Study Protocol as well
Traceability – How – Analysis ResultsADaM Analysis Results Metadata (ARM) OS is the parameter to be used 2 AVAL is time to CNSR is censor «flag» 3 ADEFFTTE is the source ADAM 1 Reference to SAP and stats details 5 Stat model applied in SAS 4
Traceability – How – ADaMsSDTMs ADaM is derived from SDTM Variable copied from SDTM should keep original attributes (name, label, etc.) With some exceptions i.e. DCSREAS (Reason for Discontinuation from Study) instead of DS.DSDECOD --SEQ variable from parent SDTM SRCDOM/SRCVAR/SRCSEQ when multiple SDTM datasets are used to build ADAM i.e. SRCDOM=SS/SRCVAR=SSORRES/SRCSEQ=3 Keeping SDTM Variables for traceability
Traceability – How – ADaMsSDTMs Use of Analysis Flags Derivation of Weekly Mean Daily Pain Intensity Score by excluding observations occurred during the wash-out period Use analysis flag when you decide to exclude any particular information/record from your derivation. This is a matter being transparent but most important to allow the reviewer to reproduce the results you have obtained
Traceability – How – ADaMsADAMs Use of Intermediate Datasets Very complex derivations may require the creation of intermediate analysis datasets and In these situations the ADaM IG suggest to make use intermediate analysis datasets and suggest to submit this intermeidate along with their associated metadata. This an example of practical use from the draft Breast Cancer TAUG For the purpose of analyzing the events of the patient, it is important to identify and collect the activities for that patient which occurred during the conduct of the clinical trial. These activities, such as the dates and results of the radiologic image or other interventions, are described either in the protocol or within the SAP. These activities are used to determine the different time to event analysis. By creating a database of these activities for each subject, the analyst can aid in the review of the time to event analysis and show why the date for one activity was used instead of another activity. Proposed in Breast Cancer TAUG (draft)
Traceability – How – SDTMsCDASH Traceability – How – SDTMsLegacy The SDTM aCRF is not a specification document for the Legacy Conversion SDTM Migration from Legacy Data should be appropriately documented (same with CDASH with probably less derivations in place) We should not forget that we also need to be able to trace back to the original data collection instrument, regardless if CDASH or legacy standards where used we need to be able to document how the migration was performed for example how legacy terminology weas converted to CDISC CT
“Lost” in Traceability Data-Point Traceability at all Costs? Datapoint traceability points directly to the specific predecessor record(s) and should be implemented if practical and feasible. Where we could get lost in traceability. Firt of all I’d like to stress the fact that in definition of traceability the words «practical» and «feasible» were used so it might not worth to put in place complicated methodoloy to be able to use data-point traceability and which in the may be not easy to use for the reviewer but better to spend some more time in explaining through metadata or supportive documents ???? Do we need that ????
“Lost” in Traceability Unclear Specifications Comments/Unclear Specifications avoid copy & paste of SAS code if aval ne . and visitnum=1 then do; if chg>0 then do; …..30 more lines of code….. Provide rationale and detailed steps involved in the derivation of the variable i.e. why visitnum=1? Make us of english and «pseudo-code» Related to that documentation of your algorithm should be a simple cut & paste from your SAS code for example. Try to describe the algorithm by using pseus-code and gives rationale of the different steps involved
“Lost” in Traceability Multiple Traceability In ADTTE - STARTDT (Time to Event Origin Date for Subject) Usually DM.RFSTDTC - ADT (Analysis Date) Mutiple treaceibilty. That’s a case currently not supported by the available standards / tools. That for example the case where you have time to event endpoint where the starting point might be not necessarly the data o randomization or the first exposure date (so we could assumed all time-to-event endpoint make use of it). The example is from an oncology endppint, duration of response, where the starting point is the occurrence of an other time to event point that is time to response ADT Time to response STARTDT Duration of Response STARTDT ADT
“Lost” in Traceability - Integration Non-Linear process Original CSR based on legacy raw and analysis data with post SDTM conversion for ISS/ISE pooling Study Level Traceability Considerations - Best Practices for Non-Linear Data Flow. PhUSE-CSS WG October 2014 FDA Technical Conformance Guide, Merch 2016 Original CSR based on legacy raw and analysis data with post SDTM and ADAM conversion for ISS/ISE pooling Traceability in the CDISC content assume you follow a linear process that is your analysis results are created from ADaM, that is derived from SDTM. The traceability is definitevley broken when a linar process is not used when for example decide to prospectively migrate to SDTM and eventually re-derived ADaM from SDTM, still the CSR that was created prior to taking the decision to migrate, that will submit won’t make use of SDTM and ADaM and therefore this might make not possible reproduce the results. In this case different options should be considered when submitting data, including the re-creation of analysis results and/or the submission of the legacy data.
Conclusions Plan traceability ahead Traceability and its related documentation might help the reviewer understanding what you have done (without asking!!) When a non-linear process is not followed plan ahead and discuss possible strategy with FDA reviewers and submit a Legacy Data Conversion Plan and Report Plan traceability ahead as it can be a lot fot of work if added retrospectively. Traceabiliy in any of the form we have seen could help the reviewer understanding what you have done without going back to with you with a list of questions Be careful when a linear process is followed, this might definitevely get you lost in translation
References Traceability: Plan Ahead for Future Needs – S. Minjoe, T. Petrowitsch – PhUSE RG06 [2014] Traceability and Data Flow PhUSE-CSS WG http://www.phusewiki.org/wiki/index.php?title=Traceability_and_Data_Flow Summary of Traceability References PhUSE-CSS Wiki Page http://www.phusewiki.org/wiki/index.php?title=Summary_of_Traceability_References And here some reference that could help understanding where traceability take a role place and in particular the work done by a PhUSE-CSS working group and the white papers they have produced in 2014
Angelo Tinazzi – Associate Director – Statistical Programming Thank you for your time! Angelo Tinazzi – Associate Director – Statistical Programming angelo.tinazzi@cytel.com
Abstract “An important component of a regulatory review is an understanding of the provenance of the data (i.e. traceability of the sponsor’s results back to the CRF data). Traceability permits an understanding of the relationships between the analysis results, analysis datasets, tabulation datasets, and source data” (cfr FDA Study Data Technical Conformance Guide June 2015). With the release of the first version of “Analysis Results Metadata (ARM) Specification Version 1.0 for Define-XML Version 2” (January 2015) we have now all technical "instruments" to make this possible. The purpose of this presentation is to illustrate the technical elements in the existing CDISC standards that make possible full traceability. Examples on how to link SDTM and ADaM through either “metadata” or “data-point” traceability, including examples of good versus bad specifications, and from ADAM to Statistical Outputs with the use of ARM, will be illustrated.