© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 2 Octagon Research Solutions, Inc. Leading the Electronic Transformation of Clinical R&D © 2008 Octagon Research Solutions, Inc. All Rights Reserved. Dan Crawford Director, Clinical Data Strategies March 12, 2008
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 3 Basic Concepts of SDTM Captures all the submitted tabulation data as a series of observations in domains based on standard specified structure –SDTM does not specify content! Raw Collected Data No imputed Values Defines specific rules for variable names and structure within each domain No derived or analysis variables except for those in SDTM –RFSTDTC (Reference Start Date) –RFENDTC (Reference End Date)
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 4 Some Common Mistakes when Converting to SDTM Adding Derived Variables to CRT Datasets Imputing Data –Completing Partial dates –Example: AE Start date is 06/2005. Do not record as 06/01/2005. That work is done in analysis datasets Plugging Holes in the data –If you didn’t collect it - Don’t try to create it now –Example: If collection date is missing, do not create an algorithm to populate. Leave it blank
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 5 SDTM and ADaM SDTM –Source or raw data –Vertical –No redundancy –Character variables –Each domain is specific to itself –Dates are ISO8601 character strings ADaM –Derived data –Structure may not necessarily be vertical –Redundancy is needed for easy analysis –Numeric variables –Combines variables across multiple domains –Dates are formatted as numeric (e.g. SAS dates) to allow manipulation Source: Susan Kenny, Inspire Pharmaceuticals Inc BOTH ARE NEEDED FOR FDA REVIEW !
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 6 Process Flow 1. Source Data Evaluation 2. Author Data Conversion Specifications 3. Migrate Data from Source to SDTM Target 4. Data Pooling to Create Integrate Database 5. Data Standardization
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 7 Required Items Normalized datasets All inclusive lab data Gaps between record content and formats catalog will be identified Verification that all fields on CRF are captured in datasets Supporting documentations on study design and data collection
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 8 Source Data Evaluation Source Data Review is the key to a successful SDTM Conversion Project. Due to the granularity of SDTM, it requires a thorough knowledge of legacy data and supporting documentation.
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 9 Legacy Data Challenges Issue: Missing data/documentation Resolution: Perform QC data audits, may include individual case report forms and/or utilize CSR listings. Work with Sponsor/vendor to identify and locate missing documentation (if it exists) Issue: Non-English databases and/or documentation Resolution: Identify early and perform translation Issue: Incomplete/incorrect formats catalogs Resolution: Identify discrepancies and update format catalogs/ manually link metadata with proper formats and then programmatically update data and apply decodes Issue: Data discrepancies/oddities Resolution: Indicate data anomalies in “Reviewers Guide” or create “Notes to Reviewer" in the define file
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 10 Source Data Evaluation Metadata Analyses: conduct a series of metadata analyses to scan for common attributes and structures against the clinical data. The results will allow you to create groups of similar studies to reduce units of work and maximize efficiency.
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 11 Project Design Group similar studies based on review of automated report, source documentation, and source data. Example: Studies coming from the same CDM system. Example: Studies with the same phase or conducted by the same CRO Data conversion specifications are developed based on similarities within groups of studies. Example: Data conversion specifications created for the first study in a group will serve as template for subsequent studies in that group.
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 12 Project Design type classData conversion specifications will be created based on type of domain: Standard or Custom, and then by class: Interventions, Events and Findings. Interventions Standard Custom EventsFindings
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 13 Project Design To ensure full accountability for all data points, each study should include a Mapping Specifications document, detailing the CDISC SDTM target for each source dataset and variable. Utilizing a database (excel or access) that stores these instructions will allow you to replicate the process for studies that have identical (or, similar) structures. One Many One Many SDTM One Many CRF
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 14
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 15
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 16
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 17 Extraction Transfer Loading (ETL) tool is used to migrate data from source to target datasets Graphical modeling of data flow Pluggable maps for reusability of logic Data Migration Process
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 18 Quality Control Automated Quality Control –Mapping specification utility: Built-in SDTM compliance wizard –CDISC SDTM compliance verified using software developed in-house or manual review. Manual Quality Control –Completion of quality control checklists: 100 % QC of converted data against mapping specifications 2 subject per Domain QC for all data points against Raw Data
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 19 Recommendations Adopt and move SDTM standards as far “upstream” as possible Design CRFs with SDTM in mind Standardize Controlled Terminology Convert Datasets to SDTM Generate Analysis datasets and CSR from SDTM
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 20 Data Pooling Challenges
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 21 Data pooling During pooling, data content is standardized Unique Subject Identifiers Terms are mapped to common standard Laboratory Data Any collected data with “controlled terminology” »AE Outcome »AE Relationship »Race Dictionary encoding of Adverse Events and Concomitant medication and possibly Medical History
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 22 Challenge #1: Standardization of Data Laboratory Data –Standardization of units Impacts results and normal range values –Normal Ranges Many times the normal ranges are not incorporated into the laboratory datasets. Find them Some Laboratory normal ranges are based on Gender and Age. –Create library of Standard Analyte names, units (SI) along with all conversion factors
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 23 Laboratory Standardization Example LBTESTLBORRESLBORRESUConversion Factor LBSTRESNLBSTRESU Albumin36g/L (SI)N/A36g/L Albumin3.4g/dL1034g/L Albumin612µmol/L g/L
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 24 Challenge # 2: USUBJID Do you have the same subjects enrolled in more than one trial? If so, do you have a database that tracks these subjects?
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 25 Challenge # 2: USUBJID When combining studies to create a pooled database for an ISS/ISE – those subjects will need to have the same USUBJID across all studies. USUBJID Database: –Pool necessary variables from all studies (most likely will come from different source datasets DM, VS, MH) –Output all Subjects with matching DOB and Gender –Use additional information to determine if subject is a match –Assign USUBJID
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 26
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 27 Challenge # 3:Recoding – Medical History Does your ISE require analysis based on a subset of the population – i.e. subjects with Cardiovascular disease? Medical History is not coded in many studies and can be problematic to code for an ISS/ISE Some CRFs may be designed to allow for more than one term per line Coding Medical History typically involves the splitting of many terms
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 28 MHTERMMHMODIFYMHBODSYSMHDECOD HIP DYSPLASIA OPERATED IN TORN MENISCUS (R) OPERATED ON OCCASIONAL LEFT KNEE PAIN HIP DYSPLASIACONGENITAL, FAMILIAL AND GENETIC DISORDERS HIP DYSPLASIA HIP DYSPLASIA OPERATED IN TORN MENISCUS (R) OPERATED ON OCCASIONAL LEFT KNEE PAIN HIP DYSPLASIA OPERATED IN APPROXIMATELY SURGICAL AND MEDICAL PROCEDURES HIP SURGERY HIP DYSPLASIA OPERATED IN TORN MENISCUS (R) OPERATED ON OCCASIONAL LEFT KNEE PAIN TORN MENISCUS (R) OPERATED ON 1994 SURGICAL AND MEDICAL PROCEDURES MENISCUS OPERATION HIP DYSPLASIA OPERATED IN TORN MENISCUS (R) OPERATED ON OCCASIONAL LEFT KNEE PAIN TORN MENISCUS (R) INJURY, POISONING AND PROCEDURAL COMPLICATIONS MENISCUS LESION HIP DYSPLASIA OPERATED IN TORN MENISCUS (R) OPERATED ON OCCASIONAL LEFT KNEE PAIN OCCASIONAL LEFT KNEE PAIN MUSCULOSKELETAL AND CONNECTIVE TISSUE DISORDERS ARTHRALGIA
© 2008 Octagon Research Solutions, Inc. All Rights Reserved. 29 Challenge # 3:Coding – “Splits” DatasetUSUBJIDAESEQAETERMAEMODIFYAEGRPID CRT Nausea/Vomiting Pool Nausea/VomitingNausea2 Pool Nausea/VomitingVomiting2