Clinical database management: From raw data through study tabulations to analysis datasets Thank you for your kind introduction, and the opportunity to give this talk. The title of the talk is Clinical database management: From raw data through study tabulations to analysis datasets Si litt om bakgrunn, CRO, akademia, SAS, Stata Inge Christoffer Olsen, Phd Diakonhjemmet Hospital, Norway
Introduction “An experiment is a question which science poses to Nature and a measurement is the recording of Nature's answer” Max Planck I will begin with a quote from the famous physicist Max Planck: ” An experiment is a question which science poses to Nature and a measurement is the recording of Nature's answer” Meaning we cannot understand Nature without measurements. We need to take care of our measurements!
electronic Case Report Form Background Patient Study electronic Case Report Form eCRF Study database SDB
Objective Make the study database Transparent Logical Transferable Ready to analyse
CDISC Clinical Data Interchange Standards Consortium (CDISC) Open Multidisciplinary Neutral Non-profit The CDISC mission is to develop and support global, platform-independent data standards that enable information system interoperability to improve medical research and related areas of healthcare. Standards developed in cooperation with international pharmaceutical, academic and governmental stakeholders Essential for FDA regulatory submissions of new pharmaceuticals
Standards Protocol Clinical Data Acquisition Standards Harmonization (CDASH) eCRF standard Laboratory Data Model (LAB) Standard for exchanging lab-results Study Data Tabulation Model (SDTM) Structure CRF data within pre-specified domains Analysis Data Model (ADaM) Standards for analysis-ready datasets
Idea Protocol CDASH SDTM ADaM Statistical analyses Report
Strengths Standardized programs Recognizable Transferable Potentially very efficient Shown to decrease resources needed by 60% and more if implemented from the protocol on
Weaknesses Rigid Programming demanding Designed to be transferable Text variables No labels, label values or other Stata specific features Extreme long format Not suitable for “non-programmers”
Example STDM Findings class from an Myleaoid Leukemia RCT Note no treatment column STUDYID DOMAIN USUBJID XRSEQ XRTESTCD XRTEST XRCAT XRORRES XRORRESU XRSTRESC XRSTRESN XTSTRESU XRSTAT VISITNUM VISIT TESTSTUDY XR TESTSTUDY-101-02 1 DINTENS In-Patient Days in Intensive Care Unit HOSPITALISATION DAYS 21 Day 21 TESTSTUDY-101-03 2 DREASON In-Patient Days Due to Other Reasons TESTSTUDY-101-04 3 DSTUDY In-Patient Days Due to Study Treatment 22 TESTSTUDY-101-05 4 HOSPITAL Days Hospitalized in this Course TESTSTUDY-101-06 5 BLAST Blasts in Blood TREATMENT RESPONSE 13.11.2010 . TESTSTUDY-101-07 6 BM Bone Marrow NOT DONE TESTSTUDY-101-08 7 NEUT Neutrophils TESTSTUDY-101-09 8 PLAT Platelets TESTSTUDY-101-10 9 RESPONSE Treatment Response NR NO RESPONSE TESTSTUDY-101-11 10 CONDITIO Patient Condition FOLLOW-UP ALIVE 600 Safety Follow up Assessment TESTSTUDY-101-12 11 NEVER Never Had a Remission Y TESTSTUDY-101-13 12 THERAPY Further Anticancer Therapy TESTSTUDY-101-14 13 RELAPSE / SURVIVAL DEAD 701 Survival / Relapse Assessment (MONTH 1) TESTSTUDY-101-15 14 TESTSTUDY-101-16 15
Idea To use the basics from CDISC, but without the rigidity of this approach Use only the SDTM and ADaM utilities Long but not so long Keep Stata labelling features Make accessible for non-programmers
Overview Raw output from eCRF Imported into Stata Tabulation Datasets (TD) All data in semi-standardized datasets No manipulation Analysis Datasets Formatted datasets ready for analyses Possibly manipulated
Examples of TDs tddm: Demographics tdsv: Study visits tdds: Disposition important events during study such as ICF date, randomisation date, study end date, withdrawal date etc. tdtrt: Study treatment information tdie, tdae, tdcm, tdlb, tdvs etc
Examples of ADs adsl: Subject level analysis dataset adds: Disposition treatment, populations, baseline adjustments variables adds: Disposition For the patient flow figure adbl: Baseline Demographics and baseline characteristics addisact: Disease activity measures (imputed) For primary and secondary analyses
Discussion I have presented a setup for clinical study databases based on the CDISC standards I find the organisation simple and clear Clear distinction between manipulated and non-manipulated data Transferal of the database should be followed by a document describing the content Reserachers used to one large file might find the number of datasets overwhelming and confusing
The end Thank you!