Download presentation
1
Most Common Issues in Define.xml files
NJ CDISC User Group Sergiy Sirichenko October 21, 2015
2
Abbreviations CT – CDISC Control Terminology
VLM – Value Level Metadata
3
Major problems in Define.xml
Usage of outdated Define.xml v1.0 Inconsistency in metadata Missing study specific metadata Lack of expertise
4
Outdated Define.xml v1.0 is still used
Define.xml has many standard limitation issues “The first” versions are never perfect Define.xml v1.0 is 11 years old Does anybody still using SDTM IG 3.1.1? Define.xml v2.0 is robust enough to handle current submission needs Separate presentation or webinar will be dedicated to this topic
5
Lack of structural consistency in v1.0
Metadata structural consistency in define.xml v2.0 is preventive against errors Example: Variable Source value defines other attributes “CRF” -> Pages are expected “Derived” -> Computational Algorithm is expected Define.xml v1.0 allows entering CRF pages for derived variables, having missing values for expected attributes, etc.
6
Limited and confusing VLM in v1.0
In v1.0 Value Level Metadata does not provide a reference to variable it applies Cannot handle multiple conditions Confusing and complex hierarchical VLM structure is used instead Example: LB domain has VLM assigned to LBCAT LBCAT has VLM for LBSPEC, LBSPEC -> LBMETHOD, etc. Properties of LBORRES (or other?) variable are described on some point of this tree structure V2.0 has explicit single expression with multiple condition assigned to particular variable
7
Some sponsors try to mimic v2.0
To use functionality of v2.0 Example: V1.0 does not have attributes for NCI Codes Sponsor added NCI Codes as a part of Decode value V2.0: V1.0: It’s invalid usage of v1.0 standard! Why not switch to v2.0 instead? Permitted Value (Code) mmol/L [C64387] ng/mL [*] Code Value Code Text mmol/L mmol/L [C64387] ng/mL ng/mL [*]
8
Some sponsors use custom stylesheet
Often done to mimic the functionality of v2.0 Regulatory reviewers like consistency, so please use the CDISC provided standard stylesheet
9
Non-relevant metadata
Variable Role is used for standard development, but does not add any value for study metadata Example: STUDYID and USUBJID can only be “Identifier” Does anyone actually used this info? Define.xml 2.0 stylesheet doesn’t display it
10
Order of datasets and variables
Alphabetical Example: AE, CM, DM, … Correct: logical order as defined by standard - by Class, then by domain name Random Example: Correct: as variables are present in dataset Order # Variable Label 1 AECAT Category for Adverse Event 2 AEDECOD Dictionary-Derived Term 3 AEGRPID Group ID 4 AESEQ Sequence Number 5 AETERM Reported Term for the Adverse Event 6 DOMAIN Domain Abbreviation 7 STUDYID Study Identifier 8 USUBJID Unique Subject Identifier 9 AEBODSYS Body System or Organ Class 10 AEOUT Outcome of Adverse Event …
11
Missing or invalid Origin
No references to CRF pages Example: Origin=”CRF”, instead of “CRF Page 12, 41, 57” Inconsistencies in Origin/Comments Example: RFSTDTC has Origin = “CRF” No annotations on CRF (as expected) Comments: “First dose of study medication” (it looks like Derived variable)
12
Missing of invalid Derivations
Example 1: AGE: ”Calculation: = Min DOV - BRTHDTC in AGEU“ What is DOV? How I can use Character value (BRTHDTC) in arithmetical formula? How were missing or partially missing dates handled? Derivations should be provided in terms of available data Example 2: “ZX021_AE_DURATION” ???
13
Invalid Value Level Metadata
VLM should be described on the same level as regular variables: Codelist, DataType, Length, Origin, Derivation, etc. Common issue is missing or invalid metadata for Value Level Consider VLM as new variables with properties independent from “hosted” variable Example: Treatment Emergent Flag in SUPPAE has length=1, not 200 as QVAL variable
14
Duplicate records Code List Term Variables Order Number
15
External dictionaries
Info on external dictionaries (MedDRA, WHODrug) is not provided correctly As comments to variable (non-machine readable) ISO8601 is defined as External Dictionary It’s a data format associated with all date, datetime, etc. variables. No specific reference to ISO8601 is needed if Data Type is defined correctly
16
Missing study specific metadata
Study specific information is crucial for reviewers However in most submission packages it’s missing Value of define.xml, SDRG, aCRFs is to explain what is unique in this particular study
17
Missing Codelists Codelists are limited to variables which are assigned to standard CT Commonly missing study specific Codelists for variables Category (--CAT), Subcategory (--SCAT) EXTRT, ARMCD, --TESTCD/--TEST, QNAM, TPT RDOMAIN in CO and RELREC domains XXTOX, …
18
Merged Codelists Due to confusion between Standard CT Codelist and study Variable Codelist Example: Define.xml has one codelist (UNIT) assigned to all --DOSU, --VAMTU, --ORRESU, --STRESU variables This codelist includes all unique terms across all study “units” variables and have 450 items, while for example EXDOSU variable is populated with one “mg” term only A reference to 450-terms codelist is not relevant
19
What is define.xml Codelist?
Define.xml Codelist describes data collection process and should be limited to all terms used for data collection of specific data element (a particular Variable or Value Level) For example, LBSTRESU, EGORRESU, EXDOSU usually have separate Codelists based on the same (UNIT) standard CT If data is collected as a free text, then Codelist may be not applicable Common example is CMDOSU, CMDOSFRQ, CELOC, etc.
20
Missing terms in Codelist
Term is present in data SD0037 check Programming error Due to misspelling , leading space characters, etc. Due to missing Decoded value for some items CodeList vs. EnumaretedItem Codelist was populated based on collected data, but some options from CRF page are not included Example: Only race “WHITE” is collected, while 6 options are present on CRF
21
Missing or invalid Value Level Metadata
Content of SUPPQUAL domains must be described
22
Missing description of --SPID
--SPID is often Key Variable in domain Clear and detailed description is required to understand study data Why --SPID was introduced? How it was derived? … Often Sponsors copy Notes text from CDISC IG. It’s completely invalid approach! Study specific information is expected. SDTM IG text: “Sponsor-defined reference number. Perhaps pre-printed on the CRF as an explicit line identifier or defined in the sponsor’s operational database. Example: Line number on a CRF Page.“
23
Missing description of variables
Study specific variables are the most important RFPENDTC, RFSTDTC, RFXSTDTC, --GRPID, --LNKID,--SPID, … SDTM text is not a variable description! See --SPID slide as an example
24
Invalid Key Variables Too long list of variables
Example: “STUDYID, USUBJID, EXSPID, EXTRT, EXCAT, EXDOSTXT, EXDOSU, EXDOSFRM, EXDOSFRQ, EXDOSTOT, EXROUTE, EXSTDTC, EXENDTC, EXSTDY, EXENDY, EXTPT,EXTPTNUM, EXTPTREF, VISIT” Inconsistency between Key Variables and domain Structure Example: Structure: “One record per event” Key Variable: “USUBJID, AETERM, AEDECOD, AESTDTC, AESEV, AESER, AEACN, VISIT” Usage of –SEQ as Key Variable Example: “USUBJID, AESEQ”
25
Non-compliance with eCTD
Define.xml file is located in different folder than datasets Example: define.xml in …\tabulation Data in …\tabulation\sdtm File name is not “define.xml” “define_study_001_sdtm.xml”
26
Contact info: Sergiy Sirichenko
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.