Download presentation
Presentation is loading. Please wait.
Published byOliver Randall Modified over 8 years ago
1
SDTM Metadata Curation Process Dianne Reeves
2
Session Outline Submit Candidate Terminology – Example spreadsheet Load new terms into EVS (Enterprise Vocabulary Services) Curate CDEs (Common Data Elements) using caDSR Tools and new EVS terms – Well-formed Metadata – Business Rules / Best Practices – Versioning
3
SDTM Code List Submission Spreadsheet
4
Review – Data Element Fundamentals Lesson 1: Naming Conventions and Rules Lesson 2: Process for Creating Data Element Concept, Value Domain and Data Element Names Lesson 3: Composing Definitions for Data Element Concepts, Value Domains and Data Elements Lesson 4: Applying Skills for Reuse of Data Element Concepts and Value Domains Creating Well-formed Metadata
5
Review - ISO/IEC 11179 Administered Components Administered Component: A registry item for which administrative information is recorded. Data Element Concept (DEC) – An idea that can be represented in the form of a data element, described independently of any particular representation. Value Domain (VD) – A set of attributes describing representational characteristics of instance data with or without permissible values. Data Element (DE) – A unit of data for which the definition, representation and permissible values are specified by means of a set of attributes.
6
Data Element Fundamentals DECDEC Object Class Property Data Element ConceptValue Domain Data Element += D E VDVD DECDEC VDVD Representation Term Representation Term + Object Class + Property + Rep Term = Data Element + Object Class + Property + Rep Term = Data Element
7
Data Element Fundamentals - Example DECDEC Person Address Person AddressZip Code Person Address Zip Code += D E VDVD DECDEC VDVD Zip Code Zip Code Person Address Zip Code Person Address Zip Code
8
Metadata as Libraries of Re-usable Components - 1 D E VDVD DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC VDVD VDVD VDVD VDVD VDVD VDVD VDVD VDVD VDVD DECDEC ID# 106 Person Address Zip Code
9
Metadata as Libraries of Reusable Components - 2 D E VDVD DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC VDVD VDVD VDVD VDVD VDVD VDVD VDVD VDVD VDVD DECDEC ID# 106 ID# 77 Person Address State Code
10
Lesson 1: Naming Conventions By the completion of this lesson, the attendee will be able to: – Identify the six things to consider when developing a context naming convention – Identify different types of vocabulary lists used for data element naming – Identify why a Thesaurus is a good source for naming terms – List the basics of a data element name – Describe the rules for a data element Long Name – Identify the three types of data element Short Names and rules for their creation
11
Developing a Naming Convention Establish a scope for the convention Determine the authority that establishes a name Develop semantic rules for the source and content of words used in a name Formulate syntax rules for required word order Develop lexical rules covering controlled word lists, name length, character set, and language Set guidelines on uniqueness of names in context
12
Types of Vocabulary Lists Vocabulary – All the words of a language. The sum of words used by, understood by, or at the command of a particular person or group. Lexicon – A stock of terms used in a particular profession, subject, or style. Ontology of Program – A set of representational terms. Definitions associate the names of entities in a logic grouping (e.g. classes, relations, functions or other objects) with human-readable text describing what the names mean, and formal axioms that constrain the interpretation and well-formed use of these terms.
13
Axiom – An established rule, principle, or law Terminology – The vocabulary of technical terms used in a particular field, subject, science, or art; nomenclature. Code Sets – A select list of terminology. Types of Vocabulary Lists (cont’d)
14
Using a Thesaurus Source of name components Provides semantic linking of preferred terms Gives guidance in using homographs Shows equivalence, hierarchy, and association Allows a controlled vocabulary
15
Different Types of Data Element Names Name – The word or combination of words by which a something is known. Long Name – A 255 character (database max.) fully annotated name describing an administered component. Short Name – A 30 character (database max.) abbreviated name for an administered component. The short name may be generated by the database system, abbreviated by the system, or entered by a user. Alternate Name – other names that identify an administered component i.e. – Dicom tags, SAS column names, UML model name.
16
Long Name Rules Long Name (maximum characters = 255) – A readable and descriptive phrase describing the administered component. – Use mixed case and capitalize the first letter of major terms. – Separate terms with spaces. – Avoid using an overly unique naming convention. Terms need to be searchable. Long Name Components – 1 – Object – 2 – Property – 3 – Representation – 4 – Qualifiers In most cases the long name will be typed out in its entirety. If it is determined that an abbreviation is needed, the context should agree on the abbreviation.
17
Short Name Rules System Generated: DEC Public ID and version with the VD Public ID and version, separated by a colon. – Example: 2145678v1.0:2356987v3.0 Abbreviated: Truncated terms from the Long Name. 4 characters, mixed case, separated by an underscore. – Example: Clinical Stage Disease Text Name would become Clin_Stag_Dise_Text_Name User Entered: A standard list of abbreviations is used by caDSR (original source CTEP). – If a standard abbreviation doesn’t exist, use the default truncated abbreviation if appropriate. – No punctuation and only upper case, separated by underscore. – If default abbreviation isn’t appropriate, create a new abbreviation. – Submit the new abbreviation to the Context Administrators.
18
Abbreviations
19
Lesson 1 Review 1.Identify the six things to consider when developing a context naming convention (slide 15) 2.Identify different types of vocabulary lists used for data element naming (slides 16 and 17) 3.Identify why a Thesaurus is a good source for naming terms (slide 18) 4.List the 3 different data element names (slide 19) 5.Describe the rules for a data element Long Name (slide 20) 6.Identify the three types of data element Short Names and rules for their creation (slide 21)
20
Lesson 2: Creating Names By the completion of this lesson, the attendee will be able to: – Summarize the process for creating Data Element Concept, Value Domain and Data Element names – List the components needed for Data Element Concept, Value Domain and Data Element names – Identify the caDSR tool used to create DEC, VD and CDE names – Name the source of component terms – Identify the rules used in creating Short names
21
Step 1: Considering Components of the Data Ask yourself: – What information do you want to capture to describe the data collected? – If I wanted to look in the database for data, what words would I use for the most efficient search? Example: – I want to collect the race of participants in a protocol. I have a list of possible responses to the question, – “What is your race?” – White, Black, Asian, Not Reported
22
*Each component is created from a registered term in the NCI Thesaurus. Step 2: Data Element Concept Name Our question is: What is your race? To create a DEC name you will need: – Object Class: What is the focus or action of the data being captured? – Property: What is the characteristic of the object class that makes it identifiable? – Qualifiers: Does the object class or property need additional description? Where will we find these terms?*
23
The Curation Tool links to EVS Finding the Object Class
24
The Curation Tool links to EVS Finding the Object Class
25
The Curation Tool links to EVS There’s so many – how do I choose? *Consider the source (NCI Thesaurus, Metathesaurus, etc.) the definition, the workflow status, and how others have used the term in the caDSR Finding the Object Class
26
The Object Class is Populated:
27
Repeat the process to find the Property Term Finding the Property
28
The Curation Tool links to EVS Finding the Property
29
Here there also many choices. Remember to look at the source, the definition and for similar use in the caDSR Finding the Property
30
The Property is Populated:
31
The Data Element Concept Name is Created
32
The Data Element Concept Short Name is Created
33
Our question is: What is your race? There is a list of responses – White, Black, Asian, Not Reported To create a VD name you will need: Representation – What is the form of the data being collected? Qualifiers – Does the representation term need additional description? Do you need an Object Class and a Property? *Each component is created from a registered term in the NCI Thesaurus. Step 3: Value Domain Name
34
Consider the Standard Representation Term List
35
The Curation Tool links to EVS Finding the Representation Term:
36
The Curation Tool links to EVS Finding the Representation:
37
The Curation Tool links to EVS *Consider the source (NCI Thesaurus, Metathesaurus, etc.) the definition, workflow status, and how others have used the term in the caDSR Finding the Representation:
38
The Representation is populated:
39
The Value Domain Long Name and Short Name are Formed:
40
Step 4: The Data Element Name To create a CDE name you will need: – Data Element Concept Name – Value Domain Name
41
The data element name is composed of the components of the DEC and VD Creating the Data Element Name
42
What happened?? Creating the Data Element Short Name
44
Lesson 2: Review 1.List the components needed for Data Element Concept and Value Domain names. Identify required components. 2.Identify the caDSR tool used to create DEC and VD names 3.Name the source of component terms. 4.Identify the types of Short names 5.List the components needed to create a Data Element name 6.Identify the rules used in creating CDE Long and Short names
45
Lesson 3: Composing Definitions By the completion of this lesson, the attendee will be able to: Describe the purpose of a definition List the six ISO guidelines for an effective definition Name the tool used for creating definitions and the source for Administered Components definitions Compose meaningful definitions for Data Element Concepts, Value Domains and Data Elements Explain how to create an Explanatory Comment and identify cases in which it would be necessary to include one.
46
Purpose of Definitions The purpose of a data element definition is to define a data element with words or phrases that describe, explain, or make clear its meaning. Good definitions promote the standardization and reuse of data elements, leading to data sharing and interoperability of information systems. The challenge is to create a definition that is specific enough to meet a study/organization’s needs and is generic enough to be used across a community in order to promote harmonization.
47
Data Element Definition Guidance A metadata definition should be: Unique Singular A statement of concept, not its negative A descriptive phrase or sentence Commonly understood abbreviations Without embedded definitions
48
Data Element Definitions Created by the Curation Tool As the Object Class, Property, and Representation are selected, a definition is built by default by concatenating the definitions of the administered components. The definitions are from EVS. Not all Default definitions are appropriate.
49
The Building of A Default Data Element Concept Definition: Default Definition: Person, a human being._ Major living subspecies of man differentiated by genetic and physical characteristics. There are four racial groups: Australoid, Caucasoid, Mongoloid, and Negroid.
50
Restructuring the Data Element Concept Definition The Definition should provide unambiguous clarification: The concise description of the Object Class along with a description of how the Property provides differentiation to the Object Class Person, a human being._ Major living subspecies of man differentiated by genetic and physical characteristics. There are four racial groups: Australoid, Caucasoid, Mongoloid, and Negroid. Modified Definition: A person's self-declared racial origination.
51
Building a Default Value Domain Definition Default Definition: A human being._An arbitrary classification of taxonomic group that is a division of a species; usually arise as a consequence of geographical isolation within a species and characterised by shared heredity, physical attributes and behavior, and in case of humans, by common history, nationality, or geographic distribution._Category; used informally to mean a class of things.
52
Restructuring the Value Domain Definition The Definition should provide unambiguous clarification: The concise description of the Representation term and it’s relationship to the Object Class and Property. Default Definition: A human being._An arbitrary classification of taxonomic group that is a division of a species; usually arise as a consequence of geographical isolation within a species and characterised by shared heredity, physical attributes and behavior, and in case of humans, by common history, nationality, or geographic distribution._Category; used informally to mean a class of things. Modified Definition: The classifications that describe a person's self-declared racial origination.
53
The Building of A Default Data Element Definition: Default Definition: A single human being._Major living subspecies of man differentiated by genetic and physical characteristics. There are four racial groups: Australoid, Caucasoid, Mongoloid, and Negroid._A human being._An arbitrary classification of taxonomic group that is a division of a species; usually arise as a consequence of geographical isolation within a species and characterised by shared heredity, physical attributes and behavior, and in case of humans, by common history, nationality, or geographic distribution._Category; used informally to mean a class of things.
54
Restructuring the Data Element Definition The Definition should provide unambiguous clarification: Identify the representation of the data and the relationship to the Object Class and Property. The definition should be unique. Composed of words in the singular that make a statement expressed in positive descriptive phrases or sentences. When necessary, it should include commonly understood abbreviations without embedded definitions. Modified Definition: The classifications that describe a person’s self-declared racial origination.
55
Explanatory Comments When the definition requires additional information to provide uniqueness or clarity, that information may be added in an Explanatory Comment. Explanatory Comments can provide examples of any broad concepts in the DEC or VD. Explanatory Comments should be included in the Comment field of the CDE.
56
Creating an Explanatory Comment
57
Lesson 3 - Review 1.Describe the purpose of a definition (slide 52) 2.List the six ISO guidelines for an effective definition (slide 53) 3.Name the tool used for creating definitions and the source for Administered Components definitions 4.Compose meaningful definitions for Data Element Concepts, Value Domains and Data Elements 5.Explain an Explanatory Comment and identify the process for creation.
58
Consider the Data by Analyzing the Question Question 1: How many times have you mixed pesticides? – Responses: Never < 50 > 50 Question 2: How many times have you mixed household cleaners? – Responses: Never < 50 > 50 Lesson 4 – Reuse of Data Element Concepts and Value Domains within a Context
59
Data Element Concept 1 * Object Class: Pesticide *Property: Mixing Qualifier: None Data Element Concept Name 1 Pesticide Mixing Data Element Concept 2 * Object Class: Cleaner Qualifier: Household *Property: Mixing Data Element Concept Name 2 Household Cleaner Mixing Specific Components of a Data Element Concept
60
Question Responses: Never < 50 > 50 Can we use a generic term to describe both Question Object Classes? Value Domain (shared) Object Class: Material Qualifier: Chemical Property: Mixing *Representation: Text Code Qualifier: Frequency *Permissible Values Value Domain Name (shared) Chemical Material Mixing Frequency Text Code Specific Components of a Value Domain
61
Before creating a new DEC or VD, search the caDSR for Components that can be reused Reuse in the caDSR
62
Unique Data Element Name 1 Value Domain Shared Unique Data Element Name 2 DEC 1 DEC 2 Combine Components to Create Unique Pairings
63
Unique Data Element Name 1 Data Element Concept Shared Unique Data Element Name 2 VD 1 VD 2 Combine Components to Create Unique Pairings
64
Data Element Concept 1 Pesticide Mixing Data Element Concept 2 Household Cleaner Mixing Value Domain (shared) Chemical Material Mixing Frequency Text Code Creating the Data Element Name
65
Unique Data Element Name 1 Pesticide Mixing Frequency Text Code Unique Data Element Name 2 Household Cleaner Frequency Text Code Value Domain (shared) Chemical Material Mixing Frequency Text Code CDE 1 = Pesticide Mixing Chemical Material Mixing Frequency Text Code CDE 2 = Household Cleaner Mixing Chemical Material Mixing Frequency Text Code Creating Data Element Names
66
Unique Data Element Name 1 Data Element Concept Shared Unique Data Element Name 2 VD 1 VD 2 Sharing a Data Element Concept
67
Use Case: 90 Minor restrictions in physically strenuous activity 80Active, but tires more quickly 70 Both greater restriction of and less time spent in play activity 60 Up and around, but minimal active play; keeps busy with quieter activities. 50 Gets dressed, but lies around much of the day; no active play, able to participate in all quiet play and activities. 40 Mostly in bed; participates in quiet activities. 30 In bed; needs assistance even for quiet play. 20 Often sleeping; play entirely limited to very passive activities. 10No play; does not get out of bed. 0Unresponsive Karnofsky Performance Status Score ?? 100Normal, no complaints, no evidence of disease 90 Able to carry on normal activity; minor signs or symptoms of disease 80 Normal activity with effort; some signs or symptoms of disease 70 Cares for self, unable to carry on normal activity or to do active work 60 Requires occasional assistance, but is able to care for most of his/her needs 50 Requires considerable assistance and frequent medical care 40Disabled, requires special care and assistance 30 Severely disabled, hospitalization indicated. Death not imminent 20 Very sick, hospitalization indicated. Death not imminent 10Moribund, fatal processes progressing rapidly 0Dead Lansky Performance Status Score? Fully active, normal100
68
Performance Status Shared Data Element Concept VD Karnofsky VD Lansky DEC Performance Status CDE Karnofsky CDE Lansksy
69
Lesson 4 Review Whenever appropriate try to reuse the DEC, VD, and/or DE. When creating a DECs and VDs consider generic terms to promote reuse. Within a context, a Data Element is created by a unique pairing of a VD and a DEC. The DEC, VD, and DE must have all required components. The selection of component terms will create well-formed metadata. DEC VD CDE
70
Creating Well-formed Metadata Naming Conventions and Rules Process for Creating Data Element Concept, Value Domain and Data Element Names Composing Definitions for Data Element Concepts, Value Domains and Data Elements Applying Skills for Reuse of Data Element Concepts and Value Domains Module Review
71
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.