Presentation is loading. Please wait.

Presentation is loading. Please wait.

SDTM Metadata Curation Process  Dianne Reeves. Session Outline  Submit Candidate Terminology – Example spreadsheet  Load new terms into EVS (Enterprise.

Similar presentations


Presentation on theme: "SDTM Metadata Curation Process  Dianne Reeves. Session Outline  Submit Candidate Terminology – Example spreadsheet  Load new terms into EVS (Enterprise."— Presentation transcript:

1 SDTM Metadata Curation Process  Dianne Reeves

2 Session Outline  Submit Candidate Terminology – Example spreadsheet  Load new terms into EVS (Enterprise Vocabulary Services)  Curate CDEs (Common Data Elements) using caDSR Tools and new EVS terms – Well-formed Metadata – Business Rules / Best Practices – Versioning

3 SDTM Code List Submission Spreadsheet

4  Review – Data Element Fundamentals  Lesson 1: Naming Conventions and Rules  Lesson 2: Process for Creating Data Element Concept, Value Domain and Data Element Names  Lesson 3: Composing Definitions for Data Element Concepts, Value Domains and Data Elements  Lesson 4: Applying Skills for Reuse of Data Element Concepts and Value Domains Creating Well-formed Metadata

5 Review - ISO/IEC 11179 Administered Components  Administered Component: A registry item for which administrative information is recorded.  Data Element Concept (DEC) – An idea that can be represented in the form of a data element, described independently of any particular representation.  Value Domain (VD) – A set of attributes describing representational characteristics of instance data with or without permissible values.  Data Element (DE) – A unit of data for which the definition, representation and permissible values are specified by means of a set of attributes.

6 Data Element Fundamentals DECDEC Object Class Property Data Element ConceptValue Domain Data Element += D E VDVD DECDEC VDVD Representation Term Representation Term + Object Class + Property + Rep Term = Data Element + Object Class + Property + Rep Term = Data Element

7 Data Element Fundamentals - Example DECDEC Person Address Person AddressZip Code Person Address Zip Code += D E VDVD DECDEC VDVD Zip Code Zip Code Person Address Zip Code Person Address Zip Code

8 Metadata as Libraries of Re-usable Components - 1 D E VDVD DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC VDVD VDVD VDVD VDVD VDVD VDVD VDVD VDVD VDVD DECDEC ID# 106 Person Address Zip Code

9 Metadata as Libraries of Reusable Components - 2 D E VDVD DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC VDVD VDVD VDVD VDVD VDVD VDVD VDVD VDVD VDVD DECDEC ID# 106 ID# 77 Person Address State Code

10 Lesson 1: Naming Conventions  By the completion of this lesson, the attendee will be able to: – Identify the six things to consider when developing a context naming convention – Identify different types of vocabulary lists used for data element naming – Identify why a Thesaurus is a good source for naming terms – List the basics of a data element name – Describe the rules for a data element Long Name – Identify the three types of data element Short Names and rules for their creation

11 Developing a Naming Convention  Establish a scope for the convention  Determine the authority that establishes a name  Develop semantic rules for the source and content of words used in a name  Formulate syntax rules for required word order  Develop lexical rules covering controlled word lists, name length, character set, and language  Set guidelines on uniqueness of names in context

12 Types of Vocabulary Lists  Vocabulary – All the words of a language. The sum of words used by, understood by, or at the command of a particular person or group.  Lexicon – A stock of terms used in a particular profession, subject, or style.  Ontology of Program – A set of representational terms. Definitions associate the names of entities in a logic grouping (e.g. classes, relations, functions or other objects) with human-readable text describing what the names mean, and formal axioms that constrain the interpretation and well-formed use of these terms.

13  Axiom – An established rule, principle, or law  Terminology – The vocabulary of technical terms used in a particular field, subject, science, or art; nomenclature.  Code Sets – A select list of terminology. Types of Vocabulary Lists (cont’d)

14 Using a Thesaurus  Source of name components  Provides semantic linking of preferred terms  Gives guidance in using homographs  Shows equivalence, hierarchy, and association  Allows a controlled vocabulary

15 Different Types of Data Element Names  Name – The word or combination of words by which a something is known.  Long Name – A 255 character (database max.) fully annotated name describing an administered component.  Short Name – A 30 character (database max.) abbreviated name for an administered component. The short name may be generated by the database system, abbreviated by the system, or entered by a user.  Alternate Name – other names that identify an administered component i.e. – Dicom tags, SAS column names, UML model name.

16 Long Name Rules  Long Name (maximum characters = 255) – A readable and descriptive phrase describing the administered component. – Use mixed case and capitalize the first letter of major terms. – Separate terms with spaces. – Avoid using an overly unique naming convention. Terms need to be searchable.  Long Name Components – 1 – Object – 2 – Property – 3 – Representation – 4 – Qualifiers  In most cases the long name will be typed out in its entirety. If it is determined that an abbreviation is needed, the context should agree on the abbreviation.

17 Short Name Rules  System Generated: DEC Public ID and version with the VD Public ID and version, separated by a colon. – Example: 2145678v1.0:2356987v3.0  Abbreviated: Truncated terms from the Long Name. 4 characters, mixed case, separated by an underscore. – Example: Clinical Stage Disease Text Name would become Clin_Stag_Dise_Text_Name  User Entered: A standard list of abbreviations is used by caDSR (original source CTEP). – If a standard abbreviation doesn’t exist, use the default truncated abbreviation if appropriate. – No punctuation and only upper case, separated by underscore. – If default abbreviation isn’t appropriate, create a new abbreviation. – Submit the new abbreviation to the Context Administrators.

18 Abbreviations

19 Lesson 1 Review 1.Identify the six things to consider when developing a context naming convention (slide 15) 2.Identify different types of vocabulary lists used for data element naming (slides 16 and 17) 3.Identify why a Thesaurus is a good source for naming terms (slide 18) 4.List the 3 different data element names (slide 19) 5.Describe the rules for a data element Long Name (slide 20) 6.Identify the three types of data element Short Names and rules for their creation (slide 21)

20 Lesson 2: Creating Names  By the completion of this lesson, the attendee will be able to: – Summarize the process for creating Data Element Concept, Value Domain and Data Element names – List the components needed for Data Element Concept, Value Domain and Data Element names – Identify the caDSR tool used to create DEC, VD and CDE names – Name the source of component terms – Identify the rules used in creating Short names

21 Step 1: Considering Components of the Data  Ask yourself: – What information do you want to capture to describe the data collected? – If I wanted to look in the database for data, what words would I use for the most efficient search?  Example: – I want to collect the race of participants in a protocol. I have a list of possible responses to the question, – “What is your race?” – White, Black, Asian, Not Reported

22 *Each component is created from a registered term in the NCI Thesaurus. Step 2: Data Element Concept Name  Our question is: What is your race?  To create a DEC name you will need: – Object Class: What is the focus or action of the data being captured? – Property: What is the characteristic of the object class that makes it identifiable? – Qualifiers: Does the object class or property need additional description?  Where will we find these terms?*

23 The Curation Tool links to EVS Finding the Object Class

24 The Curation Tool links to EVS Finding the Object Class

25 The Curation Tool links to EVS There’s so many – how do I choose? *Consider the source (NCI Thesaurus, Metathesaurus, etc.) the definition, the workflow status, and how others have used the term in the caDSR Finding the Object Class

26 The Object Class is Populated:

27 Repeat the process to find the Property Term Finding the Property

28 The Curation Tool links to EVS Finding the Property

29 Here there also many choices. Remember to look at the source, the definition and for similar use in the caDSR Finding the Property

30 The Property is Populated:

31 The Data Element Concept Name is Created

32 The Data Element Concept Short Name is Created

33 Our question is: What is your race? There is a list of responses – White, Black, Asian, Not Reported To create a VD name you will need:  Representation – What is the form of the data being collected?  Qualifiers – Does the representation term need additional description?  Do you need an Object Class and a Property? *Each component is created from a registered term in the NCI Thesaurus. Step 3: Value Domain Name

34 Consider the Standard Representation Term List

35 The Curation Tool links to EVS Finding the Representation Term:

36 The Curation Tool links to EVS Finding the Representation:

37 The Curation Tool links to EVS *Consider the source (NCI Thesaurus, Metathesaurus, etc.) the definition, workflow status, and how others have used the term in the caDSR Finding the Representation:

38 The Representation is populated:

39 The Value Domain Long Name and Short Name are Formed:

40 Step 4: The Data Element Name  To create a CDE name you will need: – Data Element Concept Name – Value Domain Name

41 The data element name is composed of the components of the DEC and VD Creating the Data Element Name

42 What happened?? Creating the Data Element Short Name

43

44 Lesson 2: Review 1.List the components needed for Data Element Concept and Value Domain names. Identify required components. 2.Identify the caDSR tool used to create DEC and VD names 3.Name the source of component terms. 4.Identify the types of Short names 5.List the components needed to create a Data Element name 6.Identify the rules used in creating CDE Long and Short names

45 Lesson 3: Composing Definitions By the completion of this lesson, the attendee will be able to:  Describe the purpose of a definition  List the six ISO guidelines for an effective definition  Name the tool used for creating definitions and the source for Administered Components definitions  Compose meaningful definitions for Data Element Concepts, Value Domains and Data Elements  Explain how to create an Explanatory Comment and identify cases in which it would be necessary to include one.

46 Purpose of Definitions  The purpose of a data element definition is to define a data element with words or phrases that describe, explain, or make clear its meaning.  Good definitions promote the standardization and reuse of data elements, leading to data sharing and interoperability of information systems.  The challenge is to create a definition that is specific enough to meet a study/organization’s needs and is generic enough to be used across a community in order to promote harmonization.

47 Data Element Definition Guidance A metadata definition should be:  Unique  Singular  A statement of concept, not its negative  A descriptive phrase or sentence  Commonly understood abbreviations  Without embedded definitions

48 Data Element Definitions Created by the Curation Tool  As the Object Class, Property, and Representation are selected, a definition is built by default by concatenating the definitions of the administered components.  The definitions are from EVS.  Not all Default definitions are appropriate.

49 The Building of A Default Data Element Concept Definition: Default Definition: Person, a human being._ Major living subspecies of man differentiated by genetic and physical characteristics. There are four racial groups: Australoid, Caucasoid, Mongoloid, and Negroid.

50 Restructuring the Data Element Concept Definition  The Definition should provide unambiguous clarification: The concise description of the Object Class along with a description of how the Property provides differentiation to the Object Class Person, a human being._ Major living subspecies of man differentiated by genetic and physical characteristics. There are four racial groups: Australoid, Caucasoid, Mongoloid, and Negroid.  Modified Definition: A person's self-declared racial origination.

51 Building a Default Value Domain Definition Default Definition: A human being._An arbitrary classification of taxonomic group that is a division of a species; usually arise as a consequence of geographical isolation within a species and characterised by shared heredity, physical attributes and behavior, and in case of humans, by common history, nationality, or geographic distribution._Category; used informally to mean a class of things.

52 Restructuring the Value Domain Definition  The Definition should provide unambiguous clarification: The concise description of the Representation term and it’s relationship to the Object Class and Property. Default Definition: A human being._An arbitrary classification of taxonomic group that is a division of a species; usually arise as a consequence of geographical isolation within a species and characterised by shared heredity, physical attributes and behavior, and in case of humans, by common history, nationality, or geographic distribution._Category; used informally to mean a class of things.  Modified Definition: The classifications that describe a person's self-declared racial origination.

53 The Building of A Default Data Element Definition: Default Definition: A single human being._Major living subspecies of man differentiated by genetic and physical characteristics. There are four racial groups: Australoid, Caucasoid, Mongoloid, and Negroid._A human being._An arbitrary classification of taxonomic group that is a division of a species; usually arise as a consequence of geographical isolation within a species and characterised by shared heredity, physical attributes and behavior, and in case of humans, by common history, nationality, or geographic distribution._Category; used informally to mean a class of things.

54 Restructuring the Data Element Definition  The Definition should provide unambiguous clarification: Identify the representation of the data and the relationship to the Object Class and Property. The definition should be unique. Composed of words in the singular that make a statement expressed in positive descriptive phrases or sentences. When necessary, it should include commonly understood abbreviations without embedded definitions.  Modified Definition: The classifications that describe a person’s self-declared racial origination.

55 Explanatory Comments  When the definition requires additional information to provide uniqueness or clarity, that information may be added in an Explanatory Comment.  Explanatory Comments can provide examples of any broad concepts in the DEC or VD.  Explanatory Comments should be included in the Comment field of the CDE.

56 Creating an Explanatory Comment

57 Lesson 3 - Review 1.Describe the purpose of a definition (slide 52) 2.List the six ISO guidelines for an effective definition (slide 53) 3.Name the tool used for creating definitions and the source for Administered Components definitions 4.Compose meaningful definitions for Data Element Concepts, Value Domains and Data Elements 5.Explain an Explanatory Comment and identify the process for creation.

58 Consider the Data by Analyzing the Question  Question 1: How many times have you mixed pesticides? – Responses: Never < 50 > 50  Question 2: How many times have you mixed household cleaners? – Responses: Never < 50 > 50 Lesson 4 – Reuse of Data Element Concepts and Value Domains within a Context

59 Data Element Concept 1 * Object Class: Pesticide *Property: Mixing Qualifier: None Data Element Concept Name 1 Pesticide Mixing Data Element Concept 2 * Object Class: Cleaner Qualifier: Household *Property: Mixing Data Element Concept Name 2 Household Cleaner Mixing Specific Components of a Data Element Concept

60 Question Responses: Never < 50 > 50 Can we use a generic term to describe both Question Object Classes? Value Domain (shared) Object Class: Material Qualifier: Chemical Property: Mixing *Representation: Text Code Qualifier: Frequency *Permissible Values Value Domain Name (shared) Chemical Material Mixing Frequency Text Code Specific Components of a Value Domain

61 Before creating a new DEC or VD, search the caDSR for Components that can be reused Reuse in the caDSR

62 Unique Data Element Name 1 Value Domain Shared Unique Data Element Name 2 DEC 1 DEC 2 Combine Components to Create Unique Pairings

63 Unique Data Element Name 1 Data Element Concept Shared Unique Data Element Name 2 VD 1 VD 2 Combine Components to Create Unique Pairings

64 Data Element Concept 1 Pesticide Mixing Data Element Concept 2 Household Cleaner Mixing Value Domain (shared) Chemical Material Mixing Frequency Text Code Creating the Data Element Name

65 Unique Data Element Name 1 Pesticide Mixing Frequency Text Code Unique Data Element Name 2 Household Cleaner Frequency Text Code Value Domain (shared) Chemical Material Mixing Frequency Text Code CDE 1 = Pesticide Mixing Chemical Material Mixing Frequency Text Code CDE 2 = Household Cleaner Mixing Chemical Material Mixing Frequency Text Code Creating Data Element Names

66 Unique Data Element Name 1 Data Element Concept Shared Unique Data Element Name 2 VD 1 VD 2 Sharing a Data Element Concept

67 Use Case: 90 Minor restrictions in physically strenuous activity 80Active, but tires more quickly 70 Both greater restriction of and less time spent in play activity 60 Up and around, but minimal active play; keeps busy with quieter activities. 50 Gets dressed, but lies around much of the day; no active play, able to participate in all quiet play and activities. 40 Mostly in bed; participates in quiet activities. 30 In bed; needs assistance even for quiet play. 20 Often sleeping; play entirely limited to very passive activities. 10No play; does not get out of bed. 0Unresponsive Karnofsky Performance Status Score ?? 100Normal, no complaints, no evidence of disease 90 Able to carry on normal activity; minor signs or symptoms of disease 80 Normal activity with effort; some signs or symptoms of disease 70 Cares for self, unable to carry on normal activity or to do active work 60 Requires occasional assistance, but is able to care for most of his/her needs 50 Requires considerable assistance and frequent medical care 40Disabled, requires special care and assistance 30 Severely disabled, hospitalization indicated. Death not imminent 20 Very sick, hospitalization indicated. Death not imminent 10Moribund, fatal processes progressing rapidly 0Dead Lansky Performance Status Score? Fully active, normal100

68 Performance Status Shared Data Element Concept VD Karnofsky VD Lansky DEC Performance Status CDE Karnofsky CDE Lansksy

69 Lesson 4 Review  Whenever appropriate try to reuse the DEC, VD, and/or DE.  When creating a DECs and VDs consider generic terms to promote reuse.  Within a context, a Data Element is created by a unique pairing of a VD and a DEC.  The DEC, VD, and DE must have all required components. The selection of component terms will create well-formed metadata. DEC VD CDE

70  Creating Well-formed Metadata  Naming Conventions and Rules  Process for Creating Data Element Concept, Value Domain and Data Element Names  Composing Definitions for Data Element Concepts, Value Domains and Data Elements  Applying Skills for Reuse of Data Element Concepts and Value Domains Module Review

71  Questions?


Download ppt "SDTM Metadata Curation Process  Dianne Reeves. Session Outline  Submit Candidate Terminology – Example spreadsheet  Load new terms into EVS (Enterprise."

Similar presentations


Ads by Google