Semantic Interoperability: caCORE and the Cancer Data Standards Repository (caDSR) Jennifer Brush
Session Outline Audience Interview – What do you want to learn? Standard Vocabularies – NCI Thesaurus as part of EVS Metadata and Data Elements – Their differences and why we use them caCORE Infrastructure and caDSR – How it all fits together caDSR Tools – CDE Browser – CDE Curation Tool – Sentinel Tool Semantic Interoperability – UML Model Browser / Semantic Integration Workbench
Standard Vocabularies Facilitate translational research Integrate diverse data systems Improve the links between clinical research and the healthcare delivery system
Enterprise Vocabulary Services (EVS) Address NCI’s needs for controlled vocabulary and semantics Components NCI ThesaurusNCI Metathesaurus Stand-alone reference terminology Relational: Links to multiple terminologies One definition for cancer research One or more definitions from multiple sources Designed for annotation and database coding to facilitate data analysis and retrieval Designed for mapping cancer terms across terminologies throughout the cancer research community to facilitate integration
Use EVS Check and compare dictionary definitions. Find synonyms. Determine relationships to other concepts/terms. Identify and evaluate potential options when curating new CDEs or adding terms to permissible value lists. Provide links to related research publications. If you can’t find a term, you can submit a new one
Exercise 1 - Examine an EVS Term Complete: Exercise 1 from the “Semantic Interoperability” exercise handout. Time: 2 minutes
Exercise 1 - Examine an EVS Term 1.Navigate to the NCI Terminology Browser nciterms.nci.nih.gov 2.Select the NCI Thesaurus 3.Select “Connect” 4.Enter “gene” in the Quick Search entry field, then “Gene” from the results
concept code
Metadata is data about data Metadata describes the content, quality, condition, and other characteristics of data Example: If a question on a form reads: “What is your age?” – What is the data? – What is the metadata? Define Metadata
caDSR Overview: Metadata Example: Age caDSR metadata repository Data Describes the data in What is your age?: Metadata 33 Local database stored in Person Self Reported Age (data element) Person Self Reported Age (data element concept) Age Values (value domain) Person (object class) Self Reported Age (property) Datatype: Numeric Max length: 10 Version: 2.0 High Value: 999 Low Value: 0 Type: Non-enumerated stored in
Data Elements A data element is a standard way of describing and representing metadata – e.g. caDSR contains metadata based on the ISO/IEC metadata standard “Semantically Immutable Metadata” are data elements that are made up of one or more terms from a standard vocabulary “Semantically Interoperable Systems” base their data models (in our case, UML Class Diagrams) on metadata that is semantically immutable
Data Element Fundamentals DECDEC Object Class Property Data Element ConceptValue Domain Data Element += D E VDVD DECDEC VDVD Representation Term Representation Term + Object Class + Property + Rep Term = Data Element + Object Class + Property + Rep Term = Data Element
Data Element Fundamentals DECDEC Person Address Person AddressZip Code Person Address Zip Code += D E VDVD DECDEC VDVD Zip Code Zip Code Person Address Zip Code Person Address Zip Code
Libraries of Re-usable Components D E VDVD DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC VDVD VDVD VDVD VDVD VDVD VDVD VDVD VDVD VDVD DECDEC ID# 106 Person Address Zip Code
Libraries of Reusable Components D E VDVD DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC DECDEC VDVD VDVD VDVD VDVD VDVD VDVD VDVD VDVD VDVD DECDEC ID# 106 ID# 77 Person Address State Code
How Data Elements are Used On Forms for data collection (CRFs) In Databases to describe database field attributes and constraints In information/UML Modeling Support APIs To describe application user interface components, validation rules, display name and format
Cancer Data Standards Repository (caDSR ) Metadata repository and registry Based on the ISO/IEC standard for metadata registries Designed to integrate caCORE infrastructure Supports the development and deployment of Data Elements that are used as metadata descriptors
ISO is a non-government network of the national standards institutes of 151 countries ISO has standards for mathematics, manufacturing, electrical mechanical and civil engineering, imaging, electronics, and information technology Benefits of using ISO/IEC 11179: – Metadata model fully supports the variations needed for biomedical applications – Easier to understand and share cancer research information. – – ISO: International Organization for Standardization
caCORE Components Enterprise Vocabulary Data Standards Bioinformatics Objects
caCORE Infrastructure Vocabulary for CDE specification Dictionary, thesaurus services Domain object metadata Common data elements Public APIs Common data elements (CDEs)
caDSR Tools: Purpose caDSR Tools are designed to: – Create, consume, distribute and promote ISO/IEC compliant metadata – Enable semantic consistency across research domains – Support the metadata life-cycle and governance processes
caDSR Tools CDE Browser / FormBuilder – Search for and Download Data Elements – Collect Data Elements onto Forms and Download Forms CDE Curation Tool – Curate (Create and Edit) Data Element Concepts, Value Domains and Data Elements Sentinel Tool – Create Alert Definitions to monitor changes to caDSR metadata
CDE Browser (Search & Download) caDSR Search Tree: Displays all the current caDSR Contexts. Users can search for groups of DEs by navigating the tree. Data Element Search Pane: This is the main search window. Users looking for Data Elements can enter a key word or phrase. Navigation Menu: use these buttons to navigate to the CDE cart, Form Builder, or back to Home( that is back to this page)
Exercise 2 – Examine a Data Element in the CDE Browser Complete: Exercise 2 from the “Semantic Interoperability” exercise handout. Time: 5 minutes
Exercise 2 – Examine a Data Element in the CDE Browser Navigate to the CDE Browser – Select the third option, “At least one of the terms” Enter “gene” in the search term field Scroll down to “Gene Identifier java.lang.Long “ in the results list; select the Long Name to open the Data Element details window
Exercise 2 – Examine a Data Element in the CDE Browser
Answer the following questions: – What is the Long Name of the Data Element? – What is the Public ID of the Data Element? – What context owns the Data Element? – What is the Data Element Concept Long Name? – Are there permissible values for this Data Element?
Exercise 2 – Examine a Data Element in the CDE Browser Answers: – What is the Long Name of the Data Element? Gene Name java.lang.String – What is the Public ID of the Data Element? – What context owns the Data Element? caCORE – What is the Data Element Concept Long Name? Gene Name – Are there permissible values for this Data Element? NO
CDE Curation Tool (Create/Edit Metadata Using EVS)
CDE Curation Tool (Create/Edit Existing Metadata)
Sentinel Tool (Monitor Changes to Metadata) What to watch When to Watch What to watch for What to report
Sentinel Tool Reports (View Changes Made to Metadata) Change Blocks Associated Blocks
Semantic Integration Tools UML Model Browser – Browse administered items that are part of registered UML Models – Supports browsing, searching, and exporting the classes, attributes and relationships between classes of a UML domain model Semantic Integration Workbench – Guides users through the workflow process required for annotating a UML domain model – Tags UML Models with matching semantic concepts from the NCI Thesaurus
UML Model Browser Web-based – Designed for UML model owners Search for and view UML model components in caDSR – classes – class attributes – associations between classes and attributes – ISO Components (metadata) related to those classes and attributes
UML Model Browser Interface UML Model Search Tree: Search for model components. Basic Class/Attribute Search Pane: Users looking for classes and attributes can enter search criteria here. Basic Class/Attribute Search Pane: Users looking for classes and attributes can enter search criteria here. Navigation Menu: Access other caDSR tools and resources.
UML Model Browser : UML Model Search Tree Displays current caDSR Contexts For each Context, – lists all the UML classes – grouped by project, subproject and package Search for classes by navigating the tree and clicking on a context, project, subproject or package Search for attributes by clicking on a class project subproject package class
UML Model Browser : UML Class - Model Tree Search Results # Matches ‘crumb trail’ Class Search Results Package
Exercise 3 – View Classes & Attributes in the UML Model Browser Complete: Exercise 3 from the “Semantic Interoperability” exercise handout. Time: 5 minutes
Exercise 3 – View Classes & Attributes in the UML Model Browser 1.Navigate to the UML Model Browser Use the tree to navigate to the caCORE Project: 1. caCORE Projects caCORE Cancer Bioinformatic Infrastructure Objects gov.nih.nci.cabio.domain 3.Scroll down the list of classes, select the “Gene” class 4.Answer the following: 1. What are the two attributes in the Gene class? 2. What project does the Gene class belong to? 3. What context is this project in? 4. What is the Public ID of the “Gene Name” data element?
Exercise 3 – View Classes & Attributes in the UML Model Browser Answers: – What are the two attributes in the Gene class? Gene cluterId Gene fullName – What project does the Gene class belong to? caCORE – What context is this project in? caCORE – What is the Public ID of the “Gene Name” data element?
Semantic Integration Workbench Audience: caCORE SDK UML Model developers/users performing semantic annotation Performs the tasks associated with semantic annotation and review for loading of UML Models into caDSR Benefits: – Users select NCI Thesaurus concepts or existing metadata for UML model annotation Recommended Prerequisites – EVS terms – Enterprise Architect – UML Class Diagram as your domain model
SIW in the caCORE SDK Workflow 1.Design system and draw model (UML tool) 2.Perform Semantic Integration (SIW - Semantic Integration Workbench) 3.Register metadata (UML Loader) 4.Generate and deploy system (Code Generator)
Using the Semantic Integration Workbench SIW Viewer Window UML Entities Mapped Concept
NCICB Application Support Live Support: Monday – Friday 8 am – 8 pm Eastern Time – Telephone support is available Monday to Friday, 8 am – 8 pm Eastern Time, excluding government holidays. – You may leave a message, send an or submit a support request via the Web at any time. Phone: Toll-free: Web:
Questions