Frank Hartel, PhD Enterprise Vocabulary Services National Cancer Institute NCI Enterprise Vocabulary Services (EVS) and Semantic Integration at NCI - An.

Slides:



Advertisements
Similar presentations
Introduction The cancerGrid metadata registry (cgMDR) has proved effective as a lightweight, desktop solution, interoperable with caDSR, targeted at the.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Controlled Terminology for Clinical Research
Enterprise Vocabulary Services National Cancer Institute
27 June 2005caBIG an initiative of the National Cancer Institute, NIH, DHHS caBIG the cancer Biomedical Informatics Grid Arumani Manisundaram caBIG - Project.
CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
Consistent and standardized common model to support large-scale vocabulary use and adoption Robust, scalable, and common API to reduce variation in clinical.
CaBIG™ Terminology Services Path to Grid Enablement Thomas Johnson 1, Scott Bauer 1, Kevin Peterson 1, Christopher Chute 1, Johnita Beasley 2, Frank Hartel.
Who am I Gianluca Correndo PhD student (end of PhD) Work in the group of medical informatics (Paolo Terenziani) PhD thesis on contextualization techniques.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Overview of Biomedical Informatics Rakesh Nagarajan.
EleMAP: An Online Tool for Harmonizing Data Elements using Standardized Metadata Registries and Biomedical Vocabularies Jyotishman Pathak, PhD 1 Janey.
CaGrid Service Metadata Scott Oster - Ohio State
Development Principles PHIN advances the use of standard vocabularies by working with Standards Development Organizations to ensure that public health.
CaBIG: the cancer Biomedical Informatics Grid Ken Buetow NCICB/NCI/NIH/DHHS.
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
Chapter 6 System Engineering - Computer-based system - System engineering process - “Business process” engineering - Product engineering (Source: Pressman,
OpenMDR: Alternative Methods for Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois Shannon Hastings Department of Biomedical Informatics Ohio State University.
LexEVS 6.0 Overview Scott Bauer Mayo Clinic Rochester, Minnesota February 2011.
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
CaBIG Semantic Infrastructure 2.0: Supporting TBPT Needs Dave Hau, M.D., M.S. Acting Director, Semantic Infrastructure NCI Center for Biomedical Informatics.
LexEVS Overview Mayo Clinic Rochester, Minnesota June 2009.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
January 19, 2011 Sherri de Coronado, Semantic Services Center for Bioinformatics and Information Technology.
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
H Using the Open Metadata Registry (OpenMDR) to generate semantically annotated grid services Rakesh Dhaval, MS, Calixto Melean,
Value Set Resolution: Build generalizable data normalization pipeline using LexEVS infrastructure resources Explore UIMA framework for implementing semantic.
CaBIG ® VCDE Workspace Tactics thru June 14, 2010: How working groups fit together, and other activities Brian Davis April 1, 2010 VCDE WS Teleconference.
Clinical Data Interchange Standards Consortium (CDISC) uses NCIt for its Study Data Tabulation Model (SDTM) and other global data standards for medical.
Open Terminology Portal (TOP) Frank Hartel, Ph.D. Associate Director, Enterprise Vocabulary Services National Cancer Institute, Center for Biomedical Informatics.
1 LS DAM Overview and the Specimen Core February 16, 2012 Core Team: Ian Fore, D.Phil., NCI CBIIT, Robert Freimuth, Ph.D., Mayo Clinic, Elaine Freund,
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Cancer MetaData Standards Peter A. Covitz, Ph.D. HL7 RCRIM October 1, 2002.
CaCORE Software Development Kit George Komatsoulis 25-Feb-2005.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
CaGrid Overview and Core Services caGrid Knowledge Center February 2011.
ACGT: Open Grid Services for Improving Medical Knowledge Discovery Stelios G. Sfakianakis, FORTH.
1 Cancer Models Database (caMOD). 2 History  January 2000 – Prototype is presented during the Mouse Models of Human Cancers (MMHCC) Steering Committee.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
This material was developed by Duke University, funded by the Department of Health and Human Services, Office of the National Coordinator for Health Information.
A LexWiki-based Representation and Harmonization Framework for caDSR Common Data Elements Guoqian Jiang, Ph.D. Robert Freimuth, Ph.D. Harold Solbrig Mayo.
- EVS Overview - Biomedical Terminology and Ontology Resources Frank Hartel, Ph.D. Director, Enterprise Vocabulary Services NCI Center for Bioinformatics.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Overview of SC 32/WG 2 Standards Projects Supporting Semantics Management Open Forum 2005 on Metadata Registries 14:45 to 15:30 13 April 2005 Larry Fitzwater.
1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.
Interchange vs Interoperability Main Entry: in·ter·op·er·a·bil·i·ty : ability of a system... to use the parts or equipment of another system Source: Merriam-Webster.
Patterns in caBIG Baris E. Suzek 12/21/2009. What is a Pattern? Design pattern “A general reusable solution to a commonly occurring problem in software.
Protégé 3.4 Plug-in for Editing and Maintaining the NCI Thesaurus Protégé Conference June 23, 2009 Amsterdam Sherri de Coronado, Gilberto Fragoso.
Design for a High Performance, Configurable caGrid Data Services Platform Peter Hussey LabKey Software, Inc, Seattle, WA USA Contact:
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
CaBIG™ Terminology Services Path to Grid Enablement Thomas Johnson 1, Scott Bauer 1, Kevin Peterson 1, Christopher Chute 1, Johnita Beasley 2, Frank Hartel.
Semantic Media Wiki Open Terminology Development - Initial Steps - Frank Hartel, Ph.D. Associate Director, Enterprise Vocabulary Services National Cancer.
Challenges and issues with information sharing: The four pillars of semantic interoperability Douglas B. Fridsma, MD, PhD, FACP University of Pittsburgh.
1 LS DAM Overview August 7, 2012 Current Core Team: Ian Fore, D.Phil., NCI CBIIT, Robert Freimuth, Ph.D., Mayo Clinic, Mervi Heiskanen, NCI-CBIIT, Joyce.
CaCORE In Action: An Introduction to caDSR and EVS Browsers for End Users A Tool Demonstration from caBIG™ caCORE (Common Ontologic Representation Environment)
National Cancer Institute caDSR Briefing for Small Scale Harmonication Project Denise Warzel Associate Director, Core Infrastructure caCORE Product Line.
1 caBIG®-aligned Enterprise Metadata Infrastructure to Support Commercial Clinical Trials Management Software: A Pilot Implementation September 11, 2009.
0 caBIG and caGrid: Interoperable Computing Infrastructure for the Nation’s [and World’s] Cancer Research Enterprise Peter A. Covitz, Ph.D. Chief Operating.
Semantic Interoperability: caCORE and the Cancer Data Standards Repository (caDSR)  Jennifer Brush.
Margaret Haber, RN, OCN Frank Hartel, PhD Enterprise Vocabulary Services National Cancer Institute Overview of NCI Enterprise Vocabulary Services (EVS)
1 EVS and caDSR The Semantic Backbone Sherri de Coronado, EVS, CBIIT Aug 2009.
Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois
Semantic Web - caBIG Abstract: 21st century biomedical research is driven by massive amounts of data: automated technologies generate hundreds of.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Achieving Semantic Interoperability of Cancer Registries
Networking and Health Information Exchange
The Re3gistry software and the INSPIRE Registry
Clinical Observation Interoperability March 18, 2008
Presentation transcript:

Frank Hartel, PhD Enterprise Vocabulary Services National Cancer Institute NCI Enterprise Vocabulary Services (EVS) and Semantic Integration at NCI - An Overview -

2 Outline: Terminology management and semantic integration at NCI NCI Enterprise Vocabulary Services NCI Thesaurus (NCIt) NCI Metathesaurus (NCI Meta) Collaborations

3 NCI biomedical informatics Goal: A virtual web of interconnected data, individuals, and organizations redefines how research is conducted, care is provided, and patients/participants interact with the biomedical research enterprise

4 in·ter·op·er·a·bil·i·ty ability of a system...to use the parts or equipment of another system Source: Merriam-Webster web site interoperability ability of two or more systems or components to exchange information and to use the information that has been exchanged. Source: IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries, IEEE, 1990] Interoperability Semantic interoperability Syntactic interoperability Courtesy: Charlie Mead

5 No Controlled Terminology? No Interoperability Systems cannot exchange or use information if they use incompatible codes or tokens to signify meaning Terminology services provide token and codes Proper use of them assures consistent meaning across the enterprise

6 Vocabulary for CDE specification Dictionary, thesaurus, ontology services via caBIO API Domain object metadata Common data elements Public APIs Common data elements (CDEs) Can it be done? caCORE - An Example via downloads

7 Information integration Cross- discipline reasoning cancer Common Ontologic Representation Environment (caCORE) biomedical objects common data elements controlled vocabulary

8 Common Data Elements Structured data reporting elements Precisely defining the questions and answers What question are you asking, exactly? What are the possible answers, and what do they mean? biomedical objects common data elements controlled vocabulary

9 Biomedical Information Objects Data service infrastructure developed using OMG’s Model Driven Architecture approach Object models expressed in UML represent actual biomedical research entities such as genes, sequences, chromosomes, sequences, cellular pathways, ontologies, clinical protocols, etc. The object models form the basis for uniform APIs (Java, SOAP, HTTP-XML, Perl) that provide an abstraction layer and interfaces for developers to access information without worrying about the back-end data stores biomedical objects common data elements controlled vocabulary

10 Binding Data, Metadata to Terminology - caCORE SDK UML Modeling Tool (provided by user) Information model that will define data classes, attributes and relationships Semantic Connector Annotate UML model with ontology concepts: bridges the world of databases to that of structured semantics. UML Loader (run by NCI staff) Loads model into the caDSR metadata registry Model and associated semantics are available at runtime Code Generator Model and a code template are inputs into generator Creates the ‘caCORE-like’ n-tier software system with Java and Web Services APIs

11 caCORE SDK

12 Extending Interoperability Beyond the Enterprise cancer Biomedical Informatics Grid (caBIG) Common, widely distributed infrastructure permits cancer research community to focus on innovation Shared vocabulary, data elements, data models facilitate information exchange Collection of interoperable applications developed to common standard Raw cancer research data is available for mining and integration

13 caBIG - facilitate sharing of infrastructure, applications, and data

14 Cancer Center Cancer Center NCI caGrid OTHER caBIG SERVICE PROVIDERS OTHER TOOLKITS

15 caGRID GUIAdmin Security caBIO caDSR EVS caBIG Dataresource … caARRAY Other caBIG DataResource Data source exposed as objects Well-defined objects using caDSR / EVS Mobius GME for schemas Metadata identifies services, objects exposed, relationships between objects, relationships between services Standard Grid interfaces Standard query language and interface Advertisement and Discovery Security Invocation / Schedule Execution / coordination Identifiers rProteomics Other caBIG Analysis tool Grid client API Globus Resource API OGSA-DAI caBIG Analytical Service Registry QueryInvocation GRAM

16 caGrid Standard Service Metadata  Common Metadata describes generic information about service providing Cancer Center  Data Service Metadata describes the data exposed using terminology and objects from caDSR/EVS  Analytical Service Metadata describes the supported operations and their inputs and outputs using terminology and objects from caDSR/EVS

17 Enterprise Vocabulary NCI Metathesaurus (Cross-map standard vocabularies/ontologies, e.g. SNOMED, MedDRA, ICD) Semantic integration, inter-vocabulary mapping UMLS Metathesaurus extended with cancer-oriented vocabularies  930,000 Concepts, 2,200,000 terms and phrases  Mappings among over 50 vocabularies NCI Thesaurus Description logic-based 48,000 “Concepts”  Concept is the semantic unit  Terms are Concept labels – synonymy  Semantic relationships between Concepts Other standard terminologies MedDRA, MGED, SNOMED, GO, etc. biomedical objects common data elements controlled vocabulary

18 NCI builds on EVS via caCORE Infrastructure

19 Production EVS Servers in caCORE

20 Enterprise Vocabulary Services Services and resources that address NCI's needs for controlled vocabulary A collaboration NCI Office of Communications  Physician Data Query (PDQ), Cancer Information Service and the NCI web portal NCI Center for Bioinformatics  Bioinformatics Core Infrastructure (caCORE), including metadata repository (caDSR) and object models built using EVS terminology for core semantics

21 NCI EVS Goal – Integration by Meaning Clinical, translational, and basic research terminology have overlapping but specialized needs, therefore EVS assists to:  Integrate different conceptual frameworks  Create terminological and taxonomic conventions across systems Vocabulary Products NCI Thesaurus – an ontology-like terminology NCI Metathesaurus – maps vocabularies External vocabularies maintained and served: MedDRA, HL7, NDF-RT, LOINC, etc.

22 Terminology Development Guidelines Develop a content model Leverage existing sources where appropriate (VA NDF-RT, RxNorm, LOINC, etc. …) Develop unique content where needed (Cancer genes and diagnoses, drugs and therapies, molecular abnormalities, clinical trial standard terminology etc.) Link to other information sources and standards using URLs as possible (GO, Swissprot, drug formularies, trial protocols) Federate, merge or map with other standard terminology for semantic integration

23 NCI Thesaurus (NCIt) Reference Terminology for NCI, Partners A Federal Standard Terminology Broad coverage of the cancer research and clinical domain including prevention and treatment trials Neoplastic and other Diseases Findings and Abnormalities Anatomy, Tissues, Subcellular Structures Agents, Drugs, Chemicals Genes, Gene Products, Biological Processes Animal Models – Mouse, other Research techniques and management, apparatus, clinical and lab, radiology, imagery

24 NCI Thesaurus (2) Published Monthly Public domain, open content license Available on-line and by download (OWL, Ontylog XML, flat files) 48,000+ “Concepts” hierarchically organized Description-logic based “Roles” establish machine readable semantic relationships between Concepts, ex.: “Carcinoma” Clinically_associated_with “Lytic Bone Lesions,” “TP53” Gene_associated_with_Disease “Breast Carcinoma”

25 NCI Thesaurus is Deployed: (full documentation) API: caCORE public access Fulfills NCI and collaborators’ needs for controlled vocabulary Public domain, open content license

26 Example Concept Details Concept Details URI: Version: August 2005 (05.09e) Metastasis Identifiers: name Metastasis code C19151 Relationships to other concepts: Biological_Process_Has_Result_Biological_Process Tumor Expansion Biological_Process_Has_Initiator_Process Pathologic Process Information about this concept: Synonym MET Synonym metastasis Synonym Tumor Cell Migration Synonym with source data Metastasis|PT|CADSR Synonym with source data MET|AB|CADSR Synonym with source data Tumor Cell Migration|SY|NCI Synonym with source data Metastasis|PT|NCI Synonym with source data metastasis|SY|NCI-GLOSS|CDR NCI_META_CUI CL CL Semantic_Type Phenomenon or Process Related_Lash_Concept metastasis Preferred_Name Metastasis DEFINITION NCI|Metastasis is the spread or migration of cancer cells from one part of the body (the organ in which it first appeared) to another. The secondary tumor contains cells that are like those in the original (primary) tumor. For example, breast cancer cells may spread (metastasize) to the lungs and cause the growth of a new tumor. When this happens, the disease is called metastatic breast cancer. (NCI) Synonym Metastasis DEFINITION NCI-GLOSS|(meh-TAS-ta-sis) The spread of cancer from one part of the body to another. A tumor formed from cells that have spread is called a secondary tumor, a metastatic tumor, or a metastasis. The secondary tumor contains cells that are like those in the original (primary) tumor. The plural form of metastasis is metastases (meh-TAS-ta-seez). Superconcepts: Cancer Progression Subconcepts: Distant Metastasis Intravascular Metastasis

27 Other Examples : Use URI to view Details of a Drug Concept- ser/ConceptReport.jsp?dictionary=NCI_ Thesaurus&code=C620 ser/ConceptReport.jsp?dictionary=NCI_ Thesaurus&code=C620 Use GUI to search for and view hierarchy Fluvastatin Sodium

28 NCI Metathesaurus: Filtered UMLS Metathesaurus extended with additional required vocabularies 930,000+ concepts, 2,200,000 terms and phrases with definitions Mappings among over 50 vocabularies Extensive synonymy: Over 40,000 terms for neoplasms mapped to 7,000 concepts Used as online dictionary and thesaurus, for mapping and document indexing

29 NCI Metathesaurus (2) Minor releases monthly, Major releases twice a year Provides a mapped overlap and partial inter- relation of current versions of NCI and partner required vocabularies, ex. The ICD’s, MedDRA, SNOMED, MeSH (NLM Medical Subject Headings), HCPCS (procedures), LOINC (lab values), drug terminologies (VA NDF-RT, AOD, RxNORM, Multum, NCI Thesaurus drugs, etc.)

30

31 EVS Products & Services Are Open NCI Thesaurus is Open Contnent ftp://ftp1.nci.nih.gov/pub/cacore/EVS/ThesaurusTer msofUse.htm ftp://ftp1.nci.nih.gov/pub/cacore/EVS/ThesaurusTer msofUse.htm NCI Metathesaurus is Mostly Open Source See Each Source’s License cesServlet NCI EVS Servers Are Freely Accessible On the Web : Via API : All Software Developed by NCI EVS is Public Open Source and Free for the Asking: and

32 EVS Collaborations Many Active Collaborations Federal: FDA, VA, CDC, and Various NIH Institutes such as NHLBI, NIDCR Major Standards Organizations: HL7, CDISC, W3C, FHA Cancer Centers and Cancer Cooperative Groups (caBIG, caGRID) Numerous Research collaborators such as the Microarray Gene Expression Data Society (MGED Ontology, FuGO)

33 Areas of Collaboration FDA (Terminology for Drugs, Devices, and Clinical Trial Terminology Initiatives) VA (Drugs, Common Clinical Trials Semantics, Terminology Operations) CDC (Cancer Incidence and Prevention, Terminology Operations) Cancer Centers (Clinical Trials, Experimental Organism Terminology, Micro- nutrients, Open Terminology Servers, other (caBIG)) CDISC/HL7 RCRIM (Clinical Research Data Standards)

34 Contact: Frank Hartel, PhD NCI Center for Bioinformatics