Ontology Web Services from the National Center for Biomedical Ontology Mark Musen and Nigam Shah {musen,

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

BioPortal: A Web Repository and Services for Biomedical Ontologies and Data Resources Natasha Noy and the BioPortal team Stanford Center for Biomedical.
BioPortal Status and Plans September 2011 Ray Fergerson NCBO Project Director Stanford University 1.
Open repositories: value added services The Socionet example Sergey Parinov, CEMI RAS and euroCRIS.
Web Mining.
NCBO-I2B2 Collaboration Overview and Use Cases Nigam Shah
Data Normalization Milestones. Data Normalization  Goals –To conduct the science for realizing semantic interoperability and integration of diverse data.
Knowledge Graph: Connecting Big Data Semantics
Asking translational research questions using ontology enrichment analysis Nigam Shah
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Open Health Tools Distributed Terminology System Presentation Jack Bowie SVP Sales and Marketing Apelon, Inc. 1.
© Copyright 2008, Mayo Clinic College of Medicine Mayo Clinic Open Health Tools Application for Membership OHT Board Meeting, Birmingham, UK July 1, 2008.
IAEA International Atomic Energy Agency INIS Collection Search: Introduction and main features INIS Training Seminar 7-11 October 2013, Vienna Domenico.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
Overview of Biomedical Informatics Rakesh Nagarajan.
EleMAP: An Online Tool for Harmonizing Data Elements using Standardized Metadata Registries and Biomedical Vocabularies Jyotishman Pathak, PhD 1 Janey.
THE NATIONAL CENTER FOR BIOMEDICAL ONTOLOGY Ontology-based Tools to Enhance Data Curation Trish Whetzel, PhD Outreach Coordinator December 9, 2010.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Accelerating Candidate Gene Discovery through Ontological Indexing of Large Scale Data Repositories Simon Twigger, Ph.D.
Query Health Concept-to-Codes (C2C) SWG Meeting #2 December 13,
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
The Health Ontology Mapper (HOM) Method Clinical & Translational Science Ontology Workshop (NCBO/CTSA) April 24, 2012 Rob Wynden - Chief Scientist, Ketty.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
THE NATIONAL CENTER FOR BIOMEDICAL ONTOLOGY BioPortal Updates and Planned Features Trish Whetzel May 3, 2012.
Ontology-based Annotation & Query of TMA data Nigam Shah Stanford Medical Informatics
Health Ontology Mapper A project initiated within the CTSA (Clinical Translation Science Awards) program Goal: create a semantic interoperability layer.
1 EviCare (NLP) Aid clinical work in hospitals using NLP: “Summarize” health records Select and rank recommendations from clinical practice guidelines.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
1 st June 2006 St. George’s University of LondonSlide 1 Using UMLS to map from a Library to a Clinical Classification: Improving the Functionality of a.
Lisa A. Lang, MPP Assistant Director for Health Services Research Information Head, National Information Center on Health Services Research and Health.
Resource Curation and Automated Resource Discovery.
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.
Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD
Copyright OpenHelix. No use or reproduction without express written consent1.
Value Set Resolution: Build generalizable data normalization pipeline using LexEVS infrastructure resources Explore UIMA framework for implementing semantic.
Open Terminology Portal (TOP) Frank Hartel, Ph.D. Associate Director, Enterprise Vocabulary Services National Cancer Institute, Center for Biomedical Informatics.
September 6, 2013 A HUBzero Extension for Automated Tagging Jim Mullen Advanced Biomedical IT Core Indiana University.
ICAT Progress Update Tania Tudorache, Csongor Nyulas, Sean Falconer, Jack Elliott, Samson Tu, Mark Musen Stanford Center for Biomedical Informatics Research.
Oracle Database 11g Semantics Overview Xavier Lopez, Ph.D., Dir. Of Product Mgt., Spatial & Semantic Technologies Souripriya Das, Ph.D., Consultant Member.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
European Translational Research Information and KM Services (eTRIKS) Prof Yike Guo Department of Computing Imperial College London Convergence Meeting:
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Vocabulary Knowledge Center Update VCDE Workspace July 21, 2011.
Mapping to Ontologies Nigam Shah
Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania.
Supporting Collaborative Ontology Development in Protégé International Semantic Web Conference 2008 Tania Tudorache, Natalya F. Noy, Mark A. Musen Stanford.
Research Tools Brought to you by the Clinical and Translational Science Institute Presented by: Terri Shkuda Systems Analyst Research Informatics The Penn.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
AdisInsight User Guide July 2015
Data mining in web applications
TDM in the Life Sciences Application to Drug Repositioning *
Step-by-step Demo
A Reusable Framework for Automated Record Creation and Population
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
The UMLS and the Semantic Web
Using NCBO Web services
Ryan Cantor, MSPH Director of Statistical Reporting, Intermacs
Solutions to Clinical Data Visualization and Analysis
Collaborating with the National Center for Biomedical Ontology
Data challenges in the pharmaceutical industry
Federal Health IT Ontology Project (HITOP) Group
Vaccine Code Set Management Services Pilot
Metadata Editor Introduction
Elsevier Activity Range
WorldCat: Broad Web visibility for our collection
An ecosystem of contributions
Health Information - Retrieval, Analysis and Archival on cloud
FaceBase Hub Years 1 through 5
Presentation transcript:

Ontology Web Services from the National Center for Biomedical Ontology Mark Musen and Nigam Shah {musen,

NCBO: Key activities We create and maintain a library of biomedical ontologies. We build tools and Web services to enable the use of ontologies and their derivatives. We collaborate with scientific communities that develop and use ontologies.

5 Go to BioPortal

Total Monthly Visits to BioPortal

PART-I

Ontology Services Download Traverse Search Comment Download Traverse Search Comment Widgets Tree-view Auto-complete Graph-view Tree-view Auto-complete Graph-view Annotation Data Access Mapping Services Create Download Upload Create Download Upload Views Term recognition Fetch “data” annotated with a given term

ONTOLOGY SERVICES Accessing, browsing, searching and traversing ontologies in Your application

11

CodeSpecific UI

Wikipathways uses Ontology Services

Biositemaps Editor

VIEWS Custom subsets of large ontologies

Views and Value Sets Users can contribute their derivatives of BioPortal ontologies, which become first-class objects in BioPortal and can be used as all other ontologies are (e.g., as value sets) Recently added: a view-extractor service Enables users to extract a subtree of an ontology in OWL 20

Views in BioPortal 21

MAPPINGS Using NCBO technology to integrate terminologies and ontologies

Mappings Root Term-1 Term-2 Term-3 Term-4 Term-5 R t1 t2 t4 t5 t6 t7 t3 Term-2 t1 Term-5 t5 Ontology A Upload or Download mapping subsets Ontology B

Using Mappings for query federation Seizure Single Seizure Single Seizure Partial Seizure Partial Seizure Complex Seizure Complex Seizure Seizure NOS Epilepsy Temporal Epilepsy Temporal Epilepsy Partial Epilepsy Partial Epilepsy Single Seizure Single Seizure Direct Mappings FROM (site #1) TO (site #2) Convulsion disorder

WIDGETS Using NCBO technology on your web pages

Ontology Widgets UI components with “BioPortal inside”: term-selection widget for a specific ontology form fields with auto- complete from a specific BioPortal ontology RSS feed for an ontology Visualization widget Tree widget

ANNOTATOR SERVICE Using Ontologies to Annotate Your Data

Annotation as a Web service Process textual metadata to automatically tag text with as many ontology terms as possible.

Annotator: workflow “Melanoma is a malignant tumor of melanocytes which are found predominantly in skin but also in the bowel and the eye”. – 39228/DOID:1909, Melanoma in Human Disease Transitive closure – 39228/DOID:191, Melanocytic neoplasm, direct parent of Melanoma in Human Disease – 39228/DOID: , cell proliferation disease, grand parent of Melanoma in Human Disease

Code Word Add-in to call the Annotator Service ? Word Add-in to call the Annotator Service ? Annotator service Multiple ways to access Specific UI Excel UIMA platform

DATA SERVICE Using Ontologies to Access Public Data

Resource index: The Basic Idea The index can be used for: Search Data mining

Resources index: Example

Code Resource Index Multiple ways to access Specific UI Resource Tab Resources annotated = 22 Total records = 3.5 million Direct annotations = … million After transitive closure = 16.4 Billion

PART-II

Use-cases based on ontology services

Sample user needs I need to restrict user input to a certain value set I need to extract the disease branch from SNOMEDCT I need to identify all terms mapped to UMLS CUI C I need to code/annotate free-text with ontology terms – For data exchange, export to standard formats

Use-cases for users of i2b2

Aim 1: Integrate NCBO services in i2b2 Preliminary results: Export any ontology stored in BioPortal into the format used by i2b2’s ontology cell Future Work: Make the export code available as a service Embed the extraction code into the i2b2 Ontology Cell to “pull” content Ensure we have the latest versions of ontologies used by i2b2 and CTSA users (ICD9, ICD10, SNOMEDCT, RXNORM, LOINC, CPT)

Aim 2: Mappings for query federation Preliminary result: Worked out the workflow for using mappings for query translation Detailed discussions with the HOM and OpenMDR groups to define use-case and elicit requirements Future work: Use BioPortal as the shared repository for inter terminology mappings Tackle access, IP, performance, and institutional issues Key features Import outside mappings Update mappings when versions change Mechanism to curate mappings Support proprietary curation and content

Using Mappings for query federation Seizure Single Seizure Single Seizure Partial Seizure Partial Seizure Complex Seizure Complex Seizure Seizure NOS Epilepsy Temporal Epilepsy Temporal Epilepsy Partial Epilepsy Partial Epilepsy Single Seizure Single Seizure Direct Mappings FROM (site #1) TO (site #2) Convulsion disorder

Use-cases based on automated annotation

Ontology based annotation 20 diseases

Disease card

Tm2d1 RGD Svs4 Hbb Scgb2a1 Alb + Linking annotations to data (by Simon Twigger)

Hbb is_expressed_in rat kidney Tm2d1 is_expressed_in rat kidney

Annotation Analytics

Generic GO based analysis routine Get annotations for each gene in list Count the occurrence (x) of each annotation term in gene list Count the occurrence (y) of that term in some reference set (whole genome?) P-value for how “surprising” is it to find x, given y. Set Reference x y

Annotation Analytics Landscape SNOMED-CT Gene Ontology Gene Sets NCIT ICD-9 Human Disease Cell Type MeSH Drugs, Chemicals Grant Sets Paper Sets Patient Sets Drug Sets : Health Indicator Warehouse datasets

Mutation enrichment

Annotation Analytics Landscape SNOMED-CT Gene Ontology Gene Sets NCIT ICD-9 Human Disease Cell Type MeSH Drugs, Chemicals Grant Sets Paper Sets Patient Sets Drug Sets : Mut ? ? Health Indicator Warehouse datasets

Ontology neutral enrichment analysis

Set Reference x ?

Using ontologies other than GO ERCC6  nucleoplasm PARP1  protein N-terminus binding ERCC6  nucleoplasm PARP1  protein N-terminus binding ERCC6  PARP1  ERCC6  PARP1 

ERCC6GO: PMID: ERCC6GO: PMID: PARP1GO: PMID: ERCC6GO: PMID: PARP1GO: PMID: Enrichment Analysis with the DO NCBO Annotator: NCBO Annotator: {ERCC6, PARP1}  PMID: {ERCC6, PARP1}  {Cockayne syndrome, DNA damage} {ERCC6, PARP1}  {Cockayne syndrome, DNA damage}

P35226, P04626, P38646, P50539, O95622, P04150, P07900, Q12805, P01375, P54098, P00533, P02545, P02649, P04637, P05067, P05549, P08047, P08138, P10636, P15692, P25963, P29353, P29590, P49768, P62993, Q00987, Q04206, Q13526, Q16643, Q8N726, P00441, P05019, P05231, P35354, P10909, Q06830, P15502, Q9UEF7, P01137, P04271, O15379, O95831, P09874, Q13315, Q7Z2E3, Q9UNE7, P01127, P01308, P02656, P07203, P09619, P17936, P18031, P19838, P27169, P42771, P45984, Q07869, Q14191, P08069, P68104, P01344, P06400, P09884, P10809, P25445, O43684, P17948, P48507, P28069, P16885, P18146, P35558, Q99683, P18074, P19447, P28715, Q03468, Q13216, Q13888, P16220, P35222, Q16665, P07949, P11362, P01023, P01286, Q9NYJ7, O00555, O15530, P01138, P17252, P31749, P63165, P55851, O76070, P01241, P13232, P16871, P22061, P28340, P31785, P48047, P63279, P48637, P01100, P17535, O14746, O15297, O60934, O96017, P00519, P01106, P04040, P05412, P06493, P07992, P09429, P10415, P11388, P12004, P12956, P13010, P16104, P21675, P23025, P26583, P27361, P27694, P27695, P35249, P35638, P38398, P39748, P40692, P43351, P45983, P49715, P49841, P51587, P54132, P54274, P55072, P60484, P63104, P78527, Q02880, Q05655, Q06609, Q07812, Q13535, Q13547, Q15554, Q16539, Q92769, Q92793, Q92889, Q96EB6, Q96ST3, Q9H3D4, P20700, Q07960, O75360, P10912, P50402, P04179, O75376, O75907, P01116, P17676, P23560, P60568, P62136, P98164, Q14186, Q14289, Q08050, Q00653, Q05195, P42858, Q9GZV9, P48357, P03372, P10275, P15336, P35568, Q02643, Q12778, Q9Y4H2, P06213, P08107, P11142, O60674, P42229, P51692, Q9UJ68, Q02297, P60953, P00749, P55916, Q96G97, P01112, P09211, P09936, P48506, Q15831, P11387, Q13253, O60566, P01133, P10599, P15923, P19235, P20226, P20248, P27986, P40763, P42338, P61244, P62979, Q05397, Q06124, Q09472, Q14526, Q15648, Q9UBK2, O60381, O94761, P29279, Q9UBX0, P42345, Q01094, P06746, Q8N6T7, O43524, P50542, O00327, O15120, O15217, O15243, O15516, O75844, O95985, P00390, P00395, P09629, P13639, P20382, P25874, P32745, P36969, P61278, P62987, P78406, P98177, Q00613, Q13219, Q99643, Q99807, Q9UBI1 Profiling a set of Aging genes Ageing-related genes (261) –

Profiling patient sets Patient Reports ICD (Abdominal pain, unspecified) Patient records processed from U. Pittsburg NLP Repository with IRB approval.

Annotation Analytics Landscape SNOMED-CT Gene Ontology Gene Sets NCIT ICD-9 Human Disease Cell Type MeSH Drugs, Chemicals Grant Sets Paper Sets Agin g Patient Sets Drug Sets : EMRs Mut What questions can we ask? Health Indicator Warehouse datasets

ANNOTATION ANALYTICS - II Analysis of semantically tagged data from electronic health records

Term – 1 : Term – n Syntactic types Frequency Term recognition tool NCBO Annotator NegEx Patterns NegEx Rules – Negation detection P1ICD9 P1T1, T2, no T4 …T5, T4, T3 …T4, T3, T1 T8, T9, T4 …T6, T8, T10 T1, T2, no T4 P2 P3 : : Pn Terms form a temporal series of tags  Cohort of Interest Diseases Procedures Drugs BioPortal – knowledge graph Creating clean lexicons Annotation Workflow Further Analysis Text clinical note Terms Recognized Negation detection Generation of tagged data

ROR of 2.058, CI of [1.804, 2.349] PRR of 1.828, CI of [1.645, 2.032] The uncorrected X 2 statistic has p-value < ROR=1.524, CI=[0.872, 2.666] PRR=1.508, CI=[0.8768, 2.594] X 2 p-value= Adverse drug events

Off-label use

Analyses on semantically tagged data SNOMED-CT Gene Ontology Gene Sets NCIT ICD-9 Human Disease Cell Type MeSH Drugs, Chemicals Grant Sets Paper Sets Agin g Patient Sets Drug Sets : EMRs Mut 1.Discovering or predicting adverse drug events 2.Predicting a labeled outcome (readmissions) 3.Learning associations between terms of type intervention, disease, finding, side effects, drugs 4.Predicting rejection rates in billing/claims processing 5.Learning off-label usage patterns 1.Discovering or predicting adverse drug events 2.Predicting a labeled outcome (readmissions) 3.Learning associations between terms of type intervention, disease, finding, side effects, drugs 4.Predicting rejection rates in billing/claims processing 5.Learning off-label usage patterns Health Indicator Warehouse datasets

THE END

65 Credits Mark Musen, PI The NIH Roadmap grant U54 HG Credits Mark Musen, PI The NIH Roadmap grant U54 HG004028