Ontology Web Services from the National Center for Biomedical Ontology Mark Musen and Nigam Shah {musen,
NCBO: Key activities We create and maintain a library of biomedical ontologies. We build tools and Web services to enable the use of ontologies and their derivatives. We collaborate with scientific communities that develop and use ontologies.
5 Go to BioPortal
Total Monthly Visits to BioPortal
PART-I
Ontology Services Download Traverse Search Comment Download Traverse Search Comment Widgets Tree-view Auto-complete Graph-view Tree-view Auto-complete Graph-view Annotation Data Access Mapping Services Create Download Upload Create Download Upload Views Term recognition Fetch “data” annotated with a given term
ONTOLOGY SERVICES Accessing, browsing, searching and traversing ontologies in Your application
11
CodeSpecific UI
Wikipathways uses Ontology Services
Biositemaps Editor
VIEWS Custom subsets of large ontologies
Views and Value Sets Users can contribute their derivatives of BioPortal ontologies, which become first-class objects in BioPortal and can be used as all other ontologies are (e.g., as value sets) Recently added: a view-extractor service Enables users to extract a subtree of an ontology in OWL 20
Views in BioPortal 21
MAPPINGS Using NCBO technology to integrate terminologies and ontologies
Mappings Root Term-1 Term-2 Term-3 Term-4 Term-5 R t1 t2 t4 t5 t6 t7 t3 Term-2 t1 Term-5 t5 Ontology A Upload or Download mapping subsets Ontology B
Using Mappings for query federation Seizure Single Seizure Single Seizure Partial Seizure Partial Seizure Complex Seizure Complex Seizure Seizure NOS Epilepsy Temporal Epilepsy Temporal Epilepsy Partial Epilepsy Partial Epilepsy Single Seizure Single Seizure Direct Mappings FROM (site #1) TO (site #2) Convulsion disorder
WIDGETS Using NCBO technology on your web pages
Ontology Widgets UI components with “BioPortal inside”: term-selection widget for a specific ontology form fields with auto- complete from a specific BioPortal ontology RSS feed for an ontology Visualization widget Tree widget
ANNOTATOR SERVICE Using Ontologies to Annotate Your Data
Annotation as a Web service Process textual metadata to automatically tag text with as many ontology terms as possible.
Annotator: workflow “Melanoma is a malignant tumor of melanocytes which are found predominantly in skin but also in the bowel and the eye”. – 39228/DOID:1909, Melanoma in Human Disease Transitive closure – 39228/DOID:191, Melanocytic neoplasm, direct parent of Melanoma in Human Disease – 39228/DOID: , cell proliferation disease, grand parent of Melanoma in Human Disease
Code Word Add-in to call the Annotator Service ? Word Add-in to call the Annotator Service ? Annotator service Multiple ways to access Specific UI Excel UIMA platform
DATA SERVICE Using Ontologies to Access Public Data
Resource index: The Basic Idea The index can be used for: Search Data mining
Resources index: Example
Code Resource Index Multiple ways to access Specific UI Resource Tab Resources annotated = 22 Total records = 3.5 million Direct annotations = … million After transitive closure = 16.4 Billion
PART-II
Use-cases based on ontology services
Sample user needs I need to restrict user input to a certain value set I need to extract the disease branch from SNOMEDCT I need to identify all terms mapped to UMLS CUI C I need to code/annotate free-text with ontology terms – For data exchange, export to standard formats
Use-cases for users of i2b2
Aim 1: Integrate NCBO services in i2b2 Preliminary results: Export any ontology stored in BioPortal into the format used by i2b2’s ontology cell Future Work: Make the export code available as a service Embed the extraction code into the i2b2 Ontology Cell to “pull” content Ensure we have the latest versions of ontologies used by i2b2 and CTSA users (ICD9, ICD10, SNOMEDCT, RXNORM, LOINC, CPT)
Aim 2: Mappings for query federation Preliminary result: Worked out the workflow for using mappings for query translation Detailed discussions with the HOM and OpenMDR groups to define use-case and elicit requirements Future work: Use BioPortal as the shared repository for inter terminology mappings Tackle access, IP, performance, and institutional issues Key features Import outside mappings Update mappings when versions change Mechanism to curate mappings Support proprietary curation and content
Using Mappings for query federation Seizure Single Seizure Single Seizure Partial Seizure Partial Seizure Complex Seizure Complex Seizure Seizure NOS Epilepsy Temporal Epilepsy Temporal Epilepsy Partial Epilepsy Partial Epilepsy Single Seizure Single Seizure Direct Mappings FROM (site #1) TO (site #2) Convulsion disorder
Use-cases based on automated annotation
Ontology based annotation 20 diseases
Disease card
Tm2d1 RGD Svs4 Hbb Scgb2a1 Alb + Linking annotations to data (by Simon Twigger)
Hbb is_expressed_in rat kidney Tm2d1 is_expressed_in rat kidney
Annotation Analytics
Generic GO based analysis routine Get annotations for each gene in list Count the occurrence (x) of each annotation term in gene list Count the occurrence (y) of that term in some reference set (whole genome?) P-value for how “surprising” is it to find x, given y. Set Reference x y
Annotation Analytics Landscape SNOMED-CT Gene Ontology Gene Sets NCIT ICD-9 Human Disease Cell Type MeSH Drugs, Chemicals Grant Sets Paper Sets Patient Sets Drug Sets : Health Indicator Warehouse datasets
Mutation enrichment
Annotation Analytics Landscape SNOMED-CT Gene Ontology Gene Sets NCIT ICD-9 Human Disease Cell Type MeSH Drugs, Chemicals Grant Sets Paper Sets Patient Sets Drug Sets : Mut ? ? Health Indicator Warehouse datasets
Ontology neutral enrichment analysis
Set Reference x ?
Using ontologies other than GO ERCC6 nucleoplasm PARP1 protein N-terminus binding ERCC6 nucleoplasm PARP1 protein N-terminus binding ERCC6 PARP1 ERCC6 PARP1
ERCC6GO: PMID: ERCC6GO: PMID: PARP1GO: PMID: ERCC6GO: PMID: PARP1GO: PMID: Enrichment Analysis with the DO NCBO Annotator: NCBO Annotator: {ERCC6, PARP1} PMID: {ERCC6, PARP1} {Cockayne syndrome, DNA damage} {ERCC6, PARP1} {Cockayne syndrome, DNA damage}
P35226, P04626, P38646, P50539, O95622, P04150, P07900, Q12805, P01375, P54098, P00533, P02545, P02649, P04637, P05067, P05549, P08047, P08138, P10636, P15692, P25963, P29353, P29590, P49768, P62993, Q00987, Q04206, Q13526, Q16643, Q8N726, P00441, P05019, P05231, P35354, P10909, Q06830, P15502, Q9UEF7, P01137, P04271, O15379, O95831, P09874, Q13315, Q7Z2E3, Q9UNE7, P01127, P01308, P02656, P07203, P09619, P17936, P18031, P19838, P27169, P42771, P45984, Q07869, Q14191, P08069, P68104, P01344, P06400, P09884, P10809, P25445, O43684, P17948, P48507, P28069, P16885, P18146, P35558, Q99683, P18074, P19447, P28715, Q03468, Q13216, Q13888, P16220, P35222, Q16665, P07949, P11362, P01023, P01286, Q9NYJ7, O00555, O15530, P01138, P17252, P31749, P63165, P55851, O76070, P01241, P13232, P16871, P22061, P28340, P31785, P48047, P63279, P48637, P01100, P17535, O14746, O15297, O60934, O96017, P00519, P01106, P04040, P05412, P06493, P07992, P09429, P10415, P11388, P12004, P12956, P13010, P16104, P21675, P23025, P26583, P27361, P27694, P27695, P35249, P35638, P38398, P39748, P40692, P43351, P45983, P49715, P49841, P51587, P54132, P54274, P55072, P60484, P63104, P78527, Q02880, Q05655, Q06609, Q07812, Q13535, Q13547, Q15554, Q16539, Q92769, Q92793, Q92889, Q96EB6, Q96ST3, Q9H3D4, P20700, Q07960, O75360, P10912, P50402, P04179, O75376, O75907, P01116, P17676, P23560, P60568, P62136, P98164, Q14186, Q14289, Q08050, Q00653, Q05195, P42858, Q9GZV9, P48357, P03372, P10275, P15336, P35568, Q02643, Q12778, Q9Y4H2, P06213, P08107, P11142, O60674, P42229, P51692, Q9UJ68, Q02297, P60953, P00749, P55916, Q96G97, P01112, P09211, P09936, P48506, Q15831, P11387, Q13253, O60566, P01133, P10599, P15923, P19235, P20226, P20248, P27986, P40763, P42338, P61244, P62979, Q05397, Q06124, Q09472, Q14526, Q15648, Q9UBK2, O60381, O94761, P29279, Q9UBX0, P42345, Q01094, P06746, Q8N6T7, O43524, P50542, O00327, O15120, O15217, O15243, O15516, O75844, O95985, P00390, P00395, P09629, P13639, P20382, P25874, P32745, P36969, P61278, P62987, P78406, P98177, Q00613, Q13219, Q99643, Q99807, Q9UBI1 Profiling a set of Aging genes Ageing-related genes (261) –
Profiling patient sets Patient Reports ICD (Abdominal pain, unspecified) Patient records processed from U. Pittsburg NLP Repository with IRB approval.
Annotation Analytics Landscape SNOMED-CT Gene Ontology Gene Sets NCIT ICD-9 Human Disease Cell Type MeSH Drugs, Chemicals Grant Sets Paper Sets Agin g Patient Sets Drug Sets : EMRs Mut What questions can we ask? Health Indicator Warehouse datasets
ANNOTATION ANALYTICS - II Analysis of semantically tagged data from electronic health records
Term – 1 : Term – n Syntactic types Frequency Term recognition tool NCBO Annotator NegEx Patterns NegEx Rules – Negation detection P1ICD9 P1T1, T2, no T4 …T5, T4, T3 …T4, T3, T1 T8, T9, T4 …T6, T8, T10 T1, T2, no T4 P2 P3 : : Pn Terms form a temporal series of tags Cohort of Interest Diseases Procedures Drugs BioPortal – knowledge graph Creating clean lexicons Annotation Workflow Further Analysis Text clinical note Terms Recognized Negation detection Generation of tagged data
ROR of 2.058, CI of [1.804, 2.349] PRR of 1.828, CI of [1.645, 2.032] The uncorrected X 2 statistic has p-value < ROR=1.524, CI=[0.872, 2.666] PRR=1.508, CI=[0.8768, 2.594] X 2 p-value= Adverse drug events
Off-label use
Analyses on semantically tagged data SNOMED-CT Gene Ontology Gene Sets NCIT ICD-9 Human Disease Cell Type MeSH Drugs, Chemicals Grant Sets Paper Sets Agin g Patient Sets Drug Sets : EMRs Mut 1.Discovering or predicting adverse drug events 2.Predicting a labeled outcome (readmissions) 3.Learning associations between terms of type intervention, disease, finding, side effects, drugs 4.Predicting rejection rates in billing/claims processing 5.Learning off-label usage patterns 1.Discovering or predicting adverse drug events 2.Predicting a labeled outcome (readmissions) 3.Learning associations between terms of type intervention, disease, finding, side effects, drugs 4.Predicting rejection rates in billing/claims processing 5.Learning off-label usage patterns Health Indicator Warehouse datasets
THE END
65 Credits Mark Musen, PI The NIH Roadmap grant U54 HG Credits Mark Musen, PI The NIH Roadmap grant U54 HG004028