Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD

Slides:



Advertisements
Similar presentations
BioPortal: A Web Repository and Services for Biomedical Ontologies and Data Resources Natasha Noy and the BioPortal team Stanford Center for Biomedical.
Advertisements

BioPortal Status and Plans September 2011 Ray Fergerson NCBO Project Director Stanford University 1.
NCBO-I2B2 Collaboration Overview and Use Cases Nigam Shah
Zoology 305 Library Databases/Indexes Lab Goals for session: 1) Meet your librarian Kevin Messner 2) Understand.
Data Normalization Milestones. Data Normalization  Goals –To conduct the science for realizing semantic interoperability and integration of diverse data.
Knowledge Graph: Connecting Big Data Semantics
Asking translational research questions using ontology enrichment analysis Nigam Shah
NCBI/WHO PubMed/Hinari Course NCBI Literature Databases: PubMed Background.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
MEDLINE®/PubMed® Based on the PubMed for Trainers course, U.S. National Library of Medicine (NLM) and NLM Training Center Jane Bridges, ML, AHIP Associate.
Searching Pubmed Database استخدام قاعدة المعلومات Pubmed د. سيناء عبد المحسن العقيل قسم الصيدلة الإكلينيكية برنامج مهارات البحث العلمي.
Searching and Exploring Biomedical Data Vagelis Hristidis School of Computing and Information Sciences Florida International University.
Supporting clinical professionals in the decision-making for patients with chronic diseases Mitja Luštrek 1, Božidara Cvetković 1, Maurizio Bordone 2,
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
©2013 MFMER | slide-1 Building A Knowledge Base of Severe Adverse Drug Events Based On AERS Reporting Data Using Semantic Web Technologies Guoqian Jiang,
The Thomson Reuters CITATION CONNECTION Digital Library st March – 3 rd April 2014, Jasná David Horký Country Manager – Central and Eastern Europe.
ISI Web of Knowledge update: February What’s New? Search history now includes refined searches New searchable Editor field Funding acknowledgements.
Medical Knowledge Watch at the Belgium Poison Centre Christophe Dupriez 26 June 2007.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
Overview of Biomedical Informatics Rakesh Nagarajan.
Rubber Hits the Road: Why NEMO needs RDF Paea LePendu Stanford Center for Biomedical Informatics Research National Center for Biomedical Ontology (NCBO)
EleMAP: An Online Tool for Harmonizing Data Elements using Standardized Metadata Registries and Biomedical Vocabularies Jyotishman Pathak, PhD 1 Janey.
THE NATIONAL CENTER FOR BIOMEDICAL ONTOLOGY Ontology-based Tools to Enhance Data Curation Trish Whetzel, PhD Outreach Coordinator December 9, 2010.
Information Extraction from Clinical Reports Wendy W. Chapman, PhD University of Pittsburgh Department of Biomedical Informatics.
DI FC UL1 Gene Function Prediction by Mining Biomedical Literature Pooja Jain Master in Bioinformatics Supervisor - Mário Jorge Costa Gaspar.
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
1 iProLINK: An integrated protein resource for literature mining and literature-based curation 1. Bibliography mapping - UniProt mapped citations 2. Annotation.
NLM-Semantic Medline Data Science Data Publication Commons Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Accelerating Candidate Gene Discovery through Ontological Indexing of Large Scale Data Repositories Simon Twigger, Ph.D.
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Ontology-based Annotation & Query of TMA data Nigam Shah Stanford Medical Informatics
) Linked2Safety Project (FP7-ICT – 5.3 ) A NEXT-GENERATION, SECURE LINKED DATA MEDICAL INFORMATION SPACE FOR SEMANTICALLY-INTERCONNECTING ELECTRONIC.
RLIMS-P: A Rule-Based Literature Mining System for Protein Phosphorylation Hu ZZ 1, Yuan X 1, Torii M 2, Vijay-Shanker K 3, and Wu CH 1 1 Protein Information.
SALUS - Scalable, Standard based Interoperability Framework for Sustainable Proactive Post Market Safety Studies A short overview May 2015 A. Anil SINACI.
Ontology-Based Annotation of Biomedical Time Series Data Rai Winslow, Steve Granite The Institute for Computational Medicine Johns Hopkins University.
Applying the Semantic Web at UCHSC - Center for Computational Pharmacology Ian Wilson.
 CiteGraph: A Citation Network System for MEDLINE Articles and Analysis Qing Zhang 1,2, Hong Yu 1,3 1 University of Massachusetts Medical School, Worcester,
Resource Curation and Automated Resource Discovery.
Chapter 9 Analyses Using Disease Ontologies Nigam H. Shah mail Tyler Cole Mark A. Musen Published: December 27, 2012 Presented By Mohand Akli Guiddir.
Lars Juhl Jensen Biomedical text mining. exponential growth.
8 October 2009Microbial Research Commons1 Toward a biomedical research commons: A view from NLM-NIH Jerry Sheehan Assistant Director for Policy Development.
CTAKES The clinical Text Analysis and Knowledge Extraction System.
Clinical Writing for Interventional Cardiologists.
1 How Informatics Can Drive Your Research Barry Smith
ADVANCED DB SYSTEMS BIOMEDICAL ENGINEERING. Index INTRODUCTION  BIOMEDICAL ENGINEERING  B.E. DATASETS APPLICATIONS  DATA MINING ON FDA DATABASE  ONTOLOGY-BASED.
The usefulness of ontologies in health research: focus on epidemiology Claudia Pagliari PhD FRCPE eHealth Research Group The University of Edinburgh Ontology.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Journal Searching Nancy B. Clark, M.Ed. Director of Medical Informatics Education FSU College of Medicine 1 All recourses are available online in Medical.
Watson Genomic Analytics. Select Watson solutions address a wide range of clinical and research needs in oncology Patient InsightsEvidence-based InsightsResearch.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
A LexWiki-based Representation and Harmonization Framework for caDSR Common Data Elements Guoqian Jiang, Ph.D. Robert Freimuth, Ph.D. Harold Solbrig Mayo.
A collaborative tool for sequence annotation. Contact:
Rubber Hits the Road: How RDF benefits NEMO
A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,
Mapping to Ontologies Nigam Shah
Clinical research data interoperbility Shared names meeting, Boston, Bosse Andersson (AstraZeneca R&D Lund) Kerstin Forsberg (AstraZeneca R&D.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Open Health Natural Language Processing Consortium
Ukpmc.ac.uk As a result of the mandates Research in the open How mandates work in practice 29 th May, 2009 Paul Davey, UK PubMed Central Engagement Manager,
Curing Cancer with Alfresco and the American Society for Clinical Pathologists Ron Swan CTO Ray Wijangco Alfresco Practice Manager.
Ontology Web Services from the National Center for Biomedical Ontology Mark Musen and Nigam Shah {musen,
TDM in the Life Sciences Application to Drug Repositioning *
Using NCBO Web services
Collaborating with the National Center for Biomedical Ontology
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
Sandy Jones, Public Health Advisor
Pathway Informatics December 5, 2018 Ansuman Chattopadhyay, PhD
Library Sessions for CM 2
Presentation transcript:

Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD

NCBO: Key activities We create and maintain a library of biomedical ontologies. We build tools and Web services to enable the use of ontologies and their derivatives. We collaborate with scientific communities that develop and use ontologies.

Ontology Services Download Traverse Search Comment Download Traverse Search Comment Widgets Tree-view Auto-complete Graph-view Tree-view Auto-complete Graph-view Annotation Data Access Mapping Services Create Download Upload Create Download Upload Views Term recognition Fetch “data” annotated with a given term

Annotation service Process textual metadata to automatically tag text with as many ontology terms as possible. 90 million calls, ~700 GB of data

Resource index Pubmed Abstracts Adverse Events (AERS) GEO : : Clinical Trials Drug Bank Won 1 st prize at the 2010 Semantic Web ISWC

Creating Lexicons Term – 1 : Term – n Sentence in Clinical Note – 1 : Sentence in Clinical Note – m tfdfNNJJ…VP… IDTerm-1150,87990, …0.03… ID: Term-n Syntactic types Frequency Frequency counter

Annotation Analytics Analyzing tagged data for hypothesis generation in bioinformatics

Genome Generic GO based analysis routine Reference set Study Set Get annotations for each gene in a set Count the occurrence of each annotation term in the study set Count the occurrence of that term in some reference set (whole genome?) P-value for how surprising their overlap is.

Annotation Analytics Landscape SNOMED-CT Gene Ontology Gene Sets NCIT ICD-9 Human Disease Cell Type MeSH Drugs, Chemicals Grant Sets Paper Sets Patient Sets Drug Sets : ? ? Health Indicator Warehouse datasets

Open questions 1.Can we use something other than the GO? 2.Lack of annotations—even today, roughly 20% of genes lack any GO annotation. 3.Annotation bias—annotation with certain ontology terms is not independent of each other. 4.Lack of a systematic mechanism to define a level of abstraction.

Profiling a set of Aging genes Disease Ontology ~ 30% of genome 261 Age-related genes Genome

Using ontologies other than GO ERCC6  nucleoplasm PARP1  protein N-terminus binding ERCC6  nucleoplasm PARP1  protein N-terminus binding ERCC6  PARP1  ERCC6  PARP1 

ERCC6GO: PMID: ERCC6GO: PMID: PARP1GO: PMID: ERCC6GO: PMID: PARP1GO: PMID: Enrichment Analysis with the DO NCBO Annotator: NCBO Annotator: {ERCC6, PARP1}  PMID: {ERCC6, PARP1}  {Cockayne syndrome, DNA damage} {ERCC6, PARP1}  {Cockayne syndrome, DNA damage}

Annotation Analytics on EMR data Analysis of tagged data from electronic health records

Profiling patient sets 86k patient Reports ICD (Abdominal pain, unspecified site) Patient records processed from U. Pittsburgh NLP Repository with IRB approval.

Annotation (Clinical Text)

Term – 1 : Term – n Syntactic types Frequency Term recognition tool NCBO Annotator NegEx Patterns NegEx Rules – Negation detection P1ICD9 P1T1, T2, no T4 …T5, T4, T3 …T4, T3, T1 T8, T9, T4 …T6, T8, T10 T1, T2, no T4 P2 P3 : : Pn Terms form a temporal series of tags  Cohort of Interest Diseases Procedures Drugs BioPortal – knowledge graph Creating clean lexicons Annotation Workflow Further Analysis Text clinical note Terms Recognized Negation detection Generation of tagged data

ROR of 2.058, CI of [1.804, 2.349] The X 2 statistic has p-value < ROR=1.524, CI=[0.872, 2.666] X 2 p-value = Detecting the Vioxx Risk Signal Vioxx Patients (1,560) RA Patients (14,079) MI Patients (1,827) Vioxx  MI (339) p-value < 1.3x10 -24

Detecting Adverse Events

Linear Space FeaturesLogarithmic Space Features Drug frequency Disease frequency Observed drug-first fractionObserved co-mention count Drug-first fraction z-score (fixed drug) Co-mention count z-score (fixed drug) Drug-first fraction z-score (fixed disease) Co-mention count z-score (fixed disease)

Detecting Adverse Events

Detecting Off-label use

Annotation Analytics Landscape SNOMED-CT Gene Ontology Gene Sets NCIT ICD-9 Human Disease Cell Type MeSH Drugs, Chemicals Grant Sets Paper Sets Agin g Patient Sets Drug Sets : EMRs What questions can we ask? Health Indicator Warehouse datasets

Associations and outcomes GeneDiseaseDrugDeviceProcedureEnvironment Gene Disease Drug Device Procedure Environment Side effects Off-label Indications Enrichment What questions can we ask?

Acknowledgements Paea LePendu Yi Liu Srinivasan Iyer Steve Racunas Anna Bauer-Mehren Clement Jonquet Rong Xu Mark Musen NIH – NCBO funding Mayo Team Hongfang Liu Stephen Wu Sylvia Holland Alex Skrenchuk

Mining Annotations of Grants, Publications Grants from 1972 to funding agencies Publications from Medline Only “Journal articles”

Sponsorship and Allocation

Who funds what