Download presentation
Presentation is loading. Please wait.
Published byDustin Sims Modified over 9 years ago
1
Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu
2
NCBO: Key activities We create and maintain a library of biomedical ontologies. We build tools and Web services to enable the use of ontologies and their derivatives. We collaborate with scientific communities that develop and use ontologies.
3
http://rest.bioontology.org Ontology Services Download Traverse Search Comment Download Traverse Search Comment Widgets Tree-view Auto-complete Graph-view Tree-view Auto-complete Graph-view Annotation Data Access Mapping Services Create Download Upload Create Download Upload Views Term recognition Fetch “data” annotated with a given term http://bioportal.bioontology.org
4
Annotation service Process textual metadata to automatically tag text with as many ontology terms as possible. 90 million calls, ~700 GB of data
5
Resource index Pubmed Abstracts Adverse Events (AERS) GEO : : Clinical Trials Drug Bank Won 1 st prize at the 2010 Semantic Web Challenge @ ISWC
6
Creating Lexicons Term – 1 : Term – n Sentence in Clinical Note – 1 : Sentence in Clinical Note – m tfdfNNJJ…VP… IDTerm-1150,87990,0000.900.05…0.03… ID: Term-n Syntactic types Frequency Frequency counter
7
Annotation Analytics Analyzing tagged data for hypothesis generation in bioinformatics
8
Genome Generic GO based analysis routine Reference set Study Set Get annotations for each gene in a set Count the occurrence of each annotation term in the study set Count the occurrence of that term in some reference set (whole genome?) P-value for how surprising their overlap is.
9
Annotation Analytics Landscape SNOMED-CT Gene Ontology Gene Sets NCIT ICD-9 Human Disease Cell Type MeSH Drugs, Chemicals Grant Sets Paper Sets Patient Sets Drug Sets : ? ? Health Indicator Warehouse datasets
10
Open questions 1.Can we use something other than the GO? 2.Lack of annotations—even today, roughly 20% of genes lack any GO annotation. 3.Annotation bias—annotation with certain ontology terms is not independent of each other. 4.Lack of a systematic mechanism to define a level of abstraction.
11
Profiling a set of Aging genes Disease Ontology ~ 30% of genome 261 Age-related genes Genome
12
Using ontologies other than GO ERCC6 nucleoplasm PARP1 protein N-terminus binding ERCC6 nucleoplasm PARP1 protein N-terminus binding ERCC6 PARP1 ERCC6 PARP1
13
ERCC6GO:0005654PMID:16107709 ERCC6GO:0008094PMID:16107709 PARP1GO:0047485PMID:16107709 ERCC6GO:0005730PMID:16107709 PARP1GO:0003950PMID:16107709 http://www.geneontology.org/GO.downloads.annotations.shtml Enrichment Analysis with the DO www.ncbi.nlm.nih.gov/pubmed/16107709 NCBO Annotator: http://bioportal.bioontology.org NCBO Annotator: http://bioportal.bioontology.org {ERCC6, PARP1} PMID:16107709 {ERCC6, PARP1} {Cockayne syndrome, DNA damage} {ERCC6, PARP1} {Cockayne syndrome, DNA damage}
14
Annotation Analytics on EMR data Analysis of tagged data from electronic health records
15
Profiling patient sets 86k patient Reports ICD9 789.00 (Abdominal pain, unspecified site) Patient records processed from U. Pittsburgh NLP Repository with IRB approval.
16
Annotation (Clinical Text)
17
Term – 1 : Term – n Syntactic types Frequency Term recognition tool NCBO Annotator NegEx Patterns NegEx Rules – Negation detection P1ICD9 P1T1, T2, no T4 …T5, T4, T3 …T4, T3, T1 T8, T9, T4 …T6, T8, T10 T1, T2, no T4 P2 P3 : : Pn Terms form a temporal series of tags Cohort of Interest Diseases Procedures Drugs BioPortal – knowledge graph Creating clean lexicons Annotation Workflow Further Analysis Text clinical note Terms Recognized Negation detection Generation of tagged data
18
ROR of 2.058, CI of [1.804, 2.349] The X 2 statistic has p-value < 10 -7 ROR=1.524, CI=[0.872, 2.666] X 2 p-value = 0.06816. Detecting the Vioxx Risk Signal Vioxx Patients (1,560) RA Patients (14,079) MI Patients (1,827) Vioxx MI (339) p-value < 1.3x10 -24
19
Detecting Adverse Events
20
Linear Space FeaturesLogarithmic Space Features Drug frequency Disease frequency Observed drug-first fractionObserved co-mention count Drug-first fraction z-score (fixed drug) Co-mention count z-score (fixed drug) Drug-first fraction z-score (fixed disease) Co-mention count z-score (fixed disease)
21
Detecting Adverse Events
22
Detecting Off-label use
23
Annotation Analytics Landscape SNOMED-CT Gene Ontology Gene Sets NCIT ICD-9 Human Disease Cell Type MeSH Drugs, Chemicals Grant Sets Paper Sets Agin g Patient Sets Drug Sets : EMRs What questions can we ask? Health Indicator Warehouse datasets
24
Associations and outcomes GeneDiseaseDrugDeviceProcedureEnvironment Gene Disease Drug Device Procedure Environment Side effects Off-label Indications Enrichment What questions can we ask?
25
Acknowledgements Paea LePendu Yi Liu Srinivasan Iyer Steve Racunas Anna Bauer-Mehren Clement Jonquet Rong Xu Mark Musen NIH – NCBO funding Mayo Team Hongfang Liu Stephen Wu Sylvia Holland Alex Skrenchuk
26
Mining Annotations of Grants, Publications Grants from 1972 to 2007 30 funding agencies Publications from Medline Only “Journal articles”
27
Sponsorship and Allocation
28
Who funds what
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.