Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD

Similar presentations


Presentation on theme: "Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD"— Presentation transcript:

1 Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

2 NCBO: Key activities We create and maintain a library of biomedical ontologies. We build tools and Web services to enable the use of ontologies and their derivatives. We collaborate with scientific communities that develop and use ontologies.

3 http://rest.bioontology.org Ontology Services Download Traverse Search Comment Download Traverse Search Comment Widgets Tree-view Auto-complete Graph-view Tree-view Auto-complete Graph-view Annotation Data Access Mapping Services Create Download Upload Create Download Upload Views Term recognition Fetch “data” annotated with a given term http://bioportal.bioontology.org

4 Annotation service Process textual metadata to automatically tag text with as many ontology terms as possible. 90 million calls, ~700 GB of data

5 Resource index Pubmed Abstracts Adverse Events (AERS) GEO : : Clinical Trials Drug Bank Won 1 st prize at the 2010 Semantic Web Challenge @ ISWC

6 Creating Lexicons Term – 1 : Term – n Sentence in Clinical Note – 1 : Sentence in Clinical Note – m tfdfNNJJ…VP… IDTerm-1150,87990,0000.900.05…0.03… ID: Term-n Syntactic types Frequency Frequency counter

7 Annotation Analytics Analyzing tagged data for hypothesis generation in bioinformatics

8 Genome Generic GO based analysis routine Reference set Study Set Get annotations for each gene in a set Count the occurrence of each annotation term in the study set Count the occurrence of that term in some reference set (whole genome?) P-value for how surprising their overlap is.

9 Annotation Analytics Landscape SNOMED-CT Gene Ontology Gene Sets NCIT ICD-9 Human Disease Cell Type MeSH Drugs, Chemicals Grant Sets Paper Sets Patient Sets Drug Sets : ? ? Health Indicator Warehouse datasets

10 Open questions 1.Can we use something other than the GO? 2.Lack of annotations—even today, roughly 20% of genes lack any GO annotation. 3.Annotation bias—annotation with certain ontology terms is not independent of each other. 4.Lack of a systematic mechanism to define a level of abstraction.

11 Profiling a set of Aging genes Disease Ontology ~ 30% of genome 261 Age-related genes Genome

12 Using ontologies other than GO ERCC6  nucleoplasm PARP1  protein N-terminus binding ERCC6  nucleoplasm PARP1  protein N-terminus binding ERCC6  PARP1  ERCC6  PARP1 

13 ERCC6GO:0005654PMID:16107709 ERCC6GO:0008094PMID:16107709 PARP1GO:0047485PMID:16107709 ERCC6GO:0005730PMID:16107709 PARP1GO:0003950PMID:16107709 http://www.geneontology.org/GO.downloads.annotations.shtml Enrichment Analysis with the DO www.ncbi.nlm.nih.gov/pubmed/16107709 NCBO Annotator: http://bioportal.bioontology.org NCBO Annotator: http://bioportal.bioontology.org {ERCC6, PARP1}  PMID:16107709 {ERCC6, PARP1}  {Cockayne syndrome, DNA damage} {ERCC6, PARP1}  {Cockayne syndrome, DNA damage}

14 Annotation Analytics on EMR data Analysis of tagged data from electronic health records

15 Profiling patient sets 86k patient Reports ICD9 789.00 (Abdominal pain, unspecified site) Patient records processed from U. Pittsburgh NLP Repository with IRB approval.

16 Annotation (Clinical Text)

17 Term – 1 : Term – n Syntactic types Frequency Term recognition tool NCBO Annotator NegEx Patterns NegEx Rules – Negation detection P1ICD9 P1T1, T2, no T4 …T5, T4, T3 …T4, T3, T1 T8, T9, T4 …T6, T8, T10 T1, T2, no T4 P2 P3 : : Pn Terms form a temporal series of tags  Cohort of Interest Diseases Procedures Drugs BioPortal – knowledge graph Creating clean lexicons Annotation Workflow Further Analysis Text clinical note Terms Recognized Negation detection Generation of tagged data

18 ROR of 2.058, CI of [1.804, 2.349] The X 2 statistic has p-value < 10 -7 ROR=1.524, CI=[0.872, 2.666] X 2 p-value = 0.06816. Detecting the Vioxx Risk Signal Vioxx Patients (1,560) RA Patients (14,079) MI Patients (1,827) Vioxx  MI (339) p-value < 1.3x10 -24

19 Detecting Adverse Events

20 Linear Space FeaturesLogarithmic Space Features Drug frequency Disease frequency Observed drug-first fractionObserved co-mention count Drug-first fraction z-score (fixed drug) Co-mention count z-score (fixed drug) Drug-first fraction z-score (fixed disease) Co-mention count z-score (fixed disease)

21 Detecting Adverse Events

22 Detecting Off-label use

23 Annotation Analytics Landscape SNOMED-CT Gene Ontology Gene Sets NCIT ICD-9 Human Disease Cell Type MeSH Drugs, Chemicals Grant Sets Paper Sets Agin g Patient Sets Drug Sets : EMRs What questions can we ask? Health Indicator Warehouse datasets

24 Associations and outcomes GeneDiseaseDrugDeviceProcedureEnvironment Gene Disease Drug Device Procedure Environment Side effects Off-label Indications Enrichment What questions can we ask?

25 Acknowledgements Paea LePendu Yi Liu Srinivasan Iyer Steve Racunas Anna Bauer-Mehren Clement Jonquet Rong Xu Mark Musen NIH – NCBO funding Mayo Team Hongfang Liu Stephen Wu Sylvia Holland Alex Skrenchuk

26 Mining Annotations of Grants, Publications Grants from 1972 to 2007 30 funding agencies Publications from Medline Only “Journal articles”

27 Sponsorship and Allocation

28 Who funds what


Download ppt "Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD"

Similar presentations


Ads by Google