W3C Semantic Web for HealthCare and Life Sciences Interest Group

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
RDB2RDF: Incorporating Domain Semantics in Structured Data Satya S. Sahoo Kno.e.sis CenterKno.e.sis Center, Computer Science and Engineering Department,
XML Technology in E-Commerce
By Ahmet Can Babaoğlu Abdurrahman Beşinci.  Suppose you want to buy a Star wars DVD having such properties;  wide-screen ( not full-screen )  the extra.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
©2013 MFMER | slide-1 Building A Knowledge Base of Severe Adverse Drug Events Based On AERS Reporting Data Using Semantic Web Technologies Guoqian Jiang,
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
1 CIS607, Fall 2006 Semantic Information Integration Instructor: Dejing Dou Week 10 (Nov. 29)
1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.
Module 2b: Modeling Information Objects and Relationships IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
Medical Informatics Basics
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
1 The Discovery Informatics Framework Pat Rougeau President and CEO MDL Information Systems, Inc. Delivering the Integration Promise American Chemical.
Linked Open Data: a new resource for eResearch Dr Anne Cregan eResearch Analyst, Intersect and ANDS
SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014.
Semantic Web Technologies ufiekg-20-2 | data, schemas & applications | lecture 21 original presentation by: Dr Rob Stephens
Practical RDF Chapter 1. RDF: An Introduction
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
Medical Informatics Basics
Medical Informatics Basics Lection 1 Associated professor Andriy Semenets Department of Medical Informatics.
Advancing translational research with the Semantic Web Ruttenberg, Clark, Bug, Samwald, Bodenreider, Chen, Doherty, Forsberg, Gao, Kashyap, Kinoshita,
Business Value of SW in Drug Discovery Eric Neumann, W3C HCLSIG co-chair Teranode Corporation F2F Cambridge MA.
Integrated Biomedical Information for Better Health Workprogramme Call 4 IST Conference- Networking Session.
Teranode Tools and Platform for Pathway Analysis Michael Kellen, Solution Manager June 16, 2006.
From Bench to Bedside: Applications to Drug Discovery and Development Eric Neumann W3C HCLSIG co-chair Teranode Corporation HCLSIG F2F Cambridge MA.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Systems Biology ___ Toward System-level Understanding of Biological Systems Hou-Haifeng.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Introduction to the Semantic Web and Linked Data
BBN Technologies Copyright 2009 Slide 1 The S*QL Plugin for Cytoscape Visual Analytics on the Web of Linked Data Rusty (Robert J.) Bobrow Jeff Berliner,
12/7/2015Page 1 Service-enabling Biomedical Research Enterprise Chapter 5 B. Ramamurthy.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Mining the Biomedical Research Literature Ken Baclawski.
Semantic Web COMS 6135 Class Presentation Jian Pan Department of Computer Science Columbia University Web Enhanced Information Management.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
Clinical research data interoperbility Shared names meeting, Boston, Bosse Andersson (AstraZeneca R&D Lund) Kerstin Forsberg (AstraZeneca R&D.
High throughput biology data management and data intensive computing drivers George Michaels.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
Knowledge Representation Part I Ontology Jan Pettersen Nytun Knowledge Representation Part I, JPN, UiA1.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Introduction to PubChem BioAssay
Semantic Web - caBIG Abstract: 21st century biomedical research is driven by massive amounts of data: automated technologies generate hundreds of.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
W3C Semantic Web for Health Care and Life Sciences Interest Group
The Semantic Web By: Maulik Parikh.
Scientific Reproducibility using the Provenance for Healthcare and Clinical Research Framework Satya S. Sahoo Collaborators/Co-Authors: Joshua Valdez,
Enabling the Vision of Bench-to-Bedside with Semantic Web Technologies
Harnessing the Semantic Web to Answer Scientific Questions:
Middleware independent Information Service
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Data challenges in the pharmaceutical industry
Visualization of Adverse effect pathways
Sponsored by the University of Southampton
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Lixia Yao, James A. Evans, Andrey Rzhetsky  Trends in Biotechnology 
HCLS Tutorial: The W3C Health Care and Life Sciences Interest Group
2. An overview of SDMX (What is SDMX? Part I)
LOD reference architecture
Drug Design and Drug Discovery
W3C HCLS Task Force on Drug Safety and Efficacy
Rational for the 5R Philosophy
2nd HCLS F2F Amsterdam 3-4 October, 2006
Service-enabling Biomedical Research Enterprise
Pillars of WARF Therapeutics: Invest - Develop - Partner
Presentation transcript:

HCLS Workshop @ ISWC Eric Neumann and Tonya Hongsermeier University of Georgia, Nov 6, 2006

W3C Semantic Web for HealthCare and Life Sciences Interest Group Launched Nov 2005: http://www.w3.org/2001/sw/hcls Co-chairs: Dr. Tonya Hongsermeier (Partners HealthCare); Eric Neumann (Teranode) Chartered to develop and support the use of SW technologies and practices to improve collaboration, research and development, and innovation adoption in the of Health Care and Life Science domains Based on a foundation of semantically rich specifications that support process and information interoperability HCLS Objectives: Core vocabularies and ontologies to support cross-community data integration and collaborative efforts Guidelines and Best Practices for Resource Identification to support integrity and version control Better integration of Scientific Publication with people, data, software, publications, and clinical trials

HCLS Philosophy Share use-cases, applications, demonstrations, experiences Expose collections as RDF using public tools Develop (where appropriate) core vocabularies for data integration

HCLS Activities BioRDF - data + NLP as RDF BioONT - ontology coordination Adaptive Clinical Protocols and Pathways Drug Safety and Efficacy Scientific Publishing - evidence management

Outline Basic Informatics Challenges Bench-to-Bedside Applications What is the Semantic Web? Current Activities… Case Studies

Drug Discovery and Medicine Health Practice Safety Prevention Privacy Knowledge Hygieia, G. Klimt

Large Data Sets Variables >> Samples Data Expansion Large Data Sets Variables >> Samples Many New Data Types Which Formats? Combine

Where Information Advances are Most Needed Supporting Innovative Applications in R&D Translational Medicine (Biomarkers) Molecular Mechanisms (Systems) Data Provenance, Rich Annotation Clinical Information eHealth Records, EDC, Clinical Submission Documents Safety Information, Pharmacovigilance, Adverse Events, Biomarker data Standards Central Data Sources Genomics, Diseases, Chemistry, Toxicology MetaData Ontologies Vocabularies

The Big Picture - Hard to understand from just a few Points of View

Complete view tells a very different Story

Distributed Nature of R&D Silos of Data…

Data Integration: Biology Requirements Disease Proteins Genes Papers Retention Policy Audit Trail Curation Tools Ontology Experiment Assays Compounds

New Regulatory Issues Confronting Pharmaceuticals Tox/Efficacy ADME Optim from Innovation or Stagnation, FDA Report March 2004

Translational Medicine in Drug R&D Early Middle Late Cellular Systems Human In Vitro Studies Animal Studies Clinical Studies Disease Models (Therapeutic Relevance) Toxicities Target/System Efficacy $500K $5M $500M

Translational Research Improve communication between basic and clinical science so that more therapeutic insights may be derived from new scientific ideas - and vice versa. Testing of theories emerging from preclinical experimentation on disease-affected human subjects. Information obtained from preliminary human experimentation can be used to refine our understanding of the biological principles underpinning the heterogeneity of human disease and polymorphism(s). http://www.translational-medicine.com/info/about Reference NIH Digital Roadmap activity

HCLS Framework: Biomedical Research Molecular, Cellular and Systems Biology/Physiology Organism as an integrated an interacting network of genes, proteins and biochemical reactions Human body as a system of interacting organs Molecular Cell Biology/Genomic and Proteomic Research Gene Sequencing, Genotyping, Protein Structures Cell Signaling and other Pathways Biomarker Research Discovery of genes and gene products that can be used to measure disease progression or impacts of drug Pharmaco-genomics Impact of genetic inheritance on Drug Discovery and Translational Research Use of preclinical research to identify promising drug candidates

HCLS Framework: Clinical Research Clinical Trials Determination of efficacy, impact and safety of drugs for particular diseases Pharmaco-vigilance/ADE Surveillance Monitoring of impacts of drugs on patients, especially safety and adverse event related information Patient Cohort Identification and Management Identifying patient cohorts for drug trials is a challenging task Translational Research Test theories emerging from pre-clinical experimentation on disease affected human subjects Development of EHRs/EMRs for both clinical research and practice Currently EHRs/EMRs focussed on clinical workflow processes Re-using that information for clinical research and trials is a challenging task

Ecosystem: Goal State /* Need to expand this with Biomedical Research + Clinical Practice */ Biomedical Research Clinial Practice /* Need to expand this to include Healthcare and Biomedical Research Players as well… Show an integrated picture with “continuous” information flow */

What is the Semantic Web ? It’s Text Extraction It’s AI It’s Semantic Webs It’s Web 2.0 It’s Data Tracking It’s Ontologies It’s a Global Conspiracy http://www.w3.org/2006/Talks/0125-hclsig-em/

The Current Web What the computer sees: “Dumb” links No semantics - <a href> treated just like <bold> Minimal machine-processable information

The Semantic Web Machine-processable semantic information Semantic context published – making the data more informative to both humans and machines

Understanding the Semantic Web Vision Some day in the future… Today-> describing data Core Concept: TRIPLES… Specifications RDF, OWL, GRDDL- Coming soon: SPARQL, RIF Applications Data Aggregation: Recombinant Data Statements: Annotating things Practices Everything gets a URI… New definition of Data Interoperability: DTA: Data Transit Authority Subject Object Property <Patient HB2122> <shows_sign> <Disease Pneumococcal_Meningitis>

Application Space : Semantic Web Drug DD Therapeutics safety Critical Path Chem Lib manufacturing NDA Production Genomics HTS Clinical Studies eADME Compound Opt Patent Biology DMPK genes informatics

URI - A key element Uniform Resource Identifier Specification used in HTML, XML, and RDF-OWL Fundamental to RDF: It IS the only valid SW identifier! Two forms: HTTP- http://biopax.org/pathway/kreb_cycle.owl URN- urn:lsid:biopax.org:pathway:kreb_cycle Resolution Mapping retrievable data to a URI Does not mean getting everything known about a URI Not clear how to best handle versioning See Alan’s slides…

REST-fulness REST is a term coined by Roy Fielding to describe an architecture style of networked systems. REST is an acronym standing for Representational State Transfer. http://www.molbio.org/gene (get gene list) http://www.molbio.org/gene/hugsk3b (get gene info) Can REST == URI, and if so, when? Yes, if we agree return function is identical to URI resolution Issues: Should it return RDF always? - standardized Resolution is only a subset of services, how do we handle non-resolution services: are these URI’s as well?

Opportunities for Semantics in HealthCare Enhanced interoperability via: Semantic Tagging Grounding of concepts in Standardized Vocabularies Complex Definitions Semantics-based Observation Capture Inference on Diseases Phenotypes Genetics Mechanisms Semantics-based Clinical Decision Support Guided Data Interpretation Guided Ordering Semantics-based Knowledge Management

Data Semantics in the Life Sciences Pathways, Biomarkers Publications Complex Objects with Categorical/Taxonomic Data Items Systems Biology Gene expression Publications + data Categorical Taxonomic Data Items Image + Text Data Items Data Items Text Text + data items Composite Objects with Embedded “process” Complex Objects Histology Profiling Glossary A collection of terms of interest with associated meanings Thesaurus A collection of terms organized in a hierarchical structure Database Schema A collection of table definitions representing concepts and relationships and column definitions representing properties. Use to describe a structured (typically relational) database RDF(S) W3C Standard called the Resource Description Framework (Schema) used to define and capture knowledge, typically richer than a database schema Ontylog Special kind of logical language based on description logics used to represent medical ontologies such as Snomed OWL W3C Standard called Web Ontology Language used to represent ontologies. Based on a family of description logics and has richer representational constructs when compared to Ontylog IEEE SUO An IEEE Working Group working to specify an upper ontology to support computer applications such as data interoperability, information search and retrieval, automated inferencing, and natural language processing. Consists of a wide variety of rich domain independent concepts Cyc Very well known effort to capture human common sense knowledge. Uses a rich representational language called Cyc-L which uses higher order logics to capture knowledge GO (Gene ontology). KEGG (Kyoto Encyclopedia of Genes and Genomes) is a bioinformatics resource for understanding higher order functional meanings and utilities of the cell or the organism from its genome information. TAMBIS (Transparent Access to Multiple Bioinformatics Information Source). TAMBIS aims to aid researchers in biological science by providing a single access point for biological information sources round the world. EcoCyc, a part of the BioCyc library, is a scientific database for the bacterium Escherichia coli. The EcoCyc project performs literature-based curation of the entire E. coli genome, and of E. coli transcriptional regulation, transporters, and metabolic pathways. BioPAX (Biological Pathways Exchange). genomics Clinical Findings Clinical trials Unstructured Data Types Structured and Complex Data Types

RDB => RDF Virtualized RDF

XML => RDF (GRDDL) XSL XML RDF GRDDL

RDFa: Bridging the Hypertext and Semantic Webs <div xmlns:cc="http://web.resource.org/cc/" xmlns:dc=”http://purl.org/dc/1.1/” about=”photo2.jpg”> This photo was taken by <span property=”dc:creator”>Ben Adida</span> and is licensed under a <a rel=”cc:license” href=”http://cc.org/licenses/by/2.5/”> Creative Commons License </a>. </div> photo2.jpg Ben Adida licenses/by/2.5/ dc:creator cc:license

Example: Knowledge Aggregation Courtesy of BG-Medicine

Case Study: Omics Subject  Verb  Object ApoA1 … … is produced by the Liver … is expressed less in Atherosclerotic Liver … is correlated with DKK1 … is cited regarding Tangier’s disease … has Tx Reg elements like HNFR1 Subject  Verb  Object

Knowledge Mining using Semantic Web “Gene Prioritization through Data Fusion” Aerts et al, 2006, Nature Use of quantitative and qualitative information for statistical ranking. Can be used to identify novel genes involved in diseases

Potential Linked Clinical Ontologies SNOMED CDISC Disease Descriptions Clinical Obs ICD10 Applications Clinical Trials ontology RCRIM (HL7) Disease Models Pathways (BioPAX) Mechanisms IRB Tox Genomics Molecules Extant ontologies Under development Bridge concept

Case Study: BioPAX (Pathways) <bp:PATHWAYSTEP rdf:ID="xDshToXGSK3bPathwayStep"> <bp:next-step rdf:resource="#xGSK3bToBetaCateninPathwayStep"/> <bp:step-interactions> <bp:MODULATION rdf:ID="xDshToXGSK3b"> <bp:keft rdf:resource="#xDsh"/> <bp:right rdf:resource="#xGSK-3beta"/> <bp:participants rdf:resource="#xGSK-3beta"/> <bp:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> Dishevelled to GSK3beta</bp:name> <bp:direction rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> IRREVERSIBLE-LEFT-TO-RIGHT</bp: direction > <bp:control-type rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> INHIBITION</bp: control-type > <bp: participants rdf:resource="#xDsh"/> </bp: MODULATION > </bp: step-interactions > </bp: PATHWAYSTEP >

Case Study: BioPAX (Pathways) <bp:PATHWAYSTEP rdf:ID="xDshToXGSK3bPathwayStep"> <bp:next-step rdf:resource="#xGSK3bToBetaCateninPathwayStep"/> <bp:step-interactions> <bp:MODULATION rdf:ID="xDshToXGSK3b"> <bp:keft rdf:resource="#xDsh"/> <bp:right rdf:resource="#xGSK-3beta"/> <bp:participants rdf:resource="#xGSK-3beta"/> <bp:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> Dishevelled to GSK3beta</bp:name> <bp:direction rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> IRREVERSIBLE-LEFT-TO-RIGHT</bp: direction > <bp:control-type rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> INHIBITION</bp: control-type > <drug:affectedBy rdf:resource=”http://pharma.com/cmpd/CHIR99102"/> <bp: participants rdf:resource="#xDsh"/> </bp: MODULATION > </bp: step-interactions > </bp: PATHWAYSTEP > Modulation CHIR99102 affectedBy

Case Study: Drug Discovery Dashboards Dashboards and Project Reports Next generation browsers for semantic information via Semantic Lenses Renders OWL-RDF, XML, and HTML documents Lenses act as information aggregators and logic style-sheets add { ls:TheraTopic hs:classView:TopicView }

Drug Discovery Dashboard http://www.w3.org/2005/04/swls/BioDash Topic: GSK3beta Topic Target: GSK3beta Disease: DiabetesT2 Alt Dis: Alzheimers Cmpd: SB44121 CE: DBP Team: GSK3 Team Person: John Related Set Path: WNT

Bridging Chemistry and Molecular Biology Semantic Lenses: Different Views of the same data BioPax Components Target Model urn:lsid:uniprot.org:uniprot:P49841 Apply Correspondence Rule: if ?target.xref.lsid == ?bpx:prot.xref.lsid then ?target.correspondsTo.?bpx:prot

Bridging Chemistry and Molecular Biology Lenses can aggregate, accentuate, or even analyze new result sets Behind the lens, the data can be persistently stored as RDF-OWL Correspondence does not need to mean “same descriptive object”, but may mean objects with identical references

Pathway Polymorphisms Merge directly onto pathway graph Identify targets with lowest chance of genetic variance Predict parts of pathways with highest functional variability Map genetic influence to potential pathway elements Select mechanisms of action that are minimally impacted by polymorphisms Non-synonymous polymorphisms from db-SNP

BioRDF Neuro Tasks Aggregate facts and models around Parkinson’s Disease BIRN / Human Brain Project SWAN: scientific annotations and evidence NeuroCommons Use RDF and OWL to describe ’Brain Connectivity' Neuronal data in SenseLab

BioRDF: Reagents RDF resources that describes various kinds of experimental reagents, starting with antibodies: Initial RDF that captures: Gene, the fact that this is an antibody, various kinds of pages about the antibody, such as vendor documentation, and any other properties that are explicitly captured in the source material Work with the Ontology task force to identify appropriate ontologies and vocabularies to use in the RDF. Write queries against the RDF to answer questions of the sort posed on the Alzforum's

BioRDF: NCBI NCBI Data: URIs and as RDF (Olivier Bodensreider) Terminology Integration: NLM’s UMLS, MESH SNOMED…

Conclusions: Key Semantic Web Principles Plan for change Free data from the application that created it Lower reliance on overly complex Middleware The value in "as needed" data integration Big wins come from many little ones The power of links - network effect Open-world, open solutions are cost effective Importance of "Partial Understanding"