RDB2RDF: Incorporating Domain Semantics in Structured Data Satya S. Sahoo Kno.e.sis CenterKno.e.sis Center, Computer Science and Engineering Department,

Slides:



Advertisements
Similar presentations
Knowledge Modeling and its Application in Life Sciences: A Tale of two ontologies Bioinformatics for Glycan Expression Integrated Technology Resource for.
Advertisements

© 2006 IBM Corporation Integrating Life Sciences Data on the Web using SPARQL Lee Feigenbaum May, 2006.
Discovering Disease Associations using a Biomedical Semantic Web: Integration and Ranking One of the principal goals of biomedical research is to elucidate.
Ontology-driven Provenance Management in eScience: An Application in Parasite Research Ontology-driven Provenance Management in eScience: An Application.
Using DAML format for representation and integration of complex gene networks: implications in novel drug discovery K. Baclawski Northeastern University.
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
TU e technische universiteit eindhoven / department of mathematics and computer science Modeling User Input and Hypermedia Dynamics in Hera Databases and.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Pathway Knowledge Base: A Public Repository for Searching Biological Pathways Nikesh Kotecha 1, Kyle Bruck 1, William Lu 1 and Nigam Shah 1 1 Department.
Language (Formalisms) For Ontology Building Neda Alipanah 22 October 2012.
Semantic Web Introduction
The Semantic Web. The Web Today Designed for Human to read Cannot express meaning Architecture: URL –Decentralized: Link structure Language: html.
Representing the Immune Epitope Database in OWL Jason A. Greenbaum 1, Randi Vita 1, Laura Zarebski 1, Hussein Emami 2, Alessandro Sette 1, Alan Ruttenberg.
Progress Update Semantic Web, Ontology Integration, and Web Query Seminar Department of Computing David George.
Provenance Management Framework Satya S. Sahoo Kno.e.sis Center, Wright State University In Collaboration with Tarleton Lab, University of Georgia and.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Semantic Web Services Peter Bartalos. 2 Dr. Jorge Cardoso and Dr. Amit Sheth
Knowledge Enabled Information and Services Science What can SW do for HCLS today? Panel at HCSL Workshop, WWW2007 Amit Sheth Kno.e.sis Center Wright State.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Enterprise Information Integration.
Bioinformatics and the Engineering Library ASEE 2008 Amy Stout.
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
Linked Sensor Data Harshal Patni, Cory Henson, Amit P. Sheth Ohio Center of Excellence in Knowledge enabled Computing (Kno.e.sis) Wright State University,
OWL-AA: Enriching OWL with Instance Recognition Semantics for Automated Semantic Annotation 2006 Spring Research Conference Yihong Ding.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
1 CIS607, Fall 2006 Semantic Information Integration Instructor: Dejing Dou Week 10 (Nov. 29)
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. Multiple Ontologies in.
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Managing & Integrating Enterprise Data with Semantic Technologies Susie Stephens Principal Product Manager, Oracle
Knowledge Enabled Information and Services Science Empowering Translational Research using Semantic Web Technologies OCCBIO, June, 2008 Amit P. Sheth Kno.e.sis.
Krishnaprasad Thirunarayan, Pramod Anantharam, Cory A. Henson, and Amit P. Sheth Kno.e.sis Center, Ohio Center of Excellence on Knowledge-enabled Computing,
Knowledge based Learning Experience Management on the Semantic Web Feng (Barry) TAO, Hugh Davis Learning Society Lab University of Southampton.
Grant Number: IIS Institution of PI: Arizona State University PIs: Zoé Lacroix Title: Collaborative Research: Semantic Map of Biological Data.
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
Peer-to-Peer Data Integration Using Distributed Bridges Neal Arthorne B. Eng. Computer Systems (2002) Supervisor: Babak Esfandiari April 12, 2005 Candidate.
The Semantic Web Web Science Systems Development Spring 2015.
Applying the Semantic Web at UCHSC - Center for Computational Pharmacology Ian Wilson.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Enabling complex queries to drug information sources through functional composition Olivier Bodenreider Lister Hill National Center for Biomedical Communications.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
BBN Technologies Copyright 2009 Slide 1 The S*QL Plugin for Cytoscape Visual Analytics on the Web of Linked Data Rusty (Robert J.) Bobrow Jeff Berliner,
12/7/2015Page 1 Service-enabling Biomedical Research Enterprise Chapter 5 B. Ramamurthy.
Mining the Biomedical Research Literature Ken Baclawski.
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
R Store Angelique Moscicki Oshani Seneviratne Sergio Herrero-Lopez.
Proposed Research Problem Solving Environment for T. cruzi Intuitive querying of multiple sets of heterogeneous databases Formulate scientific workflows.
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
Advanced Library Services Developing a Biomedical Knowledge Repository to Support Advanced Information Management Applications Olivier Bodenreider, M.D.,
Selected Semantic Web UMBC CoBrA – Context Broker Architecture  Using OWL to define ontologies for context modeling and reasoning  Taking.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB DongHyuk Im.
EBI is an Outstation of the European Molecular Biology Laboratory. Semantic Interoperability Framework Sarala M. Wimalaratne (RICORDO project)
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
Syntax and semantics >AMYLASEE1 TGCATNGY A very simple FASTA file.
Building Trustworthy Semantic Webs
Collaborative Vocabulary Management
Harnessing the Semantic Web to Answer Scientific Questions:
DReNIn_O “A high-level ontology for drug repositioning” Joseph Mullen
Ontology.
Ontology.
The Linked Data Cloud Source: Chris Bizer. Linking Open Drug Data Susie Stephens, Principal Research Scientist, Eli Lilly.
Collaborative RO1 with NCBO
Service-enabling Biomedical Research Enterprise
Chaitali Gupta, Madhusudhan Govindaraju
Presentation transcript:

RDB2RDF: Incorporating Domain Semantics in Structured Data Satya S. Sahoo Kno.e.sis CenterKno.e.sis Center, Computer Science and Engineering Department, Wright State University, Dayton, OH, USA

Acknowledgements Dr. Olivier Bodenreider (U.S NLM, NIH) Dr. Amit Sheth (Kno.e.sis Center, Wright State University) Dr. Joni L. Rutter (NIDA, NIH) Dr. Karen J. Skinner (NIDA, NIH) Lee Peters (U.S NLM, NIH) Kelly Zeng (U.S NLM, NIH)

Outline RDB to RDF – Objectives Method I: RDB to RDF without ontology Application I: Genome ↔ Phenotype Method II: RDB to RDF with ontology Application II: Genome ↔ Biological Pathway integration Conclusion

Objectives of Modeling Data in RDF RDF data model APP (EG id-351)Alzheimer’s Disease is_associated_with subjectpredicateobject RDF enables modeling of logical relationship between entities Relations are at the heart of Semantic Web* RDF data - Logical Structure of the information Reasoning over RDF data → knowledge discovery *Relationships at the Heart of Semantic Web: Modeling, Discovering, and Exploiting Complex Semantic Relationships,Relationships at the Heart of Semantic Web: Modeling, Discovering, and Exploiting Complex Semantic Relationships Relationship Web: Blazing Semantic Trails between Web Resources

Outline RDB to RDF – Objectives Method I: RDB to RDF without ontology Application I: Genome ↔ Phenotype Method II: RDB to RDF with ontology Application II: Genome ↔ Biological Pathway integration Conclusion

Data: NCBI Entrez Gene NCBI Entrez Gene: gene related information from sequenced genomes and model organisms* o2 million gene records oGene information for genomic maps, sequences, homology, and protein expression oAvailable in XML, ASN.1 and as a Webpage *

… APP (GeneID: 351) amyloid beta A4 protein has_product Entrez Gene Web Interface

Method I: RDB to RDF without ontology Mapped 106 elements tags out of 124 element tags to named relations 50GB XML file → 39GB RDF file (411 million RDF triples) Oracle 10g release 2 with part of the patch On a machine with 2 dual-core Intel Xeon 3.2GHz processor running Red Hat Enterprise Linux 4 (RHEL4) <xsl:when test='$currNode="Entrezgene_track- info"'> Resource Entrez Gene XML Entrez Gene RDF JAXP XSLT stylesheet ORACLE 10g JENA API

Application I: Genome ↔ Phenotype MIM: Muscular dystrophy, congenital, type 1D GO: has_associated_phenotype has_molecular_function EG:9215 LARGE acetylglucosaminyl- transferase GO: glycosyltransferase GO: isa GO: acetylglucosaminyl- transferase GO: * From "glycosyltransferase" to "congenital muscular dystrophy": Integrating knowledge from NCBI Entrez Gene and the Gene Ontology” From "glycosyltransferase" to "congenital muscular dystrophy": Integrating knowledge from NCBI Entrez Gene and the Gene Ontology” From glycosyltransferase to congenital muscular dystrophy*

Outline RDB to RDF – Objectives Method I: RDB to RDF without ontology Application I: Genome ↔ Phenotype Method II: RDB to RDF with ontology Application II: Genome ↔ Biological Pathway integration Conclusion

Data: Entrez Gene + HomoloGene + Biological Pathway In collaboration with National Institute on Drug Abuse (NIH) List of 449 human genes putatively involved with nicotine dependence (identified by Saccone et al.*) Understand gene functions and interactions, including their involvement in biological pathways List of queries: oWhich genes participate in a large number of pathways? oWhich genes (or gene products) interact with each other? oWhich genes are expressed in the brain? *S.F. Saccone, A.L. Hinrichs, N.L. Saccone, G.A. Chase, K. Konvicka and P.A. Madden et al., Cholinergic nicotinic receptor genes implicated in a nicotine dependence association study targeting 348 candidate genes with 3713 SNPs, Hum Mol Genet 16 (1) (2007), pp. 36–49

Method II: RDB to RDF with ontology Method I: cannot answer query “Which genes participate in a large number of pathways?” Need to specify a particular instance of gene or pathway as starting point in RDF graph Need to classify RDF instance data – Schema + Instance geneprotein ekom:gene_1141 has_product subject predicate object ekom:protein_ source organism has_product sequence SCHEMA INSTANCE

Entrez Knowledge Model (OWL-DL) No ontology available for Entrez Gene data Created a standalone model specific to NCBI Entrez Gene – Entrez Knowledge Model (EKoM) Integrated with the BioPAX ontology (biological pathway data) Domain concepts Information model concepts

Application II: Genome ↔ Biological Pathway *An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependenceAn ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence

Outline RDB to RDF – Objectives Method I: RDB to RDF without ontology Application I: Genome ↔ Phenotype Method II: RDB to RDF with ontology Application II: Genome ↔ Biological Pathway integration Conclusion

Application driven approach for RDB to RDF – Biomedical Knowledge Integration Explicit modeling of domain semantics using named relations for oAccurate context based querying oEnhanced reasoning using relations based logic rules Use of ontology as reference knowledge model GRDDL compatible approach (using XSLT stylesheet) for transformation of RDB to RDF

More information at: Thank you