Representation of molecular structures and related computations on the Semantic Web. Universal Data Model and its Ontology. Mirek Sopek*, Neil Ostlund,

Slides:



Advertisements
Similar presentations
Resource description and access for the digital world Gordon Dunsire Centre for Digital Library Research University of Strathclyde Scotland.
Advertisements

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
CS570 Artificial Intelligence Semantic Web & Ontology 2
RDF Tutorial.
Semantic Web Introduction
 Copyright 2004 Digital Enterprise Research Institute. All rights reserved. SPARQL Query Language for RDF presented by Cristina Feier.
Chapter 3 RDF Syntax 1. Topics Basic concepts of RDF resources, properties, values, statements, triples URIs and URIrefs RDF graphs Literals and Qnames.
Dr. Alexandra I. Cristea RDF.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
1 Semantic Web and Retrieval of Scientific Data Semantics Goran Soldar University of Brighton UK Dan Smith University of East Anglia UK.
RDF: Building Block for the Semantic Web Jim Ellenberger UCCS CS5260 Spring 2011.
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Practical RDF Chapter 1. RDF: An Introduction
Logics for Data and Knowledge Representation
The Semantic Web Web Science Systems Development Spring 2015.
By: Dan Johnson & Jena Block. RDF definition What is Semantic web? Search Engine Example What is RDF? Triples Vocabularies RDF/XML Why RDF?
CubicWeb – The Semantic Web is a construction game! Student: Uglješa Milić University of Belgrade School of Electrical.
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
CC L A W EB DE D ATOS P RIMAVERA 2015 Lecture 2: RDF Model & Syntax Aidan Hogan
The LOM RDF binding – update Mikael Nilsson The Knowledge Management.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Semantic Web - an introduction By Daniel Wu (danielwujr)
Semantic Web Programming in Python an Introduction Biju B Jaganath G.
Chapter 7: Resource Description Framework (RDF) Service-Oriented Computing: Semantics, Processes, Agents – Munindar P. Singh and Michael N. Huhns, Wiley,
1 SPARQL A. Emrah Sanön. 2 RDF RDF is quite committed to Semantic Web. Data model Serialization by means of XML Formal semantics Still something is missing!
PHS / Department of General Practice Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Knowledge representation in TRANSFoRm AMIA.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Problems with XML & XML Schemas XML falls apart on the Scalability design goal. 1.The order in which elements appear in an XML document is significant.
RDF & SPARQL Introduction Dongfang Xu Ph.D student, School of Information, University of Arizona Sept 10, 2015.
SICoP Presentation A story about communication Michael Lang BEARevelytix April 25, 2007.
Presenting Semantic Data Through “Instance Hubs” Using Authoritative URI Design Schemes Alexei Bulazel 1 ( ), Dominic Difranzo 1 (
Chapter 04 Semantic Web Application Architecture 23 November 2015 A Team 오혜성, 조형헌, 권윤, 신동준, 이인용.
The AstroGrid-D Information Service Stellaris A central grid component to store, manage and transform metadata - and connect to the VO!
Linked Open Data Dataset from Related Documents Petya Osenova and Kiril Simov IICT-BAS LDL-2016, LREC, Portoroz.
Semantic Web In Depth Resource Description Framework Dr Nicholas Gibbins –
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
Scientific Units in the Electronic Age Stuart J. Chalk, Department of Chemistry University of North Florida CINF Paper 49 – 251 st ACS Meeting.
Session: Towards systematically curating and integrating
Linked Data Competency Index
Service-Oriented Computing: Semantics, Processes, Agents
Tutorial on Semantic Web
The Semantic Web By: Maulik Parikh.
Keyword Search over RDF Graphs
WEB SERVICES.
Service-Oriented Computing: Semantics, Processes, Agents
Resource Description Framework
Yaşar Tonta & Orçun Madran [yasartonta, Hacettepe University
Service-Oriented Computing: Semantics, Processes, Agents
Middleware independent Information Service
Grid Computing 7700 Fall 2005 Lecture 18: Semantic Grid
RDF For Semantic Web Dhaval Patel 2nd Year Student School of IT
Analyzing and Securing Social Networks
Linked Data for SDG Reporting
CC La Web de Datos Primavera 2016 Lecture 2: RDF Model & Syntax
PREMIS Tools and Services
How can DDI make the most of RDF?
Grid Computing 7700 Fall 2005 Lecture 18: Semantic Grid
Presented by: Jacky Ma Date: 11 Dec 2001
LOD reference architecture
Information Networks: State of the Art
JSON for Linked Data: a standard for serializing RDF using JSON
Resource Description Framework (RDF)
Low-bandwidth Semantic Web
Information - the lifeblood of the business
Taxonomy of public services
Linked Data 101 Things, URIs, RDF, Triples, Turtle, Ontologies, Vocabularies and SPARQL Linked Data is our Implementation choice for FAIR.
Web Application Development Using PHP
Presentation transcript:

Representation of molecular structures and related computations on the Semantic Web. Universal Data Model and its Ontology. Mirek Sopek*, Neil Ostlund, Jacob W.G. Bloom, Stuart Chalk Chemical Semantics Inc., 1115 NW 4th Street, Gainesville, Florida *sopek@chemicalsemantics.com

Chemical Semantics goals http://chemsem.com Interoperable PUBLISHING of Computational Chemistry calculations Semantic REPRESENTATION OF DATA for both humans and machines FEDERATION of published data with existing web-based chemical datasets Cloud-like ARCHIVING of Computational Chemistry calculation results, input/output files etc.

CSI Portal – a short review chemsem.com – EXISTING PLATFORM FOR DATA PUBLISHING

CSI Portal – what’s new ? Enhanced stability and security SPARQL Query Generator based on chemical drawings Extending the range of QC packages to: ADF, DALTON, GAMESS, GAMESS-UK, Gaussian, Jaguar, Molpro, NWChem, ORCA, Psi4, and QChem. (thanks to the use of ccLib)

Data Models in chemistry

What is a data model and why is it important? A data model organizes data elements and standardizes how the data elements relate to one another. As such, a data model should be distinguished from its serializations (i.e. file formats) The most important place where we work directly with data models is in the software!

Data Models in Chemistry TABULAR data models (most popular: MOL files, MOLDEN files, ZMT, GJF, HIN, R elational DBs etc) TREE based data models (CML, AniML, CSX etc) KEY VALUE/MIXED data models (CIF, new PDB/mmCIF, JCAMP-DX)

Why we need new data models and standards Existing data models have various levels of extensibility, but all of them fall short when a new, unknown or unpredicted (at the moment of creation), kind of data appears in it. Such new kind of data added to a model usually breaks it, or, in the best case, is ignored. There is no provision for dynamic sharing of data where people can add new data in real time.

What is the solution? We are convinced that the solution comes in the form of: a GRAPH-based data model based on the smallest possible data pattern: A TRIPLE The best implementation is offered by RDF – Resource Description Framework known from Semantic Technologies.

Why triples? Arbitrary N-tuples can be constructed out of 3-tuples Proved by W. Quin. Mathematical Logic. Harvard University Press, 1940.

“DUGIDELPOPULAW-UHFFFAOYSA-N” RDF data model Anatomy of the triple: Subject Predicate Object Thing Property Value For example: <molecule> gc:hasInChIKey gnvc:hasInChIString „1S/H2O/h1H2” “DUGIDELPOPULAW-UHFFFAOYSA-N”

RDF data model Typical data set contains large numbers of triples forming a DIRECTED GRAPH Identification and addressing of nodes is done via a URI scheme – a generalization of URLs – standard web addresses.

RDF data model in software The RDF data model in software is usually represented as: Unordered SET of TRIPLES (3-TUPLES) For example, in Python we have 3-tuple: (subject, predicate,object)

How do we interact with the model? PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX gc: <http://purl.org/gc/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?graph WHERE {     GRAPH ?graph { {         ?something gc:hasAtom ?atom1 ;             rdf:type ?somethingType ;             rdfs:label ?somethingLabel .         ?atom1 gc:isElement "F" .     }     UNION      {         ?something gc:hasAtom ?atom2 ;             rdf:type ?somethingType ;             rdfs:label ?somethingLabel .         ?atom2 gc:isElement "Cl" .     }     UNION     {         ?something gc:hasAtom ?atom3 ;             rdf:type ?somethingType ;             rdfs:label ?somethingLabel .         ?atom3 gc:isElement "Br" .     }     UNION (…) Through SPARQL queries Through specific API calls in your language of preference ua=URIRef(u'http://purl.org/gc/Atom') um=URIRef(u'http://purl.org/gc/Molecule') ur=URIRef(u'http://purl.org/gc/Residue') g=rdflib.Graph() ba=g.parse(urn,format="turtle") for m in g.subjects(RDF.type,um): nmc += 1 napm=0 # number of atoms per molecule res1=g.objects(m,uhr) lres=len(list(res1)) if lres>0: res=g.objects(m,uhr) (…) v=graph.value(subject=vURI,predicate=RDF.type) h=graph.value(subject=vURI,predicate=gcn.hasName) a=graph.value(subject=vURI,predicate=gcn.hasValue)

Software interaction with the model? Out of all data models, RDF GRAPH represents almost infinite extensibility. Its serializations (JSON-LD and Turtle) are the best to work with.

SOFTWARE

OTHER SOFTWARE ORIGINAL SOFTWARE

Data model and its serializations We shall never forget they are just SERIALIZATIONS of the underlying, more fundamental Data Model There is a number of serializations for the RDF graphs: RDF/XML, NTriples, Turtle, JSON-LD etc The most important today are: JSON-LD & Turtle

Chemical Semantics Graph Data models

CSI Molecular Data Models Existing model (currently used on our portal): Follows closely CSX (XML) data model presented here last year The New Data model features: Alternate methods to describe molecular geometry: Cartesian, Fractional and Internal coordinates Flexible representation of molecular hierarchies (molecules, residues , groups, chains, templates etc.) Cleaner serializations to both JSON-LD and Turtle – easier to work with also for humans Closer integration with Gainesville Core Ontology

CSI Molecular Data Model Geometrical objects: Top level class hierarchy

CSI Molecular Data Model

CSI Molecular Data Model Cartesian coordinates representation

CSI Molecular Data Model Molecular hierarchy

CSI Molecular Data Model Internal coordinates

POC - Representation of residues Proof-of-Concept based on AMBER residues (http://ambermd.org/doc/prep.html) As simple as adding a few more triples to the existing structure. Another example of the data model’s flexibility and processing software immunity to changes of the data patterns.

Amber residues

The contents

Amber residues Creation of residue templates on the base of internal coordinate representations adds completely new data to the system. However, the existing information is still readable by the software that ”knew” how to interpret it. The new data can now be extracted by the software that ”knows” about residues.

Use in software Excel example Python example PHP example http://chemicalsemantics.com/rda/

Ontological description of the data model The structure of the RDF data model can be described in an Ontology. http://purl.org/gc

Conclusions RDF data model delivers maximum possible extensibility while preserving the compatibility with the software used to create and consume it. It is suitable not only for knowledge representation and metadata encoding, but is also the best data model for encoding of molecular structure information.

Acknowledgements I would like to thank the following people for making this presentation possible: Dr. Neil S. Ostlund Dr. Jacob W.G. Bloom Dr. Bing Wang Dr. Stuart Chalk

Thank you! Mirek Sopek, PhD Chemical Semantics, Inc. 1115 NW 4th Street 32601 Gainesville, Florida cell: +1 917 3467500 web: www.chemicalsemantics.com email: sopek@chemicalsemantics.com