Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stuart J. Chalk, Department of Chemistry University of North Florida

Similar presentations


Presentation on theme: "Stuart J. Chalk, Department of Chemistry University of North Florida"— Presentation transcript:

1 Stuart J. Chalk, Department of Chemistry University of North Florida
A Generic Scientific Data Model and Ontology for Representation of Chemical Data Stuart J. Chalk, Department of Chemistry University of North Florida #ACSCINFDataSummit CINF Paper 171 – 251st ACS Meeting Spring 2016

2 Scientific Data Should be Open
Simple: Openness as the norm not the exception Data made available, without restriction, so its useful Mechanisms/tools to make data available Formats to allow others to get the data… …but also so its easy to use Annotate the data to make it easy to find Community driven promotion of and action on this issue

3 Options for Storing Data?
Research Notebook Spectral Files (JCAMP-DX, propriety) Excel Spreadsheets Personal Databases Online Databases PDF Files No! RDF Yes! Resource Description Framework

4 The Linked Data Platform
From: W3C Recommendation 2015 Specification - Primer -

5 JSON for Linked Data (JSON-LD)
{ { "name": " "isAlive": " "age": " "height": " " }, "", "name": "Stuart Chalk", "isAlive": true, "age": 49, "height": 188.0 } Use JavaScript Object Notation (JSON) as a text format for storing data and metadata so it can be converted to RDF

6 JSON for Linked Data (JSON-LD)
< < "49"^^< . < "true"^^< . < "188"^^< . < "Stuart Chalk" .

7 Store all Scientific Data in RDF?
Nice idea but because anything can be linked to anything else to form a graph of variable structure… ...difficult to search, hard to maintain OK, use regular relational database – Rigid Schema Not good to try and make data fit the schema… Use a hybrid approach! Encode some structure in RDF using a framework... ...add data to the structured graph in an organized way

8 What Metadata is Important for Data?
Consider FAIR Principals ( To be Findable: F1. (meta)data are assigned a globally unique and persistent identifier F2. data are described with rich metadata (defined by R1 below) F3. metadata clearly and explicitly include the identifier of the data it describes F4. (meta)data are registered or indexed in a searchable resource To be Accessible: A1. (meta)data are retrievable by their identifier using a standardized communications protocol A2. metadata are accessible, even when the data are no longer available To be Interoperable: I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. (meta)data use vocabularies that follow FAIR principles I3. (meta)data include qualified references to other (meta)data To be Reusable: R1. meta(data) are richly described with a plurality of accurate and relevant attributes R1.1. (meta)data are released with a clear and accessible data usage license R1.2. (meta)data are associated with detailed provenance R1.3. (meta)data meet domain-relevant community standards

9 What Should a Data Model Represent?
Define scope as data obtained from an experiment, a series of experiments, a project Who did the work and where are they? Metadata about the data “packet” The raw data… …its associated metadata (enough to properly contextualize the data) Access rights Published location

10 General Framework SciData – Scientific Data Model (SDM)
Overview – GitHub Repo –

11 General Framework - The Context
contains the context definition Refers to other context files Namespace abbreviations Default vocabulary links ontology term states data type

12 Methodology, System, and Dataset

13 Example Data - pH

14 Example Data - Literature Value
“scope” provides internal link to value Each value of a name value pair has a default data type that can be override by expanding value to a JSON object and adding and

15 Example Data - NMR Spectrum
“dataseries” are JSON arrays of data on one axis Bring them together with “datagroup” and we can represent at spectrum “parameter” is generic container for data, or metadata

16 Example Data – CC Calculation
“datagroup”s are structures to aggregate data at any level “datagroup”s can be infinitely nested “uid” is optional and can be used to unique define any piece of data

17 The SDM Ontology SciData Ontology – Scientific Data Model Ontology (SDMO) OWL File –

18 Future Work Get community feedback, refine/extend/standardize
Generate large corpus of disparate data in JSON-LD, ingest into triple store and query (SPARQL) Evaluate inferencing on the triple store data Push adoption through collaboration Run hackathons to build developer implementations Develop Electronic Laboratory Notebook (ELN) to generate data in JSON-LD Get feedback from data community, RDA - Test using the NDS -

19 Pain Points? Pain Points Gather stakeholders to work on standards
Challenges Opportunities Normalization Tools to generate metadata automatically User Perspective Gaps in Data Gaps in Ontology Coverage Gather stakeholders to work on standards Broad knowledge domain representation i-UPAC, RDA Chemistry Research Data IG Priorities? Data annotation and representation Data exchange (repo <-> repo, user <-> user) Structure representation (chiral centers) Curation infrastructures Domain vocabulary translations Units of measure

20 “to err is human; to forgive, divine” Alexander Pope
Reality Check “to err is human; to forgive, divine” Alexander Pope “to err is human; to really screw things up requires a computer” Paul Ehrlich “to err is human; all hell will break loose if you don’t provide accurate semantics to a computer” Stuart Chalk

21 Questions? schalk@unf.edu Phone: 904-620-1938 Skype: stuartchalk
LinkedIn/Slidehare: ORCID: ResearcherID:


Download ppt "Stuart J. Chalk, Department of Chemistry University of North Florida"

Similar presentations


Ads by Google