Stanford University, Stanford, CA, USA An Open Repository Model for Acquiring Knowledge about Scientific Experiments EKAW 2016 – November 21th, 2016 Bologna, Italy Martin O’Connor, Marcos Martínez-Romero, Attila L. Egyedi, Debra Willrett, John Graybeal, and Mark A. Musen Stanford University, Stanford, CA, USA Reproducibility problem in science. Many scientific experiments not reproducible. I am going to talk about a metadata model and an associated system called CEDAR that aims to address part of that challenge. metadatacenter.org Stanford University
Reproducibility Problem in Science The problem regarding reproducible researches emerged in Amgen when its researchers made headlines in 2012 when they announced that they were not able to reproduce the findings of 47 out of 53 researchers on cancer.
Metadata Key to Addressing Problem Crucial for reproducibility in biomedicine Locate experimental datasets online Understand how the experiments were performed Reuse the data to perform new analyses Journals and funding agencies increasingly require making experimental data and metadata available
Many Metadata Standards have been Developed
However: Metadata Submission is Hard
Metadata Submission is Hard - II Summary Data Matrix Submission Interface Raw Data
Result: Poor Metadata age Age AGE `Age age (after birth) age (in years) age (y) age (year) age (years) Age (years) Age (Years) age (yr) age (yr-old) age (yrs) Age (yrs) age [y] age [year] age [years] age in years age of patient Age of patient age of subjects age(years) Age(years) Age(yrs.) Age, year age, years age, yrs age.year age_years Variants of ‘age’ metadata field in Gene Expression Omnibus (GEO) repository
Our Solution: CEDAR - A Metadata Ecosystem Overcome the impediments to creating high-quality metadata Facilitate Creation Acquisition Use Evaluation Refinement Key goal: create a sharable metadata exchange format – a template model - for publishing, searching, exchanging metadata
CEDAR Template Model Goals Must describe composite structure of templates Implemented using standard formats Express semantics Metadata instances: Linked to controlled terms Easily serializable Easily validated Easily indexed Interchange with RDF High readable Produced/consumed via REST APIs and usable in JavaScript front ends Meets FAIR goals Model (but also the standards used to implement it)
Using JSON Schema and JSON-LD for CEDAR Template Model JSON Schema + JSON-LD JSON-LD
What is JSON Schema? Technology for describing and validating the structure of JSON documents Provides a structural description of any JSON document JSON documents that are specified with JSON Schema can be structurally validated against their associated schemas Analogous to XML Schema
What is JSON-LD? A lightweight syntax to serialize Linked Data in JSON Allows existing JSON to be interpreted as Linked Data with minimal changes JSON-LD is primarily intended to be a way to: use Linked Data in Web-based programming environments build interoperable Web services store Linked Data in JSON-based storage engines Core contribution: add semantics to JSON documents W3C Recommendation: https://www.w3.org/TR/json-ld/
Using JSON Schema to Define Template Structure { "$schema": "http://json-schema.org/draft-04/schema#", "@type": "https://repo.metadatacenter.org/core/Template", "@id": "https://repo.metadatacenter.org/templates/434334", "title": ”Study", "description": ”Study template", "type": "object", "_ui": {...}, "properties": { "title": {...}, ”description": {...}, ”principalInvestigator": {...} }, "required": ["title", "description", "principalInvestigator"] "additionalProperties": false }
Using JSON-LD to add Semantics to Metadata Instances { "title": { "@value": "Immune biomarkers study" }, "description": { "@value": "Immune biomarkers …" }, "principalInvestigator": { "name": { "@value": "Dr. P.I" }, "institution": { "name": { "@value": "Stanford" }, "zip": { "@value": "94305" } }
Using JSON-LD to add Semantics to Metadata Instances - II { "@type": "http://semantic-dicom.org/dcm#Study", "@id": "https://repo.metadatacenter.org/template_instances/55417", "@context": { "title": "https://schema.org/title", "name": "https://schema.org/name", "description": "https://schema.org/description", "zip": "https://schema.org/postalCode", "pi": "https://myschema.org/property/hasPI", "institution": "https://myschema.org/property/hasInstitution" }, "title": { "@value": "Immune biomarkers study" }, "description": { "@value": "Immune biomarkers …" }, ”principalInvestigator": { "@type": "https://schema.org/Person", "@id": "https://repo.metadatacenter.org/template_elements/557", "name": { "@value": "Dr. P.I" }, "institution": { "@type": "https://schema.org/Organization", "@id": "https://repo.metadatacenter.org/template_elements/37", "name": { "@value": "Stanford" }, "zip": { "@value": "94305" } }
CEDAR Metadata Instances can be transformed to an RDF Graph
Model drives CEDAR Workbench CEDAR Template Model Controlled terminologies
Template Designer provides Template Creation Basic template designer screen showing a Study template being created with ontology terms being specified as constraints for the diseased field
Metadata Editor automatically generates an Acquisition Interface
Metadata Editor Adds Semantics Basic Metadata Editor screen showing Study instance being populated and Disease controlled term field being auto-completed
Initial Results Public alpha release in September 2016 Represented all public metadata in ImmPort repository (146 studies) Represented an array of public ISA-created biomedical studies (~300) Represented 60k ISO 11179-based Common Data Elements from NCI Currently working with Stanford Digital Repository and several research groups
Summary We have developed a standards-based template model for representing, publishing, and sharing templates and metadata Provides strong interoperation with Linked Open Data Metadata easy to create/consume using off-the-shelf tools Very easy to work with using CEDAR tools
CEDAR Resources Web site: http://metadatacenter.org Workbench: https://cedar.metadatacenter.net GitHub: https://metadatacenter.github.io