Download presentation
Presentation is loading. Please wait.
1
Stanford University, Stanford, CA, USA
An Open Repository Model for Acquiring Knowledge about Scientific Experiments EKAW 2016 – November 21th, 2016 Bologna, Italy Martin O’Connor, Marcos Martínez-Romero, Attila L. Egyedi, Debra Willrett, John Graybeal, and Mark A. Musen Stanford University, Stanford, CA, USA Reproducibility problem in science. Many scientific experiments not reproducible. I am going to talk about a metadata model and an associated system called CEDAR that aims to address part of that challenge. metadatacenter.org Stanford University
2
Reproducibility Problem in Science
The problem regarding reproducible researches emerged in Amgen when its researchers made headlines in 2012 when they announced that they were not able to reproduce the findings of 47 out of 53 researchers on cancer.
3
Metadata Key to Addressing Problem
Crucial for reproducibility in biomedicine Locate experimental datasets online Understand how the experiments were performed Reuse the data to perform new analyses Journals and funding agencies increasingly require making experimental data and metadata available
4
Many Metadata Standards have been Developed
5
However: Metadata Submission is Hard
6
Metadata Submission is Hard - II
Summary Data Matrix Submission Interface Raw Data
7
Result: Poor Metadata age Age AGE `Age age (after birth) age (in years) age (y) age (year) age (years) Age (years) Age (Years) age (yr) age (yr-old) age (yrs) Age (yrs) age [y] age [year] age [years] age in years age of patient Age of patient age of subjects age(years) Age(years) Age(yrs.) Age, year age, years age, yrs age.year age_years Variants of ‘age’ metadata field in Gene Expression Omnibus (GEO) repository
8
Our Solution: CEDAR - A Metadata Ecosystem
Overcome the impediments to creating high-quality metadata Facilitate Creation Acquisition Use Evaluation Refinement Key goal: create a sharable metadata exchange format – a template model - for publishing, searching, exchanging metadata
9
CEDAR Template Model Goals
Must describe composite structure of templates Implemented using standard formats Express semantics Metadata instances: Linked to controlled terms Easily serializable Easily validated Easily indexed Interchange with RDF High readable Produced/consumed via REST APIs and usable in JavaScript front ends Meets FAIR goals Model (but also the standards used to implement it)
10
Using JSON Schema and JSON-LD for CEDAR Template Model
JSON Schema + JSON-LD JSON-LD
11
What is JSON Schema? Technology for describing and validating the structure of JSON documents Provides a structural description of any JSON document JSON documents that are specified with JSON Schema can be structurally validated against their associated schemas Analogous to XML Schema
12
What is JSON-LD? A lightweight syntax to serialize Linked Data in JSON
Allows existing JSON to be interpreted as Linked Data with minimal changes JSON-LD is primarily intended to be a way to: use Linked Data in Web-based programming environments build interoperable Web services store Linked Data in JSON-based storage engines Core contribution: add semantics to JSON documents W3C Recommendation:
13
Using JSON Schema to Define Template Structure
{ "$schema": " " " "title": ”Study", "description": ”Study template", "type": "object", "_ui": {...}, "properties": { "title": {...}, ”description": {...}, ”principalInvestigator": {...} }, "required": ["title", "description", "principalInvestigator"] "additionalProperties": false }
14
Using JSON-LD to add Semantics to Metadata Instances
{ "title": { "Immune biomarkers study" }, "description": { "Immune biomarkers …" }, "principalInvestigator": { "name": { "Dr. P.I" }, "institution": { "name": { "Stanford" }, "zip": { "94305" } }
15
Using JSON-LD to add Semantics to Metadata Instances - II
{ " " { "title": " "name": " "description": " "zip": " "pi": " "institution": " }, "title": { "Immune biomarkers study" }, "description": { "Immune biomarkers …" }, ”principalInvestigator": { " " "name": { "Dr. P.I" }, "institution": { " " "name": { "Stanford" }, "zip": { "94305" } }
16
CEDAR Metadata Instances can be transformed to an RDF Graph
17
Model drives CEDAR Workbench
CEDAR Template Model Controlled terminologies
18
Template Designer provides Template Creation
Basic template designer screen showing a Study template being created with ontology terms being specified as constraints for the diseased field
19
Metadata Editor automatically generates an Acquisition Interface
20
Metadata Editor Adds Semantics
Basic Metadata Editor screen showing Study instance being populated and Disease controlled term field being auto-completed
21
Initial Results Public alpha release in September 2016
Represented all public metadata in ImmPort repository (146 studies) Represented an array of public ISA-created biomedical studies (~300) Represented 60k ISO based Common Data Elements from NCI Currently working with Stanford Digital Repository and several research groups
22
Summary We have developed a standards-based template model for representing, publishing, and sharing templates and metadata Provides strong interoperation with Linked Open Data Metadata easy to create/consume using off-the-shelf tools Very easy to work with using CEDAR tools
23
CEDAR Resources Web site: http://metadatacenter.org
Workbench: GitHub:
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.