Stanford University, Stanford, CA, USA

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background The.
Developing a Metadata Exchange Format for Mathematical Literature David Ruddy Project Euclid Cornell University Library DML 2010 Paris 7 July 2010.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
An Architecture for Creating Collaborative Semantically Capable Scientific Data Sharing Infrastructures Anuj R. Jaiswal, C. Lee Giles, Prasenjit Mitra,
DEVA Data Management Workshop Devil’s Hole Pupfish Project Data Management Workshop Devil’s Hole Pupfish Program Death Valley National Park Introduction.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Metadata : Concentrating on the data, not on the scheme Imma Subirats FAO of the United Nations Marcia Zeng Kent State University euroCRIS Meeting Bologna.
XML & Library Applications ELAG 2001 Poul Henrik Jørgensen, Danish Bibliographic Centre,
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
Practical RDF Chapter 1. RDF: An Introduction
Digital Enterprise Research Institute HADA – An Access Controlled Application for Publishing and Discovering Linked Government Data Owen Sacco.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Metadata Lessons Learned Katy Ginger Digital Learning Sciences University Corporation for Atmospheric Research (UCAR)
Development Process and Testing Tools for Content Standards OASIS Symposium: The Meaning of Interoperability May 9, 2006 Simon Frechette, NIST.
Joint agINFRA & SCI-BUS workshop, 30/05/2013, Budapest, Hungary FP 7-INFRASTRUCTURES programme agINFRA Joint agINFRA & SCI-BUS workshop agINFRA.
JSON-LD. JSON as an XML Alternative JSON is a light-weight alternative to XML for data- interchange JSON = JavaScript Object Notation – It’s really language.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
STASIS Technical Innovations - Simplifying e-Business Collaboration by providing a Semantic Mapping Platform - Dr. Sven Abels - TIE -
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
It’s all semantics! The premises and promises of the semantic web. Tony Ross Centre for Digital Library Research, University of Strathclyde
© Geodise Project, University of Southampton, Knowledge Management in Geodise Geodise Knowledge Management Team Barry Tao, Colin Puleston, Liming.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Using Open Data to Create Value for Citizens. Data.gov Provides instant access to ~400,000 datasets in easy to use formats Contributions from UN, World.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Hussein Suleman University of Cape Town Department of Computer Science Digital Libraries Laboratory February 2008 Data Curation Repositories:
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Clinical research data interoperbility Shared names meeting, Boston, Bosse Andersson (AstraZeneca R&D Lund) Kerstin Forsberg (AstraZeneca R&D.
Manufacturing Systems Integration Division Development Process and Testing Tools for Content Standards Simon Frechette National Institute of Standards.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
Data Citation Implementation Pilot Workshop
Supporting Collaborative Ontology Development in Protégé International Semantic Web Conference 2008 Tania Tudorache, Natalya F. Noy, Mark A. Musen Stanford.
Oksana Hoard LIS Overview MatML stands for Materials Markup Language It is a freely-available XML schema designed to describe materials (metals,
SysML v2 Model Interoperability & Standard API Requirements Axel Reichwein Consultant, Koneksys December 10, 2015.
Linking Ontologies and ISO/IEC in the CEDAR Metadata Repository Martin J. O’Connor Technical Lead, CEDAR Project Stanford University.
Session: Towards systematically curating and integrating
Mark A. Musen, M.D., Ph.D. Stanford University
Making Metadata More Comprehensive and More Searchable with CEDAR
Paul Eglitis [IEEE] and Siri Jodha S. Khalsa [IEEE]
BioPortal as (the only functional) OOR SandBox (so far)
Conceptualizing the research world
Scientific Reproducibility using the Provenance for Healthcare and Clinical Research Framework Satya S. Sahoo Collaborators/Co-Authors: Joshua Valdez,
Giuseppina Inserra INFN Catania
OPM/S: Semantic Engineering of Web Services
Say, “S” (as) = semantics – and mean it
improve the efficiency, collaborative potential, and
Collaborating with the National Center for Biomedical Ontology
JSON-LD.
Toward FAIR Semantic Resources
RDF For Semantic Web Dhaval Patel 2nd Year Student School of IT
Attributes and Values Describing Entities.
JSON-LD 1.0 Yanan Zhang.
Project Management in SharePoint
Semantic Annotation service
Google Dataset Search Evaluation
LOD reference architecture
Project Management in SharePoint
A Case Study for Synergistically Implementing the Management of Open Data Robert R. Downs NASA Socioeconomic Data and Applications.
Bird of Feather Session
JSON for Linked Data: a standard for serializing RDF using JSON
Attributes and Values Describing Entities.
The Fat-Free Alternative to XML
JSON-LD.
W3C WoT Standardization
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
IGARSS 2019 Dr. Ingo Simonis July 2019
Presentation transcript:

Stanford University, Stanford, CA, USA An Open Repository Model for Acquiring Knowledge about Scientific Experiments EKAW 2016 – November 21th, 2016 Bologna, Italy Martin O’Connor, Marcos Martínez-Romero, Attila L. Egyedi, Debra Willrett, John Graybeal, and Mark A. Musen Stanford University, Stanford, CA, USA Reproducibility problem in science. Many scientific experiments not reproducible. I am going to talk about a metadata model and an associated system called CEDAR that aims to address part of that challenge. metadatacenter.org Stanford University

Reproducibility Problem in Science The problem regarding reproducible researches emerged in Amgen when its researchers made headlines in 2012 when they announced that they were not able to reproduce the findings of 47 out of 53 researchers on cancer.

Metadata Key to Addressing Problem Crucial for reproducibility in biomedicine Locate experimental datasets online Understand how the experiments were performed Reuse the data to perform new analyses Journals and funding agencies increasingly require making experimental data and metadata available

Many Metadata Standards have been Developed

However: Metadata Submission is Hard

Metadata Submission is Hard - II Summary Data Matrix Submission Interface Raw Data

Result: Poor Metadata age Age AGE `Age age (after birth) age (in years) age (y) age (year) age (years) Age (years) Age (Years) age (yr) age (yr-old) age (yrs) Age (yrs) age [y] age [year] age [years] age in years age of patient Age of patient age of subjects age(years) Age(years) Age(yrs.) Age, year age, years age, yrs age.year age_years Variants of ‘age’ metadata field in Gene Expression Omnibus (GEO) repository

Our Solution: CEDAR - A Metadata Ecosystem Overcome the impediments to creating high-quality metadata Facilitate Creation Acquisition Use Evaluation Refinement Key goal: create a sharable metadata exchange format – a template model - for publishing, searching, exchanging metadata

CEDAR Template Model Goals Must describe composite structure of templates Implemented using standard formats Express semantics Metadata instances: Linked to controlled terms Easily serializable Easily validated Easily indexed Interchange with RDF High readable Produced/consumed via REST APIs and usable in JavaScript front ends Meets FAIR goals Model (but also the standards used to implement it)

Using JSON Schema and JSON-LD for CEDAR Template Model JSON Schema + JSON-LD JSON-LD

What is JSON Schema? Technology for describing and validating the structure of JSON documents Provides a structural description of any JSON document JSON documents that are specified with JSON Schema can be structurally validated against their associated schemas Analogous to XML Schema

What is JSON-LD? A lightweight syntax to serialize Linked Data in JSON Allows existing JSON to be interpreted as Linked Data with minimal changes JSON-LD is primarily intended to be a way to: use Linked Data in Web-based programming environments build interoperable Web services store Linked Data in JSON-based storage engines Core contribution: add semantics to JSON documents W3C Recommendation: https://www.w3.org/TR/json-ld/

Using JSON Schema to Define Template Structure { "$schema": "http://json-schema.org/draft-04/schema#", "@type": "https://repo.metadatacenter.org/core/Template", "@id": "https://repo.metadatacenter.org/templates/434334", "title": ”Study", "description": ”Study template", "type": "object", "_ui": {...}, "properties": { "title": {...}, ”description": {...}, ”principalInvestigator": {...} }, "required": ["title", "description", "principalInvestigator"] "additionalProperties": false }

Using JSON-LD to add Semantics to Metadata Instances { "title": { "@value": "Immune biomarkers study" }, "description": { "@value": "Immune biomarkers …" }, "principalInvestigator": { "name": { "@value": "Dr. P.I" }, "institution": { "name": { "@value": "Stanford" }, "zip": { "@value": "94305" } }

Using JSON-LD to add Semantics to Metadata Instances - II { "@type": "http://semantic-dicom.org/dcm#Study", "@id": "https://repo.metadatacenter.org/template_instances/55417", "@context": { "title": "https://schema.org/title", "name": "https://schema.org/name", "description": "https://schema.org/description", "zip": "https://schema.org/postalCode", "pi": "https://myschema.org/property/hasPI", "institution": "https://myschema.org/property/hasInstitution" }, "title": { "@value": "Immune biomarkers study" }, "description": { "@value": "Immune biomarkers …" }, ”principalInvestigator": { "@type": "https://schema.org/Person", "@id": "https://repo.metadatacenter.org/template_elements/557", "name": { "@value": "Dr. P.I" }, "institution": { "@type": "https://schema.org/Organization", "@id": "https://repo.metadatacenter.org/template_elements/37", "name": { "@value": "Stanford" }, "zip": { "@value": "94305" } }

CEDAR Metadata Instances can be transformed to an RDF Graph

Model drives CEDAR Workbench CEDAR Template Model Controlled terminologies

Template Designer provides Template Creation Basic template designer screen showing a Study template being created with ontology terms being specified as constraints for the diseased field

Metadata Editor automatically generates an Acquisition Interface

Metadata Editor Adds Semantics Basic Metadata Editor screen showing Study instance being populated and Disease controlled term field being auto-completed

Initial Results Public alpha release in September 2016 Represented all public metadata in ImmPort repository (146 studies) Represented an array of public ISA-created biomedical studies (~300) Represented 60k ISO 11179-based Common Data Elements from NCI Currently working with Stanford Digital Repository and several research groups

Summary We have developed a standards-based template model for representing, publishing, and sharing templates and metadata Provides strong interoperation with Linked Open Data Metadata easy to create/consume using off-the-shelf tools Very easy to work with using CEDAR tools

CEDAR Resources Web site: http://metadatacenter.org Workbench: https://cedar.metadatacenter.net GitHub: https://metadatacenter.github.io