FAIR Sample and Data Access

Slides:

Advertisements

Similar presentations

OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.

Advertisements

Data Intensive Techniques to Boost the Real-time Performance of Global Agricultural Data Infrastructures SEMAGROW U SING A POWDER T RIPLE S TORE FOR BOOSTING.

Managing Data Interoperability with FME Tony Kent Applications Engineer IMGS.

Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.

Annual reports and feedback from UMLS licensees Kin Wah Fung MD, MSc, MA The UMLS Team National Library of Medicine Workshop on the Future of the UMLS.

North American Profile: Partnership across borders. Sharon Shin, Metadata Coordinator, Federal Geographic Data Committee Raphael Sussman; Manager, Lands.

Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.

Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.

Sage Bionetworks A non-profit organization with a vision to enable networked team approaches to building better models of disease BIOMEDICINE INFORMATION.

Jamie Hall (ILL). SciencePAD Persistent Identifiers Workshop PANData Software Catalogue January 30th 2013 Jamie Hall Developer IT Services, Institut Laue-Langevin.

Grid Computing & Semantic Web. Grid Computing Proposed with the idea of electric power grid; Aims at integrating large-scale (global scale) computing.

Sage Bionetworks A non-profit organization with a vision to enable networked team approaches to building better models of disease BIOMEDICINE INFORMATION.

SDMX IT Tools Introduction

NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.

1 WS-GIS: Towards a SOA-Based SDI Federation Fábio Luiz Leite Júnior Information System Laboratory University of Campina Grande

Data Citation Implementation Pilot Workshop

USGS ScienceBase Making Connections with Metadata Integration GSA2011 R. Sky Bristol.

European Life Sciences Infrastructure for Biological Information ELIXIR’s needs from the EOSC Steven Newhouse, EMBL-EBI Part of the.

Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.

Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,

Enhancements to Galaxy for delivering on NIH Commons

The CUAHSI Hydrologic Information System Spatial Data Publication Platform David Tarboton, Jeff Horsburgh, David Maidment, Dan Ames, Jon Goodall, Richard.

Data Publication (in H2020)

Stuart J. Chalk, Department of Chemistry University of North Florida

To develop the scientific evidence base that will lessen the burden of cancer in the United States and around the world. NCI Mission Key message:

CESSDA SaW Training on Trust, Identifying Demand & Networking

The Global Soil Information System

UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)

The UMLS and the Semantic Web

DSA and FAIR: a perfect couple

FAIR Metadata RDA 10 Luiz Olavo Bonino – - September 21, 2017.

Libraries as Data-Centers for the Arts and Humanities

FAIR Sample and Data Access

Donatella Castelli CNR-ISTI

Ways to upgrade the FAIRness of your data repository.

Using ArrayExpress.

FAIR Metrics RDA 10 Luiz Bonino – - September 21, 2017.

Introducing the World Wide Web

Data challenges in the pharmaceutical industry

Fitness for use: Users of the U. S

knowledge organization for a food secure world

C2CAMP (A Working Title)

Identifiers Answer Questions

Making Annotations FAIR

Department of Genetics • Stanford University School of Medicine

The Re3gistry software and the INSPIRE Registry

OPEN DATA – F.A.I.R. PRINCIPLES

Nikhef RDM Policy – first experiences

SDMX: A brief introduction

Metadata for research outputs management Part 2

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

Darja Fišer CLARIN ERIC Director of User Involvement

2. An overview of SDMX (What is SDMX? Part I)

SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION

2. An overview of SDMX (What is SDMX? Part I)

Carolina Mendoza-Puccini, MD

Introduction to the MIABIS SOP Working Group

Interoperability – GO FAIR - RDA

Information Networks: State of the Art

A Case Study for Synergistically Implementing the Management of Open Data Robert R. Downs NASA Socioeconomic Data and Applications.

How to Implement the FAIR Data Principles? Elly Dijk

Overview of Workflows: Why Use Them?

Bird of Feather Session

Automatic evaluation of fairness

eScience - FAIR Science

A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE case

GEO Knowledge Hub: overview

Research data lifecycle²

Module 1.1 Overview of Master Facility Lists in Nigeria

Cultivating Semantics for Data in Agriculture and Nutrition

Presentation transcript:

FAIR Sample and Data Access David van Enckevort david.van.enckevort@umcg.nl ISBER 2017

Introduction Project Manager Genomics Coordination Centre at Dept. Genetics Member of the group lead by Prof. M.A. Swertz My focus: sharing of biobank data and material

Outline FAIR Principles Making your data and samples FAIR MOLGENIS: how our tools can help you Examples of FAIR samples and data

Make resources sustainable for reuse FAIR Principles Make resources sustainable for reuse Findability Accessibility Interoperability Reusability doi: 10.1038/sdata.2016.18

Findable Resources are assigned a globally unique and persistent identifier Resources are described with rich metadata Metadata clearly and explicitly include the identifier of the resource it describes Resource can by found through other systems

Accessible Retrievable by their identifier using a standardized communications protocol The protocol is open, free, and universally implementable The protocol allows for an authentication and authorization procedure, where necessary Metadata are accessible, even when the data are no longer available

Interoperable Use a formal, accessible, shared, and broadly applicable language for knowledge representation. Use vocabularies that follow FAIR principles Include qualified references to other (meta)data

Reusable Richly described with a plurality of accurate and relevant attributes Released with a clear and accessible usage license Associated with detailed provenance Meet domain-relevant community standards

What is FAIR not? A standard A specific technology or format A data management system or analysis tool The same as Open Access Trivial to implement 11 November 201811 November 2018

Make your samples & data FAIR Step 1 Make knowledge explicit (do not assume people will know things that are obvious to you) Include sufficient metadata (units, SOPs, access conditions, consent) Provide the raw data (e.g. when you need BMI also provide length and weight)

Make your samples & data FAIR Step 2 Use standards information models Encode data using ontologies

Standard information models Define the minimal information you should capture to make your data (re)usable Provide structure to the information Common information models: MIABIS, MIAPE, MIAME https://biosharing.org/

Ontologies http://bioportal.bioontology.org/ Provide a well defined and unambiguous meaning to a term Provide relations between terms, e.g. ’Breast Cancer’ is a ‘Cancer’ is a ‘Disease’ Common ontologies include: OBIB, HPO, OMIM http://bioportal.bioontology.org/

Information model defines what information to include Ontologies define acceptable values Improve interoperability and reusability

Make your samples & data FAIR Step 3 Publish metadata about your samples and data collections Make it available to others

How we help you to make your data FAIR MOLGENIS How we help you to make your data FAIR

Platform for scientific data Data request Find and request (biobank) data sets and items Genome browser Data sharing and integration DAS protocol Upload format Import data and meta data using EMX format Model registry Meta-data registry of models for biobanks and molecular data Annotators Data integration for diagnostics and personalized medicine Compute Large scale computation on computational clusters, grids and clouds Connect Harmonisation tools RNA pipeline NGS data quantitation, structure, eQTL allele specific expression Impute pipeline GWAS harmonization and imputation R statistics Use R data API to up/download data and integrate graphics Data explorer Filter and download for further analysis DNA pipeline NGS data alignment, SNV/SV calling, QC, NIPT http://www.molgenis.org/

MOLGENIS/connect toolbox ‘FAIRifier’ system for retrospective interoperability of data Biobank Connect Make data attributes interoperable <ID> SORTA Make data values interoperable

Problem solved with MOLGENIS/connect Different data at the source Sample Material Sex Clinical Diagnosis 1 DNA F Ring chromosome 14 2 Leukocyte 3 4 Lymphoblast ID Type Diagnose Geslacht D7 dna Ring 14 Vrouw D8 wbc Man D9 D10 lbc

Code data to a common standard Identifiers, ontologies, codes

Make data values interoperable SORTA <ID> SORTA Make data values interoperable

SORTA Workflow Upload data using Excel SORTA shortlists candidate codes Lexical matching Semantic matching Human expert decides (and so trains SORTA) SORTA automatically recodes when high matching score (e.g. 80%) Use n-gram matching treshold (e.g 80%)

Expert curation of the matches

Expert curation of the matches Original text

Expert curation of the matches Candidate standardized terms

Expert curation of the matches Confidence scores

Expert curation of the matches Select the right match

Make data attributes interoperable BiobankConnect <ID> Biobank Connect Make data attributes interoperable

Software generates mappings Standard model to conform to

Software generates mappings Mapping rules for your data

Software generates mappings Mapping rules for your data You can map multiple datasets

Software generates mappings Colour indicates state of the mapping

Curation of mappings

Create rules for conversion Curation of mappings Create rules for conversion

Curation of mappings On the fly validation

Curation of mappings Mark mapping as: Curated To be discussed

Benefits Tools reduce the burden of harmonising data Allowing expert curation to provide high quality data Make data usable for pooling and aggregation

Examples how it solves problems FAIR Sample and Data Examples how it solves problems

Pooling heterogeneous data CM ever had high blood pressure 516 data items Are you taking medication for high blood pressure? 353 data items Standard variable wanted: ‘History of hypertension’ Hypertension 6401 data items Increased in blood pressure 224 data items Have you ever been told that you have elevated or high blood pressure? 75 data items PREVEND

How FAIR helps Common models standardize the data that you capture Ontologies standardize the way you express the values

Finding samples or data Descriptive data Aggregated data Sample and donor data Study name Contact info Age high, Age low, Sex, etc. Sampled date, storage temp., Material type, Disease, Age Biobank catalogues / directories

BBMRI-ERIC Directory https://directory.bbmri-eric.eu/ World largest biobank directory Listing over 1000 collections Millions of samples Federation of BBMRI National Nodes Part of the Common Services for IT Negotiator Locator https://directory.bbmri-eric.eu/

Directory federation model push BBMRI-ERIC directory pull biobank network Biobanks provide data to the national node BBMRI-ERIC directory receives data from the BBMRI National Nodes

All biobanks describe their samples with the same metadata https://directory.bbmri-eric.eu/

Enabling structured search using common terms

Send a request to the biobank for access

How FAIR helps Common structure and protocols allows the Directory to aggregate data from the national nodes Common terms allows researchers to find the right data and samples Specifying access conditions gives insight into the availability of samples and data Unique identifiers facilitate requests for access

Summary FAIR enables better use of biobank samples and data Making them findable and accessible and promote reuse We offer tools to help you make your data FAIR

To learn more Software MOLGENIS - http://www.molgenis.org/ Reading FAIR principles - DOI: 10.1038/sdata.2016.18 BiobankConnect - DOI: 10.1136/amiajnl-2013-002577 SORTA - DOI: 10.1093/database/bav089 MOLGENIS/connect - DOI: 10.1093/bioinformatics/btw155 Movies Upload - https://www.youtube.com/watch?v=VSZNXdaGIl4 SORTA - https://www.youtube.com/watch?v=Wq81S-jR3l8 BiobankConnect - https://www.youtube.com/watch?v=Gc1VKRCmTWU

Thank you for your attention!