Download presentation
Presentation is loading. Please wait.
1
FAIR Sample and Data Access
David van Enckevort ISBER 2017
2
Introduction Project Manager
Genomics Coordination Centre at Dept. Genetics Member of the group lead by Prof. M.A. Swertz My focus: sharing of biobank data and material
3
Outline FAIR Principles Making your data and samples FAIR
MOLGENIS: how our tools can help you Examples of FAIR samples and data
4
Make resources sustainable for reuse
FAIR Principles Make resources sustainable for reuse Findability Accessibility Interoperability Reusability doi: /sdata
5
Findable Resources are assigned a globally unique and persistent identifier Resources are described with rich metadata Metadata clearly and explicitly include the identifier of the resource it describes Resource can by found through other systems
6
Accessible Retrievable by their identifier using a standardized communications protocol The protocol is open, free, and universally implementable The protocol allows for an authentication and authorization procedure, where necessary Metadata are accessible, even when the data are no longer available
7
Interoperable Use a formal, accessible, shared, and broadly applicable language for knowledge representation. Use vocabularies that follow FAIR principles Include qualified references to other (meta)data
8
Reusable Richly described with a plurality of accurate and relevant attributes Released with a clear and accessible usage license Associated with detailed provenance Meet domain-relevant community standards
9
What is FAIR not? A standard A specific technology or format
A data management system or analysis tool The same as Open Access Trivial to implement 11 November November 2018
10
Make your samples & data FAIR
Step 1 Make knowledge explicit (do not assume people will know things that are obvious to you) Include sufficient metadata (units, SOPs, access conditions, consent) Provide the raw data (e.g. when you need BMI also provide length and weight)
11
Make your samples & data FAIR
Step 2 Use standards information models Encode data using ontologies
12
Standard information models
Define the minimal information you should capture to make your data (re)usable Provide structure to the information Common information models: MIABIS, MIAPE, MIAME
13
Ontologies http://bioportal.bioontology.org/
Provide a well defined and unambiguous meaning to a term Provide relations between terms, e.g. ’Breast Cancer’ is a ‘Cancer’ is a ‘Disease’ Common ontologies include: OBIB, HPO, OMIM
14
Information model defines what information to include
Ontologies define acceptable values Improve interoperability and reusability
15
Make your samples & data FAIR
Step 3 Publish metadata about your samples and data collections Make it available to others
16
How we help you to make your data FAIR
MOLGENIS How we help you to make your data FAIR
17
Platform for scientific data
Data request Find and request (biobank) data sets and items Genome browser Data sharing and integration DAS protocol Upload format Import data and meta data using EMX format Model registry Meta-data registry of models for biobanks and molecular data Annotators Data integration for diagnostics and personalized medicine Compute Large scale computation on computational clusters, grids and clouds Connect Harmonisation tools RNA pipeline NGS data quantitation, structure, eQTL allele specific expression Impute pipeline GWAS harmonization and imputation R statistics Use R data API to up/download data and integrate graphics Data explorer Filter and download for further analysis DNA pipeline NGS data alignment, SNV/SV calling, QC, NIPT
18
MOLGENIS/connect toolbox
‘FAIRifier’ system for retrospective interoperability of data Biobank Connect Make data attributes interoperable <ID> SORTA Make data values interoperable
19
Problem solved with MOLGENIS/connect
Different data at the source Sample Material Sex Clinical Diagnosis 1 DNA F Ring chromosome 14 2 Leukocyte 3 4 Lymphoblast ID Type Diagnose Geslacht D7 dna Ring 14 Vrouw D8 wbc Man D9 D10 lbc
20
Code data to a common standard Identifiers, ontologies, codes
21
Make data values interoperable
SORTA <ID> SORTA Make data values interoperable
22
SORTA Workflow Upload data using Excel
SORTA shortlists candidate codes Lexical matching Semantic matching Human expert decides (and so trains SORTA) SORTA automatically recodes when high matching score (e.g. 80%) Use n-gram matching treshold (e.g 80%)
23
Expert curation of the matches
24
Expert curation of the matches
Original text
25
Expert curation of the matches
Candidate standardized terms
26
Expert curation of the matches
Confidence scores
27
Expert curation of the matches
Select the right match
28
Make data attributes interoperable
BiobankConnect <ID> Biobank Connect Make data attributes interoperable
29
Software generates mappings
Standard model to conform to
30
Software generates mappings
Mapping rules for your data
31
Software generates mappings
Mapping rules for your data You can map multiple datasets
32
Software generates mappings
Colour indicates state of the mapping
33
Curation of mappings
34
Create rules for conversion
Curation of mappings Create rules for conversion
35
Curation of mappings On the fly validation
36
Curation of mappings Mark mapping as: Curated To be discussed
37
Benefits Tools reduce the burden of harmonising data
Allowing expert curation to provide high quality data Make data usable for pooling and aggregation
38
Examples how it solves problems
FAIR Sample and Data Examples how it solves problems
39
Pooling heterogeneous data
CM ever had high blood pressure 516 data items Are you taking medication for high blood pressure? 353 data items Standard variable wanted: ‘History of hypertension’ Hypertension 6401 data items Increased in blood pressure 224 data items Have you ever been told that you have elevated or high blood pressure? 75 data items PREVEND
40
How FAIR helps Common models standardize the data that you capture
Ontologies standardize the way you express the values
41
Finding samples or data
Descriptive data Aggregated data Sample and donor data Study name Contact info Age high, Age low, Sex, etc. Sampled date, storage temp., Material type, Disease, Age Biobank catalogues / directories
42
BBMRI-ERIC Directory https://directory.bbmri-eric.eu/
World largest biobank directory Listing over 1000 collections Millions of samples Federation of BBMRI National Nodes Part of the Common Services for IT Negotiator Locator
43
Directory federation model
push BBMRI-ERIC directory pull biobank network Biobanks provide data to the national node BBMRI-ERIC directory receives data from the BBMRI National Nodes
44
All biobanks describe their samples with the same metadata
45
Enabling structured search using common terms
46
Send a request to the biobank for access
47
How FAIR helps Common structure and protocols allows the Directory to aggregate data from the national nodes Common terms allows researchers to find the right data and samples Specifying access conditions gives insight into the availability of samples and data Unique identifiers facilitate requests for access
48
Summary FAIR enables better use of biobank samples and data
Making them findable and accessible and promote reuse We offer tools to help you make your data FAIR
49
To learn more Software MOLGENIS - http://www.molgenis.org/ Reading
FAIR principles - DOI: /sdata BiobankConnect - DOI: /amiajnl SORTA - DOI: /database/bav089 MOLGENIS/connect - DOI: /bioinformatics/btw155 Movies Upload - SORTA - BiobankConnect -
50
Thank you for your attention!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.