Presentation is loading. Please wait.

Presentation is loading. Please wait.

FAIR Sample and Data Access

Similar presentations


Presentation on theme: "FAIR Sample and Data Access"— Presentation transcript:

1 FAIR Sample and Data Access
David van Enckevort ISBER 2017

2 Introduction Project Manager
Genomics Coordination Centre at Dept. Genetics Member of the group lead by Prof. M.A. Swertz My focus: sharing of biobank data and material

3 Outline FAIR Principles Making your data and samples FAIR
MOLGENIS: how our tools can help you Examples of FAIR samples and data

4 Make resources sustainable for reuse
FAIR Principles Make resources sustainable for reuse Findability Accessibility Interoperability Reusability doi:  /sdata

5 Findable Resources are assigned a globally unique and persistent identifier Resources are described with rich metadata Metadata clearly and explicitly include the identifier of the resource it describes Resource can by found through other systems

6 Accessible Retrievable by their identifier using a standardized communications protocol The protocol is open, free, and universally implementable The protocol allows for an authentication and authorization procedure, where necessary Metadata are accessible, even when the data are no longer available

7 Interoperable Use a formal, accessible, shared, and broadly applicable language for knowledge representation. Use vocabularies that follow FAIR principles Include qualified references to other (meta)data

8 Reusable Richly described with a plurality of accurate and relevant attributes Released with a clear and accessible usage license Associated with detailed provenance Meet domain-relevant community standards

9 What is FAIR not? A standard A specific technology or format
A data management system or analysis tool The same as Open Access Trivial to implement 11 November November 2018

10 Make your samples & data FAIR
Step 1 Make knowledge explicit (do not assume people will know things that are obvious to you) Include sufficient metadata (units, SOPs, access conditions, consent) Provide the raw data (e.g. when you need BMI also provide length and weight)

11 Make your samples & data FAIR
Step 2 Use standards information models Encode data using ontologies

12 Standard information models
Define the minimal information you should capture to make your data (re)usable Provide structure to the information Common information models: MIABIS, MIAPE, MIAME

13 Ontologies http://bioportal.bioontology.org/
Provide a well defined and unambiguous meaning to a term Provide relations between terms, e.g. ’Breast Cancer’ is a ‘Cancer’ is a ‘Disease’ Common ontologies include: OBIB, HPO, OMIM

14 Information model defines what information to include
Ontologies define acceptable values Improve interoperability and reusability

15 Make your samples & data FAIR
Step 3 Publish metadata about your samples and data collections Make it available to others

16 How we help you to make your data FAIR
MOLGENIS How we help you to make your data FAIR

17 Platform for scientific data
Data request Find and request (biobank) data sets and items Genome browser Data sharing and integration DAS protocol Upload format Import data and meta data using EMX format Model registry Meta-data registry of models for biobanks and molecular data Annotators Data integration for diagnostics and personalized medicine Compute Large scale computation on computational clusters, grids and clouds Connect Harmonisation tools RNA pipeline NGS data quantitation, structure, eQTL allele specific expression Impute pipeline GWAS harmonization and imputation R statistics Use R data API to up/download data and integrate graphics Data explorer Filter and download for further analysis DNA pipeline NGS data alignment, SNV/SV calling, QC, NIPT

18 MOLGENIS/connect toolbox
‘FAIRifier’ system for retrospective interoperability of data Biobank Connect Make data attributes interoperable <ID> SORTA Make data values interoperable

19 Problem solved with MOLGENIS/connect
Different data at the source Sample Material Sex Clinical Diagnosis 1 DNA F Ring chromosome 14 2 Leukocyte 3 4 Lymphoblast ID Type Diagnose Geslacht D7 dna Ring 14 Vrouw D8 wbc Man D9 D10 lbc

20 Code data to a common standard Identifiers, ontologies, codes

21 Make data values interoperable
SORTA <ID> SORTA Make data values interoperable

22 SORTA Workflow Upload data using Excel
SORTA shortlists candidate codes Lexical matching Semantic matching Human expert decides (and so trains SORTA) SORTA automatically recodes when high matching score (e.g. 80%) Use n-gram matching treshold (e.g 80%)

23 Expert curation of the matches

24 Expert curation of the matches
Original text

25 Expert curation of the matches
Candidate standardized terms

26 Expert curation of the matches
Confidence scores

27 Expert curation of the matches
Select the right match

28 Make data attributes interoperable
BiobankConnect <ID> Biobank Connect Make data attributes interoperable

29 Software generates mappings
Standard model to conform to

30 Software generates mappings
Mapping rules for your data

31 Software generates mappings
Mapping rules for your data You can map multiple datasets

32 Software generates mappings
Colour indicates state of the mapping

33 Curation of mappings

34 Create rules for conversion
Curation of mappings Create rules for conversion

35 Curation of mappings On the fly validation

36 Curation of mappings Mark mapping as: Curated To be discussed

37 Benefits Tools reduce the burden of harmonising data
Allowing expert curation to provide high quality data Make data usable for pooling and aggregation

38 Examples how it solves problems
FAIR Sample and Data Examples how it solves problems

39 Pooling heterogeneous data
CM ever had high blood pressure 516 data items Are you taking medication for high blood pressure? 353 data items Standard variable wanted: ‘History of hypertension’ Hypertension 6401 data items Increased in blood pressure 224 data items Have you ever been told that you have elevated or high blood pressure? 75 data items PREVEND

40 How FAIR helps Common models standardize the data that you capture
Ontologies standardize the way you express the values

41 Finding samples or data
Descriptive data Aggregated data Sample and donor data Study name Contact info Age high, Age low, Sex, etc. Sampled date, storage temp., Material type, Disease, Age Biobank catalogues / directories

42 BBMRI-ERIC Directory https://directory.bbmri-eric.eu/
World largest biobank directory Listing over 1000 collections Millions of samples Federation of BBMRI National Nodes Part of the Common Services for IT Negotiator Locator

43 Directory federation model
push BBMRI-ERIC directory pull biobank network Biobanks provide data to the national node BBMRI-ERIC directory receives data from the BBMRI National Nodes

44 All biobanks describe their samples with the same metadata

45 Enabling structured search using common terms

46 Send a request to the biobank for access

47 How FAIR helps Common structure and protocols allows the Directory to aggregate data from the national nodes Common terms allows researchers to find the right data and samples Specifying access conditions gives insight into the availability of samples and data Unique identifiers facilitate requests for access

48 Summary FAIR enables better use of biobank samples and data
Making them findable and accessible and promote reuse We offer tools to help you make your data FAIR

49 To learn more Software MOLGENIS - http://www.molgenis.org/ Reading
FAIR principles - DOI:  /sdata BiobankConnect - DOI:  /amiajnl SORTA - DOI:  /database/bav089 MOLGENIS/connect - DOI:  /bioinformatics/btw155 Movies Upload - SORTA - BiobankConnect -

50 Thank you for your attention!


Download ppt "FAIR Sample and Data Access"

Similar presentations


Ads by Google