Presentation is loading. Please wait.

Presentation is loading. Please wait.

For EGI/EUDAT EMBL/ELIXIR use-cases Tony Wildish

Similar presentations

Presentation on theme: "For EGI/EUDAT EMBL/ELIXIR use-cases Tony Wildish"— Presentation transcript:

1 for EGI/EUDAT EMBL/ELIXIR use-cases Tony Wildish

2 What is EMBL-EBI? Europe’s home for biological data services, research and training A trusted data provider for the life sciences Part of the European Molecular Biology Laboratory, an intergovernmental research organisation International: 570 members of staff from 57 nations Home of the ELIXIR Technical hub.

3 A distributed data infrastructure for Europe EMBL-EBI is a founding member of ELIXIR: Europe’s distributed research infrastructure for biological information Mission: to support life science research and its translation to medicine, the environment, the bioindustries and society ELIXIR Nodes represent centres of excellence throughout Europe.

4 Data resources available from EMBL-EBI Genes, genomes & variation RNA Central Array Express Expression Atlas Metabolights PRIDE InterProPfamUniProt ChEMBLChEBI Molecular structures Protein Data Bank in Europe Electron Microscopy Data Bank European Nucleotide Archive European Variation Archive European Genome-phenome Archive Gene, protein & metabolite expression Protein sequences, families & motifs Chemical biology Reactions, interactions & pathways IntActReactomeMetaboLights Systems BioModelsEnzyme PortalBioSamples Ensembl Ensembl Genomes GWAS Catalog Metagenomics portal Europe PubMed Central Gene Ontology Experimental Factor Ontology Literature & ontologies

5 ELIXIR: Driven by 4 scientific use-cases Marine Metagenomics Genomic & Phenotypic data for Crop and Forest plants Rare Diseases Human Genetic Data Will not start with human data due to security constraints  All scientific use cases require either private or public data sets to be replicated from the source or between analysis sites

6 Use-case characteristics Data volumes from 10’s to several 100’s of GB monthly Human data likely to be largest volume/traffic Replication between a handful of sites Periodic updates to reference datasets => metadata handling to describe datasets consistently Download smaller subsets for individual analyses End-users widely distributed

7 Use-case characteristics Metadata replication not a target for the pilot Complex, domain-specific, well established No clear gain in replicating it at this time Decouple dataset-description metadata from file-location and transfer metadata Allow file-distribution to be explored and understood without digging into details of what the data is about

8 Use-case characteristics Subscription-based model Datasets subscribed to a destination, new versions distributed automatically as they become available Need to understand metadata requirements to allow this Need an opaque ID for data that can be shared between EBI and EUDAT/EGI to identify dataset versions Rely on EBI source archive for determining what the ID represents File-transfer system needs to handle overlapping datasets (partial updates to existing datasets)

9 Initial prototype Standalone prototype, first investigate metadata issues Provide a flat list of files to transfer Use globus-connect endpoints & CLI to perform transfer Side-step issues with dependency on AAI Switch to using AAI as soon as possible (ELIXIR, EGI, EUDAT) Currently works on EBI-Embassy, CESNET, and Amazon Integrate with ELIXIR portal Allow data-discovery followed by subscription to ELIXIR/EGI/EUDAT destinations

10 Summary Initial pilot to investigate issues Data-description metadata out of scope for pilot File-distribution based on AAI from multiple providers Start with globus-connect for simplicity, move to gridFTP once AAI in place File-replica metadata to be handled by prototype TBD: how to do this, tools, technologies… Integrate with ELIXIR cloud portal, (under development) Early days, lots to learn...

11 Questions?

Download ppt "For EGI/EUDAT EMBL/ELIXIR use-cases Tony Wildish"

Similar presentations

Ads by Google