Presentation is loading. Please wait.

Presentation is loading. Please wait.

Integration of EGA secure data access into Galaxy

Similar presentations


Presentation on theme: "Integration of EGA secure data access into Galaxy"— Presentation transcript:

1 Integration of EGA secure data access into Galaxy
Youri Hoogstrate[1], Alexander Senf[2], Jochem Bijlard[3], Saskia Hiltemann[1], David van Enckevort[4], Chao Zhang[5], Remond Fijneman[6], Jan-Willem Boiten[7], Gerrit Meijer[6], Andrew Stubbs[1], Jordi Rambla[8], Dylan Spalding[2], Sanne Abeln[5] [1]ErasmusMC Rotterdam (NL), [2]EMBL-EBI (UK), [3]The Hyve (NL), [4]UMC Groningen (NL), [5]VU university medical center (NL), [6]Netherlands Cancer Institute (NL), [7]Lygature (NL), [8]Centre for Genomic Regulation (ESP) Bio-molecular high throughput data is privacy sensitive and can not easily made accessible to the entire outside world. To manage access to long term- archival of such data the EGA project was initiated to facilitate data access and management to funded projects after completion to enable continued access to these data. Strict protocols govern how information is managed, stored, transferred and distributed and each data provider is responsible for ensuring a Data Access Committee is in place to grant access to the data. Moreover, the transfer of privacy sensitive data should be encrypted. Entire IT-infrastructure The aim of the CTMM-TraIT project is to setup a multi-domain IT-infrastructure in which researchers can track, share and reproduce their entire study, including metadata on wet lab experiments. To achieve this, CTMM-TraIT uses big community open source software including TranSMART and Galaxy. EGA & Galaxy In a collaboration between ELIXIR, TraIT and EGA a full ecosystem was designed to connect storage of raw experimental molecular profiling data with processed data and computational workflows (fig. 1). In this ecosystem we find Galaxy, a popular and user friendly bio-informatics analysis platform that provides an intuitive user interface for molecular biologists and bio-informaticians to run and design workflows, to do integrated analysis within the browser and to share and communicate both results and methodologies. By integrating EGA into galaxy, a user can perform an entire analysis, containing (privacy sensitive) data from EGA, to make it available in a reproducible manner for other researchers. Fig 1. A flowchart of the ecosystem: an entire study setup in a way that all molecular data and metadata and software are tracked and versioned, while having access to personal data is secure in a secure and administered manner to deal with the challenges of molecular data like accessibility, reproducibility, security and privacy. The entire ecosystem is accessible within the browser and uses free and open software (FOSS) and where EGA is a central storage facility. Proof of concept study To demonstrate the ecosystem we use cell-line data to avoid privacy sensitive related matters and demonstrate fusion gene detection in RNA-Seq data of prostate cancer cell line VCaP using STAR-Fusion (fig 2.). The workflow and the results are explained in more detail in a published Galaxy page [*], including interactive views of detected fusion genes and access to reference data. Fig 2. The end-to-end workflow starts by obtaining paired-end RNA-Seq data of the VCaP cell line via EGA. Galaxy determines the file type automatically with its built-in format detection system. The reads will be clipped before an alignment with STAR and STAR produces several output files that can be used for different analysis. The last tool is STAR-Fusion which was able to confirm the TMPRSS2-ERG fusion gene[**]. [*] [**] doi: /s Youri Hoogstrate (ErasmusMC) Sanne Abeln (VUmc)


Download ppt "Integration of EGA secure data access into Galaxy"

Similar presentations


Ads by Google