0 GBCS ecosystem for NGS Data Data Management with emBASE Data analysis with Galaxy.

Slides:



Advertisements
Similar presentations
My EBSCOhost Tutorial Tutorial support.ebsco.com.
Advertisements

Welcome to WebCRD.
12-CRS-0106 REVISED 8 FEB 2013 PRESENTS vTools Voting: Getting Voter List.
For Details Visit : or For any Help Contact the Librarian EBSCOhost 2.0.
New Release Announcements and Product Roadmap Chris DiPierro, Director of Software Development April 9-11, 2014
Test Case Management and Results Tracking System October 2008 D E L I V E R I N G Q U A L I T Y (Short Version)
Hidden Features. What will we cover 16 hidden features for Admins Bonus: –2 hidden features for Employers –Live examples!
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Calendar Browser is a groupware used for booking all kinds of resources within an organization. Calendar Browser is installed on a file server and in a.
How to Guide: Step-by-Step introduction on how to Manage your References Pavlinka Kovatcheva, Sciences Librarian Library training instruction for Sciences.
Michael Donovan, River Campus Libraries – 12/03 DocuShare Overview and Training.
Kabel Nathan Stanwicks, Head Circulation and Media Services Department Electronic Reserves Introductory Tutorial for Faculty.
Its easy to be an information provider Tutorial: Web Publishing.
Instructional Technology & Design Office or Zotero & Mendeley Workshop Presented by Kate Rojas.
JOIN A COMMUNITY OF 80,000 E-COMMERCE SITES WORLDWIDE.
What is so good about Archie and RevMan 5
SubVersioN – the new Central Service at DESY by Marian Gawron.
Instructional Technology & Design Office or Zotero & Mendeley Workshop Presented by Aisha Conner-Gaten.
OMap By: Haitham Khateeb Yamama Dagash Under Suppervision of: Benny Daon.
ArcGIS Workflow Manager An Introduction
Customized cloud platform for computing on your terms !
Classroom User Training June 29, 2005 Presented by:
© Paradigm Publishing Inc. 9-1 Chapter 9 Database and Information Management.
With Windows 7 Introductory© 2011 Pearson Education, Inc. Publishing as Prentice Hall1 Windows 7 Introductory Chapter 2 Managing Libraries Folders, Files.
Support.ebsco.com My EBSCOhost Tutorial Tutorial.
| nectar.org.au NECTAR TRAINING Module 9 Backing up & Packing up.
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
Client – Server Application Can you create a client server application: The server will be running as a service: does not have a GUI The server will run.
NGS data analysis CCM Seminar series Michael Liang:
DEMO - 8/14/2007. R2 Feature List ReceiveDocumentBatch Web Service SendPESCAcknowledgment Web Service Validate Acknowledgment Upload Acknowledgment Transcript.
Training Guide for Inzalo SOP Users. This guide has been prepared to demonstrate the use of the Inzalo Intranet based SOP applications. The scope of this.
Why use JIRA?.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
1 Chapter Overview Preparing to Upgrade Performing a Version Upgrade from Microsoft SQL Server 7.0 Performing an Online Database Upgrade from SQL Server.
With Windows 7 Introductory© 2011 Pearson Education, Inc. Publishing as Prentice Hall1 Windows 7 Introductory Chapter 3 Advanced File Management and Advanced.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
What is Web Site Administration Tool ? WAT Allow you to Configure Web Site With Simple Interface –Manage Users –Manage Roles –Manage Access Rules.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
1 EndNote X2 Your Bibliographic Management Tool 29 September 2009 Humanities and Social Sciences Resource Teams.
2004/051 >> Supply Chain Solutions That Deliver Users.
Hyperion Artifact Life Cycle Management Agenda  Overview  Demo  Tips & Tricks  Takeaways  Queries.
0 Managing your NGS Data with emBASE Sample Annotation NGS Assays Data sets grouping in experiments and projects Programmatic access Adding, Deleting and.
Remote Api Tutorial How to call WS-PGRADE workflows from remote clients through the http protocol?
Here are some things you can do while you wait 1.Open your omeka.net site in your browser (e.g. 2.Open.
1 (c) 2013 FabSoft. MOST Cloud Service What is a Cloud Service? A cloud service is internet-based, meaning that MOST is hosted on a server farm on the.
Fab25 User Training Cerium Labs LabCollector - LIMS Lynette Ballast.
Using Galaxy to build and run data processing pipelines Jelle Scholtalbers / Charles Girardot GBCS Genome Biology Computational Support.
How to complete and submit a Final Report through Mobility Tool+ Technical guidelines Authentication, Completion and Submission 1 Antonia Gogaki IT Officer.
0 NGS Data Analysis with the Galaxy Platform - an application to ChIP-seq Monterotondo, 16 April 2015 Charles Girardot Genome Biology Computational Support.
1.Switch on the computer and wait for loading. 2.Select the Windows 7 OS at the end of the list. 3.Click on the link ‘Administrator’ 4.Enter the administrator.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Using the My EBSCOhost Folder Tutorial support.ebsco.com.
1 Terminal Management System Usage Overview Document Version 1.1.
Core LIMS Training: Entering Experimental Data – Simple Data Entry.
Integrating ArcSight with Enterprise Ticketing Systems
Integrating ArcSight with Enterprise Ticketing Systems
Project Management: Messages
LAMS 2.0 Architecture. LAMS 2.0 Architecture Agenda LAMS 2.0: Technical Aims Architecture Technologies LAMS Core LAMS Tool Contract External Tools.
Now every configuration is possible
Amazon Storage- S3 and Glacier
Chapter 2: System Structures
Materials Engineering Product Data Management (ePDM)
Presenter: Karoline Lapko
SRA Submission Pipeline
Printer Admin Print Job Manager
Dreaming of a Paperless Office
Getting Started with Git and Bitbucket
Welcome to WebCRD.
Presentation transcript:

0 GBCS ecosystem for NGS Data Data Management with emBASE Data analysis with Galaxy

The big picture 1 Data File servers 1.emBASE is a database, with a web front-end, storing all metadata about your data files (e.g. fastq) NGS GB

The big picture 2 Data File servers 2.Your data files remains on your group fileserver in your “NGS data library” and are accessible directly NGS GB

The big picture 3 Data File servers NGS GB Annotate data : sample description, protocol description Manage data sets : link files to experiments/projects Publish data to public repository : upon publication Export to Tape : long term storage

The big picture 4 Data File servers GeneCore Online Ordering GCBridge Automated data transfer from GC servers to emBASE to avoid : file renaming i.e. lack of data traceability duplication of data files in several places (with different names!) unreliable or unknown storage places (your laptop…) data not being loaded in the system NGS GB

NGS ecosystem by GBCS 5 Data File servers GeneCore Online Ordering GCBridge IT LSF Cluster jobs run on cluster NGS Analysis Build/Store Workflows R studio Server GB Servers access files directly fetch info with JemBASEAPI SEPP libraries NGS GB

0. What is the “data” 6

SampleSequencing File FASTQ, BAM The typical user view of the “model” Send my sample to sequencing Download the file Mail the bioinformatician where the file is NGS GB : Data model

Sample (eg embryos, cells) Extract (eg DNA, mRNA) Library Protocols growth, treatment, extraction, amplification, sequencing, … Annotations Sequencing File FASTQ, BAM Annotations and protocols need to be controlled AMAP A more realistic view of the process NGS GB : Data model

Sample1Extract1Library1Sequencing File1 Annotations and protocols need to be controlled AMAP Replicates needs to be described properly (sample replicates vs library re-sequencing) A more realistic view of the process NGS GB : Data model Sample2Extract2Library2Sequencing File2 Sample1Extract1Library1Sequencing File1 Extract2Library2Sequencing File2 Sample1Extract1Library1Sequencing File1 Sequencing File2 ≠ ≠ Biol Rep Tech Rep

Exp Y / Project Q Projects are mixed in the same lane Exp X / Project P A complete view of the situation Samples are commonly multiplexed Sample4Extract4Library4Sequencing File FASTQ, BAM Sample1Extract1Library1 Sample2Extract2Library2 Sample3Extract3Library3 ……… Barcode Info File … FASTQ, BAM Analysis Stored (meta)data must be readily accessible for analysis Publish e.g. EBI Model and Vocabulary should match standards for final publishing NGS GB : Data model

1.emBASE 11 “Data management, organization, annotations and publication”

emBASE Items NGS GB 12 Sample2 (eg embryos, cells) Extract2 (eg DNA, mRNA) Library2 Protocols Annotations Sequencing File FASTQ, BAM Sample1Extract1Library1 Barcode Info File Sample2 (eg embryos, cells) Extract2 (eg DNA, mRNA) Library2NGS Assay Protocols Sample Annotations Sample1Extract1Library1 + File (BAM, FASTQ) SeqLane File(s) RawBioAssay1 RawBioAssay2 + File (BAM, FASTQ) Workflow emBASE

NGS GB :: Data Management :: emBASE 13 Developed in house using BASE Initially a LIMS for arrays Runs for 9 years now

emBASE Modules NGS GB :: Data Management :: emBASE 14 ; please request login Controlled Vocabulary Sample, Extract, Libraries… Assays grouped in Experiments and Projects NGS Assays Microarrays In Situ Images

emBASE NGS Assay List Page NGS GB :: Data Management :: emBASE 15 List all NGS Assays (== Lane)

emBASE NGS Assay List Page NGS GB :: Data Management :: emBASE 16 Access rights for each assay (unix like)

Search NGS Assays NGS GB :: Data Management :: emBASE 17 Powerful search on all “list” pages Customize table view Locate your assay and follow the link for details

NGS Assay: Example of a multiplexed lane 18 Lane File & Location Sequencing run info Assay (=Lane) info & rights Related raw data sets are grouped in “experiments” Individual data sets & De-multiplexed Files NGS GB :: Data Management :: emBASE

NGS Assay: Example of a multiplexed lane 19 Link to Libraries i.e. Samples NGS GB :: Data Management :: emBASE

Biomaterials NGS GB 20

Sample Annotation NGS GB 21 Sample Annotation Types : are typed free text, number (int, float) pre-defined values (enum) are owned can be created as needed by authorized users e.g. as required by ICGC

Sample Annotation NGS GB 22 Select SATs

Custom sample annotations NGS GB :: Data Management :: emBASE 23 Unlimited number of annotations Annotation types can be customized (per group)

Grouping data sets into Experiments NGS GB 24 An experiment has a single ‘type’ e.g. ChIP-seq, RNA-seq

Grouping data sets into Experiments NGS GB 25 Search raw data sets and add/remove them from exp.

Project Layer New emBASE Project Layer 26 Experiment is tied to a single type –eg ChIP-seq, RNA-seq, iCLIP-seq Group related exp. into project

NGS GB 27 Wait a sec... Do we really have to fill all these web forms ?!?! NO ! 1. GCBridge: all “items” are pre-created for you 2. Protocols and sample annotations remain to be done

2. Decentralized NGS File Data Lib 28 “Your data lives on your file server and is readily accessible”

NGS data Library NGS GB :: Data Management :: emBASE 29 NGS data library root folder (can be anywhere your like) Sub-folders containing the fastq files are organized by “Sequencer Run” Everything in your data library is managed by emBASE and is read-only to avoid data deletion, renaming, move. 1.emBASE is a database, with a web front-end, storing all metadata about your data files (e.g. fastq) 2.Your data files remains on your group fileserver in your “NGS data library” and are accessible directly

NGS data Library NGS Data Library extended to better support demultiplexed files 30 Lane directory : one per (existing) lane ; read-only

NGS data Library NGS Data Library extended to better support demultiplexed files 31 Library dir (named after immutable internal emBASE id), read-only

NGS data Library NGS Data Library extended to better support demultiplexed files 32 Data file dir, per file type read-write until you lock it; then read-only

Locking / Unlocking concept 1.Library file sub-directories are unlocked (writable for group) –you can work and replace files as you wish 2.At some point, files are ready and directories can be locked (only readable): 1.emBASE starts, at this point, to track these files 2.emBASE will allow lane file deletion when all its multiplexed libraries are locked. 3.Locking is operated via the web interface, on the whole lane or per library (case of shared lanes) 33

3. GC Bridge 34 “Ensuring smooth data transfer between GeneCore to emBASE”

GCBridge : Making your life as easy as possible 35 GeneCore Online Ordering 1.Transfer file NGS Lib NGS GB :: Automated Data Transfer

GCBridge : Making your life as easy as possible 36 GeneCore Online Ordering 1.Transfer file 2.Call GC Bridge NGS GB :: Automated Data Transfer NGS Lib

GCBridge : Making your life as easy as possible 37 GeneCore Online Ordering 1.Transfer file 2.Call GC Bridge NGS GB :: Automated Data Transfer NGS Lib

GCBridge : Making your life as easy as possible 38 GeneCore Online Ordering Lib fetch info from GC Db 1.Transfer file 2.Call GC Bridge NGS GB :: Automated Data Transfer  User gets upon transfer completion  Users gets when demultiplexing has performed

3. Practical steps 39  Validate GCBridge Transfer Form  Annotate Samples, link protocols

Data released 40 Click the link to get to the GCBridge Transfer Form

Single Library Form 41 Lane File(s) The Bridge is connected to emBASE experiments

Single Library Form 42

Single Library Form 43

Single Library Form 44  Sample names can be matched against existing Sample or Libraries  Search is performed ignoring prefix Sample1Extract1Library1 Extract2Library2 i.e. tech. replicate NGSAssay Library1 or lib. resequencing NGSAssay New entries are created by default Sample1Extract1Library1NGSAssay

Multiplexed Library Form 45 Identical Multiplex specific

Multiplexed Library Form 46 Tell us about lib number, so we can control submissions…

Easy demultiplexing in Data Lib Directly 47  Request demultiplexing (runs on cluster); starts when submission is complete  Jemultiplexer is emBASE-aware (ie where files go in Data Library  Jemultiplexer can also be (re)launched command line

Easy selection of lane mates 48  Select all lane-mates

Re-use emBASE samples and libraries 49 Step-by-step tutorial at (Quick Links)  Select search level : sample or library

Re-use emBASE samples and libraries 50  Select search level : sample or library  Select appropriate items  Match levels can be mixed  Allows to accurately model replicates (tech. vs biol. )

Re-use emBASE samples and libraries 51  Select search level : sample or library Step-by-step tutorial at (Quick Links)

Already demultiplexed samples NGS GB 52

Automatic notification NGS GB 53

NGS GB 54 Now what ? 1. GCBridge: all “items” are pre-created for you 2. Protocols and sample annotations remain to be done in emBASE

Working in batch with emBASE NGS GB Narrow your search to locate wanted samples

Working in batch with emBASE NGS GB Select the ones you want or All N.B : Increase number of item/page in GUI settings if needed

Working in batch with emBASE NGS GB Associate protocols, change access rights to all selected samples in one click

Working in batch with emBASE NGS GB Download pre-filled excel file for batch annotation

Working in batch with emBASE NGS GB 59 1.Keep columns you need, 2.Fill in your annotations in Excel, 3.Save back as text

Working in batch with emBASE NGS GB Batch (re)annotate your samples using this file

emBASE Advanced Features (for the command line user) 61

Working with emBASE 1.Export experiment or project views using the web interface 2.Use the new command line emBASE API to learn where files are or should be placed –These commands extracts all info from emBASE for a lane, an experiment or a project 62 Documentation at :

Concept : work as you like 63

Concept : work as you like 64 NGS Lib Database samples, libs, RBAs, exp, project link real files pull info as needed

Export Project View to disk 65

emBASE API Example 66 Assume you want to discover all libraries and associated files in a given lane …

emBASE API Example 67 Available from anywhere Logged in user used to authenticate in emBASE Rights apply the same way as in emBASE

emBASE API Example 68 Example : Create symlinks on the fly to the NGS data lib for all libs of a new lane

Archiving of emBASE Data Goal : save space by moving data offline when projects are finished 69 Fill in optionsemBASE admin is warned

Archiving of emBASE Data All data files connected to the experiments are exported IT performs back up on tape We delete ‘deletable’ files (concept of active experiment): –emBASE knows which files can be deleted, which ones have been deleted and how to get them back, if needed –delete files are locally replaced with the a small file containing back up information You can follow the archiving status in emBASE 70 What happens next ? This is a couple of clicks on your side but remember that you still pay the bill !

Galaxy (First Steps) 71 “Powerful data analysis made easy and reproducible ”

Galaxy is a web-based job management platform 72 ToolsHistory (active analysis) Launch Analysis Jobs NGS GB :: Data Analysis :: Galaxy : log in with your EMBL account

Finding your data 73 NGS GB :: Data Analysis :: Galaxy => select your group library

Run jobs 74 NGS GB :: Data Analysis :: Galaxy

Jobs can be assembled into workflows 75 NGS GB :: Data Analysis :: Galaxy

Apply workflows to each demultiplexed data set in one click 76 NGS GB :: Data Analysis :: Galaxy

Each data set analysis is well identified 77 NGS GB :: Data Analysis :: Galaxy

Galaxy Summary 78 1.Galaxy is a job management / analysis platform Run standard analysis (trimming, QC, mapping, peak calling,…) Assemble workflows and perform parallel processing 2.Jobs are sent to the new LSF EMBL cluster We implement cluster good practices (copy to local /tmp, …) Tools are available under BCR/SEPP 3.Continuous update/addition of tools & indices 4.Open source and very active project NGS GB :: Data Analysis

Galaxy Summary 79 1.Galaxy is a job management / analysis platform Run standard analysis (trimming, QC, mapping, peak calling,…) Assemble workflows and perform parallel processing 2.Jobs are sent to the new LSF EMBL cluster We implement cluster good practices (copy to local /tmp, …) Tools are available under BCR/SEPP 3.Continuous update/addition of tools & indices 4.Open source and very active project NGS GB :: Data Analysis

Galaxy Summary 80 1.Galaxy is a job management / analysis platform Run standard analysis (trimming, QC, mapping, peak calling,…) Assemble workflows and perform parallel processing 2.Jobs are sent to the new LSF EMBL cluster We implement cluster good practices (copy to local /tmp, …) Tools are available under BCR/SEPP 3.Continuous update/addition of tools & indices 4.Galaxy uses the data from your NGS Data library directly 5.Easy transfer of results from Galaxy to your own disks NGS GB :: Data Analysis

Conclusion 81 There are absolutely no drawbacks in using our system, only benefits ! NGS GB :: Data Analysis

82 Joscha Sauer Shu-yi Su Laura O’Donovan Matthias Monfort Alumni Aziz Moussa M. Chaturvedi L-A Schmitt Nicolas Delhomme Leila Tlili Arnaud Huaulme GeneCore Jonathon Blake Juergen Zimmermann Markus Fritz Vladimir Benes Eileen Furlong IT Services Michael Wahlers Andres Lindau All GB members Chenchen Zhu Simon Anders Tobias Rausch Frank Thommen (CBB) Thank you