Download presentation
Presentation is loading. Please wait.
Published byArron Jackson Modified over 8 years ago
1
0 Managing your NGS Data with emBASE Sample Annotation NGS Assays Data sets grouping in experiments and projects Programmatic access Adding, Deleting and Archiving files
2
GBCS Services Overview 1 GeneCore Online Ordering Data GC Bridge Annotate data Manage data sets (Analyze arrays) Export to EBI IT LSF Cluster jobs run on cluster NGS Analysis Build/Store Workflows R studio Server GB Servers File servers SEPP libraries
3
NGS data Library NGS Data @ GB :: Data Management :: emBASE 2 1.emBASE is a database, with a web front-end, storing all metadata about your data files (e.g. fastq) 2.Your data files remains on your group fileserver in your “NGS data library” and are accessible directly Data File Server web app / MySQL
4
NGS data Library NGS Data @ GB :: Data Management :: emBASE 3 NGS data library root folder (can be anywhere your like) Sub-folders containing the fastq files are organized by “Sequencer Run” Everything in your data library is managed by emBASE and is read-only to avoid data deletion, renaming, move. 1.emBASE is a database, with a web front-end, storing all metadata about your data files (e.g. fastq) 2.Your data files remains on your group fileserver in your “NGS data library” and are accessible directly
5
4 Part I : Fundamental concepts How is the data stored or represented in emBASE ?
6
A detailed view of a NGS experiment NGS Data @ GB 5 Sample (eg embryos, cells) Extract (eg DNA, mRNA) Library Protocols growth, treatment, extraction, amplification, sequencing, … Annotations Sequencing File FASTQ, BAM The ready-to-sequence library is in fact obtained after several steps, following precise protocols
7
The complete/real view of a NGS experiment NGS Data @ GB 6 Sample4Extract4Library4 Protocols Annotations Sequencing File Sample1Extract1Library1 Sample2Extract2Library2 Sample3Extract3Library3 ……… Barcode Info File … Exp X / Project PExp Y / Project Q Samples are commonly multiplexed and projects mixed in lanes
8
Publication of your NGS experiment requires all this information NGS Data @ GB 7 Sample4 (eg embryos, cells) Extract4 (eg DNA, mRNA) Library4 Protocols Annotations Specific format eg MAGE-TAB Sequencing File FASTQ, BAM Publish e.g. EBI Sample1Extract1Library1 Sample2Extract2Library2 Sample3Extract3Library3 ……… Barcode Info File … emBASE models all these different “items” Data management, annotations and publication is the reason of emBASE
9
emBASE “objects” NGS Data @ GB 8 Sample2 (eg embryos, cells) Extract2 (eg DNA, mRNA) Library2 Protocols Annotations Sequencing File FASTQ, BAM Sample1Extract1Library1 Barcode Info File Sample2 (eg embryos, cells) Extract2 (eg DNA, mRNA) Library2NGS Assay Protocols Sample Annotations Sample1Extract1Library1 + File (BAM, FASTQ) SeqLane File(s) RawBioAssay1 RawBioAssay2 + File (BAM, FASTQ) Workflow emBASE
10
emBASE “objects” (2) NGS Data @ GB 9 Library2NGS Assay Library1 + File (BAM, FASTQ) SeqLane File(s) RawBioAssay1 RawBioAssay2 + File (BAM, FASTQ) RawBioAssayN + File (BAM, FASTQ) LibraryN ExperimentA [RNA-seq] ExperimentB [ChIP-seq] Project X Experiments should contain raw data set of the same type eg RNA-seq => experiment are exported as MAGE-TAB document for submission Projects group related experiments together
11
GBCS web site : primary info. source 10 All about emBASE All online tutorials and documents
12
All Tutorials and more 11 ~ Whole today tutorial is available in here
13
First steps in emBASE 12 Let’s see this for real In this section 1.Login 2.Menus : basic / expert mode 3.Change your defaults, reset pwd 4.Adapt GUI displays
14
The different emBASE Sections NGS Data @ GB 13 => Go to http://gbcs.embl.de/base and log inhttp://gbcs.embl.de/base Sample, Extract, NGS Library Protocols Annotations Raw data sets Experiment linking Experiments, Projects Experiment export (EBI submission) Archiving NGS Assays (Lane) Account and Default Settings
15
Your account settings 14 DefaultsReset your pwd
16
Tune your display 15 Customize display Adapt row number in tables
17
Managing your “Biomaterial” 16 Let’s see this for real In this section 1.Using the search interface to narrow down interesting samples 2.Customizing list pages (item number, columns) 3.Change samples property in batch 4.Sample Annotation : individually in batch online using a file 5.Add protocols
18
Go to sample list page 17 Open Biomaterials and click “Samples”
19
Narrow down sample search 18 Filter on owner notice the use of wildcard search Tools : Select samples and “Delete”, “Annotate” or “Merge” them Batch edit properties of selected samples
20
Customize table display 19 Filter on sample name additional filters are combined with ‘AND’ Click “Customize table view” Tune your display by hiding columns Use GUI Settings > Profile to control the displayed table row number Particularly useful to be able to select lost of samples When you come back to this page, notice how emBASE remembers your “filters”
21
Change sample rights in batch 20 Select all samples you want to modify notice the [A N] controls in header to select “All” or “None” selection only applies to displayed samples Change Group Access to RW (read-write) Click “Ok” button
22
21 Modification refused : this ‘test’ user does not have enough privileges to change samples owned by Pierre => emBASE has a ‘linux-like’ right management
23
22 In the context of the training I now log in as a privileged user to be the owner of the data we are playing with (e.g. you and your data)
24
Batch modification now works 23 Select a Growth protocol and click “Ok”
25
Batch modification now works 24
26
25 Setting Protocols and properties must be done on : Sample Extracts NGS Libraries We won’t demonstrate this ; it is exactly like for samples
27
The Sample Annotation View 26 Switch to Annotation View notice the lack of annotations we’ll see next how to annotate all samples at once No Annotations
28
Let’s annotate samples 27 Select all samples you want to annotate Click “Annotate”
29
Batch Sample Annotation Interface 28 Add a annotation type => a new column is added to the table Select “SampleType”
30
Batch Sample Annotation Interface 29 Select ‘frozen_sample’ in the first cell
31
Batch Sample Annotation Interface 30 Notice the green message => there is NO save button, database is changed on- the-fly
32
Batch Sample Annotation Interface 31
33
Batch Sample Annotation Interface 32 dragging down the corner will copy the value over (like excel) only drag down over 3-4 rows double click the bottom right corner of the last cell with a value this will copy value in the remaining empty cells
34
Batch Sample Annotation Interface 33 you can now add other annotation types
35
Excel-like Annotation Table 34 Select a cell and drag the bottom-right corner to copy value over bottom cells or Select a cell and double click on the bottom-right corner to copy value in the whole column (only if all below cells are empty)
36
35 The next slides show how to batch annotate samples using a file you created in excel
37
COPY SLIDES IN 36
38
Managing your “NGS Assay” and “RBA” emBASE::Storing Sequencing Assay 37 Let’s see this for real 1.Understanding NGS Assay content multiplexed libraries QC flag 2.RawBioAssay (aka RBA) data file relationship QC flag 3.Understanding how files are stored in your NGS Library 4.File Locking/Unlocking concept 5.Lane and RBA File deletion philosophy 6.Getting and Adding RBA files from the command line
39
Go to NGS Assay List Page emBASE::Storing Sequencing Assay 38 List all NGS Assays (== Lane) The 20 RNA-seq samples comes from 2 lanes => Click lane 7
40
NGS Assay: Example of a multiplexed lane 39 Lane File & Location Individual raw data sets & De-multiplexed Files Sequencing run info (notice link to the “run”) Assay (=Lane) info & rights Let’s zoom in raw data sets
41
Raw data sets section 40 Demultiplexed files must be added in emBASE needed for data submission needed to trash lane files and save space !
42
Raw data sets section 41 Click one data set (we call these Raw BioAssay)
43
Set Quality of Raw data set 42
44
NGS Assay storage on your file server 43 Lane directory : one per (existing) lane ; read-onlyRun directory : one per flowcell ; read-only
45
NGS Assay storage on your file server 44 Library dir (named after immutable internal emBASE id), read-only
46
NGS Assay storage on your file server 45 Data file dir, per file type read-write until “locked”; then read-only no files are in the directory [we’ll come back on this locking concept later]
47
NGS data Library NGS Data Library extended to better support demultiplexed files 46 emBASE::Storing Sequencing Assay::File Organization
48
Adding demultiplexed files 47 Time to stop clicking ! log on spinoza [as galaxy] cd /g/furlong/project/21_dvir/fastq/RNA-seq … one can of course manually copy files in these directories (fastq or bam dir) or use our command line utilities
49
SCREENSHOOT OF LOCKING AND LANE FILE DELETION 48
50
Locking / Unlocking concept 1.Library file sub-directories are unlocked (writable for group) –you can work and replace files as you wish 2.At some point, files are ready and directories can be locked (only readable): 1.emBASE starts, at this point, to track these files 2.emBASE will allow lane file deletion when all its multiplexed libraries are locked. 3.Locking is operated via the web interface, on the whole lane or per library (case of shared lanes) 49 emBASE::Storing Sequencing Assay::File Organization
51
Organizing your RBA (files) into Experiments and Projects emBASE::Working With Data File Sets 50 Let’s see this for real 1.Experiments 2.Adding/Removing RawBioAssay to/from Experiments 3.Sync’ing with Galaxy 4.Exporting an experiment to MAGE-TAB (for submission) do not demonstrate, too long 5.Grouping Experiment in Project 6.Archiving Experiments/Project to tape what happens ? (price, replacement file, duration) how is the archiving info stored and accessible ?
52
Grouping data sets into Experiments emBASE::Working With Data File Sets::Experiments 51 An experiment has a single ‘type’ e.g. ChIP-seq, RNA-seq
53
Grouping data sets into Experiments emBASE::Working With Data File Sets::Experiments 52 Search raw data sets and add/remove them from exp.
54
Other with Experiments Galaxy Sync MAGE-TAB Export 53 emBASE::Working With Data File Sets::Experiments
55
Regrouping Experiments in Projects Show Project page 54 emBASE::Working With Data File Sets::Projects
56
Archiving of emBASE Data Goal : save space by moving data offline when projects are finished 55 Fill in optionsemBASE admin is warned emBASE::Working With Data File Sets::Archiving
57
Archiving of emBASE Data 56 emBASE::Working With Data File Sets::Archiving Please see online tutorial at http://gbcs.embl.de/portal/tiki-index.php?page=archivingTutorial
58
Archiving of emBASE Data All data files connected to the experiments are exported IT performs back up on tape We delete ‘deletable’ files (concept of active experiment): –emBASE knows which files can be deleted, which ones have been deleted and how to get them back, if needed –delete files are locally replaced with the a small file containing back up information You can follow the archiving status in emBASE 57 What happens next ? This is a couple of clicks on your side but remember that you still pay the bill ! emBASE::Working With Data File Sets::Archiving
59
58 emBASE::Working With Data File Sets::JemBASEAPI More for the command line user
60
Working with this new structure 1.Use the command line emBASE API to learn where files are or should be placed –These commands extracts all info from emBASE for a lane, an experiment or a project 2.Use the command line emBASE API to add RBA files (Fastq, BAM) in emBASE Storage 59 Documentation at : http://gbcs.embl.de/tikiwiki/tiki-index.php?page=BASEJavaCmdLineUtilities emBASE::Storing Sequencing Assay::API
61
emBASE API : Get* utilities 60 Assume you want to discover all libraries and associated files in a given lane … emBASE::Storing Sequencing Assay::API
62
A Get* Example 61 Available from anywhere Logged in user used to authenticate in emBASE Rights apply the same way as in emBASE emBASE::Storing Sequencing Assay::API
63
A Get* Example 62 Example : Create symlinks on the fly to the NGS data lib for all libs of a new lane emBASE::Storing Sequencing Assay::API
64
Loading your data with GCBridge GCBridge::Batch Data Upload 63 1.The 3 common situations : a)single sample lane b)internally multiplexed lane c)demultiplexed lanes using Illumina index 2.Re-using existing samples 3.Handling Lane Mate in situations b) and c) 4.Coping with multiple projects lane i.e. situation c) only 5.Data demultiplexing i.e. situation c) only 6.Handling mistakes Step-by-step tutorial at http://gbcs.embl.dehttp://gbcs.embl.de (then GC Bridge Menu)
65
Thank you 64 Joscha Sauer Shu-yi Su Aziz Moussa M Chaturvedi Alumni L-A Schmitt Nicolas Delhomme Leila Tlili Arnaud Huaulme GeneCore Jonathon Blake Juergen Zimmermann Markus Fritz Vladimir Benes Eileen Furlong and Lars Steinmetz IT Services Michael Wahlers Andres Lindau All GB members Julien Gagneur (now LMU) Chenchen Zhu Lin Gen Simon Anders Tobias Rausch
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.