Download presentation
Presentation is loading. Please wait.
Published byCalvin Chandler Modified over 9 years ago
1
Presentation of the CRG Bioinformatics Core facility Jean-François Taly
2
People in the BioCore Jean-Francois Luca Toni @CRG 2009 @BioCore 2012 Acting head Structur. bioinfo. MSA NGS analyst Galaxy server Training @BioCore 2010 NGS analyst Small ncRNA prediction Motif analysis Training @Biocore 2009 Wikis Web/DB dev. DB Mirrors Struct. bioinfo. Training @Biocore 2014 Micro-arrays NGS analyst Galaxy Training Sarah
3
Our mission Expertise in bioinformatics Service Consultation Trainings Internal and external Support in infrastructures In collaboration with the SIT and TIC Part of the CRG bioinformaticians network 83 @ bioinformatics retreat Many more in PRBB/CNAG
4
Our services Analysis Microarray Chip-seq RNA-seq DE and assembly Genome assembly Variant calling Informatics support Wiki WEB Server API Trainings Galaxy, Perl, Linux, advanced bioinformatics
5
Fee per service
6
Our contribution to projects Project conception Bioinfo exp. design Bioinfo exp. realization Bioinfo output interpretation Project conclusions
7
Our contribution to projects Project conception Bioinfo exp. design Bioinfo exp. realization Bioinfo output interpretation Project conclusions Apply a defined procedures
8
Our contribution to projects Project conception Bioinfo exp. design Bioinfo exp. realization Bioinfo output interpretation Project conclusions Customized Analysis
9
CRG bioinformatics community Big Data WG EGA initiative Data Engineering NoSQL HPC NGS Tech. Sem. RNA-seq G. assembly Variant Annot. Metagenomics Other topics Integrated -omics Good practice in code dev. Galaxy dev. …
10
source: Creative Commons, Wikipedia Gene expression array data analysis: Background correction and normalization Differential expression analysis Gene Ontology and pathway analysis Various graphics / plots Additional array-based technologies the Bioinformatics unit supports include: qPCR arrays Comparative Genomics Hybridization arrays Main tools are based on the R / Bioconductor environment Micro-arrays
11
RNA-seq
13
DNA-seq
14
Pevzner P A et al. PNAS 2001;98:9748-9753
15
Chip-seq
17
Growing to the next level From gene DE to transcripts DE Users have now access to longer reads and deeper coverage Metagenomics 16S Ribosomal amplicon sequencing with MiSeq Data integration framework Combining different data types into one single analysis RNAseq DE Histone marks Metabolomics data Proteomics Data analysis workflow on Galaxy Leave the basic processing to users and focus on advanced analysis
18
Databases mirroring Biological file sources ENSEMBL UCSC NCBI Blast DBs UniProt PDB Igenomes (Illumina, only Human but the rest is upcoming) All Indexed and formated for NCBI BLAST+ (makeblastdb for proteins and nucleic acids) Bowtie & Bowtie2 BWA Fastaindex (Exonerate) GEM faTo2bit
19
Where are they stored? In CRG common storage: /db More information: http://biocore.crg.cat/wiki/Category:Mirrors http://biocore.crg.cat/wiki/Category:Mirrors IMPORTANT: DEPRECATED /db/seq (former /seq) IS DEPRECATED
20
WEB and Database services Applications Data and project management Platforms for big data analysis and complex information querying Promotion and publication of scientific results
21
WEB and Database services Example Superfly for Yogi Jaëger Superfly Visual catalogue of gene embryo development of different fly species.
22
WEB and Database services Example PRGDB with Walter Sanseverino PRGDB Wiki-based Database of plant resistance genes.
23
Activity per category in 2014
24
Presentation of the Galaxy platform Jean-François Taly Bioinformatics Core Facility CRG (Barcelona, Catalonia, Spain) September 18th 2014 EMBO Global Exchange Course Pasteur Institute of Tunis, Tunisia
25
Biologists : Linux-free data analysis with a graphical interface Bioinformaticians : Insure reproducibility when sharing analysis and workflows Teach their knowledge to a broad audience Get access to workflows for topics they are not familiar of Software Developers : Diffuse their tools on a standardized platform Why Should I Use Galaxy?
26
The Galaxy Team Galaxy is developed by : The Nekrutenko lab in the center for Comparative Genomics and Bioinformatics at Penn State UniversityNekrutenko lab The Taylor lab at Johns Hopkins UniversityTaylor lab The community https://wiki.galaxyproject.org/GalaxyTeam
27
Rationale behind Galaxy From Goeks et al. Genome Biol. 2010.Goeks et al. Genome Biol. 2010 “Computation has become an essential tool in life science research. This is exemplified in genomics, where first microarrays and now massively parallel DNA sequencing have enabled a variety of genome-wide functional assays, such as ChIP-seq and RNA-seq (and many others), that require increasingly complex analysis tools. However, sudden reliance on computation has created an 'informatics crisis' for life science researchers: computational resources can be difficult to use, and ensuring that computational experiments are communicated well and hence reproducible is challenging. Galaxy helps to address this crisis by providing an open, web-based platform for performing accessible, reproducible, and transparent genomic science. “
28
Biologists : Linux-free data analysis with a graphical interface Bioinformaticians : Insure reproducibility when sharing analysis and workflows Teach their knowledge to a broad audience Get access to workflows for topics they are not familiar of Software Developers : Diffuse their tools on a standardized platform Why Should I Use Galaxy?
29
Makes bioinformatics accessible
30
From a command line …
31
… to a graphical interface
32
One step
33
Multi-step protocol 1 2 3 4 5
34
Workflow
35
Galaxy Tutorials https://usegalaxy.org/u/jeremy/p/galaxy-rna-seq-analysis-exercise https://usegalaxy.org/u/jeremy/p/galaxy-rna-seq-analysis-exercise https://wiki.galaxyproject.org/Learn https://wiki.galaxyproject.org/Learn
36
NGS in a laptop MinION brings NGS to your laptop MinION http://youtu.be/UtXlr19xTh8
37
Biologists : Linux-free data analysis with a graphical interface Bioinformaticians : Insure reproducibility when sharing analysis and workflows Teach their knowledge to a broad audience Get access to workflows for topics they are not familiar of Software Developers : Diffuse their tools on a standardized platform Why Should I Use Galaxy?
38
Reproducibility Bioinformaticians suffer that too! Results can change in function of Libraries and software versions Genome annotations Results published without the code Want to share your findings with everybody? Froze an environment in a Virtual Machine Use an application controller (Docker) Prepare a Galaxy workflow
39
Improve the visibility of a paper “A Galaxy workflow and the corresponding wrappers are available to download at https://mylab.com. A virtual machine containing a pre-set up server can be download at the same address “https://mylab.com Why not having as well?
40
Galaxy Workflows
41
Biologists : Linux-free data analysis with a graphical interface Bioinformaticians : Insure reproducibility when sharing analysis and workflows Teach their knowledge to a broad audience Get access to workflows for topics they are not familiar of Software Developers : Diffuse their tools on a standardized platform Why Should I Use Galaxy?
42
Wrapping software Software The wrapper prepare the command line XML file
43
Simple wrapper example
44
venn_diagram.sh Wrapper can launch scripts
45
TopHat wrapper (1) XML file describing tophat parameters
46
TopHat wrapper (2) XML file describing tophat parameters
47
Community Tools/Wrappers
48
Galaxy Public servers Good points Free No IT tasks Comes with reference genomes and workflows Bad points Offer Limited Resources (Disk/CPUs) Data transfer may be long Give access to the tools they want Data security may not be respected Should I install Galaxy?
49
Galaxy Public Servers https://wiki.galaxyproject.org/PublicGalaxyServers https://wiki.galaxyproject.org/PublicGalaxyServers
50
Galaxy Local Server Good points Total control on data and tools Your own disk and CPU limitation Some companies sell a ready-to-use infrastructure Tool shed helps to install wrappers and software Bad points Cost of installation and maintenance Need IT supports if you need a multi-users advanced set up Should I install Galaxy?
51
Get Galaxy https://wiki.galaxyproject.org/Admin/GetGalaxy https://wiki.galaxyproject.org/Admin/GetGalaxy Can be installed only in Linux or Mac
52
NFS:/software HPC User /scratch Sequences Indexes Files, Back-up, tmp FTP NFS NFS:/db Galaxy server Tools DATA Software 30 days max. Files > 2Gb
53
Database engine Galaxy team recommend postgreSQL but can it be MySQL Store users details and data information Tools = wrappers File describing all possible parameters of a software Script preparing the correct command line Apache server
55
Shared file system NFS (2Pb) 10 €/Tb/Group/Month Access to the shared biological resources Ensembl, UCSC Genomes and indexes Uniprot, pfam, smart, PDB Access to the shared software repository High Performance Computing 7 cores 8 CPUS each (56 tot) 47 Gb memory
57
FTP server Proftpd for the server side I recommend Filezila for the client (multiplatform) Upload from Galaxy Files are moved to the shared file system
58
Galaxy is an open, web-based platform for computational biomedical research. Accessible: Users without programming experience can run tools and workflows Reproducible: Galaxy captures analysis details Transparent: Users can share and publish analyses WIKI: https://wiki.galaxyproject.org/FrontPage https://wiki.galaxyproject.org/FrontPage Summary
59
http://galaxy.crg.es/ http://galaxy.crg.es/ Demo on Galaxy@CRG
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.