Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Slides:



Advertisements
Similar presentations
Before we start Login to the laptop: user: crgcomu Password: crgcomu Login to the network: Wifi: carretwifi Password : Login to galaxy (ldap):
Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Introduction to BioConductor Friday 23th nov 2007 Ståle Nygård Statistical methods and bioinformatics for the analysis of microarray.
RCAC Research Computing Presents: DiaGird Overview Tuesday, September 24, 2013.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Introduction to bioknoppix: Linux for the life sciences Carlos M Rodríguez Rivera Humberto Ortiz Zuazaga.
UK -Tomato Chromosome Four Sarah Butcher Bioinformatics Support Service Centre For Bioinformatics Imperial College London
Aleksi Kallio CSC – IT Center for Science Chipster and collaboration with other bioinformatics platforms.
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011.
Before we start: Align sequence reads to the reference genome
Building Data-intensive Pipelines Ravi K Madduri Argonne National Lab University of Chicago.
IPlant Collaborative Powering a New Plant Biology iPlant Collaborative Powering a New Plant Biology.
E-BIOGENOUEST: A REGIONAL LIFE SCIENCES INITIATIVE FOR DATA INTEGRATION Datacite Annual Conference Nancy Olivier Collin – IRISA/INRIA
Customized cloud platform for computing on your terms !
Bioinformatics Core Facility Ernesto Lowy February 2012.
COMPUTER SOFTWARE Section 2 “System Software: Computer System Management ” CHAPTER 4 Lecture-6/ T. Nouf Almujally 1.
Detecting enriched regions (Chip- seq, RIP-seq) Statistical evaluation of enriched regions Data displayed in Genome Browser Detection of enriched motifs.
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
CRG Bioinformatics Core: Jan - June Consulting researchers on bioinformatics tools, data analysis, data management, and interpretation of experimental.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
RNA-Seq in Galaxy Igor Makunin QAAFI, Internal Workshop, April 17, 2015.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Copyright OpenHelix. No use or reproduction without express written consent1.
-- Don Preuss NCBI/NLM/NIH
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Bioinformatics Core Facility Guglielmo Roma January 2011.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Developed at the Broad Institute of MIT and Harvard Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, and Mesirov JP. GenePattern 2.0. Nature Genetics 38.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
Repository for Targeted Proteomics Assays Josh Eckels Skyline Users Group - June 9, 2013.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.
A collaborative tool for sequence annotation. Contact:
An approach to carry out research and teaching in Bioinformatics in remote areas Alok Bhattacharya Centre for Computational Biology & Bioinformatics JAWAHARLAL.
XML-Based Grid Data System for Bioinformatics Development Noppadon Khiripet, Ph.D Wasinee Rungsarityotin, MS Chularat Tanprasert, Ph.D Royol Chitradon.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Biomedical and Bioscience Gateway to National Cyberinfrastructure John McGee Renaissance Computing Institute
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015.
Galaxy Community Conference July 27, 2012 The National Center for Genome Analysis Support and Galaxy William K. Barnett, Ph.D. (Director) Richard LeDuc,
2nd Texas A&M Big Data Workshop Development of “Big Data” Scientific Workflow Management Tools for the Materials Genome Initiative: “Materials Galaxy”
Lars Ailo Bongo NBS meeting Tromsø, Jan 23, 2016 NeLS Norwegian e-Infrastructure for Life Sciences Overview and recent developments
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
Using Galaxy to build and run data processing pipelines Jelle Scholtalbers / Charles Girardot GBCS Genome Biology Computational Support.
Canadian Bioinformatics Workshops
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
Practice:submit the ChIP_Streamline.pbs 1.Replace with your 2.Make sure the.fastq files are in your GMS6014 directory.
Centralizing Bioinformatics Services: Analysis Pipelines, Opportunities, and Challenges with Large- scale –Omics, and other BigData High-Performance Computing.
Cancer Genomics Core Lab
Why Create a PGDB? Perform pathway analyses as part of a genome project Analyze omics data Create a central public information resource for the organism,
CyVerse Tools and Services
Tools and Services Workshop
Joslynn Lee – Data Science Educator
CyVerse Discovery Environment
National Center for Genome Analysis Support
USF Health Informatics Institute (HII)
HII Technical Infrastructure
Computational Pipeline Strategies
Presentation transcript:

Presentation of the CRG Bioinformatics Core facility Jean-François Taly

People in the BioCore Jean-Francois Luca Acting head Structur. bioinfo. MSA NGS analyst Galaxy server 2010 NGS analyst Small ncRNA prediction Motif analysis 2009 Wikis Web/DB dev. DB Mirrors Struct. bioinfo Micro-arrays NGS analyst Galaxy Training Sarah

Our mission Expertise in bioinformatics Service Consultation Trainings Internal and external Support in infrastructures In collaboration with the SIT and TIC Part of the CRG bioinformaticians network bioinformatics retreat Many more in PRBB/CNAG

Our services  Analysis  Microarray  Chip-seq  RNA-seq DE and assembly  Genome assembly  Variant calling  Informatics support  Wiki  WEB Server  API  Trainings  Galaxy, Perl, Linux, advanced bioinformatics

Fee per service

Our contribution to projects Project conception Bioinfo exp. design Bioinfo exp. realization Bioinfo output interpretation Project conclusions

Our contribution to projects Project conception Bioinfo exp. design Bioinfo exp. realization Bioinfo output interpretation Project conclusions Apply a defined procedures

Our contribution to projects Project conception Bioinfo exp. design Bioinfo exp. realization Bioinfo output interpretation Project conclusions Customized Analysis

CRG bioinformatics community Big Data WG EGA initiative Data Engineering NoSQL HPC NGS Tech. Sem. RNA-seq G. assembly Variant Annot. Metagenomics Other topics Integrated -omics Good practice in code dev. Galaxy dev. …

source: Creative Commons, Wikipedia Gene expression array data analysis: Background correction and normalization Differential expression analysis Gene Ontology and pathway analysis Various graphics / plots Additional array-based technologies the Bioinformatics unit supports include: qPCR arrays Comparative Genomics Hybridization arrays Main tools are based on the R / Bioconductor environment Micro-arrays

RNA-seq

DNA-seq

Pevzner P A et al. PNAS 2001;98:

Chip-seq

Growing to the next level  From gene DE to transcripts DE  Users have now access to longer reads and deeper coverage  Metagenomics  16S Ribosomal amplicon sequencing with MiSeq  Data integration framework  Combining different data types into one single analysis  RNAseq DE  Histone marks  Metabolomics data  Proteomics  Data analysis workflow on Galaxy  Leave the basic processing to users and focus on advanced analysis

Databases mirroring  Biological file sources  ENSEMBL  UCSC  NCBI Blast DBs  UniProt  PDB  Igenomes (Illumina, only Human but the rest is upcoming)  All Indexed and formated for  NCBI BLAST+ (makeblastdb for proteins and nucleic acids)  Bowtie & Bowtie2  BWA  Fastaindex (Exonerate)  GEM  faTo2bit

Where are they stored?  In CRG common storage:  /db  More information:   IMPORTANT: DEPRECATED  /db/seq (former /seq) IS DEPRECATED

WEB and Database services  Applications  Data and project management  Platforms for big data analysis and complex information querying  Promotion and publication of scientific results

WEB and Database services  Example  Superfly for Yogi Jaëger Superfly  Visual catalogue of gene embryo development of different fly species.

WEB and Database services  Example  PRGDB with Walter Sanseverino PRGDB  Wiki-based Database of plant resistance genes.

Activity per category in 2014

Presentation of the Galaxy platform Jean-François Taly Bioinformatics Core Facility CRG (Barcelona, Catalonia, Spain) September 18th 2014 EMBO Global Exchange Course Pasteur Institute of Tunis, Tunisia

 Biologists :  Linux-free data analysis with a graphical interface  Bioinformaticians :  Insure reproducibility when sharing analysis and workflows  Teach their knowledge to a broad audience  Get access to workflows for topics they are not familiar of  Software Developers :  Diffuse their tools on a standardized platform Why Should I Use Galaxy?

The Galaxy Team Galaxy is developed by : The Nekrutenko lab in the center for Comparative Genomics and Bioinformatics at Penn State UniversityNekrutenko lab The Taylor lab at Johns Hopkins UniversityTaylor lab The community

Rationale behind Galaxy From Goeks et al. Genome Biol Goeks et al. Genome Biol “Computation has become an essential tool in life science research. This is exemplified in genomics, where first microarrays and now massively parallel DNA sequencing have enabled a variety of genome-wide functional assays, such as ChIP-seq and RNA-seq (and many others), that require increasingly complex analysis tools. However, sudden reliance on computation has created an 'informatics crisis' for life science researchers: computational resources can be difficult to use, and ensuring that computational experiments are communicated well and hence reproducible is challenging. Galaxy helps to address this crisis by providing an open, web-based platform for performing accessible, reproducible, and transparent genomic science. “

 Biologists :  Linux-free data analysis with a graphical interface  Bioinformaticians :  Insure reproducibility when sharing analysis and workflows  Teach their knowledge to a broad audience  Get access to workflows for topics they are not familiar of  Software Developers :  Diffuse their tools on a standardized platform Why Should I Use Galaxy?

Makes bioinformatics accessible

From a command line …

… to a graphical interface

One step

Multi-step protocol

Workflow

Galaxy Tutorials  

NGS in a laptop MinION brings NGS to your laptop MinION

 Biologists :  Linux-free data analysis with a graphical interface  Bioinformaticians :  Insure reproducibility when sharing analysis and workflows  Teach their knowledge to a broad audience  Get access to workflows for topics they are not familiar of  Software Developers :  Diffuse their tools on a standardized platform Why Should I Use Galaxy?

Reproducibility Bioinformaticians suffer that too! Results can change in function of Libraries and software versions Genome annotations Results published without the code Want to share your findings with everybody? Froze an environment in a Virtual Machine Use an application controller (Docker) Prepare a Galaxy workflow

Improve the visibility of a paper “A Galaxy workflow and the corresponding wrappers are available to download at A virtual machine containing a pre-set up server can be download at the same address “ Why not having as well?

Galaxy Workflows

 Biologists :  Linux-free data analysis with a graphical interface  Bioinformaticians :  Insure reproducibility when sharing analysis and workflows  Teach their knowledge to a broad audience  Get access to workflows for topics they are not familiar of  Software Developers :  Diffuse their tools on a standardized platform Why Should I Use Galaxy?

Wrapping software Software The wrapper prepare the command line XML file

Simple wrapper example

venn_diagram.sh  Wrapper can launch scripts

TopHat wrapper (1)  XML file describing tophat parameters

TopHat wrapper (2)  XML file describing tophat parameters

Community Tools/Wrappers

Galaxy Public servers  Good points  Free  No IT tasks  Comes with reference genomes and workflows  Bad points  Offer Limited Resources (Disk/CPUs)  Data transfer may be long  Give access to the tools they want  Data security may not be respected Should I install Galaxy?

Galaxy Public Servers 

Galaxy Local Server  Good points  Total control on data and tools  Your own disk and CPU limitation  Some companies sell a ready-to-use infrastructure  Tool shed helps to install wrappers and software  Bad points  Cost of installation and maintenance  Need IT supports if you need a multi-users advanced set up Should I install Galaxy?

Get Galaxy   Can be installed only in Linux or Mac

NFS:/software HPC User /scratch Sequences Indexes Files, Back-up, tmp FTP NFS NFS:/db Galaxy server Tools DATA Software 30 days max. Files > 2Gb

 Database engine  Galaxy team recommend postgreSQL but can it be MySQL  Store users details and data information  Tools = wrappers  File describing all possible parameters of a software  Script preparing the correct command line  Apache server

 Shared file system  NFS (2Pb)  10 €/Tb/Group/Month  Access to the shared biological resources  Ensembl, UCSC Genomes and indexes  Uniprot, pfam, smart, PDB  Access to the shared software repository  High Performance Computing  7 cores  8 CPUS each (56 tot)  47 Gb memory

 FTP server  Proftpd for the server side  I recommend Filezila for the client (multiplatform)  Upload from Galaxy  Files are moved to the shared file system

 Galaxy is an open, web-based platform for computational biomedical research.  Accessible: Users without programming experience can run tools and workflows  Reproducible: Galaxy captures analysis details  Transparent: Users can share and publish analyses  WIKI:  Summary

 Demo on