EMBRACE An example of Grid Integration (I): The EMBRACE project Jean SALZEMANN CNRS/IN2P3.

Slides:



Advertisements
Similar presentations
Pre-SIG meeting " Genome Annotation" A BioSapiens initiative Goal of the workshop were - to create an open forum to discuss current problems on function.
Advertisements

Plateforme de Calcul pour les Sciences du Vivant Embrace WP3 meeting Vincent Breton Chargé de Recherches au CNRS.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
INFSO-RI Enabling Grids for E-sciencE WISDOM mini-workshop Vincent Breton (CNRS-IN2P3, LPC Clermont-Ferrand) ISGC 2007 March 28th,
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Archives and Information Retrieval
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
CSC Grid Activities Arto Teräs HIP Research Seminar February 18th 2005.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Luxembourg, Sep 2001 Pedro Fernandes Inst. Gulbenkian de Ciência, Oeiras, Portugal EMBER A European Multimedia Bioinformatics Educational Resource.
Plateforme de Calcul pour les Sciences du Vivant A Service for Biological Database Replication and Update Jean Salzemann – LPC.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Bioinformatics Grid Application for Life Science. COMMUNICATION NETWORK DEVELOPMENT SPECIFIC SUPPORT ACTION BIOINFOGRID Luciano Milanesi CNR-ITB.
Bioinformatics.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Network Services for Biologists in the Genome Era The Work of the European Bioinformatics Institute.
IST E-infrastructure shared between Europe and Latin America Biomedical Applications in EELA Esther Montes Prado CIEMAT (Spain)
EMBRACE Web Services Taavi Hupponen CSC – Center for Scientific Computing, Finland BOSC 2007.
BIOINFOGRID: Bioinformatics Grid Application for Life Science Giorgio Maggi INFN and Politecnico di Bari

INFSO-RI Enabling Grids for E-sciencE V. Breton, 30/08/05, seminar at SERONO Grid added value to fight malaria Vincent Breton EGEE.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Web Services interoperability and standards. Infrastructure Challenge ● Applied bioinformatics need various computer resources ● The amount and size of.
EMBL-EBI EMBL-EBI EMBL-EBI What is the EBI's particular niche? Provides Core Biomolecular Resources in Europe –Nucleotide; genome, protein sequences,
1 Large-Scale Profile-HMM on the Grid Laurent Falquet Swiss Institute of Bioinformatics CH-1015 Lausanne, Switzerland Borrowed from Heinz Stockinger June.
Harbin Institute of Technology Computer Science and Bioinformatics Wang Yadong Second US-China Computer Science Leadership Summit.
Bioinformatics Core Facility Guglielmo Roma January 2011.
INFSO-RI Enabling Grids for E-sciencE Status of the Biomedical Applications in EELA Project (E-Infrastructures Shared Between Europe.
INFSO-RI Enabling Grids for E-sciencE In silico docking on EGEE infrastructure, the case of WISDOM Nicolas Jacq LPC of Clermont-Ferrand,
Introduction to Bioinformatics (Lecture for CS397-CXZ Algorithms in Bioinformatics) Jan. 21, 2004 ChengXiang Zhai Department of Computer Science University.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo ZAYAS-LAGUNAS Pierre-Alain BRANGER Jérôme VERLEYEN Roberto.
 Our mission Deploying and unifying the NMR e-Infrastructure in System Biology is to make bio-NMR available to the scientific community in.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE08 conference, Istambul Biomed community meeting V. Breton, CNRS.
EB3233 Bioinformatics Introduction to Bioinformatics.
INFSO-RI Enabling Grids for E-sciencE EGEE Review WISDOM demonstration Vincent Bloch, Vincent Breton, Matteo Diarena, Jean Salzemann.
Bioinformatics and Computational Biology
B i o i n f o r m a t i c s / B i o m e d i c a l A p p l i c a t i o n s i n E E L A Mexico, D.F., october 22 – 26, e – s c i e n c e M e x i c.
BIOINFOGRID: Bioinformatics Grid Application for life science MILANESI, Luciano National Research Council Institute of.
Biomedical and Bioscience Gateway to National Cyberinfrastructure John McGee Renaissance Computing Institute
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
INFSO-RI Enabling Grids for E-sciencE EGEE-2 NA4 Biomed Bioinformatics in CNRS Christophe Blanchet Institute of Biology and Chemistry.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE08 conference, Istambul Life sciences cluster perspective on EGI V. Breton, CNRS On.
EGEE is a project funded by the European Union under contract IST Enabling bioinformatics applications to.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
THE BIOVEL PROJECT: ROBUST PHYLOGENETIC WORKFLOWS RUNNING ON THE GRID Bachir Balech (IBBE-CNR)
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Bioinformatics activity Christophe BLANCHET.
Milanesi Luciano Catania, Italy 13/03/2007 Bioinformatics challenges in European projects in Grid. Milanesi Luciano National Research Council Institute.
EGEE is a project funded by the European Union under contract IST Aims and organization of the Biomedical VO Yannick Legré CNRS/IN2P3 NA4/SA1.
Bioinformatics Grid Application for Life Science. COMMUNICATION NETWORK DEVELOPMENT SPECIFIC SUPPORT ACTION BIOINFOGRID Andreas Gisel & Luciano Milanesi.
EMBRACE Workshop Appled Gene Ontology ITB – CNR Bari, Italy 7. – 9. November 2007 Domenica D’Elia, Giulia De Sario, Andreas Gisel, Cecilia Saccone, Angelica.
Project Database Handler The Project Database Handler is a brokering application which will mediate interactions between the project database and other.
BME435 BIOINFORMATICS.
Bob Jones EGEE Technical Director
Joslynn Lee – Data Science Educator
생물정보학 Bioinformatics.
LESSON 1 INTNRODUCTION HYE-JOO KWON, Ph.D /
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

EMBRACE An example of Grid Integration (I): The EMBRACE project Jean SALZEMANN CNRS/IN2P3

Introduction EMBRACE is a EU-sponsored Network of Excellence aimed at enabling bioinformatics research through better operability of databases, servers, and services.

Example You want to predict phosphorylation sites just outside transmembrane helices in 1329 membrane proteins. Yesterday: 1) Obtain software to predict transmembrane helices; 2) Obtain software to predict phosphorylation sites; 3) Install both programs; 4) Write software that calls both programs; 5) Write software that combines outputs and presents results. Tomorrow: 1) Import APIs for the two services; 2) Write software that combines outputs and presents results.

The Goal Of EMBRACE Embrace aims at building a « knowledge grid » allowing integrated exploitation of data: –collection, curation and provision of biomolecular information –Availability of most of the popular databases and software products –tools and programming interfaces to exploit that information –taking away the need for maintaining local copies of databases and software

Data EMBRACE includes nearly all European bioinformaticians with longstanding track-records in terms of providing databases, servers, and services. Data types that they will make available: DNA sequences, protein sequences, macromolecular structures, SNPs, expression information, alignments, untranslated regions, structure domains, protein families, literature, electron micrographs, orthologs, ORFs, genome annotation, proteomics patterns, GPCRs, protein interactions, nucleotid

Software EMBRACE includes nearly all large European bioinformatics centers that all will make their servers, services, and computational tools available using the EMBRACE-GRID. Computational facilities that all European bioinformaticians will get at their finger tips include: DNA sequence analysis, genome annotation, homology searches at sequence and structure level, structure analysis, visualization, protein sequence analysis, phylogeny, protein domain mapping, pattern matching, HMM, neural nets, micro-arrays, workflow management, text-mining, systems biology, database techno

Education The EMBRACE portal ( lists the courses that EMBRACE has presented and will present: July 2005 France Grid technology October 2005 EnglandData modeling and integration February 2006 England Portal tools October 2006 FinlandTools for grid usage February 2007 DenmarkBioinformatics of immunology April 2007 SwedenRegulatory sequence motifs (10 more courses not listed) July 2009 SpainDatabases and gene annotation

The EMBRACE Challenge Applied bioinformatics need various computer resources The amount and size of databases and tools are growing rapidly Systems Biology is predicted to become more important A lot of existing tools and data sources to integrate

Technology Recommendation Use Web Services, especially WS-I profile Use of XML-schema to describe DataTypes Give standard definition to DataTypes Use Standardized Databases Interfaces (make workflows with the EMBRACE services)

Web Services advantages Replace local resources with remote resources Web Services provide a standardized access method Web Services are widely adopted in the BioInformatics community They are evolving constantly with new specifications

The EMBRACE VO on EGEE Infrastructure to deploy cpu-intensive and data-intensive applications Testbed to validate the technology recommendation 400 CPUs and 3 TB of Data Storage

Statistics on EGEE

SiteCity, countryNumber of nodesRelated Embrace partner CSCEspoo, Finland124 National Supercomputer CenterLinköping, Sweden97- Umea UniversityUmea, Sweden94- Center for Parallel Computers (PDC)Stockholm, Sweden82- Centro Extremeño de Tecnologías Avanzadas, CIEMAT Trujillo, Spain40- Life sciences Institut Universitaire de Technologie Aurillac, France387 (CNRS) Institute of Biomedical TechnologyBari, Italy222 (CNR-ITB Bari) Center for Molecular and BIomolecular Informatics Nijmegen, NL17CMBI Centro National de BiotecnologiaMadrid,Spain169 (CNB) Sites of EMBRACE VO

An Example of Application: PDB Database Refinement Recomputation of protein structures in 3 steps, using the WISDOM Environment. Deployment on EGEE (Spring 2007): –673 CPUs used –70000 jobs submitted –17 CPU years –500 GB of data produced

Application example: WISDOM Type of computations: docking with proteins and ligands databases. Web service interface to submit jobs. Users can use the interface to send docking jobs without specific knowledge of the grid, and embed dockings into their workflows

Application example: Automatic update of databases Service that automatically replicate and update biological databases (file databases) Web service interface to deploy new databases or retrieve status of a deployment. Service can be used also in workflow to make an update before experiment Hide the datamanagement for non grid expert users

Contacts EMBRACE is coordinated by Graham Cameron and Kerstin Nyberg at the EBI. Peter Rice coordinates the content integration Alan Bleasby coordinates the tools integration Vincent Breton coordinates technology recommendation Erik Bongcam Rudloff coordinates the test cases Gert Vriend coordinates outreach and education

Acknowledgements The EMBRACE project is funded by the European Commission within its FP6 Programme, under the thematic area "Life sciences, genomics and biotechnology for health,"contract number LHSG- CT