FP6−2004−Infrastructures−6-SSA-026634 Biological applications in GRID: the EUChinaGRID experience F. Polticelli – University Roma Tre EUChinaGRID WP4-Applications.

Slides:



Advertisements
Similar presentations
FP6−2004−Infrastructures−6-SSA [ Empowering e Science across the Mediterranean ] Grids and their role towards development F. Ruggieri – INFN (EUMEDGRID.
Advertisements

Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
1 Application Specific Module for P-GRADE Portal 2.7 Application Specific Module overview Akos Balasko MTA-SZTAKI LPDS
Automatic assignment of NMR spectral data from protein sequences using NeuroBayes Slavomira Stefkova, Michal Kreps and Rudolf A Roemer Department of Physics,
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
INFSO-RI Enabling Grids for E-sciencE Gilda experiences and tools in porting application Giuseppe La Rocca INFN – Catania ICTP/INFM-Democritos.
FP6−2004−Infrastructures−6-SSA EUChinaGRID Project Giuseppe Andronico Technical Manager EUChinaGRID Project INFN Sez.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
Cloud Usage Overview The IBM SmartCloud Enterprise infrastructure provides an API and a GUI to the users. This is being used by the CloudBroker Platform.
IST E-infrastructure shared between Europe and Latin America Biomedical Applications in EELA Esther Montes Prado CIEMAT (Spain)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks “High throughput” protein structure prediction.
A DΙgital Library Infrastructure on Grid EΝabled Technology ETICS Usage in DILIGENT Pedro Andrade
BIOINFOGRID: Bioinformatics Grid Application for Life Science Giorgio Maggi INFN and Politecnico di Bari
INFSO-RI Enabling Grids for E-sciencE V. Breton, 30/08/05, seminar at SERONO Grid added value to fight malaria Vincent Breton EGEE.
EGEE is a project funded by the European Union under contract IST Advances in the Grid enabled molecular simulator (GEMS) EGEE 06 Conference.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
FP6−2004−Infrastructures−6-SSA IPv6 in the EGEE Related Projects: the EUChinaGRID experience Gabriella Paolini – GARR.
Contact person: Prof. M. Niezgódka Prof. Piotr Bała ICM Interdisciplinary Centre for Mathematical and Computational Modelling Warsaw University,
E-science grid facility for Europe and Latin America Using Secure Storage Service inside the EELA-2 Infrastructure Diego Scardaci INFN (Italy)
FP6−2004−Infrastructures−6-SSA EUChinaGrid Infrastructure Giuseppe Andronico - INFN Catania Concertation Meeting – Budapest,
Information Technology in the Natural Sciences Biology – Chemistry – Physics.
FP6−2004−Infrastructures−6-SSA Interconnection & Interoperability of Grids between Europe and China the EUChinaGRID Project F. Ruggieri – INFN Project.
Enabling Grids for E-sciencE Astronomical data processing workflows on a service-oriented Grid architecture Valeria Manna INAF - SI The.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Web Portal for Chemists M. Sterzel,
A GRID solution for Gravitational Waves Signal Analysis from Coalescing Binaries: preliminary algorithms and tests F. Acernese 1,2, F. Barone 2,3, R. De.
B i o i n f o r m a t i c s / B i o m e d i c a l A p p l i c a t i o n s i n E E L A Mexico, D.F., october 22 – 26, e – s c i e n c e M e x i c.
FP6−2004−Infrastructures−6-SSA EUChinaGrid status report Giuseppe Andronico INFN Sez. Di Catania CERN – March 3° 2006.
Structural Biology on the GRID Dr. Tsjerk A. Wassenaar Biomolecular NMR - Utrecht University (NL)
EGEE-0 / LCG-2 middleware Practical.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
1 P-GRADE Portal tutorial at EGEE’09 Introduction to hands-on Gergely Sipos MTA SZTAKI EGEE.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
Università di Perugia Enabling Grids for E-sciencE Status of and requirements for Computational Chemistry NA4 – SA1 Meeting – 6 th April.
1 P-GRADE Portal hands-on Gergely Sipos MTA SZTAKI Hungarian Academy of Sciences.
INFSO-RI SA2 ETICS2 first Review Valerio Venturi INFN Bruxelles, 3 April 2009 Infrastructure Support.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
FP6−2004−Infrastructures−6-SSA Porting Biological Applications in Grid: An Experience within the EUChinaGrid Framework G. La Rocca (1), G. Minervini.
EUChinaGRID project Federico Ruggieri INFN – Sezione di Roma3 EGEE04 - External Projects Integration Session Pisa 25 October 2005.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Bioinformatics activity Christophe BLANCHET.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
First South Africa Grid Training June 2008, Catania (Italy) GILDA t-Infrastructure Valeria Ardizzone INFN Catania.
Consorzio COMETA - Progetto PI2S2 UNIONE EUROPEA Grid2Win : gLite for Microsoft Windows Elisa Ingrà - INFN.
The status of IHEP Beijing Site WLCG Asia-Pacific Workshop Yaodong CHENG IHEP, China 01 December 2006.
Enabling Grids for E-sciencE University of Perugia Computational Chemistry status report EGAAP Meeting – 21 rst April 2005 Athens, Greece.
Bob Jones EGEE Technical Director
Practical using C++ WMProxy API advanced job submission
Grid2Win: Porting of gLite middleware to Windows platform
Grid2Win Porting of gLite middleware to Windows XP platform
(Prague, March 2009) Andrey Y Shevel
Tamas Kiss University Of Westminster
GILDA t-Infrastructure
Work Package 2 TECHNOLOGICAL HARMONISATION Reporting period
Grid2Win: Porting of gLite middleware to Windows XP platform
Grid2Win: Porting of gLite middleware to Windows XP platform
Recap: introduction to e-science
Attività grid di Biomedicina in Italia e in Europa
EUChinaGRID Applications
Short update on the latest gLite status
APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY
CompChem VO: User experience using MPI
Interoperability & Standards
Grid2Win: Porting of gLite middleware to Windows XP platform
GSAF Grid Storage Access Framework
Bioinformatics for plant biosecurity and surveillance systems
Molecular Modeling By Rashmi Shrivastava Lecturer
The GENIUS portal and the GILDA t-Infrastructure
gLite Job Management Christos Theodosiou
EUChinaGRID Federico Ruggieri INFN Roma3
Presentation transcript:

FP6−2004−Infrastructures−6-SSA Biological applications in GRID: the EUChinaGRID experience F. Polticelli – University Roma Tre EUChinaGRID WP4-Applications Manager Budapest,

Fabio Polticelli UROM3  EGEE07  Budapest  2/tot  Outline  EUChinaGrid Overview  The structural genomics challenge  Biological Applications in EUChinaGRID The “never born proteins” Protein structure prediction using GRID – Rosetta integration within the Genius portal – Early/Late stage integration in the Gridsphere portal Structure validation using GRID – AMBER deployment on GRID  Conclusions and perspectives function recognition and catalytic site identification tools In silico structural genomics

Fabio Polticelli UROM3  EGEE07  Budapest  3/tot  EUChinaGRID Overview  Aim provide support actions to foster the integration and interoperability of the Grid infrastructures in Europe (EGEE) and China (CNGrid). promote the migration of new applications on the Grid infrastructures by training new user communities and supporting the adoption of grid tools for scientific applications.  Applications validate the intercontinental infrastructure using scientific applications facilitate porting of new applications relevant for scientific and industrial collaboration between Europe and China. three main application fields: – EGEE Applications (ATLAS and CMS) – Astroparticle Physics applications (the ARGO experiment) – Biology applications (“Never born Proteins)

Fabio Polticelli UROM3  EGEE07  Budapest  4/tot  The structural genomics challenge  The combination of the 20 natural amino acids in a specific sequence in a protein chain dictates the three-dimensional structure of the protein  Protein function is linked to the specific three-dimensional arrangement of amino acids functional groups.  With the advancement of molecular biology techniques a huge amount of information on protein sequences has been made available but far less information is available on structure and function of these proteins.  Prediction of protein structure and function is a key instrument to better understand the protein folding principles and successfully exploit the information provided by the “genomic revolution”.

Fabio Polticelli UROM3  EGEE07  Budapest  5/tot  The test case: Never born proteins  With 20 different comonomers, a protein chain of just 60 amino acids can theoretically exist in chemically and structurally unique combinations  But the number of natural proteins (10 9 to a maximum of ) is just a tiny fraction of all possible proteins  There exist a huge number of protein sequences that have never been exploited by biological systems, in other words and enormous number of “never born proteins” (NBP). These pose the following questions: – Which are the criteria with which the existing proteins have been selected? – Natural proteins have peculiar properties in terms for example of thermal stability, solubility in water or amino acid composition? – Can NBP be exploited for biomedical and/or biotechnological purposes?

Fabio Polticelli UROM3  EGEE07  Budapest  6/tot  Never born proteins and GRID  The problem is tackled by a “high throughput” approach made feasible by the use of the GRID infrastructure  A huge library of random amino acid sequences of fixed length is generated (n=70)  “ab initio” protein structure prediction software is used.  Analysis of the structural characteristics of the resulting proteins frequency of compact and yet unknown folds presence of putative catalytic sites experimental validation on “interesting” cases

Fabio Polticelli UROM3  EGEE07  Budapest  7/tot  The tool: Rosetta abinitio  Developed by David Baker – University of Washington  Based on a “fragment assembly” strategy  semi-empirical force field for the evaluation of the thermodinamics of the predicted structure  Particularly successful in the prediction of novel folds in the CASP competitions (Critical Assesment of Structure Prediction)  Rosetta abinitio has been deployed in GRID through the use of the GENIUS interface with the option of parametric jobs submission to run a large number of jobs (structure predictions) at the same time.

Fabio Polticelli UROM3  EGEE07  Budapest  8/tot  First step: Integration on the GILDA facility  Single job execution on GILDA A shell script has been prepared which: – registers the program executable and the required input files (fragment libraries and secondary structure prediction file) on the LFC catalog – calls the Rosetta executable and proceeds with workflow execution. A JDL file was created to run the application on the GILDA working nodes which use the gLite middleware

Fabio Polticelli UROM3  EGEE07  Budapest  9/tot  Integration on the GENIUS web portal  To facilitate the use of the Rosetta abinitio application within the grid environment by the computational biology community, the application was integrated within the GENIUS portal (  After MyProxy server initialization, input files and executable uploading, JDL file preparation, application running, run status monitoring and download of the output file are carried out from within the portal.  Given the huge number of “never born proteins” to be simulated, a parametric JDL file automatic generation procedure has been set up within the GENIUS environment.  More than 2x10 4 never born protein structures predicted so far

Fabio Polticelli UROM3  EGEE07  Budapest  10/tot  GENIUS screenshots

Fabio Polticelli UROM3  EGEE07  Budapest  11/tot  Never born proteins structure examples

Fabio Polticelli UROM3  EGEE07  Budapest  12/tot  Early/Late stage  Developed by Irena Roterman group – Jagellonian University  A program for protein folding simulation not structure prediction (complementary approach to Rosetta)  based on early stage - statistics using a database of known sequences;  late stage - energy minimization in alternating potentials; this stage is the most computationally expensive;  Early/Late stage has been deployed in GRID through the use of the GridSphere Portal Framework and Gridwise Tech LCG API package, that provides access to gLite middleware.

Fabio Polticelli UROM3  EGEE07  Budapest  13/tot  Early/Late stage  A self-containing bundle of programs and libraries needed for application running was created and registered in the LFC catalogue.  A script was created to install the application on site each time a job is started.  A JDL file was created to run application on the grid, that use the gLite middleware.  Finally, to enable running the application for users that are not familiar with the grid, it was decided to integrate it in a web portal based on the GridSphere Portal Framework.

Fabio Polticelli UROM3  EGEE07  Budapest  14/tot  Never born proteins. What’s next ?  Build consensus  Experimental validation Nuclear magnetic resonance (NMR) data acquired in the NMR centre of Peking University Experimental data contain all the information about the primary structure of the protein, about topology and bonds. NMR structure calculation and refinement is an iterative process which, for a single protein, involves many starting structures, normally 200 structures per round, and each protein may need (or more) rounds of calculations.

Fabio Polticelli UROM3  EGEE07  Budapest  15/tot  AMBER porting on GRID  A simple.JDL file and a set of scripts to run the program have been developed Executable = "amber_serial.sh"; StdOutput = "testJob.out"; StdError = "testJob.err"; InputSandbox = {"amber_serial.sh","amber_test/amber_grid.tar"}; OutputSandbox = {"testJob.out","testJob.err","out.tar"}; Requirements = other.GlueCEUniqueID == "gridce.roma3.infn.it:2119/jobmanager-lcgpbs-grid";  The program is currently under testing by the Peking University NMR group

Fabio Polticelli UROM3  EGEE07  Budapest  16/tot  What we have done  Structural bioinformatics is a key instrument to exploit the huge amount of data available on human and pathogens genes.  In the EUChinaGRID project we set up a system to predict the three-dimensional structure of a high number of protein sequences, to validate the predictions and to test them experimentally  We are currently refining function recognition (ASSIST) and catalytic site identification tools (Early/Late Stage)  In silico structural genomics of bacterial and viral pathogens Low-cost activity High potential biomedical impact with small investments Application to endemic human and animal pathogens of developing countries Sinergy with pharmaceutical industry What we plan to do

FP6−2004−Infrastructures−6-SSA Acknowledgements - Prof. Luisi for the original idea of “never born proteins” - INFN Catania (Rosetta deployment) - Jagellonian Univ. (Early/Late Stage deployment) - INFN Roma Tre (AMBER deployment)

FP6−2004−Infrastructures−6-SSA Thank you for your attention !