FP6−2004−Infrastructures−6-SSA-026634 Biological applications in GRID: the EUChinaGRID experience F. Polticelli – University Roma Tre EUChinaGRID WP4-Applications.

FP6−2004−Infrastructures−6-SSA-026634 Biological applications in GRID: the EUChinaGRID experience F. Polticelli – University Roma Tre EUChinaGRID WP4-Applications Manager Budapest, 1.10.2007

Fabio Polticelli UROM3  EGEE07  Budapest 1-10-2007  2/tot  Outline  EUChinaGrid Overview  The structural genomics challenge  Biological Applications in EUChinaGRID The “never born proteins” Protein structure prediction using GRID – Rosetta integration within the Genius portal – Early/Late stage integration in the Gridsphere portal Structure validation using GRID – AMBER deployment on GRID  Conclusions and perspectives function recognition and catalytic site identification tools In silico structural genomics

Fabio Polticelli UROM3  EGEE07  Budapest 1-10-2007  3/tot  EUChinaGRID Overview  Aim provide support actions to foster the integration and interoperability of the Grid infrastructures in Europe (EGEE) and China (CNGrid). promote the migration of new applications on the Grid infrastructures by training new user communities and supporting the adoption of grid tools for scientific applications.  Applications validate the intercontinental infrastructure using scientific applications facilitate porting of new applications relevant for scientific and industrial collaboration between Europe and China. three main application fields: – EGEE Applications (ATLAS and CMS) – Astroparticle Physics applications (the ARGO experiment) – Biology applications (“Never born Proteins)

Fabio Polticelli UROM3  EGEE07  Budapest 1-10-2007  4/tot  The structural genomics challenge  The combination of the 20 natural amino acids in a specific sequence in a protein chain dictates the three-dimensional structure of the protein  Protein function is linked to the specific three-dimensional arrangement of amino acids functional groups.  With the advancement of molecular biology techniques a huge amount of information on protein sequences has been made available but far less information is available on structure and function of these proteins.  Prediction of protein structure and function is a key instrument to better understand the protein folding principles and successfully exploit the information provided by the “genomic revolution”.

Fabio Polticelli UROM3  EGEE07  Budapest 1-10-2007  5/tot  The test case: Never born proteins  With 20 different comonomers, a protein chain of just 60 amino acids can theoretically exist in 20 60 chemically and structurally unique combinations  But the number of natural proteins (10 9 to a maximum of 10 13 ) is just a tiny fraction of all possible proteins  There exist a huge number of protein sequences that have never been exploited by biological systems, in other words and enormous number of “never born proteins” (NBP). These pose the following questions: – Which are the criteria with which the existing proteins have been selected? – Natural proteins have peculiar properties in terms for example of thermal stability, solubility in water or amino acid composition? – Can NBP be exploited for biomedical and/or biotechnological purposes?

Fabio Polticelli UROM3  EGEE07  Budapest 1-10-2007  6/tot  Never born proteins and GRID  The problem is tackled by a “high throughput” approach made feasible by the use of the GRID infrastructure  A huge library of random amino acid sequences of fixed length is generated (n=70)  “ab initio” protein structure prediction software is used.  Analysis of the structural characteristics of the resulting proteins frequency of compact and yet unknown folds presence of putative catalytic sites experimental validation on “interesting” cases

Fabio Polticelli UROM3  EGEE07  Budapest 1-10-2007  7/tot  The tool: Rosetta abinitio  Developed by David Baker – University of Washington  Based on a “fragment assembly” strategy  semi-empirical force field for the evaluation of the thermodinamics of the predicted structure  Particularly successful in the prediction of novel folds in the CASP competitions (Critical Assesment of Structure Prediction)  Rosetta abinitio has been deployed in GRID through the use of the GENIUS interface with the option of parametric jobs submission to run a large number of jobs (structure predictions) at the same time.

Fabio Polticelli UROM3  EGEE07  Budapest 1-10-2007  8/tot  First step: Integration on the GILDA facility  Single job execution on GILDA A shell script has been prepared which: – registers the program executable and the required input files (fragment libraries and secondary structure prediction file) on the LFC catalog – calls the Rosetta executable and proceeds with workflow execution. A JDL file was created to run the application on the GILDA working nodes which use the gLite middleware

Fabio Polticelli UROM3  EGEE07  Budapest 1-10-2007  9/tot  Integration on the GENIUS web portal  To facilitate the use of the Rosetta abinitio application within the grid environment by the computational biology community, the application was integrated within the GENIUS portal (https://glite-tutor.ct.infn.it).https://glite-tutor.ct.infn.it  After MyProxy server initialization, input files and executable uploading, JDL file preparation, application running, run status monitoring and download of the output file are carried out from within the portal.  Given the huge number of “never born proteins” to be simulated, a parametric JDL file automatic generation procedure has been set up within the GENIUS environment.  More than 2x10 4 never born protein structures predicted so far

Fabio Polticelli UROM3  EGEE07  Budapest 1-10-2007  10/tot  GENIUS screenshots

Fabio Polticelli UROM3  EGEE07  Budapest 1-10-2007  11/tot  Never born proteins structure examples

Fabio Polticelli UROM3  EGEE07  Budapest 1-10-2007  12/tot  Early/Late stage  Developed by Irena Roterman group – Jagellonian University  A program for protein folding simulation not structure prediction (complementary approach to Rosetta)  based on early stage - statistics using a database of known sequences;  late stage - energy minimization in alternating potentials; this stage is the most computationally expensive;  Early/Late stage has been deployed in GRID through the use of the GridSphere Portal Framework and Gridwise Tech LCG API package, that provides access to gLite middleware.

Fabio Polticelli UROM3  EGEE07  Budapest 1-10-2007  13/tot  Early/Late stage  A self-containing bundle of programs and libraries needed for application running was created and registered in the LFC catalogue.  A script was created to install the application on site each time a job is started.  A JDL file was created to run application on the grid, that use the gLite middleware.  Finally, to enable running the application for users that are not familiar with the grid, it was decided to integrate it in a web portal based on the GridSphere Portal Framework.

Fabio Polticelli UROM3  EGEE07  Budapest 1-10-2007  14/tot  Never born proteins. What’s next ?  Build consensus  Experimental validation Nuclear magnetic resonance (NMR) data acquired in the NMR centre of Peking University Experimental data contain all the information about the primary structure of the protein, about topology and bonds. NMR structure calculation and refinement is an iterative process which, for a single protein, involves many starting structures, normally 200 structures per round, and each protein may need 10-30 (or more) rounds of calculations.

Fabio Polticelli UROM3  EGEE07  Budapest 1-10-2007  15/tot  AMBER porting on GRID  A simple.JDL file and a set of scripts to run the program have been developed Executable = "amber_serial.sh"; StdOutput = "testJob.out"; StdError = "testJob.err"; InputSandbox = {"amber_serial.sh","amber_test/amber_grid.tar"}; OutputSandbox = {"testJob.out","testJob.err","out.tar"}; Requirements = other.GlueCEUniqueID == "gridce.roma3.infn.it:2119/jobmanager-lcgpbs-grid";  The program is currently under testing by the Peking University NMR group

Fabio Polticelli UROM3  EGEE07  Budapest 1-10-2007  16/tot  What we have done  Structural bioinformatics is a key instrument to exploit the huge amount of data available on human and pathogens genes.  In the EUChinaGRID project we set up a system to predict the three-dimensional structure of a high number of protein sequences, to validate the predictions and to test them experimentally  We are currently refining function recognition (ASSIST) and catalytic site identification tools (Early/Late Stage)  In silico structural genomics of bacterial and viral pathogens Low-cost activity High potential biomedical impact with small investments Application to endemic human and animal pathogens of developing countries Sinergy with pharmaceutical industry What we plan to do

FP6−2004−Infrastructures−6-SSA-026634 Acknowledgements - Prof. Luisi for the original idea of “never born proteins” - INFN Catania (Rosetta deployment) - Jagellonian Univ. (Early/Late Stage deployment) - INFN Roma Tre (AMBER deployment)

FP6−2004−Infrastructures−6-SSA-026634 Thank you for your attention !

FP6−2004−Infrastructures−6-SSA-026634 Biological applications in GRID: the EUChinaGRID experience F. Polticelli – University Roma Tre EUChinaGRID WP4-Applications.

Similar presentations

Presentation on theme: "FP6−2004−Infrastructures−6-SSA-026634 Biological applications in GRID: the EUChinaGRID experience F. Polticelli – University Roma Tre EUChinaGRID WP4-Applications."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

FP6−2004−Infrastructures−6-SSA-026634 Biological applications in GRID: the EUChinaGRID experience F. Polticelli – University Roma Tre EUChinaGRID WP4-Applications.

Similar presentations

Presentation on theme: "FP6−2004−Infrastructures−6-SSA-026634 Biological applications in GRID: the EUChinaGRID experience F. Polticelli – University Roma Tre EUChinaGRID WP4-Applications."— Presentation transcript:

Similar presentations

About project

Feedback