Download presentation
Presentation is loading. Please wait.
Published byAldous Randall Modified over 8 years ago
1
Milanesi Luciano Catania, Italy 13/03/2007 Bioinformatics challenges in European projects in Grid. Milanesi Luciano National Research Council Institute for Biomedical Technologies, Milan, Italy luciano.milanesi@itb.cnr.it Alessandro Orro National Research Council Institute for Biomedical Technologies, Milan, Italy alessandro.orro@itb.cnr.it
2
Milanesi Luciano Catania, Italy 13/03/2007 2 Related EU projects EUGRID ISS e G BEinGRID EUIndia
3
Milanesi Luciano Catania, Italy 13/03/2007 3 Introduction Bioinformatics applications have become an ideal research area where computer scientists can apply and further develop new intelligent computation methods, in both experimental and theoretical cases. Bioinformatics needs –Data storage (sequencing, genotyping, microarray) –Connection with HPC infrastructure –Data sharing and distribution The European Bioinformatics initiative based on infrastructure created by the EGEE and BioinfoGRID try to address these issues. - 2 years: 1 Gen 2006 – 31 Dic 2007
4
Milanesi Luciano Catania, Italy 13/03/2007 4 BioinfoGRID Project The BIOINFOGRID project aims to –promote the Bioinformatics Grid application for life science in the bioinformatics community –Evaluate and adopt high-level user interfaces –Evaluate bioinformatics applications in five main fields Genomics, Proteomics, Transcriptomics, Molecular dynamics, Biological database Partners
5
Milanesi Luciano Catania, Italy 13/03/2007 5 BioinfoGRID www.bioinfogrid.eu
6
Milanesi Luciano Catania, Italy 13/03/2007 6 Research Main research fields –Genomics –Proteomics –Transcriptomics –Molecular dynamics –Biological database
7
Milanesi Luciano Catania, Italy 13/03/2007 7 Genomics applications Genomics Bioinformatics applications are typically data driven and have long running times because it is necessary to integrate many different biological databases and tools. Comparative approach: sequence search, multiple alignment, domain search
8
Milanesi Luciano Catania, Italy 13/03/2007 8 Genomics applications Validation of the W3H-Task-System
9
Milanesi Luciano Catania, Italy 13/03/2007 9 Proteomics Applications the evaluation of different programs and databases to perform high throughput proteomics analysis in grid, in order to face genome scale analysis both in sequence based functional identification and in structural studies of the three dimensional atoms configuration
10
Milanesi Luciano Catania, Italy 13/03/2007 10 Proteomics Applications Pipeline for protein functional domain analysis –BlastProDom is a wrapper script on top of a Blast package used to search against PRODOM families –FPrintScan is used to search against the PRINTS collection of protein signatures –HMMPfam is used to search against the Pfam HMM database, against SMARTHMM database and against TIGRFAMs collection of HMMs. –ScanRegExp is used to search against the PROSITE patterns collection and verify the matches by statistically significant CONFIRM patterns. –Superfamily is used to search against the SUPERFAMILY database of structures. –SignalPHMM for prediction and location of signal peptide cleavage sites, using HMM.
11
Milanesi Luciano Catania, Italy 13/03/2007 11 Proteomics Applications Protein surface calculation : the grid will be used to elaborate the volumetric description of the protein obtaining a precise representation of the corresponding surface.
12
Milanesi Luciano Catania, Italy 13/03/2007 12 Transcriptomics Applications Computational GRIDs to analyse trascriptomics data Description To perform algorithmic tools for gene expression data analysis in GRID: evaluate the computational tools for extracting biologically significant information from gene expression data. Algorithms will focus on clustering steady state and time series gene expression data, multiple testing and meta analysis of different microarray experiments.
13
Milanesi Luciano Catania, Italy 13/03/2007 13 Transcriptomics Applications Samples Genes Sample annotations Gene annotations Gene expression matrix Gene expression levels
14
Milanesi Luciano Catania, Italy 13/03/2007 14 Transcriptomics Applications Green = Expression level low with respect to reference sample. Red = Expression level high with respect to reference sample. Black = Expression level comparable to reference sample. The columns are ordered such that similar expression profiles neighbor each other. Eisen et al. PNAS 1998.
15
Milanesi Luciano Catania, Italy 13/03/2007 15 Transcriptomics Applications Case studies: breast cancer
16
Milanesi Luciano Catania, Italy 13/03/2007 16 Molecular applications in GRID Molecular Dynamics = computation of the motion of atoms within a molecular system using molecular mechanics Molecular Dynamics is commonly used for drug design and drug discovery –Molecular modelling of drugs –Measurement of binding energies between ligands and biological targets Grids offer promising perspectives for in silico drug discovery –Identification of drug candidates using computing tools –Virtual screening (docking) = rapid assessment of large libraries of chemical structures in order to guide the selection of likely drug candidates Resutl from docking a diphenyl urea compound against plasmepsins (WISDOM-I, credit: V. Kasam)
17
Milanesi Luciano Catania, Italy 13/03/2007 17 Molecular applications in GRID Aim : The objective is to docking and Molecular Dynamics simulations, which usually take a very long time to complete the analysis. Description Wide In Silico Docking On Malaria initiative WISDOM- II:This project perform the docking and molecular dynamics simulation on the GRID platform for discovery new targets for neglected diseases. Analysis can be performed notably using the data generated by the WISDOM application on the EGEE infrastructure.
18
Milanesi Luciano Catania, Italy 13/03/2007 18 Influenza A Neuraminidase Grid-enabled High-throughput in-silico Screening against Influenza A Neuraminidase Encouraged by the success of the first EGEE biomedical data challenge against malaria (WISDOM), the second data challenge battling avian flu was kicked off in April 2006 to identify new drugs for the potential variants of the Influenza A virus. In this project, the impact of a world-wide Grid infrastructure to efficiently deploy large scale virtual screening to speed up the drug design process has been demonstrated.
19
Milanesi Luciano Catania, Italy 13/03/2007 19 Influenza A Neuraminidase Results Completed dockings308,585 Estimated duration on 1 CPU16.7 year Duration of experiment30 days Number of jobs2580 Max number of concurrent CPUs240 Number of CE36
20
Milanesi Luciano Catania, Italy 13/03/2007 20 Conclusions 2007 activities –Interfaces Improvement –Dissemination and training –Target extensions (medical and biomedical informatics)
21
Milanesi Luciano Catania, Italy 13/03/2007 21 Acknowledgments BioinfoGRID http://www.bioinfogrid.euhttp://www.bioinfogrid.eu EGEE Enabling Grid for E-science project http://www.eu.egee.orghttp://www.eu.egee.org EELA: e-Infrastructure between Europe and Latin America project http://www.eu-eela.org/index.htm http://www.eu-eela.org/index.htm Euchinagrid: Interconnection & Interoperability of Grids between Europe & China project. http://www.euchinagrid.org/ FIRB-MIUR LITBIO: Laboratory for Interdisciplinary Technologies in Bioinformatics http://www.litbio.org,http://www.litbio.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.