Milanesi Luciano CAPI 16-17 Milan, Italy HPC AND GRID BIOCOMPUTING APPLICATIONS IN LIFE SCIENCE Milanesi Luciano National Research Council Institute of.

Slides:



Advertisements
Similar presentations
Biological pathway and systems analysis An introduction.
Advertisements

Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
MAGNet Center: Andrea Califano NCIBI: Brian Athey Simbios: Russ Altman Creating a DBP Community to Enhance the NCBC Biomedical Impact NCBC Work Group Report,
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Bioinformatics at IU - Ketan Mane. Bioinformatics at IU What is Bioinformatics? Bioinformatics is the study of the inherent structure of biological information.
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Medical Informatics Basics
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Bioinformatics Grid Application for Life Science. COMMUNICATION NETWORK DEVELOPMENT SPECIFIC SUPPORT ACTION BIOINFOGRID Luciano Milanesi CNR-ITB.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Knowledgebase Creation & Systems Biology: A new prospect in discovery informatics S.Shriram, Siri Technologies (Cytogenomics), Bangalore S.Shriram, Siri.
Development of Bioinformatics and its application on Biotechnology
Tae-Hyung Kim 1 Gil-Mi Ryu 1,2 InSong Koh 2 Jong Park 3 1.
Biomedical Computation at NIH Michael Marron National Center for Research Resources National Institutes of Health.
Beyond the Human Genome Project Future goals and projects based on findings from the HGP.
Chapter 13. The Impact of Genomics on Antimicrobial Drug Discovery and Toxicology CBBL - Young-sik Sohn-
Ranking-Aware Integration and Explorative Search of Distributed Bio-Data Dipartimento di Elettronica e Informazione NETTAB 2012 Integrated Bio-Search November.
Medical Informatics Basics
Bioinformatics and medicine: Are we meeting the challenge?
IST E-infrastructure shared between Europe and Latin America Biomedical Applications in EELA Esther Montes Prado CIEMAT (Spain)
Medical Informatics Basics Lection 1 Associated professor Andriy Semenets Department of Medical Informatics.
EGEE’ September 2009 BARCELLONA,SPAINMilanesi Luciano Bioinformatics GRID and HPC challenges in Biomedicine and Biosciences. Milanesi Luciano National.
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
Integrated Biomedical Information for Better Health Workprogramme Call 4 IST Conference- Networking Session.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.
BIOINFOGRID: Bioinformatics Grid Application for Life Science Giorgio Maggi INFN and Politecnico di Bari
INFSO-RI Enabling Grids for E-sciencE V. Breton, 30/08/05, seminar at SERONO Grid added value to fight malaria Vincent Breton EGEE.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Agent-based methods for translational cancer multilevel modelling Sylvia Nagl PhD Cancer Systems Science & Biomedical Informatics UCL Cancer Institute.
Page 1 SCAI Dr. Marc Zimmermann Department of Bioinformatics Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) Grid-enabled drug discovery.
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
Integrating the Bioinformatic Technology Group into your research programme Introduction People and Skills Examples Integrating the BTG Contacts BHRC Away.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Harbin Institute of Technology Computer Science and Bioinformatics Wang Yadong Second US-China Computer Science Leadership Summit.
Systems Biology ___ Toward System-level Understanding of Biological Systems Hou-Haifeng.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Overview of Bioinformatics 1 Module Denis Manley..
EGEE-II INFSO-RI Enabling Grids for E-sciencE WISDOM in EGEE-2, biomed meeting, 2006/04/28 WISDOM : Grid-enabled Virtual High Throughput.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
EB3233 Bioinformatics Introduction to Bioinformatics.
An overview of Bioinformatics. Cell and Central Dogma.
INFSO-RI Enabling Grids for E-sciencE EGEE Review WISDOM demonstration Vincent Bloch, Vincent Breton, Matteo Diarena, Jean Salzemann.
A collaborative tool for sequence annotation. Contact:
Bioinformatics and Computational Biology
An approach to carry out research and teaching in Bioinformatics in remote areas Alok Bhattacharya Centre for Computational Biology & Bioinformatics JAWAHARLAL.
B i o i n f o r m a t i c s / B i o m e d i c a l A p p l i c a t i o n s i n E E L A Mexico, D.F., october 22 – 26, e – s c i e n c e M e x i c.
BIOINFOGRID: Bioinformatics Grid Application for life science MILANESI, Luciano National Research Council Institute of.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
High throughput biology data management and data intensive computing drivers George Michaels.
Milanesi Luciano Catania, Italy 13/03/2007 Bioinformatics challenges in European projects in Grid. Milanesi Luciano National Research Council Institute.
Bioinformatics Grid Application for Life Science. COMMUNICATION NETWORK DEVELOPMENT SPECIFIC SUPPORT ACTION BIOINFOGRID Andreas Gisel & Luciano Milanesi.
신기술 접목에 의한 신약개발의 발전전망과 전략 LGCI 생명과학 기술원. Confidential LGCI Life Science R&D 새 시대 – Post Genomic Era Genome count ‘The genomes of various species including.
Semantic Web - caBIG Abstract: 21st century biomedical research is driven by massive amounts of data: automated technologies generate hundreds of.
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY
Introduction to Bioinformatic
Presentation transcript:

Milanesi Luciano CAPI Milan, Italy HPC AND GRID BIOCOMPUTING APPLICATIONS IN LIFE SCIENCE Milanesi Luciano National Research Council Institute of Biomedical Technologies, Milan, Italy CAPI 2006 Milan, 16-17

CAPI Milan, Italy Milanesi Luciano 2 Introduction: Post-genomic “Post-genomic” focuses on the new tools and new methodologies emerging from the knowledge of genome sequences. Production and use of DNA micro arrays, analysis of transciptome, proteome, metabolome are the different topics developed in this class.

CAPI Milan, Italy Milanesi Luciano 3 The human organism: ~ 3 billion nucleotides ~ 30,000 genes coding for ~ 100, ,000 transcripts ~ 1-2 million proteins ~ 60 trillion cells of ~ 300 cell types in ~14,000 distinguishable morphological structures

CAPI Milan, Italy Milanesi Luciano 4 Human Genome and Medicine As research progresses, investigators will also uncover the mechanisms for diseases caused by several genes or by a gene interacting with environmental factors. The identification of these genes and their proteins will be useful in finding more-effective therapies and preventive measures. Investigators determining the underlying biology of genome organization and gene regulation will also begin to understand how humans develop from single cells to adults. A new level of experiments are required to obtain an overall picture of when, where, and how gene are expressed.

CAPI Milan, Italy Milanesi Luciano 5 A typical gene lab can produce 100 terabytes of information a year, the equivalent of 1 million encyclopedias. Few biologists have the computational skills needed to fully explore such an astonishing amount of data; nor do they have the skills to explore the exploding amount of data being generated from clinical trials. The immense amount of data that are available, and the knowledge is the tip of the data iceberg. Bioinformatics: Emerging Opportunities and Emerging Gaps11 Paula E.Stephan and Grant Black Emerging Opportunites

CAPI Milan, Italy Milanesi Luciano 6 ICT and Genomics A key development in the computational world has been the arrival of de novo design algorithms that use all available spatial information to be found within the target to design novel drugs. Coupling these algorithms to the rapidly growing body of information from structural genomics together with the new ICT technology (eg. HPC, GRID, Web Services, ecc.) provides a powerful new possibility for exploring design to a broad spectrum of genomics targets, including more challenging techniques such as: protein–protein interactions, docking, molecular dynamics, system biology, gene network ecc.

CAPI Milan, Italy Milanesi Luciano 7 DNA High Throughput Sequencing DNA High Throughput Sequencing MSMS EST HTS Microsatellite SNP’s Microarray High Throughput Data Project

CAPI Milan, Italy Milanesi Luciano 8 NCBI initiative for the creation of 7 National Centre for Integrative Biomedical Informatics in USA Informatics for Integrating Biology and the Bedside (i2b2) Isaac Kohane, PI Center for Computational Biology (CCB) Arthur Toga, PI Multiscale Analysis of Genomic and Cellular Networks (MAGNet) Andrea Califano, PI National Alliance for Medical Imaging Computing (NA-MIC) Ron Kikinis, PI The National Center For Biomedical Ontology (NCBO) Mark Musen, PI Physics-Based Simulation of Biological Structures (SIMBIOS) Russ Altman, PI National Center for Integrative Biomedical Informatics (NCIBI) Brian D. Athey, PI

CAPI Milan, Italy Milanesi Luciano 9 Related EU projects EUGRID ISS e G BEinGRID EUIndia

CAPI Milan, Italy Milanesi Luciano 10 BioinfoGRID Project. The BIOINFOGRID project proposes to combine the Bioinformatics services and applications for molecular biology users with the Grid Infrastructure by EGEE and EGEEII projects. In the BIOINFOGRID initiative we plan to evaluate genomics, transcriptomics, proteomics and molecular dynamics applications studies based on GRID technology. The project start date: 1st January 2006 The project finish date: 31 December 2007

CAPI Milan, Italy Milanesi Luciano 11 The grid application aspects. The massive potential of Grid technology will be indispensable when dealing with both the complexity of models and the enormous quantity of data, for example, in searching the human genome or when carry out simulations of molecular dynamics for the study of new drugs. The BIOINFOGRID projects proposes to combine the Bioinformatics services and applications for molecular biology users with the Grid Infrastructure created by EGEE Enabling Grids for E-sciencE

CAPI Milan, Italy Milanesi Luciano 12 EGEE: > 180 sites, 40 countries > 24,000 processors, ~ 5 PB storage EGEE Grid Sites : Q sites CPU EGEE: Steady growth over the lifetime of the project

CAPI Milan, Italy Milanesi Luciano 13 Genomics applications in GRID Aim : use of computational GRID to analyse molecular biological data at the genomic scale Description the GRID Portal system: unification of larger groups of bioinformatics tools into single analytical steps and their optimization for GRID GRID analysis of cDNA data: computer- aided functional annotation of cDNAs in order to optimize sensitivity and specificity

CAPI Milan, Italy Milanesi Luciano 14 Genomics applications in GRID GRID analysis of genomic databases: integration of precomputed data, gene identification, differentiation of pseudogenes, comparative genome analysis, etc. Multiple alignments: testing of new algorithms for computationally very demanding alignment procedures, optimization for GRID.

CAPI Milan, Italy Milanesi Luciano 15 Proteomics Applications in GRID Aim : use of computational GRIDs to analysis molecular biological data in proteomics Description Perform functional protein analysis in GRID by using the functional protein domain annotations on large protein families using GRID and related databases.

CAPI Milan, Italy Milanesi Luciano 16 Proteomics Applications in GRID Protein surface calculation in GRID. : the grid will be used to elaborate the volumetric description of the protein obtaining a precise representation of the corresponding surface.

CAPI Milan, Italy Milanesi Luciano 17 Transcriptomics applications in GRID Aim : use of computational GRIDs to analyse trascriptomics data and to perform application of Phylogenetic methods based on estimates trees. Description To perform algorithmic tools for gene expression data analysis in GRID: evaluate the computational tools for extracting biologically significant information from gene expression data. Algorithms will focus on clustering steady state and time series gene expression data, multiple testing and meta analysis of different microarray experiments from different groups, and identification of transcription sites.

CAPI Milan, Italy Milanesi Luciano 18 Transcriptomics applications in GRID Data analysis specific for bioinformatics allow the GRID user to store and search genetics data, with direct access to the data files stored on Data Storage element on GRID servers. Researchers perform their activities regardless geographical location, interact with colleagues, share and access data Scientific instruments and experiments provide huge amount of data from microarray

CAPI Milan, Italy Milanesi Luciano 19 Phylogenetic application in GRID Phylogenetics : Reconstructing the evolutionary history of a group of taxa is major research thrust in computational biology and a standard part of exploratory sequence analysis. An evolutionary history not only gives relationships among taxa, but also an important tool for inferring the universal tree of life, inferring structural, physiological, and biochemical properties of sequences from other similar sequences, and reconstruction of tissue evolution.

CAPI Milan, Italy Milanesi Luciano 20 Database Applications in GRID Aim : To mange the biological database, by using the GRID EGEE infrastructure. Description Biological database on GRID: these databases will be complemented by others that are publicly available in Internet, by using GRID and web services where appropriate. Functional Analogous Finder: By using the GO terms and the associations to gene products it is possible to compare the total associated GO terms and their ascending parents to validate the functional analogy between two gene products

CAPI Milan, Italy Milanesi Luciano 21 Molecular applications in GRID Aim : The objective is to docking and Molecular Dynamics simulations, which usually take a very long time to complete the analysis. Description Wide In Silico Docking On Malaria initiative WISDOM- II:This project perform the docking and molecular dynamics simulation on the GRID platform for discovery new targets for neglected diseases. Analysis can be performed notably using the data generated by the WISDOM application on the EGEE infrastructure.

CAPI Milan, Italy Milanesi Luciano 22 Wide In Silico Docking On Malaria Ligand Loops variation between structures Active site ~ 40 millions complexes target-compound were produced during the DC

CAPI Milan, Italy Milanesi Luciano 23 Influenza A Neuraminidase Grid-enabled High-throughput in-silico Screening against Influenza A Neuraminidase Encouraged by the success of the first EGEE biomedical data challenge against malaria (WISDOM), the second data challenge battling avian flu was kicked off in April 2006 to identify new drugs for the potential variants of the Influenza A virus. Mobilizing thousands of CPUs on the Grid, the 6-weeks high-throughput screening activity has fulfilled over 100 CPU years of computing power. In this project, the impact of a world-wide Grid infrastructure to efficiently deploy large scale virtual screening to speed up the drug design process has been demonstrated.

CAPI Milan, Italy Milanesi Luciano 24 LITBIO FIRB-MIUR LITBIO: Laboratory for Interdisciplinary Technologies in Bioinformatics Istituto Nazionale per la ricerca sul Cancro - Genova Consiglio Nazionale delle Ricerche DIST- Università di Genova Unversità di Camerino CEINGE - Università di Napoli Exadron – Eurotech S.p.A CONSORZIO INTERUNIVERSITARIO LOMBARDO PER L'ELABORAZIONE AUTOMATICA, Segrate, Italy

CAPI Milan, Italy Milanesi Luciano 25 System Biology for Health

CAPI Milan, Italy Milanesi Luciano System Biology Cell cycle is a complex biological process that implies the interaction of a large number of genes Disease studies on tumour proliferation are related with the de-regulation of cell cycle It will be useful finding as quickly as possible information related to all the genes involved in this cellular process We implement a new resource which collects useful information about the human cell cycle to support studies on genetic diseases related to this crucial biological process

CAPI Milan, Italy Milanesi Luciano 27 Human Cell Cycle Data Integration Data integration system from many biological resources: NCBI, Ensemble, Kegg, Reactome, dbSNP, MGC, DBTSS, Unigene, QPPD, TRANSFAC UniProt, InterPro, PDB, TRANSPATH, BIND, MINT, IntAct Data Warehouse Approach

CAPI Milan, Italy Milanesi Luciano 28 Data Warehouse WHY DATA WAREHOUSE: High efficiency to retrieve specific information related to a specific query More information availability in unique resource Immediate access to different kind of information through a single query Better information accuracy and better control on the information sources

CAPI Milan, Italy Milanesi Luciano 29 Text Mining: Cyclin D1 Literature searching develeped in ORIEL and based on the E-Biosci searching tool List of abstract related to cyclin D1 description

CAPI Milan, Italy Milanesi Luciano 30 Syntetic Biology Molecular Interaction Maps are becoming the equivalent of an anatomy atlas to map specific measurements in a functional context; e.g. QTLs, expression profiles, etc. Barrett et al. Current Opinion in Biotechnology 2006, 17:488–492

CAPI Milan, Italy Milanesi Luciano 31 Conclusion New technologies have been introduced to automate the analysis, and annotation of genomic, proteomic and Systems Biology data (eg. Web services, Workflow, Data Mining, Agent, GRID, Ontology, Semantic Web). A new generation of algorithms and data mining needs to be developed in order to be capable of connecting the biological information of genes, proteins and metabolic pathways with the patients’ disease. The dedicated HPC and GRID infrastructure will be in a position to tackle the important role of developing new strategies for production and analysis of data in the fields of biotechnology and biomedicine. The massive potential of HPC and Grid technology will be indispensable when dealing with both the complexity of models and the enormous quantity of data.

CAPI Milan, Italy Milanesi Luciano 32 Acknowledgments This work was supported by the: Italian FIRB-MIUR LITBIO: Laboratory for Interdisciplinary Technologies in Bioinformatics BIOINFOGRID EGEE Enabling Grid for E- science project

CAPI Milan, Italy Milanesi Luciano 33 Thank you EUGRID ISS e G