WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Slides:



Advertisements
Similar presentations
Symantec 2010 Windows 7 Migration EMEA Results. Methodology Applied Research performed survey 1,360 enterprises worldwide SMBs and enterprises Cross-industry.
Advertisements

Symantec 2010 Windows 7 Migration Global Results.
Design and Evaluation of an Autonomic Workflow Engine Thomas Heinis, Cesare Pautasso, Gustavo Alsonso Dept. of Computer Science Swiss Federal Institute.
Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
Distributed Systems Architectures
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 4 Computing Platforms.
11 Application of CSF4 in Avian Flu Grid: Meta-scheduler CSF4. Lab of Grid Computing and Network Security Jilin University, Changchun, China Hongliang.
Design of Dose Response Clinical Trials
UNITED NATIONS Shipment Details Report – January 2006.
Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.
T H O M S O N S C I E N T I F I C Ann Wescott April 8, 2008 Prous Science Update SLA DPHT Spring Meeting 2008.
INFSO-RI Enabling Grids for E-sciencE The WISDOM initiative Wide In Silico Docking On Malaria Yannick Legré, CNRS/IN2P3 on behalf.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Why Grids Matter to Europe Bob Jones EGEE.
1. 2 Why are Result & Impact Indicators Needed? To better understand the positive/negative results of EC aid. The main questions are: 1.What change is.
Fighting Malaria With The Grid. Computing on The Grid The Internet allows users to share information across vast geographical distances. Using similar.
1 Chapter 12 File Management Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Chapter 1 Introduction Copyright © Operating Systems, by Dhananjay Dhamdhere Copyright © Introduction Abstract Views of an Operating System.
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
Andrew McNaughton 1 Radical Change is Entirely Possible! 2 nd November 2011.
4.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 4: Organizing a Disk for Data.
13 Copyright © 2005, Oracle. All rights reserved. Monitoring and Improving Performance.
Chapter 1: Introduction to Scaling Networks
PP Test Review Sections 6-1 to 6-6
1 IMDS Tutorial Integrated Microarray Database System.
Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
Target Costing If you cannot find the time to do it right, how will you find the time to do it over?
1 Developing a Predictive Model for Internet Video Quality-of-Experience Athula Balachandran, Vyas Sekar, Aditya Akella, Srinivasan Seshan, Ion Stoica,
Universität Kaiserslautern Institut für Technologie und Arbeit / Institute of Technology and Work 1 Q16) Willingness to participate in a follow-up case.
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
Chapter 10: The Traditional Approach to Design
Systems Analysis and Design in a Changing World, Fifth Edition
Essential Cell Biology
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
From Model-based to Model-driven Design of User Interfaces.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks World-wide in silico drug discovery against.
INFSO-RI Enabling Grids for E-sciencE WISDOM mini-workshop Vincent Breton (CNRS-IN2P3, LPC Clermont-Ferrand) ISGC 2007 March 28th,
Biomedical research methods. What are biomedical research methods? An integrated approach using chemical, mathematical and computer simulations, in vitro.
The South African Malaria Initiative A Case Study E Jane Morris Bridging the Gap in Global Health Innovation - from Needs to.
Figure 4.1 NEW PRODUCT DEVELOPMENT PROCESS Finance Corporate strategy and portfolio decisions Regulatory affairs Marketing and sales + market research.
Asia’s Largest Global Software & Services Company Genomes to Drugs: A Bioinformatics Perspective Sharmila Mande Bioinformatics Division Advanced Technology.
KISTI’s Activities on the NA4 Biomed Cluster Soonwook Hwang, Sunil Ahn, Jincheol Kim, Namgyu Kim and Sehoon Lee KISTI e-Science Division.
FKPPL workshop May 2012 BUI The Quang Prof. Vincent Breton Prof. Doman Kim Prof. NGUYEN Hong Quang Prof. PHAM Quoc Long Grid enabled in silico drug discovery.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
IST E-infrastructure shared between Europe and Latin America Biomedical Applications in EELA Esther Montes Prado CIEMAT (Spain)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Building Grid-enabled Virtual Screening Service.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Application Case Study: Distributed.
INFSO-RI Enabling Grids for E-sciencE WHO, 2/11/04 Grid enabled in silico drug discovery Vincent Breton CNRS/IN2P3 Credit for the.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
Protein Molecule Simulation on the Grid G-USE in ProSim Project Tamas Kiss Joint EGGE and EDGeS Summer School.
INFSO-RI Enabling Grids for E-sciencE V. Breton, 30/08/05, seminar at SERONO Grid added value to fight malaria Vincent Breton EGEE.
Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.
Page 1 SCAI Dr. Marc Zimmermann Department of Bioinformatics Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) Grid-enabled drug discovery.
EGEE-II INFSO-RI Enabling Grids for E-sciencE WISDOM, a grid enabled virtual screening initiative Yannick Legré LPC Clermont-Ferrand,
INFSO-RI Enabling Grids for E-sciencE In silico docking on EGEE infrastructure, the case of WISDOM Nicolas Jacq LPC of Clermont-Ferrand,
EGEE-II INFSO-RI Enabling Grids for E-sciencE WISDOM in EGEE-2, biomed meeting, 2006/04/28 WISDOM : Grid-enabled Virtual High Throughput.
INFSO-RI Enabling Grids for E-sciencE Grid-enabled drug discovery to address neglected diseases N. Jacq – CNRS-IN2P3 EGAAP meeting.
Avian Flu Data Challenge Hsin-Yen Chen ASGC 29 Aug APAN24.
INFSO-RI Enabling Grids for E-sciencE EGEE Review WISDOM demonstration Vincent Bloch, Vincent Breton, Matteo Diarena, Jean Salzemann.
Milanesi Luciano Catania, Italy 13/03/2007 Bioinformatics challenges in European projects in Grid. Milanesi Luciano National Research Council Institute.
Page 1 Computer-aided Drug Design —Profacgen. Page 2 The most fundamental goal in the drug design process is to determine whether a given compound will.
Clouds , Grids and Clusters
Nicolas Jacq LPC, IN2P3/CNRS, France
APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY
WISDOM-II, status of preparation
In silico docking on grid infrastructures
Grid computing Assaf Gottlieb Tel-Aviv University
Presentation transcript:

WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann

Page 2 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Burden of Diseases in Developing World

Page 3 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Screen in culture Screen in animal models pharmacokinetics analysis High throughput screening InhibitorsValidated hitsLeads Drug candidate Iterative medicinal chemistry Optimise efficacy and pharmaceutical qualities Drug Discovery Process Target selection Validated? Robust assay system? Structural Information? Lead inhibitors? Product development Target More detailed potency/efficacy studies; pharmacokinetics; early toxicology Nwaka & Ridley 2003 Natural products/traditional medicines

Page 4 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Gaps in Drug R&D Tropical / Neglected Diseases Valid. drug targets New compounds Good animal models to assess safety and efficacy Good evaluation tools Clinical trials capacity in developing countries Capacity for uptake of new medicines Capacity for post- approval processes Research and discovery Preclinical development Phase IPhase IIPhase III Registration, launch, utilisation GAP I GAP II GAP III GAP IV

Page 5 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Potential Impact of an Distributed IT Infrastructure Speed-up the development of new drugs and vaccines in silico research on grids Collection of epidemiological data for research (modeling, molecular biology) Ease the deployment of clinical trials in endemic areas Improve disease monitoring Collect data on drug distribution and treatment follow-up Evaluate impact of policies and programs Improve alert and monitoring system for epidemics Ease integration of African research laboratories in world research Offer access to IT resources Offer access to data and services (telemedicine, life sciences)

Page 6 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases From a single PC to a Grid Farm of PCs Example: Novartis Examples: Example: EGEE Enterprise grid: Mutualization of resources in a company Volunteer computing: CPU cycles made available by PC owners Grid infrastructure: Internet + disk and storage resources + services for information management ( data collection, transfer and analysis)

Page 7 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Grid Impact on Drug Discovery Workflow down to Drug Delivery (1/2) Grids provide the necessary tools and data to identify new biological targets Bioinformatics services (database replication, workflow,...) Resources for CPU intensive tasks such as genomics comparative analysis, inverse docking,... Grids provide the resources to speed up lead discovery Large scale in silico docking to identify potentially promising compounds Molecular Dynamics computations to refine virtual screening and further assess selected compounds

Page 8 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Grid Impact on Drug Discovery Workflow down to Drug Delivery (2/2) Grids provide environments for epidemiology Federation of databases to collect data in endemic areas to study a disease and to evaluate impact of vaccine, vector control measures, Resources for data analysis and mathematical modeling Grids provide the services needed for clinical trials Federation of database to collect data in the centers participating to the clinical trials Grids provide the tools to monitor drug delivery Federation of database to monitor drug delivery

Page 9 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases WISDOM : Wide In Silico Docking On Malaria Biological goal Proposition of new inhibitors for a family of proteins produced by Plasmodium falciparum Biomedical informatics goal Deployment of in silico virtual docking on the grid Grid goal Deployment of a CPU consuming application generating large data flows to test the grid operation and services => data challenge

Page 10 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Introduction to the Disease : Malaria ~300 million people worldwide are affected million people die every year Widely spread Caused by protozoan parasites of the genus Plasmodium Complex life cycle with multiple stages

Page 11 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases There is a Real Need for New Drugs to Fight Malaria (WHO) Drug resistance has emerged for all classes of antimalarials except artemisinins. Resistance to chloroquine, the cheapest and the most used drug, is spreading in almost all the endemic countries. Resistance to the combination of sulfadoxine-pyrimethamine which was already present in South America and in South-East Asia is now emerging in East Africa (65% in Western Tanzania) All countries experiencing resistance to conventional monotherapies should use ACTs (artemisinin-based combination therapies) But there is even the threat of resistance to artemisinin too, as it is already observed in murine Plasmodium yoelii

Page 12 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Identification of New Antimalarial Targets The available drugs focus on a limited number of biological targets => cross-resistance to antimalarials With the advent of the plasmodium genome, many targets came into light The potential antimalarial drug targets are broadly classified into three categories: Targets involved in hemoglobin degradation (proteases like plasmepsins, falcipains) Targets involved in metabolism Targets engaged in membrane transport and signaling (choline transporter etc ). The present project WISDOM focuses on hemoglobin metabolism and especially on Plasmepsin II and Plasmepsin IV

Page 13 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Docking Dataflow hit crystal structure ligand data base junk placing the ligand Structure optimization Ranking Protein surface Ligand Water molecule Scoring

Page 14 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases High Throughput Virtual Docking Chemical compounds (ZINC): Chembridge – 500,000 Drug like – 500,000 Targets (PDB): Plasmepsin II (1lee,1lf2,1lf3) Plasmepsin IV (1ls5) Millions of chemical compounds available in laboratories High Throughput Screening 1-10$/compound, nearly impossible Molecular docking (FlexX, AutoDock) ~80 CPU years, 1 TB data Data challenge on EGEE ~6 weeks on ~1700 computers Hits screening using assays performed on living cells Leads Clinical testing Drug

Page 15 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Experimental setup for HT-experimentation 2 different assays: ie. docking tools FlexX and AutoDock 5+3 (5+5) different targets from two families (Plm II and IV): 1lee, 1lf2, 1lf3, 1ls5 A and B chain, including 3 different crystal water setups 4 (2) different assay conditions, ie. parameter variations (place particles, overlap volume, 2 genetic algorithms) multiple point measuring, i.e. pose clustering and select 10 cluster centres 52 large scale experiments 26 million measurements 52 large scale experiments 26 million measurements 4 x x 10

Page 16 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Testing under Controlled Conditions redocking studies of 5 co-crystallized ligands and 9 known inhibitors mixed with ZINC compounds ZINC control test quality filter 2000

Page 17 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases EGEE, International Project of Grid Infrastructure Started in 2004, >70 partners in the world Project leader : CERN 6 scientific domains with >20 applications deployed 170 grid nodes, CPUs, several PetaBytes of data, jobs by day Countries with nodes contributing to the data challenge WISDOM

Page 18 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Simplified Grid Workflow 3000 floating licenses given by BioSolveIT to SCAI Maximum number of used licenses was 1008 StorageElement ComputingElement Site1 Site2 StorageElement User interface ComputingElement Compounds database Parameter settings Target structures Compounds sub lists Results Statistics Compounds list RessourceBroker Software FlexX License Server EGEE

Page 19 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases WISDOM Architecture GRID Grid services (RB, RLS…) Grid resources (CE, SE) Application components (Software, database) wisdom_install InstallerTester wisdom_test wisdom_execution Workload definition Job submission Job monitoring Job bookkeeping Fault tracking Fault fixing Job resubmission Set of jobs User wisdom_collect Accounting data Superviser wisdom_site wisdom_db License server

Page 20 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Objective of the WISDOM Development Objective Producing a large amount of data in a limited time with a minimal human cost during the data challenge. Need an optimized environment Limited time Performance goal Need a fault tolerant environment Grid is heterogeneous and dynamic Stress usage of the grid during the DC Need an automatic production environment Execution with the Biomedical Task Force Grid API are not fully adapted for a bulk use at a large scale

Page 21 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Number of Docked Ligands (millions) vs. Time (days) : Intensive submission of FlexX jobs with Chembridge ligands base 2: Resubmission 3: Intensive submission of FlexX jobs with drug like ligands base 4: Resubmission 5: Intensive submission of Autodock jobs with Chembridge ligands base 6: Resubmission

Page 22 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Number of Running and Waiting Jobs vs. Time

Page 23 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Total Amount of CPU Provided by EGEE Federation

Page 24 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Exploitation Metrics MetricsFlexX + Autodock phases Total CPU time80 years Number of jobs72751 Number of grid nodes58 Number of jobs running in parallel on the grid 1643 Volume of output data946 GB Volume of transferred data (input + output)6302 GB

Page 25 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Performance Metrics MetricsFlexX + Autodock phases Cumulated millions number of docked ligands 41,27 Number of docked ligands / h46475 Effective CPU time67,15 years Effective duration37 days Crunching factor662 Average transfer rate0,8 MB/s Peak rate62,1 MB/s

Page 26 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Efficiency Metrics (1/2) MetricsFlexX + Autodock phases Success rate77,0 % Success rate after results checking 46,2 % Success rate after results checking without WISDOM failures 63,0 % Efficiency depends on : Heterogeneous and dynamic nature of the grid Stress usage Automatic jobs (re)submission (sink-hole effect)

Page 27 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Efficiency Metrics (2/2) Successful jobs46 % Workload Management failure10 % - Overload, Disk failure - Misconfiguration, disk space problem - Air-conditioning, electrical cut Data Management failure4 % - Network / connection - Electrical cut - Unknown Sites failure9 % - Misconfiguration, Tar command, disk space - Information system update - Jobs number limitation in the waiting queue - Air-conditioning, Electrical cut Unclassified4 % - lost jobs - Unknown Server license failure23 % - Server failure - Electric cut - Server stop WISDOM failure4 %- Jobs distribution - Human failure - Script

Page 28 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Influence of the Assay Conditions with / without crystal waterdifferent pocketsparameter settings score correlation increasing

Page 29 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Different Docking Tools AutoDock: removal of internal stress term scaling of units AutoDock FlexX

Page 30 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Best Scoring Compounds for 1lee in Parameter Set 1, FlexX WISDOM , -48,6 WISDOM , -48,1 WISDOM , -47,9 WISDOM , -47,7 WISDOM , -45,6 WISDOM , -45,0 WISDOM , -43,6 WR100400, 0.11 μm

Page 31 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Lessons Learned and Conclusions Biological goal Top scoring compounds possess basic chemical groups like thiourea, guanidino, and amino acrolein as core structure. Identified compounds are non peptidic and low molecular weight compounds. More insights are needed and further analysis has to be done Check additional information resources Biomedical informatics goal WISDOM (Wide In-Silico Docking On Malaria) is the first large scale drug discovery initiative on an open grid infrastructure Docking tools are not black boxes There are many different variations of the same problem leading to different results Grid goal About 80 CPU years to produce TB of data Dont trust databases and forget flat files Some tools are still missing

Page 32 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Future Works Extension of in silico workflow Virtual docking service with further docking tools (qualitative comparisons) Molecular dynamics for reranking Setting up a relational results database Ligand similarity based clustering of results combinatorial library design (using combinatorial docking) A second data challenge is possible in 2006 (autumn) With the new EGEE middleware, gLite With a better quality process (efficiency, security…) Need to define target, docking software, compounds database finally in vitro testing and structure activity relationships

Page 33 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases The Challenges Related to Deployment in Africa Challenge in terms of infrastructure in Africa Bandwidth is a prerequisite Challenges in terms of technology Grid technology must provide the services for data and knowledge management Challenges in terms of human involvement Research laboratories and hospitals must have the expertise and the commitment It is time for an international initiative to address neglected diseases using grid infrastructures and volunteer computing

Page 34 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Acknowledgements: people Vinod Kumar Kasam Antje Wolf Astrid Maaß Mahendrakar Sridhar Horst Schwichtenberg people Matthieu Reichstadt Jean Salzemann Yannick Legré Florence Jacq cooperation partners