WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann
Page 2 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Burden of Diseases in Developing World
Page 3 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Screen in culture Screen in animal models pharmacokinetics analysis High throughput screening InhibitorsValidated hitsLeads Drug candidate Iterative medicinal chemistry Optimise efficacy and pharmaceutical qualities Drug Discovery Process Target selection Validated? Robust assay system? Structural Information? Lead inhibitors? Product development Target More detailed potency/efficacy studies; pharmacokinetics; early toxicology Nwaka & Ridley 2003 Natural products/traditional medicines
Page 4 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Gaps in Drug R&D Tropical / Neglected Diseases Valid. drug targets New compounds Good animal models to assess safety and efficacy Good evaluation tools Clinical trials capacity in developing countries Capacity for uptake of new medicines Capacity for post- approval processes Research and discovery Preclinical development Phase IPhase IIPhase III Registration, launch, utilisation GAP I GAP II GAP III GAP IV
Page 5 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Potential Impact of an Distributed IT Infrastructure Speed-up the development of new drugs and vaccines in silico research on grids Collection of epidemiological data for research (modeling, molecular biology) Ease the deployment of clinical trials in endemic areas Improve disease monitoring Collect data on drug distribution and treatment follow-up Evaluate impact of policies and programs Improve alert and monitoring system for epidemics Ease integration of African research laboratories in world research Offer access to IT resources Offer access to data and services (telemedicine, life sciences)
Page 6 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases From a single PC to a Grid Farm of PCs Example: Novartis Examples: Example: EGEE Enterprise grid: Mutualization of resources in a company Volunteer computing: CPU cycles made available by PC owners Grid infrastructure: Internet + disk and storage resources + services for information management ( data collection, transfer and analysis)
Page 7 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Grid Impact on Drug Discovery Workflow down to Drug Delivery (1/2) Grids provide the necessary tools and data to identify new biological targets Bioinformatics services (database replication, workflow,...) Resources for CPU intensive tasks such as genomics comparative analysis, inverse docking,... Grids provide the resources to speed up lead discovery Large scale in silico docking to identify potentially promising compounds Molecular Dynamics computations to refine virtual screening and further assess selected compounds
Page 8 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Grid Impact on Drug Discovery Workflow down to Drug Delivery (2/2) Grids provide environments for epidemiology Federation of databases to collect data in endemic areas to study a disease and to evaluate impact of vaccine, vector control measures, Resources for data analysis and mathematical modeling Grids provide the services needed for clinical trials Federation of database to collect data in the centers participating to the clinical trials Grids provide the tools to monitor drug delivery Federation of database to monitor drug delivery
Page 9 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases WISDOM : Wide In Silico Docking On Malaria Biological goal Proposition of new inhibitors for a family of proteins produced by Plasmodium falciparum Biomedical informatics goal Deployment of in silico virtual docking on the grid Grid goal Deployment of a CPU consuming application generating large data flows to test the grid operation and services => data challenge
Page 10 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Introduction to the Disease : Malaria ~300 million people worldwide are affected million people die every year Widely spread Caused by protozoan parasites of the genus Plasmodium Complex life cycle with multiple stages
Page 11 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases There is a Real Need for New Drugs to Fight Malaria (WHO) Drug resistance has emerged for all classes of antimalarials except artemisinins. Resistance to chloroquine, the cheapest and the most used drug, is spreading in almost all the endemic countries. Resistance to the combination of sulfadoxine-pyrimethamine which was already present in South America and in South-East Asia is now emerging in East Africa (65% in Western Tanzania) All countries experiencing resistance to conventional monotherapies should use ACTs (artemisinin-based combination therapies) But there is even the threat of resistance to artemisinin too, as it is already observed in murine Plasmodium yoelii
Page 12 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Identification of New Antimalarial Targets The available drugs focus on a limited number of biological targets => cross-resistance to antimalarials With the advent of the plasmodium genome, many targets came into light The potential antimalarial drug targets are broadly classified into three categories: Targets involved in hemoglobin degradation (proteases like plasmepsins, falcipains) Targets involved in metabolism Targets engaged in membrane transport and signaling (choline transporter etc ). The present project WISDOM focuses on hemoglobin metabolism and especially on Plasmepsin II and Plasmepsin IV
Page 13 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Docking Dataflow hit crystal structure ligand data base junk placing the ligand Structure optimization Ranking Protein surface Ligand Water molecule Scoring
Page 14 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases High Throughput Virtual Docking Chemical compounds (ZINC): Chembridge – 500,000 Drug like – 500,000 Targets (PDB): Plasmepsin II (1lee,1lf2,1lf3) Plasmepsin IV (1ls5) Millions of chemical compounds available in laboratories High Throughput Screening 1-10$/compound, nearly impossible Molecular docking (FlexX, AutoDock) ~80 CPU years, 1 TB data Data challenge on EGEE ~6 weeks on ~1700 computers Hits screening using assays performed on living cells Leads Clinical testing Drug
Page 15 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Experimental setup for HT-experimentation 2 different assays: ie. docking tools FlexX and AutoDock 5+3 (5+5) different targets from two families (Plm II and IV): 1lee, 1lf2, 1lf3, 1ls5 A and B chain, including 3 different crystal water setups 4 (2) different assay conditions, ie. parameter variations (place particles, overlap volume, 2 genetic algorithms) multiple point measuring, i.e. pose clustering and select 10 cluster centres 52 large scale experiments 26 million measurements 52 large scale experiments 26 million measurements 4 x x 10
Page 16 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Testing under Controlled Conditions redocking studies of 5 co-crystallized ligands and 9 known inhibitors mixed with ZINC compounds ZINC control test quality filter 2000
Page 17 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases EGEE, International Project of Grid Infrastructure Started in 2004, >70 partners in the world Project leader : CERN 6 scientific domains with >20 applications deployed 170 grid nodes, CPUs, several PetaBytes of data, jobs by day Countries with nodes contributing to the data challenge WISDOM
Page 18 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Simplified Grid Workflow 3000 floating licenses given by BioSolveIT to SCAI Maximum number of used licenses was 1008 StorageElement ComputingElement Site1 Site2 StorageElement User interface ComputingElement Compounds database Parameter settings Target structures Compounds sub lists Results Statistics Compounds list RessourceBroker Software FlexX License Server EGEE
Page 19 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases WISDOM Architecture GRID Grid services (RB, RLS…) Grid resources (CE, SE) Application components (Software, database) wisdom_install InstallerTester wisdom_test wisdom_execution Workload definition Job submission Job monitoring Job bookkeeping Fault tracking Fault fixing Job resubmission Set of jobs User wisdom_collect Accounting data Superviser wisdom_site wisdom_db License server
Page 20 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Objective of the WISDOM Development Objective Producing a large amount of data in a limited time with a minimal human cost during the data challenge. Need an optimized environment Limited time Performance goal Need a fault tolerant environment Grid is heterogeneous and dynamic Stress usage of the grid during the DC Need an automatic production environment Execution with the Biomedical Task Force Grid API are not fully adapted for a bulk use at a large scale
Page 21 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Number of Docked Ligands (millions) vs. Time (days) : Intensive submission of FlexX jobs with Chembridge ligands base 2: Resubmission 3: Intensive submission of FlexX jobs with drug like ligands base 4: Resubmission 5: Intensive submission of Autodock jobs with Chembridge ligands base 6: Resubmission
Page 22 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Number of Running and Waiting Jobs vs. Time
Page 23 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Total Amount of CPU Provided by EGEE Federation
Page 24 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Exploitation Metrics MetricsFlexX + Autodock phases Total CPU time80 years Number of jobs72751 Number of grid nodes58 Number of jobs running in parallel on the grid 1643 Volume of output data946 GB Volume of transferred data (input + output)6302 GB
Page 25 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Performance Metrics MetricsFlexX + Autodock phases Cumulated millions number of docked ligands 41,27 Number of docked ligands / h46475 Effective CPU time67,15 years Effective duration37 days Crunching factor662 Average transfer rate0,8 MB/s Peak rate62,1 MB/s
Page 26 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Efficiency Metrics (1/2) MetricsFlexX + Autodock phases Success rate77,0 % Success rate after results checking 46,2 % Success rate after results checking without WISDOM failures 63,0 % Efficiency depends on : Heterogeneous and dynamic nature of the grid Stress usage Automatic jobs (re)submission (sink-hole effect)
Page 27 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Efficiency Metrics (2/2) Successful jobs46 % Workload Management failure10 % - Overload, Disk failure - Misconfiguration, disk space problem - Air-conditioning, electrical cut Data Management failure4 % - Network / connection - Electrical cut - Unknown Sites failure9 % - Misconfiguration, Tar command, disk space - Information system update - Jobs number limitation in the waiting queue - Air-conditioning, Electrical cut Unclassified4 % - lost jobs - Unknown Server license failure23 % - Server failure - Electric cut - Server stop WISDOM failure4 %- Jobs distribution - Human failure - Script
Page 28 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Influence of the Assay Conditions with / without crystal waterdifferent pocketsparameter settings score correlation increasing
Page 29 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Different Docking Tools AutoDock: removal of internal stress term scaling of units AutoDock FlexX
Page 30 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Best Scoring Compounds for 1lee in Parameter Set 1, FlexX WISDOM , -48,6 WISDOM , -48,1 WISDOM , -47,9 WISDOM , -47,7 WISDOM , -45,6 WISDOM , -45,0 WISDOM , -43,6 WR100400, 0.11 μm
Page 31 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Lessons Learned and Conclusions Biological goal Top scoring compounds possess basic chemical groups like thiourea, guanidino, and amino acrolein as core structure. Identified compounds are non peptidic and low molecular weight compounds. More insights are needed and further analysis has to be done Check additional information resources Biomedical informatics goal WISDOM (Wide In-Silico Docking On Malaria) is the first large scale drug discovery initiative on an open grid infrastructure Docking tools are not black boxes There are many different variations of the same problem leading to different results Grid goal About 80 CPU years to produce TB of data Dont trust databases and forget flat files Some tools are still missing
Page 32 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Future Works Extension of in silico workflow Virtual docking service with further docking tools (qualitative comparisons) Molecular dynamics for reranking Setting up a relational results database Ligand similarity based clustering of results combinatorial library design (using combinatorial docking) A second data challenge is possible in 2006 (autumn) With the new EGEE middleware, gLite With a better quality process (efficiency, security…) Need to define target, docking software, compounds database finally in vitro testing and structure activity relationships
Page 33 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases The Challenges Related to Deployment in Africa Challenge in terms of infrastructure in Africa Bandwidth is a prerequisite Challenges in terms of technology Grid technology must provide the services for data and knowledge management Challenges in terms of human involvement Research laboratories and hospitals must have the expertise and the commitment It is time for an international initiative to address neglected diseases using grid infrastructures and volunteer computing
Page 34 Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases Acknowledgements: people Vinod Kumar Kasam Antje Wolf Astrid Maaß Mahendrakar Sridhar Horst Schwichtenberg people Matthieu Reichstadt Jean Salzemann Yannick Legré Florence Jacq cooperation partners