Download presentation
Presentation is loading. Please wait.
Published byWilliam Bryant Modified over 9 years ago
1
EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks World-wide in silico drug discovery against neglected and emerging diseases on grid infrastructures Nicolas Jacq HealthGrid Association, France Credit: WISDOM initiative
2
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 2 Content Overview of the WISDOM application Deployment on the EGEE grid and experience Conclusion
3
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 3 WISDOM WISDOM (http://wisdom.healthgrid.org/)http://wisdom.healthgrid.org/ –Developing new drugs for neglected and emerging diseases with a particular focus on malaria. –Reduced R&D costs and accelerated R&D for emerging and neglected diseases Three large calculations: –WISDOM-I (Summer 2005) –Avian Flu (Spring 2006) –WISDOM-II (Autumn 2006)
4
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 4 In silico drug discovery presents unique challenges for Information Technologists and computer scientists Clinical Phases (I-III) DRUG DISCOVERY IN SILICO DRUG DISCOVERY
5
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 5 Docking: predict how small molecules bind to a receptor of known 3D structure Simplified virtual screening process by docking Successful examples –rapid, –cost effective… But there are limitations –Need for CPU and storage
6
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 6 Grid-enabled high throughput virtual screening by docking A few target structures Millions of chemical compounds 1 to 30 mn per docking A few MB by output 100 CPU years, 1 TB Large scale deployment on grid infrastructure Challenges: Speed-up the process Manage the data Docking software
7
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 7 Example: In silico drug discovery on avian flu The goal is to study in silico the impact of selected point mutations on the efficiency of existing drugs and to find new potential drugs A collaboration of 5 grid projects: Auvergrid, BioinfoGrid, EGEE-II, Embrace, TWGrid Significant parameters: –1 docking software: Autodock –8 conformations of the target (N1 neuraminidase) –300,000 selected compounds –105 year CPU to dock all configurations on all compounds Timescale: –First contacts: March 1st 2006 –kick-off: April 1st 2006 –Duration: 6 weeks N1H5 Credit: Y-T Wu
8
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 8 Results TargetsCom- pound s CPU- years Duration (wk) Max. CPUs Size of Results (TB) WISDOM-I (Q3’05) Plasmepsin 1M8061,7001 Avian Flu (Q2’06) H5N1300k10561,7000.750 WISDOM-II (Q4’06) GST DHFR Tubulin 4.2M42085,0002
9
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 9 Example : In silico results from avian flu data challenge 5 out of 6 known effective inhibitors can be identified in the first 15% of the ranking and in the first 5% reranked (2,250 compounds) –Enrichment = 5.5 and 111 (<1 in most cases) Most known effective inhibitors lose their affinity in binding with a mutated target GNA 2.4% 15% cut off E119A 11.5% E119A mutated type GNA 11.5% Original type
10
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 10 Experimental assay confirms 7 actives out of 123 purchased “ potential hits ” (interacting complexes with higher affinities and proper docked poses), which proved the usefulness of our work. NA Example : In vitro results from avian flu data challenge
11
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 11 Content Overview of the WISDOM application Deployment on the EGEE grid and experience Conclusion
12
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 12 Requirements for a large scale deployment on grid Adaptation of the application to the grid Access to a large infrastructure providing maintained resources Use of a production system providing automated and fault-tolerant job and file management
13
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 13 Adaptation of the application to the grid The applications are not designed for grid computing. The application code can not be modified. A common strategy is to split the application into shorter tasks License management for commercial software is not yet adapted for large infrastructure
14
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 14 Access to a large infrastructure (1/3) A resource estimation is needed before the deployment The application package requires installation (and testing) An efficient and responsive user support of the infrastructure is required
15
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 15 Access to a large infrastructure (2/3) : the EGEE infrastructure Real Time Monitor EGEE added value: –Large computing and storage resources (>30000 CPUs, 50Pb) –24 hours a day availability of resources –User support –Job and Data Management –Information and Monitoring –Security Limitations for life science applications –Short jobs –Data confidentiality –Reliability of services –…
16
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 16 Access to a large infrastructure (3/3) : Biomedical Virtual Organization status Biomed VO leader : V. Breton ~80 participants, see http://egeena4.lal.in2p3.frhttp://egeena4.lal.in2p3.fr Three active subgroups –Medical imaging (J. Montagnat) –Bioinformatics (C. Blanchet) –Drug discovery (V.Breton) Biomedical VO manager: Y. Legré, legre@clermont.in2p3.fr See http://cic.in2p3.fr (VO information, publication of data challenge…)http://cic.in2p3.fr 1 VOMS server, 1 LFC, +20 RBs +100 CEs, +8,000 CPUs (but many users) +110 SEs, ~Tens of TB available on disk 27 countries
17
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 17 Use of a production system Managing thousands of jobs and files is a manually labor- intensive task –Job preparation, submission and monitoring, output retrieval, failure identification and resolution, job resubmission… The rate of submitted jobs must be carefully monitored –In order to avoid Resource Brokers overload –In order to efficiently use the resources The amount of transferred data impacts on grid performance –The data must be installed on the grid –Storing subsets of the database instead of large unique compound files Grid process introduces significant delays –The submitted jobs must be sufficiently long in order to reduce the impact of this middleware overhead
18
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 18 Use of a production system Other production system from HEP experiments on EGEE –The ATLAS production system - The ATLAS experiment –BOSS and CRAB - The CMS experiment –Alien - The Alice experiment –DIRAC - The LHCb experiment –DIANE - CERN –Ganga, a user interface –GridICE and Monalisa, two monitoring services for users
19
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 19 DMSDMS User Interface HealthGrid Server Web Site WMS SEsCEs &WNs FlexLM Schema of the WISDOM production environment User Interface WISDOM production system WMS Submits the jobs Checks job status Resubmits CEs &WNs FlexX job SEs Structure file Compounds file inputs outputs Output file Local server Web Site WISDOM DB Output DB Docking information Statistics FLEXlm license FlexX Statistics
20
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 20 A huge international effort for WISDOM-II Significant contributions from EELA, EUMedGRID and EUChinaGRID Over 420 CPU years in 10 weeks A record throughput of 100,000 docked compounds per hour WISDOM calculations used FlexX from BioSolveIT (6k free, floating licenses)
21
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 21 Origin of failures during the WISDOM-I deployment RateReasons Success rate after checking output data46 % Server license failure23%Server failure Power cut Server stop WISDOM failure4%Job distribution Human error Script failure Workload Management failure10 %Overload, disk failure Mis-configuration, disk space problem Air-conditioning, power cut Data Management failure4 %Network / connection Power cut Other unknown causes Sites failure9 %Mis-configuration, tar command, disk space Information system update Job number limitation in the waiting queue Air-conditioning, electrical cut Unclassified4 %Lost jobs Other unknown causes Grid success rate 63% After substracting license server and WISDOM failures
22
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 22 Success rates of the deployments WISDOM-I –User success rate : 46% License server is a bottleneck –Grid success rate : 63% Heterogeneous and dynamic nature of the grid Power cut, air-conditionning, mis-configuration, overload… Stress usage Automatic jobs (re)submission (“sink-hole” effect) WISDOM against avian flu –Grid success rate: 80% Constant and slower job submission flow Manual control of resubmission process WISDOM fault-tolerance improved Grid reliability improved (Workload Management System)
23
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 23 Content Overview of the WISDOM application Deployment on the EGEE grid and experience Conclusion
24
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 24 Summary (1/2) The experiments demonstrated how grid infrastructures have a tremendous capacity to mobilize very large CPU resources for well targeted goals during a significant period of time –1st large scale deployment of life sciences application on a grid infrastructure The deployments have been a very useful experience in identifying the limitations and bottlenecks of the EGEE infrastructure and middleware The reliability is still the major issue for the WISDOM production system and the EGEE middleware Large scale deployment still requires to be grid expert
25
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 25 Summary (2/2) WISDOM data challenge has demonstrated that collaborative production grids can be used for steps in the drug discovery process –1st production of biochemical results on a grid infrastructure The impact has significantly raised the interest of the research community on malaria. Output data collection and presentation require improvements to speed-up the post-docking analysis –Storage of output metadata from the jobs in a relational database –Access to this database and to the docking output files is required
26
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Jacq, 16.04.2007 26 Thank you To all members of the WISDOM collaboration for their contribution to the project To all grid nodes which committed resources and allowed the success of the initiative To all projects which supported the initiative by providing either computing resources or manpower to develop the WISDOM environment To BioSolveIT by offering up to 6000 free licenses of FlexX
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.