Presentation is loading. Please wait.

Presentation is loading. Please wait.

EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Building Grid-enabled Virtual Screening Service.

Similar presentations


Presentation on theme: "EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Building Grid-enabled Virtual Screening Service."— Presentation transcript:

1 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Building Grid-enabled Virtual Screening Service for Drug Discovery Ying-Ta Wu 1 and Hurng-Chun Lee 2 1 Academia Sinica Genomic Research Center 2 Academia Sinica Grid Computing Center (ASGC)

2 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Outlines Avian flu drug analysis on the Grid Developing grid-enabled virtual screening service The next large-scale virtual screening on avian flu

3 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 The virtual screening 300,000 Chemical compounds: ZINC Chemical combinatorial library Target (PDB) : Neuraminidase (8 structures) Millions of chemical compounds available in laboratories High Throughput Screening $2/compound, nearly impossible Molecular docking (Autodock) ~137 CPU years, 600 GB data Data challenge on EGEE, Auvergrid, TWGrid ~6 weeks on ~2000 computers In vitro screening of 100 hits Hits sorting and refining

4 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1 st large-scale avian flu virtual screening on the Grid In 2006, a grid-enabled high-throughput screening against the H5N1 virus was performed –Matching 300,000 ligands against 8 targets using AutoDock –The computing requirement of 137 CPU years was tackled by the 6-weeks high-throughput screening (HTS) activity on EGEE, AuverGrid and TWGrid –Two different computing models (WISDOM and DIANE) were adopted for submitting docking jobs concerning two different user aspects (scalability and interactivity) –The goal is to analyze the efficiency of the known drugs to the possible Neuraminidases mutations

5 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 High-throughput screening using WISDOM WISDOM: Wide In-Silico Docking On Malaria The platform has been successfully tested in previous challenge a workflow of Grid job handling: automatic job submission, status check and report, error recovery push model job scheduling + batch mode job handling StorageElement ComputingElement Site1 Site2 StorageElement User interface ComputingElement Compound s database Software Parameter settings Target structures Compounds sublists Results Statistics Compounds list

6 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Interactive screening using DIANE + GANGA DIANE: Distributed Analysis Environment An overlay system on top of a variety of distributed computing environment, taking care of all synchronization, communication and workflow management details on behalf of application A lightweight framework for parallel scientific applications in master-worker model Pull model job scheduling + interactive mode job handling with flexible failure recovery mechanism

7 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 The grid statistics WISDOMDIANE Total number of completed dockings2 * 10 6 308,585 Estimated duration on 1 CPU88.3 years16.7 years Duration of the experience6 weeks4 weeks Cumulative number of the Grid jobs54,0002580 Max. number of concurrent CPUs2,000240 Crunching rate912203 Approximated distribution efficiency46 %84 % Approximated throughput2 sec/docking10 sec./ docking ~600 GBytes of docking results are produced and archived on the Grid ~83% were successfully completed according to the Grid Logging and Bookkeeping; only ~70% of results were really produced on the Grid storage element

8 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 GNA 2.4% 15% cut off GNA=zanamivir Original Type: T06 DAN 35% 4AM 13% pKd=5.3 pKd=7.3 pKd=7.5 Ki=4uM Ki=150nM Ki=1nM 2qwe: Zanamivir (known drug) five out of six known effective compounds can be identified in the first 15% of the ranking E = (5/6)/15% = 5.5 (< 1 in most cases) Enrichment of primary in silico HTS

9 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Mutation effects 300,000 x15% = 45,000 45,000 x 5% = 2,250 autodock re-rank

10 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Effects of point mutation Most known effective inhibitors lose their affinity in binding with a mutated target E119A T01 DNA 4AM 55% 11.5% 2qwe: 2.4%  11.5% 1f8c: 13%  55%

11 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Q: How to deliver an user-friendly service integrating the high-throughput virtual screening and the data analysis?

12 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Lessons learnt The grid-enabled virtual screening application does benefit the drug analysis in terms of money and time. –137 CPU years in 6 weeks using about 2000 grid worker nodes –Primary HTS helps filter out 85% of compounds –Global enrichment rate: 5.5 –Mutations do affect the efficiency of the know drugs and potential hits Gaps between the current system and a real end-user application –Lack of a well-annotated ligand database –Lack of a friendly user interface to run the virtual screening process on the Grid –Lack of an easy-to-use interface to access the produced docking results for further analysis –Lack of an automatic refinement pipeline

13 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 GUI - first step to real end-user application Job History Interactive analysis Progress monitoring

14 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Refinement pipeline

15 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Common database Chemical properties to better annotate the compounds Results essential for further analysis are extracted and stored in a result database Database access through AMGA –for access control –for data replication

16 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Proposal of the 2nd data challenge Proposed plan: –Testing phase: May, 2007 –Official launch: June, 2007 Biology goals –Further analysis on the effect of the mutations –Further analysis on the open conformation of NA Grid goals –Improving the service usability –Enabling the refinement pipeline –Reducing researchers’ effort in data analysis

17 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Credit Docking workflow preparation –Contact point: Y.T. Wu –E. Rovida –P. D'Ursi –N. Jacq Grid resource management –Contact point: J. Salzemann –TWGrid : H.C. Lee, H. Y. Chen –AuverGrid : E. Medernach –EGEE : Y. Legré Platform deployment on the Grid –Contact point: H.C. Lee, J. Salzemann –M. Reichstadt –N. Jacq Users (deputy) –J. Salzemann (N. Jacq) –M. Reichstadt (E. Medernach) –L. Y. Ho (H. C. Lee) –I. Merelli, C. Arlandini (L. Milanesi) –J. Montagnat (T. Glatard) –R. Mollon (C. Blanchet) –I. Blanque (D. Segrelles) –D. Garcia

18 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Mini Workshop tomorrow afternoon from 2 pm at Conference Room 4 Discussions on the collaboration issues of the 2nd avian flu data challenge

19 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 DIANE Directory Service Improving the scalability of the DIANE framework The Directory Service is a server containing a list of all the masters The Master register itself to the Directory Service The Workers obtain a Master through the Directory Service Directory Service has an algorithm for the load balancing of the workers and prioritization of the masters


Download ppt "EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Building Grid-enabled Virtual Screening Service."

Similar presentations


Ads by Google