EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Building Grid-enabled Virtual Screening Service.

Slides:



Advertisements
Similar presentations
Workshop on HPC in India Grid Middleware for High Performance Computing Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education.
Advertisements

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks World-wide in silico drug discovery against.
INFSO-RI Enabling Grids for E-sciencE WISDOM mini-workshop Vincent Breton (CNRS-IN2P3, LPC Clermont-Ferrand) ISGC 2007 March 28th,
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
INFSO-RI Enabling Grids for E-sciencE Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases.
INFSO-RI Enabling Grids for E-sciencE EGEE – applications and training Vincent Breton, on behalf of NA4 Application identification.
Using the WS-PGRADE Portal in the ProSim Project Protein Molecule Simulation on the Grid Tamas Kiss, Gabor Testyanszky, Noam.
KISTI’s Activities on the NA4 Biomed Cluster Soonwook Hwang, Sunil Ahn, Jincheol Kim, Namgyu Kim and Sehoon Lee KISTI e-Science Division.
FKPPL workshop May 2012 BUI The Quang Prof. Vincent Breton Prof. Doman Kim Prof. NGUYEN Hong Quang Prof. PHAM Quoc Long Grid enabled in silico drug discovery.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE II - Network Service Level Agreement (SLA) Establishment EGEE’07 Mary Grammatikou.
High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud Sally R. Ellingson Graduate Research Assistant Center.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Application Case Study: Distributed.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Robin McConnell Activity Manager UEDIN (NeSC)
BIOINFOGRID: Bioinformatics Grid Application for Life Science Giorgio Maggi INFN and Politecnico di Bari
INFSO-RI Enabling Grids for E-sciencE WHO, 2/11/04 Grid enabled in silico drug discovery Vincent Breton CNRS/IN2P3 Credit for the.
INFSO-RI Enabling Grids for E-sciencE EGEE - a worldwide Grid infrastructure opportunities for the biomedical community Bob Jones.
Protein Molecule Simulation on the Grid G-USE in ProSim Project Tamas Kiss Joint EGGE and EDGeS Summer School.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
INFSO-RI Enabling Grids for E-sciencE V. Breton, 30/08/05, seminar at SERONO Grid added value to fight malaria Vincent Breton EGEE.
Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.
PRAGMA Avian Flu Grid: Drug Discovery against Infectious Diseases Wilfred W. Li, Ph.D., UCSD, USA Habibah Wahab, Ph.D., USM, Malaysia.
Page 1 SCAI Dr. Marc Zimmermann Department of Bioinformatics Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) Grid-enabled drug discovery.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks, An Overview of the GridWay Metascheduler.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
EGEE-II INFSO-RI Enabling Grids for E-sciencE WISDOM, a grid enabled virtual screening initiative Yannick Legré LPC Clermont-Ferrand,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Bazaar Vision Ideas of RC/VO coordination,
INFSO-RI Enabling Grids for E-sciencE Biomedical applications V. Breton, CNRS-IN2P3.
INFSO-RI Enabling Grids for E-sciencE In silico docking on EGEE infrastructure, the case of WISDOM Nicolas Jacq LPC of Clermont-Ferrand,
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
EGEE-II INFSO-RI Enabling Grids for E-sciencE WISDOM in EGEE-2, biomed meeting, 2006/04/28 WISDOM : Grid-enabled Virtual High Throughput.
INFSO-RI Enabling Grids for E-sciencE Grid-enabled drug discovery to address neglected diseases N. Jacq – CNRS-IN2P3 EGAAP meeting.
INFSO-RI Enabling Grids for E-sciencE Towards grid-enabled telemedicine in Africa Yannick Legré on behalf of Vincent Breton CNRS-IN2P3,
Avian Flu Data Challenge Hsin-Yen Chen ASGC 29 Aug APAN24.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE08 conference, Istambul Biomed community meeting V. Breton, CNRS.
INFSO-RI Enabling Grids for E-sciencE Graphical User Interface. for Charon Extension Layer System. and Application Dashboards Jan.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks INFSO-RI Enabling Grids for E-sciencE.
INFSO-RI Enabling Grids for E-sciencE EGEE Review WISDOM demonstration Vincent Bloch, Vincent Breton, Matteo Diarena, Jean Salzemann.
INFSO-RI Enabling Grids for E-sciencE EGEE is a project funded by the European Union under contract INFSO-RI Grid Accounting.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Gergely Sipos Activity Deputy Manager MTA.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks NA3: User Training and Induction UCY Activities.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Activités biomédicales dans EGEE-II Nicolas.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Abel Carrión Ignacio Blanquer Vicente Hernández.
BIOINFOGRID: Bioinformatics Grid Application for life science MILANESI, Luciano National Research Council Institute of.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
The business model of EtnaLead has been derivated from a model that has been developed in the IT services for a decade, that can be called the "right cost.
INFSO-RI Enabling Grids for E-sciencE Use Case of gLite Services Utilization. Multiple Ligand Trajectory Docking Study Jan Kmuníček.
User Scenarios in VENUS-C Focus on Structural Analysis Ignacio Blanquer I3M - UPV.
HA Neuramindase (NA) and replication of virions A enzyme, cleaves host receptors help release of new virions NA Modeling HTS against Inf-A NA on Grid Ying-Ta.
INFSO-RI Enabling Grids for E-sciencE Grid Services for Resource Reservation and Allocation Tiziana Ferrari Istituto Nazionale di.
Susanna Guatelli Geant4 in a Distributed Computing Environment S. Guatelli 1, P. Mendez Lorenzo 2, J. Moscicki 2, M.G. Pia 1 1. INFN Genova, Italy, 2.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks NA3 Activity – Training and Induction Robin.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Bioinformatics activity Christophe BLANCHET.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Overview of gLite, the EGEE middleware Mike Mineter Training Outreach Education National.
Milanesi Luciano Catania, Italy 13/03/2007 Bioinformatics challenges in European projects in Grid. Milanesi Luciano National Research Council Institute.
Bioinformatics Grid Application for Life Science. COMMUNICATION NETWORK DEVELOPMENT SPECIFIC SUPPORT ACTION BIOINFOGRID Andreas Gisel & Luciano Milanesi.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Robin McConnell Activity Manager UEDIN (NeSC)
Docking and Virtual Screening Using the BMI cluster
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Bob Jones EGEE project director.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Introduction to Grids and the EGEE project.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
2 nd EGEE/OSG Workshop Data Management in Production Grids 2 nd of series of EGEE/OSG workshops – 1 st on security at HPDC 2006 (Paris) Goal: open discussion.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Computational Chemistry Cluster – EGEE-III.
WISDOM-II, status of preparation
In silico docking on grid infrastructures
Presentation transcript:

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Building Grid-enabled Virtual Screening Service for Drug Discovery Ying-Ta Wu 1 and Hurng-Chun Lee 2 1 Academia Sinica Genomic Research Center 2 Academia Sinica Grid Computing Center (ASGC)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Outlines Avian flu drug analysis on the Grid Developing grid-enabled virtual screening service The next large-scale virtual screening on avian flu

Enabling Grids for E-sciencE EGEE-II INFSO-RI The virtual screening 300,000 Chemical compounds: ZINC Chemical combinatorial library Target (PDB) : Neuraminidase (8 structures) Millions of chemical compounds available in laboratories High Throughput Screening $2/compound, nearly impossible Molecular docking (Autodock) ~137 CPU years, 600 GB data Data challenge on EGEE, Auvergrid, TWGrid ~6 weeks on ~2000 computers In vitro screening of 100 hits Hits sorting and refining

Enabling Grids for E-sciencE EGEE-II INFSO-RI st large-scale avian flu virtual screening on the Grid In 2006, a grid-enabled high-throughput screening against the H5N1 virus was performed –Matching 300,000 ligands against 8 targets using AutoDock –The computing requirement of 137 CPU years was tackled by the 6-weeks high-throughput screening (HTS) activity on EGEE, AuverGrid and TWGrid –Two different computing models (WISDOM and DIANE) were adopted for submitting docking jobs concerning two different user aspects (scalability and interactivity) –The goal is to analyze the efficiency of the known drugs to the possible Neuraminidases mutations

Enabling Grids for E-sciencE EGEE-II INFSO-RI High-throughput screening using WISDOM WISDOM: Wide In-Silico Docking On Malaria The platform has been successfully tested in previous challenge a workflow of Grid job handling: automatic job submission, status check and report, error recovery push model job scheduling + batch mode job handling StorageElement ComputingElement Site1 Site2 StorageElement User interface ComputingElement Compound s database Software Parameter settings Target structures Compounds sublists Results Statistics Compounds list

Enabling Grids for E-sciencE EGEE-II INFSO-RI Interactive screening using DIANE + GANGA DIANE: Distributed Analysis Environment An overlay system on top of a variety of distributed computing environment, taking care of all synchronization, communication and workflow management details on behalf of application A lightweight framework for parallel scientific applications in master-worker model Pull model job scheduling + interactive mode job handling with flexible failure recovery mechanism

Enabling Grids for E-sciencE EGEE-II INFSO-RI The grid statistics WISDOMDIANE Total number of completed dockings2 * ,585 Estimated duration on 1 CPU88.3 years16.7 years Duration of the experience6 weeks4 weeks Cumulative number of the Grid jobs54, Max. number of concurrent CPUs2, Crunching rate Approximated distribution efficiency46 %84 % Approximated throughput2 sec/docking10 sec./ docking ~600 GBytes of docking results are produced and archived on the Grid ~83% were successfully completed according to the Grid Logging and Bookkeeping; only ~70% of results were really produced on the Grid storage element

Enabling Grids for E-sciencE EGEE-II INFSO-RI GNA 2.4% 15% cut off GNA=zanamivir Original Type: T06 DAN 35% 4AM 13% pKd=5.3 pKd=7.3 pKd=7.5 Ki=4uM Ki=150nM Ki=1nM 2qwe: Zanamivir (known drug) five out of six known effective compounds can be identified in the first 15% of the ranking E = (5/6)/15% = 5.5 (< 1 in most cases) Enrichment of primary in silico HTS

Enabling Grids for E-sciencE EGEE-II INFSO-RI Mutation effects 300,000 x15% = 45,000 45,000 x 5% = 2,250 autodock re-rank

Enabling Grids for E-sciencE EGEE-II INFSO-RI Effects of point mutation Most known effective inhibitors lose their affinity in binding with a mutated target E119A T01 DNA 4AM 55% 11.5% 2qwe: 2.4%  11.5% 1f8c: 13%  55%

Enabling Grids for E-sciencE EGEE-II INFSO-RI Q: How to deliver an user-friendly service integrating the high-throughput virtual screening and the data analysis?

Enabling Grids for E-sciencE EGEE-II INFSO-RI Lessons learnt The grid-enabled virtual screening application does benefit the drug analysis in terms of money and time. –137 CPU years in 6 weeks using about 2000 grid worker nodes –Primary HTS helps filter out 85% of compounds –Global enrichment rate: 5.5 –Mutations do affect the efficiency of the know drugs and potential hits Gaps between the current system and a real end-user application –Lack of a well-annotated ligand database –Lack of a friendly user interface to run the virtual screening process on the Grid –Lack of an easy-to-use interface to access the produced docking results for further analysis –Lack of an automatic refinement pipeline

Enabling Grids for E-sciencE EGEE-II INFSO-RI GUI - first step to real end-user application Job History Interactive analysis Progress monitoring

Enabling Grids for E-sciencE EGEE-II INFSO-RI Refinement pipeline

Enabling Grids for E-sciencE EGEE-II INFSO-RI Common database Chemical properties to better annotate the compounds Results essential for further analysis are extracted and stored in a result database Database access through AMGA –for access control –for data replication

Enabling Grids for E-sciencE EGEE-II INFSO-RI Proposal of the 2nd data challenge Proposed plan: –Testing phase: May, 2007 –Official launch: June, 2007 Biology goals –Further analysis on the effect of the mutations –Further analysis on the open conformation of NA Grid goals –Improving the service usability –Enabling the refinement pipeline –Reducing researchers’ effort in data analysis

Enabling Grids for E-sciencE EGEE-II INFSO-RI Credit Docking workflow preparation –Contact point: Y.T. Wu –E. Rovida –P. D'Ursi –N. Jacq Grid resource management –Contact point: J. Salzemann –TWGrid : H.C. Lee, H. Y. Chen –AuverGrid : E. Medernach –EGEE : Y. Legré Platform deployment on the Grid –Contact point: H.C. Lee, J. Salzemann –M. Reichstadt –N. Jacq Users (deputy) –J. Salzemann (N. Jacq) –M. Reichstadt (E. Medernach) –L. Y. Ho (H. C. Lee) –I. Merelli, C. Arlandini (L. Milanesi) –J. Montagnat (T. Glatard) –R. Mollon (C. Blanchet) –I. Blanque (D. Segrelles) –D. Garcia

Enabling Grids for E-sciencE EGEE-II INFSO-RI Mini Workshop tomorrow afternoon from 2 pm at Conference Room 4 Discussions on the collaboration issues of the 2nd avian flu data challenge

Enabling Grids for E-sciencE EGEE-II INFSO-RI DIANE Directory Service Improving the scalability of the DIANE framework The Directory Service is a server containing a list of all the masters The Master register itself to the Directory Service The Workers obtain a Master through the Directory Service Directory Service has an algorithm for the load balancing of the workers and prioritization of the masters