EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks World-wide in silico drug discovery against.

Slides:



Advertisements
Similar presentations
Fighting Malaria With The Grid. Computing on The Grid The Internet allows users to share information across vast geographical distances. Using similar.
Advertisements

INFSO-RI Enabling Grids for E-sciencE WISDOM mini-workshop Vincent Breton (CNRS-IN2P3, LPC Clermont-Ferrand) ISGC 2007 March 28th,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Infrastructure overview Arnold Meijster &
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
INFSO-RI Enabling Grids for E-sciencE Application Demonstrations C. Loomis, J. Moscicki, J. Montagnat EGEE European Review (CERN)
INFSO-RI Enabling Grids for E-sciencE EGEE – applications and training Vincent Breton, on behalf of NA4 Application identification.
KISTI’s Activities on the NA4 Biomed Cluster Soonwook Hwang, Sunil Ahn, Jincheol Kim, Namgyu Kim and Sehoon Lee KISTI e-Science Division.
FKPPL workshop May 2012 BUI The Quang Prof. Vincent Breton Prof. Doman Kim Prof. NGUYEN Hong Quang Prof. PHAM Quoc Long Grid enabled in silico drug discovery.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
Nicholas LoulloudesMarch 3 rd, 2009 g-Eclipse Testing and Benchmarking Grid Infrastructures using the g-Eclipse Framework Nicholas Loulloudes On behalf.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Configuring and Maintaining EGEE Production.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Building Grid-enabled Virtual Screening Service.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Application Case Study: Distributed.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
BIOINFOGRID: Bioinformatics Grid Application for Life Science Giorgio Maggi INFN and Politecnico di Bari
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
INFSO-RI Enabling Grids for E-sciencE EGEE - a worldwide Grid infrastructure opportunities for the biomedical community Bob Jones.
INFSO-RI Enabling Grids for E-sciencE V. Breton, 30/08/05, seminar at SERONO Grid added value to fight malaria Vincent Breton EGEE.
Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.
Grid infrastructure analysis with a simple flow model Andrey Demichev, Alexander Kryukov, Lev Shamardin, Grigory Shpiz Scobeltsyn Institute of Nuclear.
Report from the EELA External Advisory Committee V. Breton, F. Gagliardi, M. Kunze 5/9/2006.
Page 1 SCAI Dr. Marc Zimmermann Department of Bioinformatics Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) Grid-enabled drug discovery.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
EGEE-II INFSO-RI Enabling Grids for E-sciencE WISDOM, a grid enabled virtual screening initiative Yannick Legré LPC Clermont-Ferrand,
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Next steps with EGEE EGEE training community.
INFSO-RI Enabling Grids for E-sciencE Biomedical applications V. Breton, CNRS-IN2P3.
INFSO-RI Enabling Grids for E-sciencE In silico docking on EGEE infrastructure, the case of WISDOM Nicolas Jacq LPC of Clermont-Ferrand,
ISGC 2007 – March 28th, 2007 – Y. Legré HealthGrid, a new approach to eHealth Yannick Legré, CNRS/IN2P3 Credits: V. Breton, N.
Enabling Grids for E- sciencE EGEE and gLite are registered trademarks EGEE-III INFSO-RI Analysis of Overhead and waiting times.
CEOS WGISS-21 CNES GRID related R&D activities Anne JEAN-ANTOINE PICCOLO CEOS WGISS-21 – Budapest – 2006, 8-12 May.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
EGEE-II INFSO-RI Enabling Grids for E-sciencE WISDOM in EGEE-2, biomed meeting, 2006/04/28 WISDOM : Grid-enabled Virtual High Throughput.
INFSO-RI Enabling Grids for E-sciencE Grid-enabled drug discovery to address neglected diseases N. Jacq – CNRS-IN2P3 EGAAP meeting.
INFSO-RI Enabling Grids for E-sciencE Towards grid-enabled telemedicine in Africa Yannick Legré on behalf of Vincent Breton CNRS-IN2P3,
Avian Flu Data Challenge Hsin-Yen Chen ASGC 29 Aug APAN24.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
INFSO-RI Enabling Grids for E-sciencE User Survey Objectives and Results F.Jacq CNRS-IN2P3 EGEE Conference - Athens 21 th April.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks INFSO-RI Enabling Grids for E-sciencE.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
INFSO-RI Enabling Grids for E-sciencE EGEE Review WISDOM demonstration Vincent Bloch, Vincent Breton, Matteo Diarena, Jean Salzemann.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks NA3: User Training and Induction UCY Activities.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Activités biomédicales dans EGEE-II Nicolas.
B i o i n f o r m a t i c s / B i o m e d i c a l A p p l i c a t i o n s i n E E L A Mexico, D.F., october 22 – 26, e – s c i e n c e M e x i c.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks BiG: A Grid Service to Distribute Large BLAST.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CRAB: the CMS tool to allow data analysis.
INFSO-RI Enabling Grids for E-sciencE Use Case of gLite Services Utilization. Multiple Ligand Trajectory Docking Study Jan Kmuníček.
INFSO-RI Enabling Grids for E-sciencE CRAB: a tool for CMS distributed analysis in grid environment Federica Fanzago INFN PADOVA.
Università di Perugia Enabling Grids for E-sciencE Status of and requirements for Computational Chemistry NA4 – SA1 Meeting – 6 th April.
INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.
INFSO-RI Enabling Grids for E-sciencE V. Breton, 30/09/05, Génopôle Lille Presentation of DNA4.3.2 Vincent Breton On behalf of NA4.
Enabling Grids for E-sciencE Experience Supporting the Integration of LHC Experiments Computing Systems with the LCG Middleware Simone.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
Milanesi Luciano Catania, Italy 13/03/2007 Bioinformatics challenges in European projects in Grid. Milanesi Luciano National Research Council Institute.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Introduction to Grids and the EGEE project.
The status of IHEP Beijing Site WLCG Asia-Pacific Workshop Yaodong CHENG IHEP, China 01 December 2006.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
2 nd EGEE/OSG Workshop Data Management in Production Grids 2 nd of series of EGEE/OSG workshops – 1 st on security at HPDC 2006 (Paris) Goal: open discussion.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
U.S. ATLAS Grid Production Experience
Nicolas Jacq LPC, IN2P3/CNRS, France
WISDOM-II, status of preparation
In silico docking on grid infrastructures
Presentation transcript:

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks World-wide in silico drug discovery against neglected and emerging diseases on grid infrastructures Nicolas Jacq HealthGrid Association, France Credit: WISDOM initiative

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Content Overview of the WISDOM application Deployment on the EGEE grid and experience Conclusion

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, WISDOM WISDOM ( –Developing new drugs for neglected and emerging diseases with a particular focus on malaria. –Reduced R&D costs and accelerated R&D for emerging and neglected diseases Three large calculations: –WISDOM-I (Summer 2005) –Avian Flu (Spring 2006) –WISDOM-II (Autumn 2006)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, In silico drug discovery presents unique challenges for Information Technologists and computer scientists Clinical Phases (I-III) DRUG DISCOVERY IN SILICO DRUG DISCOVERY

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Docking: predict how small molecules bind to a receptor of known 3D structure Simplified virtual screening process by docking Successful examples –rapid, –cost effective… But there are limitations –Need for CPU and storage

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Grid-enabled high throughput virtual screening by docking A few target structures Millions of chemical compounds 1 to 30 mn per docking A few MB by output 100 CPU years, 1 TB Large scale deployment on grid infrastructure Challenges: Speed-up the process Manage the data Docking software

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Example: In silico drug discovery on avian flu The goal is to study in silico the impact of selected point mutations on the efficiency of existing drugs and to find new potential drugs A collaboration of 5 grid projects: Auvergrid, BioinfoGrid, EGEE-II, Embrace, TWGrid Significant parameters: –1 docking software: Autodock –8 conformations of the target (N1 neuraminidase) –300,000 selected compounds –105 year CPU to dock all configurations on all compounds Timescale: –First contacts: March 1st 2006 –kick-off: April 1st 2006 –Duration: 6 weeks N1H5 Credit: Y-T Wu

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Results TargetsCom- pound s CPU- years Duration (wk) Max. CPUs Size of Results (TB) WISDOM-I (Q3’05) Plasmepsin 1M8061,7001 Avian Flu (Q2’06) H5N1300k10561, WISDOM-II (Q4’06) GST DHFR Tubulin 4.2M42085,0002

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Example : In silico results from avian flu data challenge 5 out of 6 known effective inhibitors can be identified in the first 15% of the ranking and in the first 5% reranked (2,250 compounds) –Enrichment = 5.5 and 111 (<1 in most cases) Most known effective inhibitors lose their affinity in binding with a mutated target GNA 2.4% 15% cut off E119A 11.5% E119A mutated type GNA 11.5% Original type

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Experimental assay confirms 7 actives out of 123 purchased “ potential hits ” (interacting complexes with higher affinities and proper docked poses), which proved the usefulness of our work. NA Example : In vitro results from avian flu data challenge

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Content Overview of the WISDOM application Deployment on the EGEE grid and experience Conclusion

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Requirements for a large scale deployment on grid Adaptation of the application to the grid Access to a large infrastructure providing maintained resources Use of a production system providing automated and fault-tolerant job and file management

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Adaptation of the application to the grid The applications are not designed for grid computing. The application code can not be modified. A common strategy is to split the application into shorter tasks License management for commercial software is not yet adapted for large infrastructure

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Access to a large infrastructure (1/3) A resource estimation is needed before the deployment The application package requires installation (and testing) An efficient and responsive user support of the infrastructure is required

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Access to a large infrastructure (2/3) : the EGEE infrastructure Real Time Monitor EGEE added value: –Large computing and storage resources (>30000 CPUs, 50Pb) –24 hours a day availability of resources –User support –Job and Data Management –Information and Monitoring –Security Limitations for life science applications –Short jobs –Data confidentiality –Reliability of services –…

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Access to a large infrastructure (3/3) : Biomedical Virtual Organization status Biomed VO leader : V. Breton ~80 participants, see Three active subgroups –Medical imaging (J. Montagnat) –Bioinformatics (C. Blanchet) –Drug discovery (V.Breton) Biomedical VO manager: Y. Legré, See (VO information, publication of data challenge…) 1 VOMS server, 1 LFC, +20 RBs +100 CEs, +8,000 CPUs (but many users) +110 SEs, ~Tens of TB available on disk 27 countries

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Use of a production system Managing thousands of jobs and files is a manually labor- intensive task –Job preparation, submission and monitoring, output retrieval, failure identification and resolution, job resubmission… The rate of submitted jobs must be carefully monitored –In order to avoid Resource Brokers overload –In order to efficiently use the resources The amount of transferred data impacts on grid performance –The data must be installed on the grid –Storing subsets of the database instead of large unique compound files Grid process introduces significant delays –The submitted jobs must be sufficiently long in order to reduce the impact of this middleware overhead

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Use of a production system Other production system from HEP experiments on EGEE –The ATLAS production system - The ATLAS experiment –BOSS and CRAB - The CMS experiment –Alien - The Alice experiment –DIRAC - The LHCb experiment –DIANE - CERN –Ganga, a user interface –GridICE and Monalisa, two monitoring services for users

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, DMSDMS User Interface HealthGrid Server Web Site WMS SEsCEs &WNs FlexLM Schema of the WISDOM production environment User Interface WISDOM production system WMS Submits the jobs Checks job status Resubmits CEs &WNs FlexX job SEs Structure file Compounds file inputs outputs Output file Local server Web Site WISDOM DB Output DB Docking information Statistics FLEXlm license FlexX Statistics

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, A huge international effort for WISDOM-II Significant contributions from EELA, EUMedGRID and EUChinaGRID Over 420 CPU years in 10 weeks A record throughput of 100,000 docked compounds per hour WISDOM calculations used FlexX from BioSolveIT (6k free, floating licenses)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Origin of failures during the WISDOM-I deployment RateReasons Success rate after checking output data46 % Server license failure23%Server failure Power cut Server stop WISDOM failure4%Job distribution Human error Script failure Workload Management failure10 %Overload, disk failure Mis-configuration, disk space problem Air-conditioning, power cut Data Management failure4 %Network / connection Power cut Other unknown causes Sites failure9 %Mis-configuration, tar command, disk space Information system update Job number limitation in the waiting queue Air-conditioning, electrical cut Unclassified4 %Lost jobs Other unknown causes Grid success rate 63% After substracting license server and WISDOM failures

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Success rates of the deployments WISDOM-I –User success rate : 46%  License server is a bottleneck –Grid success rate : 63%  Heterogeneous and dynamic nature of the grid Power cut, air-conditionning, mis-configuration, overload…  Stress usage  Automatic jobs (re)submission (“sink-hole” effect) WISDOM against avian flu –Grid success rate: 80%  Constant and slower job submission flow  Manual control of resubmission process  WISDOM fault-tolerance improved  Grid reliability improved (Workload Management System)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Content Overview of the WISDOM application Deployment on the EGEE grid and experience Conclusion

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Summary (1/2) The experiments demonstrated how grid infrastructures have a tremendous capacity to mobilize very large CPU resources for well targeted goals during a significant period of time –1st large scale deployment of life sciences application on a grid infrastructure The deployments have been a very useful experience in identifying the limitations and bottlenecks of the EGEE infrastructure and middleware The reliability is still the major issue for the WISDOM production system and the EGEE middleware Large scale deployment still requires to be grid expert

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Summary (2/2) WISDOM data challenge has demonstrated that collaborative production grids can be used for steps in the drug discovery process –1st production of biochemical results on a grid infrastructure The impact has significantly raised the interest of the research community on malaria. Output data collection and presentation require improvements to speed-up the post-docking analysis –Storage of output metadata from the jobs in a relational database –Access to this database and to the docking output files is required

Enabling Grids for E-sciencE EGEE-II INFSO-RI Jacq, Thank you To all members of the WISDOM collaboration for their contribution to the project To all grid nodes which committed resources and allowed the success of the initiative To all projects which supported the initiative by providing either computing resources or manpower to develop the WISDOM environment To BioSolveIT by offering up to 6000 free licenses of FlexX