EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks EGEE Application Case Study: Distributed.

Slides:



Advertisements
Similar presentations
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Advertisements

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks World-wide in silico drug discovery against.
INFSO-RI Enabling Grids for E-sciencE WISDOM mini-workshop Vincent Breton (CNRS-IN2P3, LPC Clermont-Ferrand) ISGC 2007 March 28th,
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Using the WS-PGRADE Portal in the ProSim Project Protein Molecule Simulation on the Grid Tamas Kiss, Gabor Testyanszky, Noam.
KISTI’s Activities on the NA4 Biomed Cluster Soonwook Hwang, Sunil Ahn, Jincheol Kim, Namgyu Kim and Sehoon Lee KISTI e-Science Division.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Configuring and Maintaining EGEE Production.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Building Grid-enabled Virtual Screening Service.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
EGEE is a project funded by the European Union under contract IST Testing processes Leanne Guy Testing activity manager JRA1 All hands meeting,
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
Protein Molecule Simulation on the Grid G-USE in ProSim Project Tamas Kiss Joint EGGE and EDGeS Summer School.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information System on gLite middleware Vincent.
INFSO-RI Enabling Grids for E-sciencE V. Breton, 30/08/05, seminar at SERONO Grid added value to fight malaria Vincent Breton EGEE.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Provenance Challenge gLite Job Provenance.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Using gLite API Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks AMGA PHP API Claudio Cherubino INFN - Catania.
INFSO-RI Enabling Grids for E-sciencE Biomedical applications V. Breton, CNRS-IN2P3.
INFSO-RI Enabling Grids for E-sciencE In silico docking on EGEE infrastructure, the case of WISDOM Nicolas Jacq LPC of Clermont-Ferrand,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Performance Improvements to BDII - Grid Information.
EGEE-II INFSO-RI Enabling Grids for E-sciencE WISDOM in EGEE-2, biomed meeting, 2006/04/28 WISDOM : Grid-enabled Virtual High Throughput.
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Usage of virtualization in gLite certification Andreas Unterkircher.
Avian Flu Data Challenge Hsin-Yen Chen ASGC 29 Aug APAN24.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE User Forum, Manchester, 10 May ‘07 Nicola Venuti
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks INFSO-RI Enabling Grids for E-sciencE.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Services for advanced workflow programming.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Web Portal for Chemists M. Sterzel,
INFSO-RI Enabling Grids for E-sciencE EGEE Review WISDOM demonstration Vincent Bloch, Vincent Breton, Matteo Diarena, Jean Salzemann.
INFSO-RI Enabling Grids for E-sciencE A Grid Approach to Distributed Image Analysis for Early Diagnosis of Alzheimer Disease Livia.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Activités biomédicales dans EGEE-II Nicolas.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Progress on first user scenarios Stephen.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMPROXY usage Álvaro Fernández IFIC (CSIC)
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CRAB: the CMS tool to allow data analysis.
INFSO-RI Enabling Grids for E-sciencE Use Case of gLite Services Utilization. Multiple Ligand Trajectory Docking Study Jan Kmuníček.
User Scenarios in VENUS-C Focus on Structural Analysis Ignacio Blanquer I3M - UPV.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Computational chemistry with ECCE on EGEE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Introduction to P-GRADE Portal hands-on Miklos Kozlovszky MTA SZTAKI
EGEE-II INFSO-RI Enabling Grids for E-sciencE Storage Accounting for Grid Environments Fabio Scibilia INFN - Catania.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
EGI Technical Forum Amsterdam, 16 September 2010 Sylvain Reynaud.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite – UNICORE interoperability Daniel Mallmann.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
EGEE-II INFSO-RI Enabling Grids for E-sciencE A Glance Towards the Future Mike Mineter Training Outreach and Education University.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Overview of gLite, the EGEE middleware Mike Mineter Training Outreach Education National.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
2 nd EGEE/OSG Workshop Data Management in Production Grids 2 nd of series of EGEE/OSG workshops – 1 st on security at HPDC 2006 (Paris) Goal: open discussion.
Enabling Grids for E-sciencE Agreement-based Workload and Resource Management Tiziana Ferrari, Elisabetta Ronchieri Mar 30-31, 2006.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
Module 01 ETICS Overview ETICS Online Tutorials
In silico docking on grid infrastructures
Presentation transcript:

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Application Case Study: Distributed Drug Analysis Platform Hurng-Chun Lee Academia Sinica Grid Computing Centre (ASGC) Coordinator of Drug discovery in EGEE: Vincent Breton

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks General introduction

Enabling Grids for E-sciencE EGEE-II INFSO-RI Drug discovery at initial step Screening is the first measure to take for the biological activity of each compound in a large compound collection against an disease target. HTS: 10 4 – 10 5 cpd/day uHTS: >10 5 cpd/day “A needle in a haystack” How to reduced pre-screening cost$ ? The wet-lab cost –more than 1 year for assay development –US$ 2 for 1 compound –operational cost of the screening machine

Enabling Grids for E-sciencE EGEE-II INFSO-RI Computer-based modeling Modified from DDT vol. 3, 4, (1998) focused library screening focused library  hit rate *  cost To improve hit rate $ using the computer simulation

Enabling Grids for E-sciencE EGEE-II INFSO-RI How Grid helps ,000 Chemical compounds: ZINC Chemical combinatorial library Target (PDB) : Neuraminidase (8 structures) Millions of chemical compounds available in laboratories High Throughput Screening $2/compound, nearly impossible Molecular docking (Autodock) ~137 CPU years, 600 GB data Data challenge on EGEE, Auvergrid, TWGrid ~6 weeks on ~2000 computers Hits sorting and refining In vitro screening of 100 hits High-throughput screening using the Grid –large-scale and on-demand resources: shorten the response time to emergency disease –computer power is much cheaper: simulations could be repeated several times

Enabling Grids for E-sciencE EGEE-II INFSO-RI Computing model & workflow Fully distributable grid jobs (parametric-sweep type) Collective analysis on distributed results Interactive pipeline (run → analysis → run)

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Application architecture

Enabling Grids for E-sciencE EGEE-II INFSO-RI System architecture integrating EGEE grid services –AMGA, UI, WMS, SE, CE Client –light-weight, portable and application-oriented interfaces built on top of a same set of Java APIs application oriented job management –integrating existing approaches (DIANE, WISDOM) –application platform for team work

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Implementation details

Enabling Grids for E-sciencE EGEE-II INFSO-RI Approach 1: the WISDOM platformStorageElement ComputingElement Site1 Site2 Storage Elemen t User interface ComputingElement Compound s database Software Parameter settings Target structures Compounds sublists Results Statistics Compounds list Preparation activity 1. docking program deployment on grid computing elements 2. compound library deployment on grid storage elements Runtime activity 1. splitting and wrapping simulations in grid jobs 2. job management automation (submission, book-keeping through the grid WMS) 3. failure check and recovery (e.g. resubmission of failed simulations)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Approach 2: the DIANE framework Preparation activity 1. DIANE framework installation on the grid user interface Runtime activity 1. a task queue holding the docking simulations (the DIANE master runs on the grid user interface) 2. grid job submission through the grid WMS (jobs for installing DIANE workers on the grid worker node) 3. establishing extra communication channel between DIANE workers and the DIANE master 4. DIANE workers run the following cycle until the task queue is emptied  pull simulation from the task queue ⇒ run simulation ⇒ return simulation result to the master 5. automatic load balancing and failure recovery

Enabling Grids for E-sciencE EGEE-II INFSO-RI Test of the 2 approaches WISDOMDIANE Total number of completed dockings2 * ,585 Estimated duration on 1 CPU88.3 years16.7 years Duration of the experience6 weeks4 weeks Cumulative number of the Grid jobs54, Max. number of concurrent CPUs2, Crunching rate Approximated distribution efficiency46%84% Approximated throughput2 sec/docking10 sec./ docking First data challenge for avian flu drug analysis –8 targets vs. 300,000 compounds (2,400,000 simulations) WISDOM wins in the throughput by submitting huge amount of grid jobs DIANE wins in the efficiency by controlling the utilization of the grid resources

Enabling Grids for E-sciencE EGEE-II INFSO-RI AMGA for result metadata User’s analysis model –giving me the compounds with docking energy lower than -9.0 eV with simulation results distributed on the Grid –user has to download all results from the SEs (~ bytes) –parsing the results one by one just a simple query on metadata: –SELECT compound_name IN simulation_results WHERE docking_energy < -9.0 AMGA aims to host the metadata of the simulation results –essential attributes of simulation results are extracted at simulation runtime and inserted into AMGA –user performs data analysis by AMGA queries –on-line metadata backup by AMGA’s replication feature

Enabling Grids for E-sciencE EGEE-II INFSO-RI GAP for integration and user interface GAP: Grid Application Platform developed by ASGC Historically it intended to integrate distributed resources and services to form a large bioinformatics computing facility in Taiwan. Now it’s grid enabled and supporting the EGEE grid Features: –3-tier architecture implementing the Model-View-Controller pattern in developing grid applications  with the flexibility to adopt new technology in each tier –service oriented approach fully implemented in Java  thin client ready to be integrated with end-user’s desktop tools

Enabling Grids for E-sciencE EGEE-II INFSO-RI The productive environment multiple DIANE instances deployed on multiple grid UIs (distributed on Asian sites) new WebServices interface of WISDOM AMGA stores the essential data from the simulation results GAP manages the simulations performed by distributed DIANE instances and WISDOM GAP provides a set of client APIs in pure Java

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks User interface

Enabling Grids for E-sciencE EGEE-II INFSO-RI command-line console (1) the main user interface based on BeanShell Java interpreter a set of BeanShell scripts wraps the client APIs user benefits by –application-oriented commands in configure and run applications –an interactive environment for building/testing developments with the client APIs

Enabling Grids for E-sciencE EGEE-II INFSO-RI command-line console (2) login(); –login the Grid Application Platform –initiate and delegate the grid proxy s = app(“DIANE_AUTODOCK”); –load an application instance s.loadJobs(jobs(“name”,”=”,”production”)); –bind historical production jobs with the application s.config(); –configure the application in an interactive and graphic way

Enabling Grids for E-sciencE EGEE-II INFSO-RI command-line console (3) s.run(); –generate and submit application jobs (high-level jobs) s.progress(); –check the overall progress of the simulation s.listSimulations(); –list all docking simulations s.viewComplex( ); –online visualization of the docking result of the given simulation id s.histogram([controls]); –draw the docking energy histogram

Enabling Grids for E-sciencE EGEE-II INFSO-RI command-line console (4) s.downloadSimulations(); –downloads all the simulation outputs (from many distributed storages) to local disk screens = s.getScreens([energy_threshold]); –gets the screen results with a given energy threshold configure simulationanalysis focused compound listnew simulation

Enabling Grids for E-sciencE EGEE-II INFSO-RI command-line console (5) jobs = s.getJobs(); –gets the job list created by (or associated with) the application –use j = jobs.get(0); to get the first job in the list more Job operations available... try online help: apidoc(j); collaboration by sharing jobs it = jobs.iterator(); while ( it.hasNext() ) { j = it.next(); j.addShareWithUser(“hclee”); j.addShareWithVO(“biomed”); } Virtual Organization Group Individual user

Enabling Grids for E-sciencE EGEE-II INFSO-RI Web portal Target selection Compound selection Docking parameter setter Energy table Complex visualization Re-using the same set of the GAP client APIs