Download presentation
Presentation is loading. Please wait.
Published bySheena Dalton Modified over 9 years ago
1
EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Activités biomédicales dans EGEE-II Nicolas Jacq HealthGrid nicolas.jacq@healthgrid.org Formation Utilisateur EGEE, Clermont-Ferrand – 10/11 janvier 2007
2
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 2 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Content Biomedical Virtual Organization status –Credit : Christophe Blanchet, Johan Montagnat, Vincent Breton WISDOM, example of biomedical application –Credit : WISDOM collaboration
3
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 3 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Biomedical Virtual Organization status Biomed VO management –Biomed VO leader : V. Breton –Deputies : J. Montagnat and C. Blanchet –~80 participants –http://egeena4.lal.in2p3.frhttp://egeena4.lal.in2p3.fr Three active subgroups –Medical imaging (J. Montagnat) –Bioinformatics (C. Blanchet) –Drug discovery (V.Breton) Active relationships with EGEE related projects and other EC projects –BioinfoGRID –Embrace –EELA, EUChinaGRID, EUMedGrid
4
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 4 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Medical Imaging Services are available on EGEE for the medical imaging community –Medical Data Management –Workflow engines: Moteur, DAGMAN –Portals: P-GRADE, GENIUS Several applications are in production mode –Bronze standard, GATE, 3D MRI simulation, pharmacokinetics, –GPTM3D, Clinical Decision Support System New applications are under development –SEE++ strabismus surgery planning –SPM based early diagnosis of Alzheimer –FreeSurfer-based brain image analysis Contact: johan@i3s.unice.fr
5
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 5 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Bioinformatics 10 Bioinformatics Applications –In production: Splatche –Prototype: GPS@, bioDCV, Dengue Docking –Porting: Large Scale Pathway, BiG, 3DEM, … Key activities: –Data Virtualization: Enabling legacy bioinformatics applications with grid and secure data access (EncFile, GFAL, Perroquet) with large-scale data capability (3DEM) –Grid-enabling bioinformatics tools with special requirements: short job (GPS@), large job, workflow (Large Scale Pathway, Splatche, BiG,) –End-user interfaces: providing biologists with Web portal, Web services (BiG, GPS@, bioDCV) Contact: Christophe.Blanchet@ibcp.fr
6
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 6 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Drug Discovery Summer 2005 : WISDOM, first large scale biomedical deployment against malaria –Results analyzed, further processing using Molecular Dynamics Spring 2006 : Large scale deployment against avian flu –Results under analysis, need for a second data challenge Autumn 2006 : Second WISDOM deployment against 4 targets of malaria –5 infrastructures are contributing: Auvergrid, EGEE, EELA, EUChinaGRID, EUMedGRID –2 other EC projects involved: BioinfoGRID, Embrace Contact: breton@clermont.in2p3.fr
7
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 7 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Current major issues Short Jobs (<5 min): SDJ workgroup –The workgroup has defined some CE setup rules to decrease grid middleware overhead to ~2 min –But only one site (LAL) is enabled (at least publishing it!) – ⇒ deploying SDJ recommendations on other biomed sites, with adequate publication (CE named with « sdj » tag) Data confidentiality –Data security addressed through gLiteIO + Fireman (ACLs) + Hydra (encryption) –Only clients available in gLite3.0: gLiteIO, Fireman and Hydra servers should be installed by the users –Limited security through GFAL + LFC Data management –No tool available in gLite to allow database integration
8
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 8 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Biomedical Virtual Organization Biomedical VO manager: Y. Legré, legre@clermont.in2p3.fr See http://cic.in2p3.fr (VO information, publication of data challenge…)http://cic.in2p3.fr 100 CEs, 8,000 CPUs (but many users) 117 SEs, ~Tens of TB available on disk 27 countries 1 VOMS server (bottleneck) 1 LFC (bottleneck) + 20 RBs (but several unavailable)
9
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 9 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Content Biomedical Virtual Organization status WISDOM, example of biomedical applications –Components of the WISDOM application –Achieved deployments on the EGEE infrastructure –Perspectives
10
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 10 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 In silico drug discovery against neglected and emerging diseases Grids are unique tools for: –Collecting and sharing information (epidemiology, genomics) –Networking experts –Mobilizing resources routinely or in emergency (vaccins & drug discovery) Grids open new perspectives to in silico drug discovery –Reduced cost for R&D against neglected diseases –Accelerating factor for R&D against emerging diseases CPU-intensive grid deployments exploring grid impact –Data challenge against malaria in the summer 2005 –Data challenge against avian flu in April-May 2006 –Data challenge against malaria in the Autumn 2006
11
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 11 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Requirements for a large scale deployment on grid Adaptation of the application to the grid Access to a large infrastructure providing maintained resources Use of a production system providing automated and fault-tolerant job and file management
12
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 12 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Docking: predict how small molecules bind to a receptor of known 3D structure Simplified virtual screening process by docking Successful examples –rapid, –cost effective… But there are limitations –Need for CPU and storage
13
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 13 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Grid-enabled high throughput virtual screening by docking A few target structures Millions of chemical compounds 1 to 30 mn per docking A few MB by output 100 CPU years, 1 TB Large scale deployment on grid infrastructure Challenges: Speed-up the process Manage the data Docking software
14
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 14 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Adaptation of the application to the grid The applications are not designed for grid computing. The application code can not be modified. A common strategy is to split the application into shorter tasks License management for commercial software is not yet adapted for large infrastructure
15
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 15 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Access to a large infrastructure The infrastructure will provide maintained computing and storage resources A resource estimation is needed before the deployment The application package requires installation (and testing) An efficient and responsive user support of the infrastructure is required
16
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 16 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Access to a large infrastructure : the EGEE infrastructure Real Time Monitor http://gridportal.hep.ph.ic.ac.uk/rtm/ EGEE added value: –Large computing and storage resources (>30000 CPUs, 50Pb) –24 hours a day availability of resources –User support –Job and Data Management –Information and Monitoring EGEE limitations –Security –Reliability of services
17
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 17 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Use of a production system Managing thousands of jobs and files is a manually labor- intensive task –Job preparation, submission and monitoring, output retrieval, failure identification and resolution, job resubmission… The rate of submitted jobs must be carefully monitored –In order to avoid Resource Brokers overload –In order to efficiently use the resources The amount of transferred data impacts on grid performance –The data must be installed on the grid –Storing subsets of the database instead of large unique compound files Grid process introduces significant delays –The submitted jobs must be sufficiently long in order to reduce the impact of this middleware overhead
18
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 18 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 User Interface Web Site WMS SEsCEs &WNs Schema of the WISDOM production environment User Interface WISDOM production system WMS Submits the jobs Checks job status Resubmits CEs &WNs Docking job SEs Input files DB subsets inputs outputs Output file Local server Web Site WISDOM DB Statistics FLEXlm license Docking soft. Statistics 3,000 floating FlexX licenses given by BioSolveIT to SCAI against malaria DMS/GFTPDMS/GFTP Output DB
19
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 19 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Production system for particle physics experiment on EGEE (1/2) The ATLAS production system – The ATLAS experiment –Uses EGEE components as much as possible –User interface: Ganga –Monitoring tool: GridIce BOSS and CRAB – The CMS experiment –CRAB is an user interface to prepare and submit jobs –BOSS monitors the jobs from logs of the WNs –Monitoring tool: GridIce and MonaLisa
20
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 20 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Production system for particle physics experiments on EGEE (2/2) Alien – The Alice experiment –Pull model service: A job agent submitted on a WN of a CE via a RB calls a job set if the site is reliable and with free WNs –Monitoring tool: MonaLisa DIRAC – The LHCb experiment –Similar to Alien with Ganga as user interface DIANE - http://cern.ch/dianehttp://cern.ch/diane GridICE and Monalisa, two monitoring services for users –collect information from agents deployed on the grid nodes and from the Information System –web interface
21
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 21 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Content Biomedical Virtual Organization status WISDOM, example of biomedical applications –Components of the WISDOM application –Achieved deployments on the EGEE infrastructure –Perspectives
22
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 22 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Deployment of WISDOM on EGEE infrastructure: Significant numbers – summer 2005 Target : Plasmepsin (PDB) Software : FlexX and Autodock Compounds : 1,000,000 (Zinc) Duration: 6 weeks instead of 80 years 1TB of data produced Up 1,700 computers in 15 countries used simultaneously Crunching factor: 600 Total amount of CPU provided by EGEE federation Number of docked compounds vs time
23
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 23 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Origin of failures during the WISDOM-I deployment RateReasons Success rate after checking output data46 % Server license failure23%Server failure Power cut Server stop WISDOM failure4%Job distribution Human error Script failure Workload Management failure10 %Overload, disk failure Mis-configuration, disk space problem Air-conditioning, power cut Data Management failure4 %Network / connection Power cut Other unknown causes Sites failure9 %Mis-configuration, tar command, disk space Information system update Job number limitation in the waiting queue Air-conditioning, electrical cut Unclassified4 %Lost jobs Other unknown causes Grid success rate 63% After substracting license server and WISDOM failures
24
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 24 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Deployment of WISDOM on EGEE, TWGrid & Auvergrid: Significant numbers – spring 2006 Target : neuraminidase (H5N1) Software : Autodock Compounds : 300,000 (ZINC + private chemolibrary) Duration : 6 weeks instead of 105 years Up to 1,700 computers in 17 countries mobilized 750 GB of data produced Crunching factor : 767 Distribution of jobs on EGEE federations, Auvergrid and TWGrid Number of docked compounds vs time
25
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 25 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Deployment improvement Grid success rate: 80% Reasons for the grid success rate improvement of WISDOM production system: –Constant and slower job submission flow –Manual control of resubmission process –WISDOM fault-tolerance improved –Grid reliability improved (Workload Management System) Less than 3 months between the first contacts and the achievement of all the required computations
26
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 26 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Content Biomedical Virtual Organization status WISDOM, example of biomedical applications –Components of the WISDOM application –Achieved deployments on the EGEE infrastructure –Perspectives
27
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 27 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Summary (1/2) The experiments demonstrated how grid infrastructures have a tremendous capacity to mobilize very large CPU resources for well targeted goals during a significant period of time The deployments have been a very useful experience in identifying the limitations and bottlenecks of the EGEE infrastructure and middleware The reliability is still the major issue for the WISDOM production system and the EGEE middleware –Migration on the new EGEE middleware: gLite The output data collection need to be improved –Storage of output metadata from the jobs in a relational database
28
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 28 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Summary (2/2) WISDOM data challenge has demonstrated that collaborative production grids can be used for steps in the drug discovery process The impact has significantly raised the interest of the research community on malaria. A second larger computing challenge against malaria is currently running. Output data presentation require improvements to speed-up the post-docking analysis –Access to the output metadata database and to the docking output files is required The deployment requires to be grid expert The next step after docking: molecular dynamics, is currently being deployed on EGEE infrastructure
29
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 29 Nicolas Jacq, HealthGrid Formation EGEE 10-11.01.2007 Long term vision: a grid for malaria Use the grid technology to foster research and development on malaria and other neglected diseases Univ. Los Andes: Biological targets, Malaria biology LPC Clermont-Ferrand: Biomedical grid SCAI Fraunhofer: Knowledge extraction, Chemoinformatics Univ. Modena: Biological targets, Molecular Dynamics ITB CNR: Bioinformatics, Molecular modelling Univ. Pretoria: Bioinformatics, Malaria biology Academica Sinica: Grid user interface Contacts also established with WHO, Microsoft, TATRC, Argonne, SDSC, SERONO, NOVARTIS, Sanofi- Aventis, Hospitals in subsaharian Africa, HealthGrid: Biomedical grid, Dissemination CEA, Acamba project: Biological targets, Chemogenomics
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.