EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Activités biomédicales dans EGEE-II Nicolas.

Slides:



Advertisements
Similar presentations
Plateforme de Calcul pour les Sciences du Vivant SRB & gLite V. Breton.
Advertisements

INFSO-RI Enabling Grids for E-sciencE NA4 Biomed Applications Athens meeting, April 20, 2005 Johan Montagnat.
INFSO-RI Enabling Grids for E-sciencE Experience with the deployment of biomedical applications on the grid Vincent Breton LPC,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks World-wide in silico drug discovery against.
INFSO-RI Enabling Grids for E-sciencE WISDOM mini-workshop Vincent Breton (CNRS-IN2P3, LPC Clermont-Ferrand) ISGC 2007 March 28th,
INFSO-RI Enabling Grids for E-sciencE Application Demonstrations C. Loomis, J. Moscicki, J. Montagnat EGEE European Review (CERN)
INFSO-RI Enabling Grids for E-sciencE EGEE – applications and training Vincent Breton, on behalf of NA4 Application identification.
KISTI’s Activities on the NA4 Biomed Cluster Soonwook Hwang, Sunil Ahn, Jincheol Kim, Namgyu Kim and Sehoon Lee KISTI e-Science Division.
IST E-infrastructure shared between Europe and Latin America Biomedical Applications in EELA Esther Montes Prado CIEMAT (Spain)
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Configuring and Maintaining EGEE Production.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Building Grid-enabled Virtual Screening Service.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Application Case Study: Distributed.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
BIOINFOGRID: Bioinformatics Grid Application for Life Science Giorgio Maggi INFN and Politecnico di Bari
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
INFSO-RI Enabling Grids for E-sciencE WHO, 2/11/04 Grid enabled in silico drug discovery Vincent Breton CNRS/IN2P3 Credit for the.
INFSO-RI Enabling Grids for E-sciencE EGEE - a worldwide Grid infrastructure opportunities for the biomedical community Bob Jones.
INFSO-RI Enabling Grids for E-sciencE V. Breton, 30/08/05, seminar at SERONO Grid added value to fight malaria Vincent Breton EGEE.
Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.
Page 1 SCAI Dr. Marc Zimmermann Department of Bioinformatics Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) Grid-enabled drug discovery.
EGEE-II INFSO-RI Enabling Grids for E-sciencE WISDOM, a grid enabled virtual screening initiative Yannick Legré LPC Clermont-Ferrand,
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Next steps with EGEE EGEE training community.
INFSO-RI Enabling Grids for E-sciencE Biomedical applications V. Breton, CNRS-IN2P3.
INFSO-RI Enabling Grids for E-sciencE In silico docking on EGEE infrastructure, the case of WISDOM Nicolas Jacq LPC of Clermont-Ferrand,
ISGC 2007 – March 28th, 2007 – Y. Legré HealthGrid, a new approach to eHealth Yannick Legré, CNRS/IN2P3 Credits: V. Breton, N.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Performance Improvements to BDII - Grid Information.
EGEE-II INFSO-RI Enabling Grids for E-sciencE WISDOM in EGEE-2, biomed meeting, 2006/04/28 WISDOM : Grid-enabled Virtual High Throughput.
INFSO-RI Enabling Grids for E-sciencE Grid-enabled drug discovery to address neglected diseases N. Jacq – CNRS-IN2P3 EGAAP meeting.
INFSO-RI Enabling Grids for E-sciencE Towards grid-enabled telemedicine in Africa Yannick Legré on behalf of Vincent Breton CNRS-IN2P3,
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
INFSO-RI Enabling Grids for E-sciencE NA4/Biomed Demonstration Medical Data Management and processing EGEE 3 rd review rehearsal,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE08 conference, Istambul Biomed community meeting V. Breton, CNRS.
INFSO-RI Enabling Grids for E-sciencE User Survey Objectives and Results F.Jacq CNRS-IN2P3 EGEE Conference - Athens 21 th April.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks INFSO-RI Enabling Grids for E-sciencE.
INFSO-RI Enabling Grids for E-sciencE EGEE – application support and identification Vincent Breton, on behalf of NA4 Application.
INFSO-RI Enabling Grids for E-sciencE EGEE Review WISDOM demonstration Vincent Bloch, Vincent Breton, Matteo Diarena, Jean Salzemann.
B i o i n f o r m a t i c s / B i o m e d i c a l A p p l i c a t i o n s i n E E L A Mexico, D.F., october 22 – 26, e – s c i e n c e M e x i c.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Progress on first user scenarios Stephen.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks BiG: A Grid Service to Distribute Large BLAST.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CRAB: the CMS tool to allow data analysis.
INFSO-RI Enabling Grids for E-sciencE Use Case of gLite Services Utilization. Multiple Ligand Trajectory Docking Study Jan Kmuníček.
INFSO-RI Enabling Grids for E-sciencE EGEE-2 NA4 Biomed Bioinformatics in CNRS Christophe Blanchet Institute of Biology and Chemistry.
Università di Perugia Enabling Grids for E-sciencE Status of and requirements for Computational Chemistry NA4 – SA1 Meeting – 6 th April.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE08 conference, Istambul Life sciences cluster perspective on EGI V. Breton, CNRS On.
INFSO-RI Enabling Grids for E-sciencE V. Breton, 30/09/05, Génopôle Lille Presentation of DNA4.3.2 Vincent Breton On behalf of NA4.
Enabling Grids for E-sciencE Experience Supporting the Integration of LHC Experiments Computing Systems with the LCG Middleware Simone.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Middleware Status Min-Hong Tsai EGEE.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Overview of gLite, the EGEE middleware Mike Mineter Training Outreach Education National.
Milanesi Luciano Catania, Italy 13/03/2007 Bioinformatics challenges in European projects in Grid. Milanesi Luciano National Research Council Institute.
Bioinformatics Grid Application for Life Science. COMMUNICATION NETWORK DEVELOPMENT SPECIFIC SUPPORT ACTION BIOINFOGRID Andreas Gisel & Luciano Milanesi.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Introduction to Grids and the EGEE project.
2 nd EGEE/OSG Workshop Data Management in Production Grids 2 nd of series of EGEE/OSG workshops – 1 st on security at HPDC 2006 (Paris) Goal: open discussion.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
NA4 Medical Imaging Geneva, September 26, 2006 Johan Montagnat.
Nicolas Jacq LPC, IN2P3/CNRS, France
WISDOM-II, status of preparation
Experience with the deployment of biomedical applications on the grid Vincent Breton LPC, CNRS-IN2P3 Credit for the slides: M. Hofmann, N. Jacq, V. Kasam,
In silico docking on grid infrastructures
Presentation transcript:

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Activités biomédicales dans EGEE-II Nicolas Jacq HealthGrid Formation Utilisateur EGEE, Clermont-Ferrand – 10/11 janvier 2007

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Content Biomedical Virtual Organization status –Credit : Christophe Blanchet, Johan Montagnat, Vincent Breton WISDOM, example of biomedical application –Credit : WISDOM collaboration

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Biomedical Virtual Organization status Biomed VO management –Biomed VO leader : V. Breton –Deputies : J. Montagnat and C. Blanchet –~80 participants – Three active subgroups –Medical imaging (J. Montagnat) –Bioinformatics (C. Blanchet) –Drug discovery (V.Breton) Active relationships with EGEE related projects and other EC projects –BioinfoGRID –Embrace –EELA, EUChinaGRID, EUMedGrid

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Medical Imaging Services are available on EGEE for the medical imaging community –Medical Data Management –Workflow engines: Moteur, DAGMAN –Portals: P-GRADE, GENIUS Several applications are in production mode –Bronze standard, GATE, 3D MRI simulation, pharmacokinetics, –GPTM3D, Clinical Decision Support System New applications are under development –SEE++ strabismus surgery planning –SPM based early diagnosis of Alzheimer –FreeSurfer-based brain image analysis Contact:

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Bioinformatics 10 Bioinformatics Applications –In production: Splatche –Prototype: bioDCV, Dengue Docking –Porting: Large Scale Pathway, BiG, 3DEM, … Key activities: –Data Virtualization: Enabling legacy bioinformatics applications  with grid and secure data access (EncFile, GFAL, Perroquet)  with large-scale data capability (3DEM) –Grid-enabling bioinformatics tools with special requirements:  short job  large job, workflow (Large Scale Pathway, Splatche, BiG,) –End-user interfaces: providing biologists with Web portal, Web services (BiG, bioDCV) Contact:

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Drug Discovery Summer 2005 : WISDOM, first large scale biomedical deployment against malaria –Results analyzed, further processing using Molecular Dynamics Spring 2006 : Large scale deployment against avian flu –Results under analysis, need for a second data challenge Autumn 2006 : Second WISDOM deployment against 4 targets of malaria –5 infrastructures are contributing: Auvergrid, EGEE, EELA, EUChinaGRID, EUMedGRID –2 other EC projects involved: BioinfoGRID, Embrace Contact:

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Current major issues Short Jobs (<5 min): SDJ workgroup –The workgroup has defined some CE setup rules to decrease grid middleware overhead to ~2 min –But only one site (LAL) is enabled (at least publishing it!) – ⇒ deploying SDJ recommendations on other biomed sites, with adequate publication (CE named with « sdj » tag) Data confidentiality –Data security addressed through gLiteIO + Fireman (ACLs) + Hydra (encryption) –Only clients available in gLite3.0: gLiteIO, Fireman and Hydra servers should be installed by the users –Limited security through GFAL + LFC Data management –No tool available in gLite to allow database integration

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Biomedical Virtual Organization Biomedical VO manager: Y. Legré, See (VO information, publication of data challenge…) 100 CEs, 8,000 CPUs (but many users) 117 SEs, ~Tens of TB available on disk 27 countries 1 VOMS server (bottleneck) 1 LFC (bottleneck) + 20 RBs (but several unavailable)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Content Biomedical Virtual Organization status WISDOM, example of biomedical applications –Components of the WISDOM application –Achieved deployments on the EGEE infrastructure –Perspectives

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  In silico drug discovery against neglected and emerging diseases Grids are unique tools for: –Collecting and sharing information (epidemiology, genomics) –Networking experts –Mobilizing resources routinely or in emergency (vaccins & drug discovery) Grids open new perspectives to in silico drug discovery –Reduced cost for R&D against neglected diseases –Accelerating factor for R&D against emerging diseases CPU-intensive grid deployments exploring grid impact –Data challenge against malaria in the summer 2005 –Data challenge against avian flu in April-May 2006 –Data challenge against malaria in the Autumn 2006

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Requirements for a large scale deployment on grid Adaptation of the application to the grid Access to a large infrastructure providing maintained resources Use of a production system providing automated and fault-tolerant job and file management

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Docking: predict how small molecules bind to a receptor of known 3D structure Simplified virtual screening process by docking Successful examples –rapid, –cost effective… But there are limitations –Need for CPU and storage

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Grid-enabled high throughput virtual screening by docking A few target structures Millions of chemical compounds 1 to 30 mn per docking A few MB by output 100 CPU years, 1 TB Large scale deployment on grid infrastructure Challenges: Speed-up the process Manage the data Docking software

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Adaptation of the application to the grid The applications are not designed for grid computing. The application code can not be modified. A common strategy is to split the application into shorter tasks License management for commercial software is not yet adapted for large infrastructure

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Access to a large infrastructure The infrastructure will provide maintained computing and storage resources A resource estimation is needed before the deployment The application package requires installation (and testing) An efficient and responsive user support of the infrastructure is required

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Access to a large infrastructure : the EGEE infrastructure Real Time Monitor EGEE added value: –Large computing and storage resources (>30000 CPUs, 50Pb) –24 hours a day availability of resources –User support –Job and Data Management –Information and Monitoring EGEE limitations –Security –Reliability of services

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Use of a production system Managing thousands of jobs and files is a manually labor- intensive task –Job preparation, submission and monitoring, output retrieval, failure identification and resolution, job resubmission… The rate of submitted jobs must be carefully monitored –In order to avoid Resource Brokers overload –In order to efficiently use the resources The amount of transferred data impacts on grid performance –The data must be installed on the grid –Storing subsets of the database instead of large unique compound files Grid process introduces significant delays –The submitted jobs must be sufficiently long in order to reduce the impact of this middleware overhead

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  User Interface Web Site WMS SEsCEs &WNs Schema of the WISDOM production environment User Interface WISDOM production system WMS Submits the jobs Checks job status Resubmits CEs &WNs Docking job SEs Input files DB subsets inputs outputs Output file Local server Web Site WISDOM DB Statistics FLEXlm license Docking soft. Statistics 3,000 floating FlexX licenses given by BioSolveIT to SCAI against malaria DMS/GFTPDMS/GFTP Output DB

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Production system for particle physics experiment on EGEE (1/2) The ATLAS production system – The ATLAS experiment –Uses EGEE components as much as possible –User interface: Ganga –Monitoring tool: GridIce BOSS and CRAB – The CMS experiment –CRAB is an user interface to prepare and submit jobs –BOSS monitors the jobs from logs of the WNs –Monitoring tool: GridIce and MonaLisa

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Production system for particle physics experiments on EGEE (2/2) Alien – The Alice experiment –Pull model service: A job agent submitted on a WN of a CE via a RB calls a job set if the site is reliable and with free WNs –Monitoring tool: MonaLisa DIRAC – The LHCb experiment –Similar to Alien with Ganga as user interface DIANE - GridICE and Monalisa, two monitoring services for users –collect information from agents deployed on the grid nodes and from the Information System –web interface

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Content Biomedical Virtual Organization status WISDOM, example of biomedical applications –Components of the WISDOM application –Achieved deployments on the EGEE infrastructure –Perspectives

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Deployment of WISDOM on EGEE infrastructure: Significant numbers – summer 2005 Target : Plasmepsin (PDB) Software : FlexX and Autodock Compounds : 1,000,000 (Zinc) Duration: 6 weeks instead of 80 years 1TB of data produced Up 1,700 computers in 15 countries used simultaneously Crunching factor: 600 Total amount of CPU provided by EGEE federation Number of docked compounds vs time

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Origin of failures during the WISDOM-I deployment RateReasons Success rate after checking output data46 % Server license failure23%Server failure Power cut Server stop WISDOM failure4%Job distribution Human error Script failure Workload Management failure10 %Overload, disk failure Mis-configuration, disk space problem Air-conditioning, power cut Data Management failure4 %Network / connection Power cut Other unknown causes Sites failure9 %Mis-configuration, tar command, disk space Information system update Job number limitation in the waiting queue Air-conditioning, electrical cut Unclassified4 %Lost jobs Other unknown causes Grid success rate 63% After substracting license server and WISDOM failures

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Deployment of WISDOM on EGEE, TWGrid & Auvergrid: Significant numbers – spring 2006 Target : neuraminidase (H5N1) Software : Autodock Compounds : 300,000 (ZINC + private chemolibrary) Duration : 6 weeks instead of 105 years Up to 1,700 computers in 17 countries mobilized 750 GB of data produced Crunching factor : 767 Distribution of jobs on EGEE federations, Auvergrid and TWGrid Number of docked compounds vs time

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Deployment improvement Grid success rate: 80% Reasons for the grid success rate improvement of WISDOM production system: –Constant and slower job submission flow –Manual control of resubmission process –WISDOM fault-tolerance improved –Grid reliability improved (Workload Management System) Less than 3 months between the first contacts and the achievement of all the required computations

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Content Biomedical Virtual Organization status WISDOM, example of biomedical applications –Components of the WISDOM application –Achieved deployments on the EGEE infrastructure –Perspectives

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Summary (1/2) The experiments demonstrated how grid infrastructures have a tremendous capacity to mobilize very large CPU resources for well targeted goals during a significant period of time The deployments have been a very useful experience in identifying the limitations and bottlenecks of the EGEE infrastructure and middleware The reliability is still the major issue for the WISDOM production system and the EGEE middleware –Migration on the new EGEE middleware: gLite The output data collection need to be improved –Storage of output metadata from the jobs in a relational database

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Summary (2/2) WISDOM data challenge has demonstrated that collaborative production grids can be used for steps in the drug discovery process The impact has significantly raised the interest of the research community on malaria. A second larger computing challenge against malaria is currently running. Output data presentation require improvements to speed-up the post-docking analysis –Access to the output metadata database and to the docking output files is required The deployment requires to be grid expert The next step after docking: molecular dynamics, is currently being deployed on EGEE infrastructure

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nicolas Jacq, HealthGrid  Formation EGEE  Long term vision: a grid for malaria Use the grid technology to foster research and development on malaria and other neglected diseases Univ. Los Andes: Biological targets, Malaria biology LPC Clermont-Ferrand: Biomedical grid SCAI Fraunhofer: Knowledge extraction, Chemoinformatics Univ. Modena: Biological targets, Molecular Dynamics ITB CNR: Bioinformatics, Molecular modelling Univ. Pretoria: Bioinformatics, Malaria biology Academica Sinica: Grid user interface Contacts also established with WHO, Microsoft, TATRC, Argonne, SDSC, SERONO, NOVARTIS, Sanofi- Aventis, Hospitals in subsaharian Africa, HealthGrid: Biomedical grid, Dissemination CEA, Acamba project: Biological targets, Chemogenomics