Download presentation
Presentation is loading. Please wait.
Published byAllyson Gregory Modified over 9 years ago
1
EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Docking and molecular dynamics – complete 16 stage workflow with gLite WMS Astrid Maaß, Jisamma Kallumadikal Fraunhofer Institute SCAI, St. Augustin, Germany
2
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de 2 plasmepsin Application Background ZINC compound db 500.000 docking Wide in Silico Docking on Malaria Further analysis Targets P.f. PLM P.f. GST P.f. DHFR P.f. Tub P.v. DHFR Plasmodium falciparum in human red blood cells plm hits 5000
3
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de 3 Motivation Docking information –binding mode drug design –Estimate of inhibitory potential Pro: –reduce number of candidate molecules rapidly! Contra: –results may be inaccurate in detail For promising candidates: –combine docking with more realistic scoring function: All-in-one workflow = FlexX + Amber –gain accuracy, maintain all information!
4
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de 4 rmsd ≤ 1.5Å Reference Data Trypsin (20)Avidin (16)HIV-Pr (15) after Docking 0.86 / 1.88 Å0.62 / 2.19 Å0.76 / 5.91 Å after MD0.90 / 1.81 Å0.95 / 1.00 Å0.88 / 2.38 Å Quality of predictions: correlation with exp. affinities/ average rmsd (quality of placements) exp R≤1 calc
5
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de 5 Docking-MD-Workflow: Purpose ΔG bind = ΔE = E PL –(E P + E L ) ΔE VDW + ΔE ELE + ΔE SOLV + Δn rot D. Huang & A. Caflisch: Efficient evaluation of binding free energy using continuum electrostatics solvation; J. Med. Chem., 2004, 47, 5791-5797 Docking Set up system for MD (assign charges, atomtypes) Energy minimization Format conversion Clustering Energy minimization MD 1 MD2 Calc. complex energy Calc. Ligand energy Calc. Protein energy Calc. ΔG FlexX Amber
6
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de 6 Preliminary Results Current status: –Test run completed successfully for avidin with default settings: R: 0.51 → 0.71, rmsd: 2.91Å → 2.51 Å –reranking in progress ZINC01073995
7
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de 7 Workflow The workflow consists of 16 stages, –where each stage depends on the previous stage –executes the following softwares FlexX 2.0.0 Commercial product Licence manager FLEXlm is used Limited access control Amber 8 (Sander & Tleap + Elan) Convert (inhouse development) Cluster (inhouse development) FlexX Tleap Sander Convert Cluster …
8
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de 8 Architecture Components Intermediate Job Meta-data management Software Applications Storage Element gLite WMS EGEE Infrastruture gLite CE Input XML gLite UI
9
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de 9 Middleware gLite 3.0.2 –gLite WMS provides the feature ‚DAG‘ Job Procedures become nodes of the dag FlexX, Tleap, Sander, Convert, Cluster and Intermediate Dependency feature The output of the previous job will go as input to the next job Each procedure is responsible for upload and download of its outputs and inputs, from an appropriate storage element Distributed Data Management Available to access data irrespective of the global location of the sites where the jobs are running ComputingElement StorageElement
10
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de 10 Workflow The ‚BIO‘ Workflow –Flexx writes (conformations) are unknown –Cluster reduces the no: of conformations Output of each stage is unpredictable Dynamic distribution of conformations required Organisation of the huge quantity of produced data
11
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de 11 Processing HOW IT WORKS (Grid)
12
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de 12 Workflow The ‚All in One‘ Workflow –The intermediate job provides dynamic division of the jobs –Higher performance –Organisation of Data Goes through each Ligand Keeps a count of the generated conformation Divide this count with the nodes Packs the input Register the input on the Grid (LFC)
13
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de 13 Workflow The ‚All in One‘ Workflow –Subjobs: 9 (Reserved) –Input: 1 Protein, 10 Ligands –Flexx write: 25 (varies) Dynamic ‚Bio‘ Workflow is combined with the gLite feature ‚DAG‘ Nodes are equally loaded 25 24
14
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de 14 User Input XML (User input) –Experiment Name –VO name –Middleware –SE, LFC –Proteins, Ligands, Sites & Receptor –Job Information Job Type Subjobs required (approximate) Input Script (batch files)
15
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de 15 Monitoring / Meta-data Management Meta-data Management Invokes the handle „http://scaimeta.scai.fraunhofer.de/ogsa/services/ogsadai/flemm“ Runs the query –Select, Insert or Update (Id, JobId, NodeName, JobType, InputLfn, OutputLfn, Destination, Hostname, StartTime, StopTime, CPUTime, SizeOfInput, SizeOfOutput, SizeOfApps, SizeOfSandboxStart, SizeOfSandboxStop) Returns the response of the request (Result Set) scaimeta.scai.fraunhofer.de Invokes the web service(Request) Response
16
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de 16 Estimation STAGES –Considering 10 ligands (20 conformations) as input, The number of subjobs (nodes) for each stage be 5 The dag consists of 80 nodes 16 stages * 5 = 80 Each node will get 2 ligands 10 ligands / 5 nodes = 2 The approximate time required for a workflow to complete the process is 15 hours without getting stuck up in long queues or failed sites without any application / software failures without any grid problems Achieved on DECH VO by extending the proxy limit to 72 hours
17
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de 17 Estimation Job Name Job Input/job Appr.time/L Appr.time/job ----------------------------------------------------------------------------------------------------- Flexx5 2L 3 min6 min Tleap5 2L 15 min30 min Sander(Min1)5 2L 21 min42 min Convert5 2L 42 sec2 min Cluster5 2L 6 sec12 sec Sander(Min2)5 2L 3.5 min7 min Sander(Md1)5 2L 1.5 hrs3 hrs Sander(Md2)5 2L 3.8 hr8 hrs Convert5 2L 6 sec12 sec Tleap25 2L 9 min18 min Sander(Min3)5 2L 6 sec12 sec Convert5 2L 12 sec24 sec Tleap35 2L 8 min16 min Sander(Min4)5 2L 7 min14 min Tleap45 2L 6 min12 min Sander(Min4)5 2L 6 min12 min
18
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de 18 Limitation Time Constraint (Biomed) –Limited Proxy (24 hrs) To successfully run the workflow we have split it into 3 parts and submitted one after the other to finish it within the available time frame –Successfully run on DECH VO Proxy was extented to 72 hours gLite updating creates problems which require the restart of WMS which aborts the running jobs –Updates are often not tested for complex jobs Long Black List Jobs are forwarded to sites which are having queues of length 4444 (to failed sites). These jobs are getting scheduled for a long time Jobs are getting aborted due to the following errors –Job got an error while in the CondorG queue. –File not available.Cannot read JobWrapper output, both from Condor and from Maradona. –Got a job held event, reason: Repeated submit attempts (GAHP reports:) –93 the gatekeeper failed to find the requested service –Got a job held event, reason: Unspecified gridmanager error –unable to register job – system load too high
19
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de 19 Challenges Developing a GUI –for the ease of the user Integrating gLite API Automatic restart of aborted jobs Control of FlexX licences
20
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de 20 Thank You Thanks to... the EGEE-team at SCAI: –Jiri Kraus –Andre Gemünd –Christoph Lenzen –Horst Schwichtenberg –Klaere Cassirer the BiosolveIT for providing FlexX licences
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.