INFSO-RI Enabling Grids for E-sciencE Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases Hurng-Chun Lee, Li-Yung Ho, and Ying-Ta Wu* *Genomics Research Center Academia Sinica, Taiwan EGEE User Forum CERN,
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, Outline Influenza A Pandemic H5N1 H1N1 H2N2H3N2H1N1 H9N2H7N7H5N1 NAHA deaths /170 cases Feb 26, 2006
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, Neuraminidases cleave host receptors help release of new virions
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, Neuraminidase and Inhibitors Zanamivir R=guanidine Oseltamivir R=H R’=amine R’ Structure-Based Drug Design binding pocket
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, MutationN1N2 R292K oseltamivir Zanamivir H274Y(F)oseltamivir N294Soseltamivir?oseltamivir E119Voseltamivir?oseltamivir E119(G;A;D)oseltamivir?Zanamivir : Predicted mutation site by structure overlay and sequence alignment : Reported mutation site Drug-resistant variants and Point Mutation
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, Prepare the Target Protein -- add polar hydrogen atoms -- assign charges to atoms -- decide range of binding site 2. Run AutoGrid 3. Prepare the Ligand -- assign charges to atoms -- decide flexible bonds (run AutoTors) 4. Run AutoDock 5. Evaluate Results and Rank Score AutoGrid AutoTors Garrett M. Morris David S. Goodsell Ruth Huey William E. Hart Scott Halliday Rik Belew Arthur J. Olson AutoDock Morris et al. (1998), J. Computational Chemistry, 19 : Docking Engine : AutoDock 3.0.5
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, Application Characteristic Virtual screening based on molecular docking is the most time consuming part in structure-based drug design workflow Number of docking tasks = N x M –N: number of ligands –M: number of target structures CPU-bound application, huge amount of output, no communication between tasks Task complexity is unpredictable –difficult to apply trivial domain decomposition method in splitting the tasks The pitiful …
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, Issues of the Grid applications Due to the loose coupling nature, distributing application jobs on the Grid is not trivial –extra works are needed concerning the efficient job handling and result gathering –need also efforts to handle transient network or site problems –complexities should be hidden and the interface to end user should be application oriented The significant Grid system overhead makes the Grid only benefit to the jobs with long computing time –not suitable for the pilot jobs for decision making
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, What is DIANE? A lightweight framework for parallel scientific applications in master-worker model –ideal for applications without communications between parallel tasks (e.g. for most of the Bioinformatics applications in analyzing huge amount of independent dataset) The framework takes care of all synchronization, communication and workflow management details on behalf of application DIANE = Distributed Analysis Environment
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, Distributing AutoDock tasks on the Grid using DIANE
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, DIANE/AutoDock A generic framework to which application can easily plug-in # -*- python -*- Application = 'Autodock' JobInitData = {'macro_repos' :'/home/hclee/diane_demo/autodock/macro', 'ligand_repos':'/home/hclee/diane_demo/autodock/ligand', 'ftprotocol':'gass', 'output_prefix':'autodock_test' } ## The input files will be staged in to workers InputFiles = [] ## The definition of failure recovery def failRecovery(self): print '*'*30 for t in self.master.tasks.failed(): print "ignoring failed task:",t t.ignore() print '*'*30 return 1 autodock.job Application specific job attributes Job level failure recovery definition % diane.startjob –-job autodock.job –ganga –w Intuitive job execution command Possible to mix heterogeneous computing backends
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, DIANE/AutoDock – integrated user interface
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, Performance Evaluation Test case –5 target protein: 1 protein, 5 conformations –ligand: 100 small compounds (with 7 positives ) 500 docking tasks in total Test environment –DIANE backend handler: SSH –Hardware spec: Traditional PC cluster with NFS (2 x Intel Xeon 2.8 GHz + 2 GB memory per node) –Grid: LCG
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, Test Results DIANE/AutoDock framework on Cluster Duration time : total elapsed time of a DIANE job Each DIANE job contain 500 tasks (5 protein conformations x 100 compounds)
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, Handling docking jobs on traditional PC cluster good load balance a DIANE/Autodock Task Test Results DIANE/AutoDock framework on Cluster
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, DIANE/AutoDock framework on LCG-GRID terminated
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, Without redundant scheduling
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, With redundant scheduling job was reassigned to other nodes
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, Compound library enrichment AutoDock parameters: translation / step=2.0 Å quaternion / step =20 degree torsion / step= 20 degree number of energy evaluation =1.5 X 10 6 max. number of generation =2.7 X 10 4 Run number =10 red = positives All positives were docked within RMSD<1.5Å
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, Probe effects due to minor changes in target’s binding sites
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, Summary Modeling compound-protein complex can be speeded up by distributing molecular docking processes on the Grid. With the DIANE framework, distributing molecular docking tasks on the Grid can be easily implemented with intuitive interface for end user. The DIANE framework also provides the functionalities by which the system can be easily tuned to tackle the issues in distributing molecular docking tasks on the loosely-coupled Grid. This simple test case demonstrated that huge compound databases can be effectively enriched by executing docking tasks on Grid. However, more resources are required in order to build up a real HTP docking service for life science community.
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, Acknowledgements Li-Yung Ho Hurng-Chun Lee Hsing-Yen Chen Dr. Simon Lin Jakub Moscicki Dr. Massimo Lamanna Supports from Genomics Research Center, Academia Sinica National Science Council, Taiwan are highly appreciated LCG-ARDA, CERN
Enabling Grids for E-sciencE INFSO-RI EGEE User Forum, CERN, Interacting Complexes A key step to structure-based inhibitor design PDB1F8B