Parameter Sweep Workflows for Modelling Carbohydrate Recognition ProSim Project Tamas Kiss, Gabor Terstyanszky, Noam Weingarten Pamela Greenwell, Hans Heindl AHM’09 Oxford, UK, December 2009
The research interest The motivation: Understanding how sugars interact with their protein partners may lead to development of new treatment methods for many diseases. The obstacle: Investigation of the binding of proteins to sugars in “wet laboratory” (in vitro) experiments is expensive and time consuming Expensive substrates Sophisticated machinery The solution: Use “in silico” tools (computer simulation) to select best binding candidates In vitro work only on selected candidates
The research task Binding pocket Sugar (ligand) Protein (receptor)
The research interest Advantages of in silico methods: Better focusing wet laboratory resources: Better planning of experiments by selecting best molecules to investigate in vitro Reduced time and cost Increased number of molecules screened Problems of in silico experiments: Time consuming Weeks or months on a single computer Simulation tools are too complex for bio-scientists Unix command line interfaces + software packages (Amber, GROMACS) Bio-molecular simulation tools are not widely tested and validated Are the results really useful and accurate?
What can we gain via the simulation? 1. 1.Validation and refinement of in-silico modelling tools 2. 2.Filter potential scenarios for wet lab experiments
The researcher’s interest What does the researcher want? Run the simulations faster Use compute resources – National Grid Service (NGS) Run the simulations Using seamless access to compute resources web based interface Combining many simulation, analysis and visualisation tools workflows Running multiple docking experiments to investigate different protein and sugar combinations parameter study
Westminster Grid Application Support Service (W-GRASS)
Bio- and Life Science - Molecular Dynamics Simulation using CHARMm - Patient Readmission Analysis with R - GAMESS-UK - ab initio molecular electronic structure program - MultiBayes - program for analysing DNA sequences of genes - ProSim - Modelling Protein Carbohydrate Recognition in-silico – application - In silico Modelling Using AutoDock Engineering - DASP - Digital Alias-free Signal Processing - Extraction of X-RAY Diffraction Profiles - Cellular Automata-Based Laser Dynamics Multi-media - Rendering portal - Grid-based on-line rendering service Physics - VisIVO – Visualisation Interface to the Virtual Observatory Application Ported by W-GRASS
ProSim – Protein Molecule Simulation on the Grid Funded by the JISC- ENGAGE program Engaging Research with e-Infrastructure promote the greater engagement of academic researchers in the UK with the UK's e-Infrastructure Prosim objectives: – –define user requirements and user scenarios of protein molecule simulation – –Identify, test and select software packages for protein molecule simulation – –automate the protein molecule simulation creating workflows and parameter study support. – –develop application specific graphical user interfaces – –run protein molecule simulation on the UK National Grid Service and make it available for the bioscience research community.
The User Scenario PDB file 1 (Receptor) PDB file 2 (Ligand) Energy Minimization (Gromacs) Validate (Molprobity) Check (Molprobity) Perform docking (AutoDock) Molecular Dynamics (Gromacs) Phase 1 Phase 2 Phase 3 Phase 4
The User Scenario in detail Public repository Local database User provided Preparation and standardisation Solvation and charge neutralization Energy minimisation Validation phase 1 – selection and preparation of receptor Solvation Energy minimisation Built using SMILES Public repository Local database User provided phase 2 – selection and preparation of ligand
The User Scenario Prepare docking: docking parameters and grid-space - AutoGrid Docking and selection of best results according to total energy AutoDock 10 AutoDock executions, 100 genetic algorithm runs each phase 3 – docking ligand to receptor Solvation of the ligand- receptor structure Energy minimisation – GROMACS Molecular dynamics GROMACS MPI version Molecule trajectory data analysis phase 4 – refining the ligand- receptor molecule (performed on 10 best results of the AutoDock simulation)
The Workflow in g-USE a combination of GEMLCA and standard g-USE jobs Executed on 5 different sites of the UK NGS Parameter sweeps in phases 3 and 4
Running simulations Set input parameters Upload input files Select executor sites Follow execution progress Typical execution time: 24 hours
User views Researchers (or End-User) Minimal computer, Grid and portal skills Only interested in running their own research Import, parameterize, execute and visualise workflows Application Developers (and/or Expert Users) Computer literate researcher or software engineer Define user scenarios and design new experiments Create, test and deploy and modify workflows Communicate with end-users and consider their requirements
The ProSim visualiser Visualisation in a newly developed portlet Allows visualisation of receptor, ligand and docked molecules at any phase during and after simulation (if the necessary files have already been generated) Allows to visualise and compare two molecules at a time. Energy, pressure, temperature and other important statistics statistics are also displayed. Using the KiNG ((Kinemage, Next Generation) visualisation tool
The ProSim visualiser
The ProSim visualiser
Lessons learned Communication between scientists and Grid experts is extremely difficult More than 50% of total time spent for the project is for communication and describing/understanding user requests/requirements Novice Grid users require totally transparent access to Grid resources Users interested in their research and not in Globus, MPI or WMS.
Future plans Make workflow more flexible to accommodate numerous different user scenarios Investigate further scenarios such as virtual screening of many ligands to one selected receptor
Thank you for your attention! Any questions?