Using DOCK to characterize protein ligand interactions Sudipto Mukherjee Robert C. Rizzo Lab.

Slides:



Advertisements
Similar presentations
K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.
Advertisements

Introductions to Parallel Programming Using OpenMP
AutoDock 4 and AutoDock Vina -Brief Intruction
SAN DIEGO SUPERCOMPUTER CENTER Blue Gene for Protein Structure Prediction (Predicting CASP Targets in Record Time) Ross C. Walker.
Samford University Virtual Supercomputer (SUVS) Brian Toone 4/14/09.
Distributed Indexed Outlier Detection Algorithm Status Update as of March 11, 2014.
A 3-D reference frame can be uniquely defined by the ordered vertices of a non- degenerate triangle p1p1 p2p2 p3p3.
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
Using Metacomputing Tools to Facilitate Large Scale Analyses of Biological Databases Vinay D. Shet CMSC 838 Presentation Authors: Allison Waugh, Glenn.
Extending a molecular docking tool to run simulations on clouds Damjan Temelkovski Dr. Tamas Kiss Dr. Gabor Terstyanszky University of Westminster.
Protein Structure and Drug Discovery Workshop To be held at Monash University, Mebourne, Australia October 3 rd to 4 th 2006 Molecular Visualization Learn.
HPCC Mid-Morning Break Interactive High Performance Computing Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
The Asynchronous Dynamic Load-Balancing Library Rusty Lusk, Steve Pieper, Ralph Butler, Anthony Chan Mathematics and Computer Science Division Nuclear.
‘Tis not folly to dream: Using Molecular Dynamics to Solve Problems in Chemistry Christopher Adam Hixson and Ralph A. Wheeler Dept. of Chemistry and Biochemistry,
DockoMatic: Automated Tool for Homology Modeling and Docking Studies DOCKOMATIC-Student Procedure-Homology Modeling & Molecular Docking Tutorial.
Using the WS-PGRADE Portal in the ProSim Project Protein Molecule Simulation on the Grid Tamas Kiss, Gabor Testyanszky, Noam.
LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
Application of e-infrastructure to real research.
High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud Sally R. Ellingson Graduate Research Assistant Center.
Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov.
Slide 1 MIT Lincoln Laboratory Toward Mega-Scale Computing with pMatlab Chansup Byun and Jeremy Kepner MIT Lincoln Laboratory Vipin Sachdeva and Kirk E.
G. Terstyanszky, T. Kukla, T. Kiss, S. Winter, J.: Centre for Parallel Computing School of Electronics and Computer Science, University of.
InCoB August 30, HKUST “Speedup Bioinformatics Applications on Multicore- based Processor using Vectorizing & Multithreading Strategies” King.
Protein Molecule Simulation on the Grid G-USE in ProSim Project Tamas Kiss Joint EGGE and EDGeS Summer School.
Parameter Sweep Workflows for Modelling Carbohydrate Recognition ProSim Project Tamas Kiss, Gabor Terstyanszky, Noam Weingarten.
BlueGene/L Facts Platform Characteristics 512-node prototype 64 rack BlueGene/L Machine Peak Performance 1.0 / 2.0 TFlops/s 180 / 360 TFlops/s Total Memory.
Evaluation of Non-Uniqueness in Contaminant Source Characterization based on Sensors with Event Detection Methods Jitendra Kumar 1, E. M. Zechman 1, E.
Efficient Data Accesses for Parallel Sequence Searches Heshan Lin (NCSU) Xiaosong Ma (NCSU & ORNL) Praveen Chandramohan (ORNL) Al Geist (ORNL) Nagiza Samatova.
Stochastic optimization of energy systems Cosmin Petra Argonne National Laboratory.
1/30/2003 BARC1 Profile-Guided I/O Partitioning Yijian Wang David Kaeli Electrical and Computer Engineering Department Northeastern University {yiwang,
In silico discovery of inhibitors using structure-based approaches Jasmita Gill Structural and Computational Biology Group, ICGEB, New Delhi Nov 2005.
SimBioSys Inc.© Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package.
Using SWARM service to run a Grid based EST Sequence Assembly Karthik Narayan Primary Advisor : Dr. Geoffrey Fox 1.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
1/20 Study of Highly Accurate and Fast Protein-Ligand Docking Method Based on Molecular Dynamics Reporter: Yu Lun Kuo
Sep 08, 2009 SPEEDUP – Optimization and Porting of Path Integral MC Code to New Computing Architectures V. Slavnić, A. Balaž, D. Stojiljković, A. Belić,
Brent Gorda LBNL – SOS7 3/5/03 1 Planned Machines: BluePlanet SOS7 March 5, 2003 Brent Gorda Future Technologies Group Lawrence Berkeley.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
INFSO-RI Enabling Grids for E-sciencE EGEE Review WISDOM demonstration Vincent Bloch, Vincent Breton, Matteo Diarena, Jean Salzemann.
Nara Institute of Science and Technology, Nara Prefecture, Japan CONFIGURATION AND DEPLOYMENT OF A SCALABLE VIRTUAL MACHINE CLUSTER FOR MOLECULAR DOCKING.
MapReduce & Hadoop IT332 Distributed Systems. Outline  MapReduce  Hadoop  Cloudera Hadoop  Tutorial 2.
Polish Infrastructure for Supporting Computational Science in the European Research Space EUROPEAN UNION Examining Protein Folding Process Simulation and.
Single Node Optimization Computational Astrophysics.
GRIDP: Web-enabled Drug Discovery Is there any way I can use computational tools to reduce the number of molecules I have to screen to a manageable number,
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
Performance Analysis on Blue Gene/P Tulin Kaman Department of Applied Mathematics and Statistics Stony Brook University.
Parallel IO for Cluster Computing Tran, Van Hoai.
Large-scale accelerator simulations: Synergia on the Grid turn 1 turn 27 turn 19 turn 16 C++ Synergia Field solver (FFT, multigrid) Field solver (FFT,
HPC University Requirements Analysis Team Training Analysis Summary Meeting at PSC September Mary Ann Leung, Ph.D.
Autoligand is a script that comes with autodock tools. It has two modes: Find #n binding site within the grid region. Define shape and volume of binding.
Elon Yariv Graduate student in Prof. Nir Ben-Tal’s lab Department of Biochemistry and Molecular Biology, Tel Aviv University.
FESR Consorzio COMETA - Progetto PI2S2 Molecular Modelling Applications Laura Giurato Gruppo di Modellistica Molecolare (Prof.
Condor on Dedicated Clusters Peter Couvares and Derek Wright Computer Sciences Department University of Wisconsin-Madison
1st EELA-2 Grid School 2-15 November Jérôme Verleyen Juan Manuel Hurtado Rámirez Enrique Merino Pérez META-DOCK.
IBM India Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum length: 2 lines Confidentiality/date line: 13pt Arial.
APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY
Is System X for Me? Cal Ribbens Computer Science Department
DATA MINING FOR SMALL MOLECULE ALLOSTERIC INHIBITORS
Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.
CICC Combines Grid Computing with Chemical Informatics
Ligand Docking to MHC Class I Molecules
University of Westminster Centre for Parallel Computing
FOUNDATIONS OF MODERN SCIENTIFIC PROGRAMMING
Cheminformatics Basics
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Presentation transcript:

Using DOCK to characterize protein ligand interactions Sudipto Mukherjee Robert C. Rizzo Lab

Acknowledgements The Rizzo Lab Dr. Robert C. Rizzo Brian McGillick Rashi Goyal Yulin Huang Support Stony Brook Department of Applied Mathematics and Statistics New York State Office of Science, Technology & Academic Research Computational Science Center at Brookhaven National Laboratory National Institutes of Health (NIGMS) NIH National Research Service Award. Grant Number:1F31CA (Trent E. Balius) IBM Rochester Carlos P. Sosa Amanda Peters

Introduction What is Docking? Compilation of DOCK on BG Scaling Benchmarks

Docking as a Drug Discovery Tool Receptor: TrypsinLigand: BenzamidineComplex Docking : Computational Search for energetically favorable binding poses of a ligand with a receptor. Find origins of ligand binding which drive molecular recognition. Finding the correct pose, given a ligand and a receptor. Finding the best molecule, given a database and a receptor. Conformer Generation Shape Fitting Scoring Functions Pose Ranking

Docking Resources Small Molecule Databases –NCI (National Cancer Institute) –UCSF ZINC zinc.docking.org Protein receptor structure –Protein Data Bank Docking Tutorials –Rizzo Lab Wiki –UCSF Tutorials dock.compbio.ucsf.edu/DOCK_6/index.htm –AMS Comp Bio Course Sequence Modeling Tools –Chimera (UCSF)

Compiling DOCK6 on BlueGene IBM XL Compiler Optimizations O5 Level Optimization –qhot Loop analysis optimization –qipa Enable interprocedural analysis PowerPC Double Hummer (2 FPU) –qtune=440 qarch=440d MASSV Mathematical Acceleration Subsystem –-lmassv DOCK Accessory programs not ported –Energy Grid files must be computed on FEN, not on regular Linux cluster because of endian issues High Throughput Computing Validation for Drug Discovery Using the DOCK Program on a Massively Parallel System Thanks to Amanda Peters, Carlos P. Sosa (IBM) for compilation help

Compiling Dock on BG/L Cross-compile on Front End Node with Makefile parameters for IBM XL Compilers CC = /opt/ibmcmp/vac/bg/8.0/bin/blrts_xlc CXX = /opt/ibmcmp/vacpp/bg/8.0/bin/blrts_xlC BGL_SYS= /bgl/BlueLight/ppcfloor/bglsys CFLAGS = -qcheck=all -DBUILD_DOCK_WITH_MPI -DMPICH_IGNORE_CXX_SEEK -I$(BGL_SYS)/include -lmassv -qarch=440d -qtune=440 -qignprag=omp -qinline -qflag=w:w -O5 -qlist -qsource -qhot FC = /opt/ibmcmp/xlf/bg/10.1/bin/blrts_xlf90 FFLAGS = -fno-automatic -fno-second-underscore LOAD = /opt/ibmcmp/vacpp/bg/8.0/bin/blrts_xlC LIBS = -lm -L$(BGL_SYS)/lib -lmpich.rts -lmsglayer.rts -lrts.rts -ldevices.rts Note that library files and compiler binaries are located in different paths on BG/L and BG/P

Compiling Dock on BG/P CC= /opt/ibmcmp/vac/bg/9.0/bin/bgxlc CXX= /opt/ibmcmp/vacpp/bg/9.0/bin/bgxlC BGP_SYS= /bgsys/drivers/ppcfloor CFLAGS= -L/opt/ibmcmp/xlmass/bg/4.4/bglib -lmassv -L-qcheck=all $(XLC_TRACE_LIB) -qarch=440d -qtune=440 -qignprag=omp -qinline -qflag=w:w -DBUILD_DOCK_WITH_MPI -DMPICH_IGNORE_CXX_SEEK -I$(BGP_SYS)/comm/include -O5 -qlist -qsource -qhot FC= /opt/ibmcmp/xlf/bg/11.1/bin/bgxlf90 FFLAGS= $(XLC_TRACE_LIB) -O3 -qlist -qsource -qhot -fno-automatic -fno-second-underscore -qarch-440d -O3 -qlist -qsource -qhot -qlist -fno-automatic -fno-second-underscore LOAD= /opt/ibmcmp/vacpp/bg/9.0/bin/bgxlC LIBS= -lm -L$(BGP_SYS)/comm/lib -lmpich.cnk -ldcmfcoll.cnk -ldcmf.cnk -L$(BGP_SYS)/runtime/SPI -lSPI.cna -lrt -lpthread -lmass

Dock scaling background Embarrassingly parallel simulation –No comm required between MPI processes –Each molecule can be docked independently as a serial process –VN mode should always be better Scaling bottlenecks –Disk I/O (need to read and write molecules and output file) –MPI master node is a compute node Scaling benchmarks were done with a database of 100,000 molecules with 48 hour time limit. –# of molecules docked is used to determine performance Typical virtual screening run uses ca. 5 million molecules.

Virtual Node mode Mode (#of CPU's) Molecules Docked CO (128)13363 VN (256)24657 Protein = 2PK4, B128 BG/L block This is a check to verify that VN mode is about twice as fast as CO mode. ModeMolecules Docked SMP (64)15287 DUAL (128)26152 VN (256)40316 Protein = 2PK4, B064 BG/P block All simulations were allowed to run for the limit of 48 hours and benchmarked on the # of molecules docked within that time. BG/P has three modes with 1,2 or 4 processors available. BG/P B064 is almost twice as fast as BG/L B128 even though both have same # of CPU's

BG/P VN mode provides best scaling Same simulation with 5 different system shows that BG/P in VN mode is best suited for virtual screening simulations. [ B064 BG/P block ] Docking Time (Hours) Protein PDB Code BG/P B512 block VN mode = 2048 cpus Timing varies widely with type of protein target Timing in hours for Production Run of 100,000 molecules docked

Scaling Benchmark on BG/L Blocksize (VN mode) Docked molecules # of molecules docked in 48hours BG/L Block size Virtual Screening was performed with the protein target 2PK4 (PDB code) with a database of 100,000 molecules run for the limit of 48 hours. For 5 million molecule screen, assuming 48 hr jobs 512 BG/L blocks, VN mode 50,000 molecule chunks = 100 jobs 128 BG/L blocks, VN mode 20,000 molecule chunks = 250 jobs i.e about 2 million node hours for a virtual screen On BG/P 512 block VN mode, 100,000 molecules docked in 20 hours i.e. we can use 200,000 molecule chunks = 25 jobs!

TODO: Future Plans for Optimization Streamline I/O operations to use fewer disk writes The HTC mode (High Throughput Computing) available on BG/P provides better scaling for embarrassingly parallel simulations. Implement multi-threading using OpenMP to take advantage of BG/P Sorting small molecules by # of rotatable bonds leads to better load balancing (Suggestion by IBM researchers)