Molecular Dynamics Simulations of Proteins with Petascale Special-Purpose Computer MDGRAPE-3 Makoto Taiji Deputy Project Director Computational & Experimental.

Slides:



Advertisements
Similar presentations
Learning Computer Simulation with an MDGRAPE Accelerator Tetsu Narumi 1, Fumikazu Konishi 2, Ryo Umetsu 2, Makoto Taiji 2, Toshikazu Ebisuzaki 3, Kenji.
Advertisements

©2009 HP Confidential template rev Ed Turkel Manager, WorldWide HPC Marketing 4/7/2011 BUILDING THE GREENEST PRODUCTION SUPERCOMPUTER IN THE.
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
One-day Meeting, INI, September 26th, 2008 Role of spectral turbulence simulations in developing HPC systems YOKOKAWA, Mitsuo Next-Generation Supercomputer.
Early Linpack Performance Benchmarking on IPE Mole-8.5 Fermi GPU Cluster Xianyi Zhang 1),2) and Yunquan Zhang 1),3) 1) Laboratory of Parallel Software.
Petaflops Special-Purpose Computer for Molecular Dynamics Simulations Makoto Taiji High-Performance Molecular Simulation Team Computational & Experimental.
10/21/20091 Protein Explorer: A Petaflops Special-Purpose Computer System for Molecular Dynamics Simulations Makoto Taiji, Tetsu Narumi, Yousuke Ohno,
Special-purpose computers for scientific simulations Makoto Taiji Processor Research Team RIKEN Advanced Institute for Computational Science Computational.
Implementation methodology for Emerging Reconfigurable Systems With minimum optimization an appreciable speedup of 3x is achievable for this program with.
Future CAMD Workloads and their Implications for Computer System Design IEEE 6th Annual Workshop on Workload Characterization.
Molecular dynamics refinement and rescoring in WISDOM virtual screenings Gianluca Degliesposti University of Modena and Reggio Emilia Molecular Modelling.
Claude TADONKI Mines ParisTech – LAL / CNRS / INP 2 P 3 University of Oujda (Morocco) – October 7, 2011 High Performance Computing Challenges and Trends.
Heterogeneous Computing: New Directions for Efficient and Scalable High-Performance Computing Dr. Jason D. Bakos.
A many-core GPU architecture.. Price, performance, and evolution.
An Integrated Approach to Protein-Protein Docking
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
Lecture 1: Introduction to High Performance Computing.
Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Heterogeneous Computing Dr. Jason D. Bakos. Heterogeneous Computing 2 “Traditional” Parallel/Multi-Processing Large-scale parallel platforms: –Individual.
Computing Platform Benchmark By Boonyarit Changaival King Mongkut’s University of Technology Thonburi (KMUTT)
3.1Introduction to CPU Central processing unit etched on silicon chip called microprocessor Contain tens of millions of tiny transistors Key components:
1 Extending Summation Precision for Network Reduction Operations George Michelogiannakis, Xiaoye S. Li, David H. Bailey, John Shalf Computer Architecture.
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Serono Science Scientific computing and high performance applications
Computer-Assisted Drug Design (1) i)Random Screening ii)Lead Development and Optimization using Multivariate Statistical Analyses. iii)Lead Generation.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
Application of e-infrastructure to real research.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
BlueGene/L Facts Platform Characteristics 512-node prototype 64 rack BlueGene/L Machine Peak Performance 1.0 / 2.0 TFlops/s 180 / 360 TFlops/s Total Memory.
Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity Approach: –Discretize the.
PDCS 2007 November 20, 2007 Accelerating the Complex Hessenberg QR Algorithm with the CSX600 Floating-Point Coprocessor Yusaku Yamamoto 1 Takafumi Miyata.
Anton Supercomputer Brandon Dean 4/28/15. History Named after Antonie van Leeuwenhoek – “father of microbiology” Molecular Dynamics (MD) simulations were.
80-Tile Teraflop Network-On- Chip 1. Contents Overview of the chip Architecture ▫Computational Core ▫Mesh Network Router ▫Power save features Performance.
In silico discovery of inhibitors using structure-based approaches Jasmita Gill Structural and Computational Biology Group, ICGEB, New Delhi Nov 2005.
- Rohan Dhamnaskar. Overview  What is a Supercomputer  Some Concepts  Couple of examples.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Protein Explorer: A Petaflops Special Purpose Computer System for Molecular Dynamics Simulations David Gobaud Computational Drug Discovery Stanford University.
An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow.
SimBioSys Inc.© 2004http:// Conformational sampling in protein-ligand complex environment Zsolt Zsoldos SimBioSys Inc., © 2004 Contents:
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
1/20 Study of Highly Accurate and Fast Protein-Ligand Docking Method Based on Molecular Dynamics Reporter: Yu Lun Kuo
Hierarchical Database Screenings for HIV-1 Reverse Transcriptase Using a Pharmacophore Model, Rigid Docking, Solvation Docking, and MM-PB/SA Junmei Wang,
ANTON D.E Shaw Research.
THE BRIEF HISTORY OF 8085 MICROPROCESSOR & THEIR APPLICATIONS
Home - Distributed Parallel Protein folding Chris Garlock.
Anton, a Special-Purpose Machine for Molecular Dynamics Simulation By David E. Shaw et al Presented by Bob Koutsoyannis.
Computer Architecture Lecture 26 Past and Future Ralph Grishman November 2015 NYU.
Computer Organization IS F242. Course Objective It aims at understanding and appreciating the computing system’s functional components, their characteristics,
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
Molecular Modeling in Drug Discovery: an Overview
FESR Consorzio COMETA - Progetto PI2S2 Molecular Modelling Applications Laura Giurato Gruppo di Modellistica Molecolare (Prof.
History of Computers and Performance David Monismith Jan. 14, 2015 Based on notes from Dr. Bill Siever and from the Patterson and Hennessy Text.
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
Page 1 Molecular Modeling Service in Profacgen. Page 2 The three-dimensional structure of a protein provides essential information about its biological.
Super Computing By RIsaj t r S3 ece, roll 50.
Architecture & Organization 1
Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.
Development of the Nanoconfinement Science Gateway
Architecture & Organization 1
BIC 10503: COMPUTER ARCHITECTURE
An Integrated Approach to Protein-Protein Docking
Presentation transcript:

Molecular Dynamics Simulations of Proteins with Petascale Special-Purpose Computer MDGRAPE-3 Makoto Taiji Deputy Project Director Computational & Experimental Systems Biology Group Genomic Sciences Center, RIKEN

What is GRAPE? GRAvity PipE Special-purpose accelerator for classical particle simulations  Astrophysical N-body simulations  Molecular Dynamics Simulations MDGRAPE-3 : Petaflops GRAPE for Molecular Dynamics simulations J. Makino & M. Taiji, Scientific Simulations with Special-Purpose Computers, John Wiley & Sons, 1997.

MDGRAPE-3 (aka Protein Explorer) Petaflops special- purpose computer for molecular dynamics simulations Started at April 2002, Finished at June 2006 Part of Protein 3000 project – a project to determine 3,000 protein structures M. Taiji et al, Proc. Supercomputing 2003, on CDROM. M. Taiji, Proc. Hot Chips 16, on CDROM (2004). EGFRTT RNA Polymerase

Molecular Dynamics Simulations Folding of Chignolin, 10-residue β-hairpin design peptide (by Dr. A. Suenaga) Bonding + - Coulomb VdW Force calculation dominates computational time Require large computational power

How GRAPE works Accelerator to calculate forces Host Computer GRAPE Most of Calculation → GRAPE Others → Host computer Particle Data Results Communication = O (N) << Calculation = O (N 2 ) Easy to build, Easy to use Cost Effective

History of GRAPE computers Eight Gordon Bell Prizes `95, `96, `99, `00 (double), `01, `03, `06

Why we build special-purpose computers? Bottleneck of high-performance computing: Parallelization limit / Memory bandwidth Power Consumption = Heat Dissipation These problems will become more serious in future. Special-purpose approach:  can solve parallelization limit for some applications  relax power consumption  ~100 times better cost-performance

Broadcast Parallelization Molecular Dynamics Case Two-body forces For parallel calculation of F i, we can use the same Broadcast Parallelization - relax Bandwidth Problem Pipeline 1 Pipeline 2 Pipeline i

Highly-Parallel Operations in Molecular Dynamics Processors For special-purpose computers  Broadcast Memory Architecture  Efficient : 720 operations/cycle/chip in MDGRAPE-3 chip  possible to increase according to Moore’s law In case of MD: MDGRAPE600 nm1 pipeline1Gflops MDGRAPE-2250 nm4 pipelines16Gflops MDGRAPE-3130 nm20 pipelines180Gflops

Power Efficiency of Special-Purpose Computers If we compare at the same technology  Pentium 4 (0.13  m, 3GHz, FSB800) … 14W/Gflops  MDGRAPE-3 chip (0.13  m) … 0.1W/Gflops Why ?  Highly-parallel at low frequency MDGRAPE-3: 250MHz, 720-equivalent operations for example, single-precision multiplier has 3 pipeline stages  Tuning accuracy Most of calculations are done in single precision  Slow I/O 84-bit wide input and output port at 125 MHz (GTL)

Force Pipeline Calculate two-body central forces 8 multipliers, 9 adders, and 1 function evaluator = 33 equivalent operations for Coulomb force calculation A. H. Karp, Scientific Programming, 1, pp133–141 (1992) Function Evaluator: approximate arbitrary functions by segmented fourth-order polynomials Multipliers : floating-point, single precision Adders: floating-point, single precision / fixed-point 40 or 80 bit

Block Diagram of MDGRAPE-3 chip Memory-in-a-chip Architecture Memory for 32,768 particles The same data is broadcasted to each pipeline

MDGRAPE-3 chip W at 300 MHz Hitachi HDL4N 130 nm Vcore=+1.2V 15.7 mm X 15.7 mm 6.1 M random gates + 9 Mbit memory 1444 pin FCBGA

MDGRAPE-3 Board 12 Chips/Board 2 boards/2U subrack = 5 Tflops Connected to PCI-X bus via LVDS 10Gbit/s interface

MDGRAPE-3 system 4,778 dedicated LSI “MDGRAPE-3 chip”  300MHz(216Gflops) 3,890  250MHz(180Gflops) 888 Nominal Peak Performance: 1 Petaflops Total 400 boards with 12(some 11) MDGRAPE-3 chips Host : Intel Xeon Cluster, 370 cores  Dual-core Xeon 5150(Woodcrest 2.66GHz) 2way server x 85 Nodes provided by Intel Corporation  Xeon 3.2DGHz 2way server x 15 Nodes  System Integration: Japan SGI Power Consumption : 200kW Size : 22 standard 19inch racks Cost : 8.6 M$ (including Labor)

MDGRAPE-3 system

Gordon Bell Prize 2006 : 185 TFLOPS simulations of Amyloid-Forming Peptides from Yeast Sup35 Proteins

Sustained Performance of Parallel System Gordon Bell 2006 Honorable Mention, Peak Performance Amyloid forming process of Yeast Sup 35 peptides Systems with 17 million atoms Cutoff simulations (Rcut = 45 Å) Nominal peak : 860 Tflops Running speed : 370 Tflops Sustained performance: 185 Tflops Efficiency ~ 45 %

Gordon Bell Prize 2007 Finalist 281 TFLOPS for X-ray protein structure analysis GA-DS (Genetic Algorithm- Direct Space) method Discrete Fourier Transformation by MDGRAPE- 3 Peak Performance: 303 TFLOPS Efficiency ~ 93%

What can be done by large-scale molecular dynamics simulations? Predictions of  Protein-Protein interactions  Protein-Ligand interactions (screening)  Protein-DNA/RNA interactions  Effect of SNPs  Effect of chemical modifications Molecular mechanism of proteins in atomic level Protein Folding – dynamics of folding pathways We provide methods to bridge knowledge on molecular structures and functions

Ligand screening using MD High-precision estimation of binding free energy of protein- ligand complex with MD 1000 poses, 200 ligands/day Suitable for Lead Optimization  Evaluate affinities before chemical synthesis Currently we mainly focus on MMPBSA which is suitable for large-scale applications N. Okimoto et al., in preparation.

Combining Molecular Docking and Molecular Dynamics Simulations Combination of molecular docking and high-speed MD simulations improve these drawbacks. Molecular docking techniques  Advantages: they explore the vast conformational space of ligands in a short time.  Drawbacks: predictions of binding free energies are less reliable. Molecular Dynamics (MD) simulation techniques  Advantages: they can treat both ligand and protein in a flexible way, consider the solvation effect, and predict accurate binding free energies.  Drawbacks: they are time-consuming and the system are often trapped in local minima. MDGRAPE-3 High-performance special purpose computer, MDGRAPE-3, enable high-speed and high-accuracy MD simulations.

Our Screening Approach Library of Chemical Compounds (6,000,000) Enriched Library Target Protein X-ray, NMR, Modeling Successfully Docked Compounds Lead Compound Lead Optimization Drug Candidates Experimental testing Drug Molecular Docking MDGRAPE-3 MD Simulations Using MDGRAPE-3 Drug-like compounds De Novo Design Pre-filtering Lipinski’s rule Similarity analysis Pharmacophore analysis Test of Our Approach Lead Discovery

Speed of Screening Multiconformer MD plus MM-PBSA+PCA calculations 1200 poses/128 CPUs/a day ~ 300 ligands/128 CPUs/ a day Singleconformer MD plus MM-PBSA+PCA calculations 1200 ligands/128 CPUs/a day A MD simulation 2.5 hours MM-PBSA 1.25 hoursPCA 1.25 hours Core 1 Core 2 1CPU 2.5 hours/1pose/1 CPUs Screening of 10,000 compounds Screening of 10,000 compounds by molecular docking : 1 days Screening of top 1,000 compounds by MD (5,000 poses): 4-5 days

Screening Performance: Trypsin - Enrichment Curves - (goldscore/ligprep) % of Top 1,000 compounds % of active compounds 3 times5 times

Other Applications Protein/Peptide Designs  Transcription Factors  Growth Factors  Fusion Proteins  Enzymes Large-Scale Calculations  Virus Capsid Proteins  Molecular Motors  Ribosome  Membranes

Next Generation: MDGRAPE-4 Tile Processor with special-purpose tile 200 General-purpose Processor Tiles 0.5TFlops (DP), 1TFlops (SP) 20 Special-Purpose Tiles for Molecular Dynamics 0.7TFlops Fast on-chip network with 200~300GB/s router bandwidth 5 Memory Access Tiles  160~320GB/s 45nm <<100W peak power with Duraid Madeina, Univ. Tokyo

Block Diagram of MDGRAPE-4 Processor

Acknowledgements Dr. Yousuke Ohno Dr. Atsushi Suenaga Dr. Noriyuki Futatsugi Dr. Gentaro Morimoto Ms. Hiroko Kondo Ms. Ryoko Yanai Dr. Tetsu Narumi (Keio Univ.) Mr. Duraid Madeina (U. Tokyo) Ministry of Education, Culture, Sports, Science & Technology Intel Corporation for early processor support Japan SGI for system integration