Molecular Dynamics Simulations on a GPU in OpenCL Alex Cappiello.

Slides:



Advertisements
Similar presentations
Instructor Notes Lecture discusses parallel implementation of a simple embarrassingly parallel nbody algorithm We aim to provide some correspondence between.
Advertisements

Explaining motion P4. Big picture How forces arise How forces arise Friction and normal reaction Friction and normal reaction Adding forces Adding forces.
N-Body Simulation Michael Mersic CS680.
Christopher McCabe, Derek Causon and Clive Mingham Centre for Mathematical Modelling & Flow Analysis Manchester Metropolitan University MANCHESTER M1 5GD.
IIAA GPMAD A beam dynamics code using Graphics Processing Units GPMAD (GPU Processed Methodical Accelerator Design) utilises Graphics Processing Units.
HOW MACHINES DO WORK? Key Concepts How do machines make work easier? What is a machine’s mechanical advantage? How can you calculate the efficiency of.
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.
Parallel Computation of the Minimum Separation Distance of Bezier Curves and Surfaces Lauren Bissett, Nicholas Woodfield,
CS 732: Advance Machine Learning Usman Roshan Department of Computer Science NJIT.
Big Kernel: High Performance CPU-GPU Communication Pipelining for Big Data style Applications Sajitha Naduvil-Vadukootu CSC 8530 (Parallel Algorithms)
Molecular Dynamics Classical trajectories and exact solutions
Fluid Simulation using CUDA Thomas Wambold CS680: GPU Program Optimization August 31, 2011.
Computing Platform Benchmark By Boonyarit Changaival King Mongkut’s University of Technology Thonburi (KMUTT)
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.
The hybird approach to programming clusters of multi-core architetures.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
GPGPU platforms GP - General Purpose computation using GPU
Antigone Engine Kevin Kassing – Period
Calculating Discrete Logarithms John Hawley Nicolette Nicolosi Ryan Rivard.
Lecture 10 CSS314 Parallel Computing
Chapter 6 Work, Energy, Power Work  The work done by force is defined as the product of that force times the parallel distance over which it acts. 
Work, Energy, Power. Work  The work done by force is defined as the product of that force times the parallel distance over which it acts.  The unit.
Computationally Efficient Histopathological Image Analysis: Use of GPUs for Classification of Stromal Development Olcay Sertel 1,2, Antonio Ruiz 3, Umit.
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
Extracted directly from:
Revisiting Kirchhoff Migration on GPUs Rice Oil & Gas HPC Workshop
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
GPU Programming with CUDA – Optimisation Mike Griffiths
Molecular Dynamics Collection of [charged] atoms, with bonds – Newtonian mechanics – Relatively small #of atoms (100K – 10M) At each time-step – Calculate.
Porting the physical parametrizations on GPUs using directives X. Lapillonne, O. Fuhrer, Cristiano Padrin, Piero Lanucara, Alessandro Cheloni Eidgenössisches.
GPU in HPC Scott A. Friedman ATS Research Computing Technologies.
P ARALLELIZATION IN M OLECULAR D YNAMICS By Aditya Mittal For CME346A by Professor Eric Darve Stanford University.
GPU Architecture and Programming
Dense Image Over-segmentation on a GPU Alex Rodionov 4/24/2009.
An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow.
Introduction to Particle Simulations Daniel Playne.
LLNL-PRES DRAFT This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract.
Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++
6.1 – Work. Objectives Explain the meaning of work. Explain the meaning of work. Describe how work and energy are related. Describe how work and energy.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Experiences with Achieving Portability across Heterogeneous Architectures Lukasz G. Szafaryn +, Todd Gamblin ++, Bronis R. de Supinski ++ and Kevin Skadron.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
8 th Grade Science. * Examples anyone? * Consider this: Before coming to school this morning you probably put your books in your backpack. You lifted.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.
MSc in High Performance Computing Computational Chemistry Module Parallel Molecular Dynamics (i) Bill Smith CCLRC Daresbury Laboratory
Particles and their home in Geometry Shaders Paul Taylor 2010.
FORCES OF MOTION Georgia Shared Resources. STANDARDS.
Sunpyo Hong, Hyesoon Kim
Development of a GPU based PIC
Chapter 5 Matter In Motion Section 4 – Gravity: A Force of Attraction pp
CUDA Compute Unified Device Architecture. Agent Based Modeling in CUDA Implementation of basic agent based modeling on the GPU using the CUDA framework.
CUDA Simulation Benjy Kessler.  Given a brittle substance with a crack in it.  The goal is to study how the crack propagates in the substance as a function.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.
L-8 Physics of Collisions Work and Energy Collisions can be very complicated –two objects crash into each other and exert strong forces over short time.
Parallel Molecular Dynamics A case study : Programming for performance Laxmikant Kale
Image Fusion In Real-time, on a PC. Goals Interactive display of volume data in 3D –Allow more than one data set –Allow fusion of different modalities.
S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.
GPU Acceleration of Particle-In-Cell Methods B. M. Cowan, J. R. Cary, S. W. Sides Tech-X Corporation.
A Little Gas Problem Ideal Gas Behavior.
GPU Architecture and Its Application
Free Science Videos for Kids
Free Science Videos for Kids
Implementing Simplified Molecular Dynamics Simulation in Different Parallel Paradigms Chao Mei April 27th, 2006 CS498LVK.
Computer-Generated Force Acceleration using GPUs: Next Steps
CS 179 Lecture 14.
Bell Ringer What is the formula for work?
What is work Chapter 4 section 1.
Free Science Videos for Kids
Presentation transcript:

Molecular Dynamics Simulations on a GPU in OpenCL Alex Cappiello

Background  How do a group of molecules interact with one another?  Useful for determining thermodynamic properties.  Can be advantageous over carrying out an experiment to measure.  Simulation is broken into two steps based on Newton’s Laws of Motion.  Find the forces exerted on each particle (hard).  Use the forces to update position (easy).

What’s So Interesting?  A tradeoff between time and accuracy.  Low accuracy limits scientific usefulness.  Timestep on the order of femtoseconds ( s) or smaller to be meaningful.  Small inputs are also not meaningful.  Past work mostly done using MPI & friends on clusters. Less on GPUs.  However, there’s lots of independent parallelism on the table and MPI has to worry about communication.

Approach  Perform the calculations with OpenCL kernels and render with OpenGL.  Use OpenCL-OpenGL interoperability to eliminate CPU-GPU memory transfer.  Naïve solution: for each particle, loop over all other particles and the force between them.  Can we ignore particles beyond a certain distance (F ≈ 0)?  Divergence.

A Tile Decomposition  Still doing the same amount of work.  An OpenCL local group (think CUDA block) handles each of these N/p blocks.  Use memory locality to our advantage—load the tile’s particle positions into __local memory (__shared__ in CUDA terms).  How big should a block be? Image credit: NVIDIA

Embedded video removed. See

Conclusions  Despite divergence, ignoring long distance interactions made a much bigger difference than the tiling method.  The opposite of my expectations.  Would likely be amplified by a more complex force calculation.  Tiling marginally better than naïve method.  No clear ideal size for the local group.  Should be at least 64 (double the natural SIMD width).  Larger groups (512 and 1024) generally not as good.  Although I didn’t measure the benefit of OpenCL-OpenGL interoperability, it was definitely a huge potential bottleneck avoided.