An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow.

Slides:

Advertisements

Similar presentations

Time averages and ensemble averages

Advertisements

ECE 734: Project Presentation Pankhuri May 8, 2013 Pankhuri May 8, point FFT Algorithm for OFDM Applications using 8-point DFT processor (radix-8)

1 NAMD - Scalable Molecular Dynamics Gengbin Zheng 9/1/01.

A Digital Laboratory “In the real world, this could eventually mean that most chemical experiments are conducted inside the silicon of chips instead of.

Survey of Molecular Dynamics Simulations By Will Welch For Jan Kubelka CHEM 4560/5560 Fall, 2014 University of Wyoming.

10/21/20091 Protein Explorer: A Petaflops Special-Purpose Computer System for Molecular Dynamics Simulations Makoto Taiji, Tetsu Narumi, Yousuke Ohno,

Implementation methodology for Emerging Reconfigurable Systems With minimum optimization an appreciable speedup of 3x is achievable for this program with.

Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached Bohua Kou Jing gao.

K-means clustering –An unsupervised and iterative clustering algorithm –Clusters N observations into K clusters –Observations assigned to cluster with.

Molecular Dynamics Simulation (a brief introduction)

The Protein Folding Problem David van der Spoel Dept. of Cell & Mol. Biology Uppsala, Sweden

Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.

Characterization Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik

Seven Minute Madness: Special-Purpose Parallel Architectures Dr. Jason D. Bakos.

A Scalable FPGA-based Multiprocessor for Molecular Dynamics Simulation Arun Patel 1, Christopher A. Madill 2,3, Manuel Saldaña 1, Christopher Comis 1,

Evaluation of Fast Electrostatics Algorithms Alice N. Ko and Jesús A. Izaguirre with Thierry Matthey Department of Computer Science and Engineering University.

Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.

Presenting: Itai Avron Supervisor: Chen Koren Final Presentation Spring 2005 Implementation of Artificial Intelligence System on FPGA.

Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.

Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.

Joo Chul Yoon with Prof. Scott T. Dunham Electrical Engineering University of Washington Molecular Dynamics Simulations.

HW/SW CODESIGN OF THE MPEG-2 VIDEO DECODER Matjaz Verderber, Andrej Zemva, Andrej Trost University of Ljubljana Faculty of Electrical Engineering Trzaska.

Molecular Modeling Part I Molecular Mechanics and Conformational Analysis ORG I Lab William Kelly.

Ross Brennan On the Introduction of Reconfigurable Hardware into Computer Architecture Education Ross Brennan

February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.

1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.

University of Veszprém Department of Image Processing and Neurocomputing Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs.

Molecular Dynamics Simulation Solid-Liquid Phase Diagram of Argon ZCE 111 Computational Physics Semester Project by Gan Sik Hong (105513) Hwang Hsien Shiung.

Efficient FPGA Implementation of QR

Molecular Dynamics Collection of [charged] atoms, with bonds – Newtonian mechanics – Relatively small #of atoms (100K – 10M) At each time-step – Calculate.

Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.

NDA Confidential. Copyright ©2005, Nallatech.1 Implementation of Floating- Point VSIPL Functions on FPGA-Based Reconfigurable Computers Using High- Level.

Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.

HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.

Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa.

AMIN FARMAHININ-FARAHANI CHARLES TSEN KATHERINE COMPTON FPGA Implementation of a 64-bit BID-Based Decimal Floating Point Adder/Subtractor.

Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,

J. Greg Nash ICNC 2014 High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations J. Greg.

NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.

Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.

Protein Explorer: A Petaflops Special Purpose Computer System for Molecular Dynamics Simulations David Gobaud Computational Drug Discovery Stanford University.

Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.

Molecular dynamics simulation of strongly coupled QCD plasmas Peter Hartmann 1 Molecular dynamics simulation of strongly coupled QCD plasmas Péter Hartmann.

Floating-Point Divide and Square Root for Efficient FPGA Implementation of Image and Signal Processing Algorithms Xiaojun Wang, Miriam Leeser

Wang Chen, Dr. Miriam Leeser, Dr. Carey Rappaport Goal Speedup 3D Finite-Difference Time-Domain.

A Scalable FPGA-based Multiprocessor Arun Patel 1, Christopher A. Madill 2,3, Manuel Saldaña 1, Christopher Comis 1, Régis Pomès 2,3, Paul Chow 1 Presented.

Hardware Accelerator for Combinatorial Optimization Fujian Li Advisor: Dr. Areibi.

Reconfigurable Computing Aspects of the Cray XD1 Sandia National Laboratories / California Craig Ulmer Cray User Group (CUG 2005) May.

ANTON D.E Shaw Research.

Chapter One Introduction to Pipelined Processors

Spatiotemporal Saliency Map of a Video Sequence in FPGA hardware David Boland Acknowledgements: Professor Peter Cheung Mr Yang Liu.

MSE Presentation 1 Lakshmikanth Ganti

Anton, a Special-Purpose Machine for Molecular Dynamics Simulation By David E. Shaw et al Presented by Bob Koutsoyannis.

Review Session BS123A/MB223 UC-Irvine Ray Luo, MBB, BS.

Molecular dynamics (4) Treatment of long-range interactions Computing properties from simulation results.

MSc in High Performance Computing Computational Chemistry Module Parallel Molecular Dynamics (i) Bill Smith CCLRC Daresbury Laboratory

Co-processors for speeding up drug design algorithms Advait Jain Priyanka Jindal Pulkit Gambhir Under the guidance of: Prof. M Balakrishnan Prof. Kolin.

A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.

A Pattern Language for Parallel Programming Beverly Sanders University of Florida.

Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.

Reconfigurable acceleration of robust frequency-domain echo cancellation C. H. Ho 1, K.F.C.Yiu 2, J. Huo 3, S. Nordholm 3 and W. Luk 1 1.Department of.

Full Design. DESIGN CONCEPTS The main idea behind this design was to create an architecture capable of performing run-time load balancing in order to.

Parallel Molecular Dynamics A case study : Programming for performance Laxmikant Kale

Material Point Method (MPM) Wednesday, 10/2/2002 Variational principle Particle discretization Grid interpolation.

Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign.

Implementation of the TIP5P Potential

Maintaining Adiabaticity in Car-Parrinello Molecular Dynamics

Centar ( Global Signal Processing Expo

Jian Huang, Matthew Parris, Jooheung Lee, and Ronald F. DeMara

Application-Specific Customization of Soft Processor Microarchitecture

Presentation transcript:

An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow

Overview  Introduction and Motivation  Background and Previous Work  Hardware Compute Engines  Results and Performance  Conclusions and Future Work

1. Introduction and Motivation

What is Molecular Dynamics (MD) simulation?  Biomolecular simulations  Structure and behavior of biological systems  Uses classical mechanics to model a molecular system  Newtonian equations of motion (F = ma)  Compute forces and integrate acceleration through time to move atoms  A large scale MD system takes years to simulate

Why is this an interesting computational problem? Physical time for simulation1e-4 sec Time-step size1e-15 sec Number of time-steps1e11 Number of atoms in a protein system32,000 Number of interactions1e9 Number of instructions/force calculation1e3 Total number of machine instructions1e23 Estimated simulation time on a petaflop/sec capacity machine 3 years

Motivation  Special-purpose computers for MD simulation have become an interesting application  FPGA technology  Reconfigurable  Low cost for system prototype  Short turn around time and development cycle  Latest technology  Design portability

Objectives  Implement the compute engines on FPGA  Calculate the non-bonded interactions in an MD simulation (Lennard-Jones and Ewald Direct Space)  Explore the hardware resources  Study the trade-off between hardware resources and computational precision  Analyze the hardware pipeline performance  Become the components of a larger project in the future

2. Background and Previous Work

Lennard-Jones Potential  Attraction due to instantaneous dipole of molecules  Pair-wise non-bonded interactions O(N 2 )  Short range force  Use cut-off radius to reduce computations  Reduced complexity close to O(N)

Lennard-Jones Potential of Argon gas

Electrostatic Potential  Attraction and repulsion due to electrostatic charge of particles (long range force)  Reformulate using Ewald Summation  Decompose to Direct Space and Reciprocal Space  Direct Space computation similar to Lennard-Jones  Direct Space complexity close to O(N)

Ewald Summation - Direct Space

Previous Hardware Developments ProjectTechnologyYear MD-GRAPE0.6um1996 MD-Engine0.8um1997 BlueGene/L0.13um2003 MD-GRAPE30.13um2004

Recent work - FPGA based MD simulator Transmogrifier-3 FPGA system  University of Toronto (2003)  Estimated speedup of over 20 times over software with better hardware resources  Fixed-point arithmetic, function table lookup, and interpolation Xilinx Virtex-II Pro XC2VP70 FPGA  Boston University (2005)  Achieved a speedup of over 88 times over software  Fixed-point arithmetic, function table lookup, and interpolation

MD Simulation software - NAMD  Parallel runtime system (Charm++/Converse)  Highly scalable  Largest system simulated has over 300,000 atoms on 1000 processors  Spatial decomposition  Double precision floating point

NAMD - Spatial Decomposition

3. Hardware Compute Engines

Purpose and Design Approach  Implement the functionality of the software compute object  Calculate the non-bonded interactions given the particle information  Fixed-point arithmetic, function table lookup, and interpolation  Pipelined architecture

Compute Engine Block Diagram

Function Lookup Table  The function to be looked up is a function of |r| 2 (the separation distance between a pair of atoms)  Block floating point lookup  Partition function based on different precision

Function Lookup Table

Hardware Testing Configuration

4. Results and Performance

Simulation Overview  Software model  Different coordinate precisions and lookup table sizes  Obtain the error compared to computation using double precision

Total Energy Fluctuation

Average Total Energy

Operating Frequency Compute EngineArithmetic Core Lennard-Jones43.6 MHz80.0 MHz Ewald Direct Space 47.5 MHz82.2 MHz

Latency and Throughput LatencyThroughput Lennard-Jones59 clocks33.33% Ewald Direct Space 44 clocks100%

Hardware Improvement Operating frequency:  Place-and-route constraints  More pipeline stages Throughput:  More hardware resources  Avoid sharing of multipliers

Compared with previous work  Pipelined adders and multipliers  Block floating point memory lookup  Support different types of atoms Lennard-Jones System Latency (clocks) Operating Frequency (MHz) Transmogrifier Xilinx Virtex-II5980.0

5. Conclusions and Future Work

Hardware Precision  A combination of fixed-point arithmetic, function table lookup, and interpolation can achieve high precision  Similar result in RMS energy fluctuation and average energy  Coordinate precision of {7.41}  Table lookup size of 1K  Block floating memory  Data precision maximized  Different types of functions

Hardware Performance  Compute engines operating frequency:  Ewald Direct Space 82.2 MHz  Lennard-Jones 80.0 MHz  Achieving 100 MHz is feasible with newer FPGAs

Future Work  Study different types of MD systems  Simulate computation error with different table lookup sizes and interpolation orders  Hardware usage: storing data in block RAMs instead of external ZBT memory