Presentation is loading. Please wait.

Presentation is loading. Please wait.

An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow.

Similar presentations


Presentation on theme: "An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow."— Presentation transcript:

1 An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow

2 Overview  Introduction and Motivation  Background and Previous Work  Hardware Compute Engines  Results and Performance  Conclusions and Future Work

3 1. Introduction and Motivation

4 What is Molecular Dynamics (MD) simulation?  Biomolecular simulations  Structure and behavior of biological systems  Uses classical mechanics to model a molecular system  Newtonian equations of motion (F = ma)  Compute forces and integrate acceleration through time to move atoms  A large scale MD system takes years to simulate

5 Why is this an interesting computational problem? Physical time for simulation1e-4 sec Time-step size1e-15 sec Number of time-steps1e11 Number of atoms in a protein system32,000 Number of interactions1e9 Number of instructions/force calculation1e3 Total number of machine instructions1e23 Estimated simulation time on a petaflop/sec capacity machine 3 years

6 Motivation  Special-purpose computers for MD simulation have become an interesting application  FPGA technology  Reconfigurable  Low cost for system prototype  Short turn around time and development cycle  Latest technology  Design portability

7 Objectives  Implement the compute engines on FPGA  Calculate the non-bonded interactions in an MD simulation (Lennard-Jones and Ewald Direct Space)  Explore the hardware resources  Study the trade-off between hardware resources and computational precision  Analyze the hardware pipeline performance  Become the components of a larger project in the future

8 2. Background and Previous Work

9 Lennard-Jones Potential  Attraction due to instantaneous dipole of molecules  Pair-wise non-bonded interactions O(N 2 )  Short range force  Use cut-off radius to reduce computations  Reduced complexity close to O(N)

10 Lennard-Jones Potential of Argon gas

11 Electrostatic Potential  Attraction and repulsion due to electrostatic charge of particles (long range force)  Reformulate using Ewald Summation  Decompose to Direct Space and Reciprocal Space  Direct Space computation similar to Lennard-Jones  Direct Space complexity close to O(N)

12 Ewald Summation - Direct Space

13 Previous Hardware Developments ProjectTechnologyYear MD-GRAPE0.6um1996 MD-Engine0.8um1997 BlueGene/L0.13um2003 MD-GRAPE30.13um2004

14 Recent work - FPGA based MD simulator Transmogrifier-3 FPGA system  University of Toronto (2003)  Estimated speedup of over 20 times over software with better hardware resources  Fixed-point arithmetic, function table lookup, and interpolation Xilinx Virtex-II Pro XC2VP70 FPGA  Boston University (2005)  Achieved a speedup of over 88 times over software  Fixed-point arithmetic, function table lookup, and interpolation

15 MD Simulation software - NAMD  Parallel runtime system (Charm++/Converse)  Highly scalable  Largest system simulated has over 300,000 atoms on 1000 processors  Spatial decomposition  Double precision floating point

16 NAMD - Spatial Decomposition

17 3. Hardware Compute Engines

18 Purpose and Design Approach  Implement the functionality of the software compute object  Calculate the non-bonded interactions given the particle information  Fixed-point arithmetic, function table lookup, and interpolation  Pipelined architecture

19 Compute Engine Block Diagram

20 Function Lookup Table  The function to be looked up is a function of |r| 2 (the separation distance between a pair of atoms)  Block floating point lookup  Partition function based on different precision

21 Function Lookup Table

22 Hardware Testing Configuration

23 4. Results and Performance

24 Simulation Overview  Software model  Different coordinate precisions and lookup table sizes  Obtain the error compared to computation using double precision

25 Total Energy Fluctuation

26 Average Total Energy

27 Operating Frequency Compute EngineArithmetic Core Lennard-Jones43.6 MHz80.0 MHz Ewald Direct Space 47.5 MHz82.2 MHz

28 Latency and Throughput LatencyThroughput Lennard-Jones59 clocks33.33% Ewald Direct Space 44 clocks100%

29 Hardware Improvement Operating frequency:  Place-and-route constraints  More pipeline stages Throughput:  More hardware resources  Avoid sharing of multipliers

30 Compared with previous work  Pipelined adders and multipliers  Block floating point memory lookup  Support different types of atoms Lennard-Jones System Latency (clocks) Operating Frequency (MHz) Transmogrifier31126.0 Xilinx Virtex-II5980.0

31 5. Conclusions and Future Work

32 Hardware Precision  A combination of fixed-point arithmetic, function table lookup, and interpolation can achieve high precision  Similar result in RMS energy fluctuation and average energy  Coordinate precision of {7.41}  Table lookup size of 1K  Block floating memory  Data precision maximized  Different types of functions

33 Hardware Performance  Compute engines operating frequency:  Ewald Direct Space 82.2 MHz  Lennard-Jones 80.0 MHz  Achieving 100 MHz is feasible with newer FPGAs

34 Future Work  Study different types of MD systems  Simulate computation error with different table lookup sizes and interpolation orders  Hardware usage: storing data in block RAMs instead of external ZBT memory


Download ppt "An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow."

Similar presentations


Ads by Google