Download presentation
Presentation is loading. Please wait.
Published byDaniella Owens Modified over 9 years ago
1
An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow
2
Overview Introduction and Motivation Background and Previous Work Hardware Compute Engines Results and Performance Conclusions and Future Work
3
1. Introduction and Motivation
4
What is Molecular Dynamics (MD) simulation? Biomolecular simulations Structure and behavior of biological systems Uses classical mechanics to model a molecular system Newtonian equations of motion (F = ma) Compute forces and integrate acceleration through time to move atoms A large scale MD system takes years to simulate
5
Why is this an interesting computational problem? Physical time for simulation1e-4 sec Time-step size1e-15 sec Number of time-steps1e11 Number of atoms in a protein system32,000 Number of interactions1e9 Number of instructions/force calculation1e3 Total number of machine instructions1e23 Estimated simulation time on a petaflop/sec capacity machine 3 years
6
Motivation Special-purpose computers for MD simulation have become an interesting application FPGA technology Reconfigurable Low cost for system prototype Short turn around time and development cycle Latest technology Design portability
7
Objectives Implement the compute engines on FPGA Calculate the non-bonded interactions in an MD simulation (Lennard-Jones and Ewald Direct Space) Explore the hardware resources Study the trade-off between hardware resources and computational precision Analyze the hardware pipeline performance Become the components of a larger project in the future
8
2. Background and Previous Work
9
Lennard-Jones Potential Attraction due to instantaneous dipole of molecules Pair-wise non-bonded interactions O(N 2 ) Short range force Use cut-off radius to reduce computations Reduced complexity close to O(N)
10
Lennard-Jones Potential of Argon gas
11
Electrostatic Potential Attraction and repulsion due to electrostatic charge of particles (long range force) Reformulate using Ewald Summation Decompose to Direct Space and Reciprocal Space Direct Space computation similar to Lennard-Jones Direct Space complexity close to O(N)
12
Ewald Summation - Direct Space
13
Previous Hardware Developments ProjectTechnologyYear MD-GRAPE0.6um1996 MD-Engine0.8um1997 BlueGene/L0.13um2003 MD-GRAPE30.13um2004
14
Recent work - FPGA based MD simulator Transmogrifier-3 FPGA system University of Toronto (2003) Estimated speedup of over 20 times over software with better hardware resources Fixed-point arithmetic, function table lookup, and interpolation Xilinx Virtex-II Pro XC2VP70 FPGA Boston University (2005) Achieved a speedup of over 88 times over software Fixed-point arithmetic, function table lookup, and interpolation
15
MD Simulation software - NAMD Parallel runtime system (Charm++/Converse) Highly scalable Largest system simulated has over 300,000 atoms on 1000 processors Spatial decomposition Double precision floating point
16
NAMD - Spatial Decomposition
17
3. Hardware Compute Engines
18
Purpose and Design Approach Implement the functionality of the software compute object Calculate the non-bonded interactions given the particle information Fixed-point arithmetic, function table lookup, and interpolation Pipelined architecture
19
Compute Engine Block Diagram
20
Function Lookup Table The function to be looked up is a function of |r| 2 (the separation distance between a pair of atoms) Block floating point lookup Partition function based on different precision
21
Function Lookup Table
22
Hardware Testing Configuration
23
4. Results and Performance
24
Simulation Overview Software model Different coordinate precisions and lookup table sizes Obtain the error compared to computation using double precision
25
Total Energy Fluctuation
26
Average Total Energy
27
Operating Frequency Compute EngineArithmetic Core Lennard-Jones43.6 MHz80.0 MHz Ewald Direct Space 47.5 MHz82.2 MHz
28
Latency and Throughput LatencyThroughput Lennard-Jones59 clocks33.33% Ewald Direct Space 44 clocks100%
29
Hardware Improvement Operating frequency: Place-and-route constraints More pipeline stages Throughput: More hardware resources Avoid sharing of multipliers
30
Compared with previous work Pipelined adders and multipliers Block floating point memory lookup Support different types of atoms Lennard-Jones System Latency (clocks) Operating Frequency (MHz) Transmogrifier31126.0 Xilinx Virtex-II5980.0
31
5. Conclusions and Future Work
32
Hardware Precision A combination of fixed-point arithmetic, function table lookup, and interpolation can achieve high precision Similar result in RMS energy fluctuation and average energy Coordinate precision of {7.41} Table lookup size of 1K Block floating memory Data precision maximized Different types of functions
33
Hardware Performance Compute engines operating frequency: Ewald Direct Space 82.2 MHz Lennard-Jones 80.0 MHz Achieving 100 MHz is feasible with newer FPGAs
34
Future Work Study different types of MD systems Simulate computation error with different table lookup sizes and interpolation orders Hardware usage: storing data in block RAMs instead of external ZBT memory
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.