Download presentation
Presentation is loading. Please wait.
Published byHilary Easter Cook Modified over 9 years ago
1
Co-processors for speeding up drug design algorithms Advait Jain Priyanka Jindal Pulkit Gambhir Under the guidance of: Prof. M Balakrishnan Prof. Kolin Paul
2
Objective To design FPGA based hardware accelerators for speeding up the energy minimization process.
3
Approach to the problem Familiarization with the code Software profiling Identifying bottleneck procedures/loops Compiler level optimizations H/w - S/w partitioning Where to partition API’s to export Hardware Design Performance Analysis
4
Overall Control Flow
5
Bottleneck Functions
7
Split Up code Eval_Energy_for _step(%) Diff_Energy(%) Non-bonded pairs 68.6129.10 Dihedrals00.5400.56 Angles00.1700.12 Bonded00.00
8
Bottleneck Functions Iterate over list of bonds {O(N) elements} Iterate over list of angles {O(N) elements} Iterate over list of dihedrals {O(N) elements} Iterate over list of non-bonded pairs {O(N 2 ) elements} Eval energyEval Energy for stepDiff energy
9
Molecule Size v/s Time (log plot) Average Slope = 2.03
10
Energy v/s CG Steps We are here
11
Non-bonded List Node structure Float A, B, C (4*3 bytes) Int a1, a2 C is a function of charge q1 and q2 of atoms. 471,282 distinct Cs (3 bytes) A, B Are a function of radius and epsilon of atoms. 192 distinct pairs of A,B (1 byte)
12
New Data Structure Vector of Distinct Cs Vector of Distinct (A,B) pairs New Node structure 3d coordinates of atoms Int a1, a2 Unsigned common_index 3 1
13
Result of new data structure Molecule Size: 2008 VanderList: 2,008,417 AB_Vander list: 136 C_Vanderlist: 21,651 Old Data Structure New Data Structure Projected Data Structure 2,008,417 * 20 ~ 40 MB 2,008,417 * 12 + 136 * 8 + 21,651 * 4 ~ 24 MB 2,008,417 * 8 + 136 * 8 + 21,651 * 4 ~ 16 MB Improvement in cache performance
14
Sorting to improve performance Consecutive nodes of van-der list can point randomly anywhere in the C and (A,B) vectors Scope for further improving Cache performance Radix sort on the van-der list First bucket sort on the C-index Second stable bucket sort on the (A,B)-index Sequential access of (A,B) vector
15
Cache Profiling (unsorted vs sorted) L1D refsL1D missesL2 refs 1,773,145,080 Rd: 1,451,802,230 Wr: 321,342,785 44,016,787 Rd: (3%) 43,429,781 Wr: (.1826 %) 587,006 44,754,341 Rd: 44,167,335 Wr: 587,006 1,842,686,500 Rd: 1,495,124,238 Wr: 347,562,262 29,287,877 Rd: (1.9%) 28,470,590 Wr:(.235%) 817,287 30,152,893 Rd: 29,335,606 Wr: 817,287 Test Case : Molecule of size 413 atoms with 25 SD and 100 CG steps
16
Converting to floating point All the code written with a double point precision Double point difficult to replicate in hardware Need to test feasibility of conversion to single precision
17
Single Point Precision minEnergyCG() diffEnergy()evalEnergy_for_step() moveStep() Precision lost here Instability introduced here Resulting in NaN
18
Single Point Precision Removed the instability Parabolic interpolation replaced by lnsearch() whenever points are colinear. Time taken to evaluate the energy increased. Increase in the number of calls to evalEnergy_for_step().
19
Slow Float Vs Double: Time Plot
20
Control Flow
21
Single Point Precision (Molecule Size: 2008 SD:100 CG: 150) # of Calls to: EvalEnergyforStep() Double 642 Slow Float 893 From: minEnergyCG()450 From: lnSearch()192443 DoubleSlow Float # of Calls to: lnSearch() 100177 evalEnergyForStep() per lnSearch() 1.922.5
22
Reducing the number of Calls minEnergyCG: Parabolic interpolation – which 3pts to choose. Lnsearch : Iteratively calculates the step size. When to stop the iteration determined by 2 tolerances. What we did: Pts for parabolic interpolation are further apart Increased the tolerances till the time to minimize the energy was same as double. Then profiled to check the actual energy.
23
Fast Float Vs Double: Time Plot
24
Fast Float Vs Double: Energy Plot
25
Our conclusions from this exercise Located the source of instability. However converting to float increased the time required for the code to run. Increasing tolerances again made the code fast. The energy in case of float did not agree well with double computation.
26
Feedback from SCF-Bio team They are interested primarily in “relaxing” the molecule. Actual energy is not of any consequence. To check float-code, metric should be error between the molecular structures (float vs double).
27
Start Structure Double Relaxed Structure Float Relaxed Structure RMS Distance New Checking Methodology Acceptance: < 0.5
28
RMS Distance vs CG Steps We are here
29
Comparison with new metric
30
Tasks completed this semester Software Profiling No. of calls Cache misses Effect of parameters Control Flow Analysis Flow Diagram Data parallelism Floating point precision requirement Exploring H/W Options Platform Selection S/W H/W Partitioning
31
Ongoing work + next semester Setting up building blocks ZBT RAM access PCI Interface Floating Point Unit Combining blocks for a simple implementation Refining the implementation Multiple compute engines Multiple PCI cards
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.