Co-processors for speeding up drug design algorithms Advait Jain Priyanka Jindal Pulkit Gambhir.

Slides:



Advertisements
Similar presentations
kinetic vs. potential energy diagrams
Advertisements

ENGS 116 Lecture 101 ILP: Software Approaches Vincent H. Berk October 12 th Reading for today: , 4.1 Reading for Friday: 4.2 – 4.6 Homework #2:
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
(4C) N1 O2 O3 O4 O5 O6 H14 H13 H7 H8 H9 H10 H11 H12 (4F) N1 O2 O3 O4 O5 O6 H13 H14 H7 H8 H9 H10 H11 H12 Supplementary Figure1. Calculated structures of.
CS 483 – SD SECTION BY DR. DANIYAL ALGHAZZAWI (3) Information Security.
Advanced microprocessor optimization Kampala August, 2007 Agner Fog
Molecular Geometry Lewis structures show the number and type of bonds between atoms in a molecule. –All atoms are drawn in the same plane (the paper).
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
L13: Review for Midterm. Administrative Project proposals due Friday at 5PM (hard deadline) No makeup class Friday! March 23, Guest Lecture Austin Robison,
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Written by: Haim Natan Benny Pano Supervisor:
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Written by: Haim Natan Benny Pano Supervisor:
Implementation of DSP Algorithm on SoC. Characterization presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompany engineer : Emilia Burlak.
Accurate 3D Modeling of User Inputted Molecules Computer Systems Lab: Ben Parr Period 6.
A Prototypical Self-Optimizing Package for Parallel Implementation of Fast Signal Transforms Kang Chen and Jeremy Johnson Department of Mathematics and.
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
03/12/20101 Analysis of FPGA based Kalman Filter Architectures Arvind Sudarsanam Dissertation Defense 12 March 2010.
Chapter 121 Chemical Bonding Chapter 12. 2Introduction The properties of many materials can be understood in terms of their microscopic properties. Microscopic.
8.5 Lewis Structures By Ali & Sam. Drawing Lewis Structures (Review) 1) Add up all the valence electrons 1) Add up all the valence electrons If it is.
Molecular structure and covalent bonding Chapter 8.
02/03/10 CSCE 769 Dihedral Angles Homayoun Valafar Department of Computer Science and Engineering, USC.
CUDA Performance Considerations (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.
© David Kirk/NVIDIA and Wen-mei W. Hwu ECE408/CS483/ECE498al, University of Illinois, ECE408 Applied Parallel Programming Lecture 12 Parallel.
DASX : Hardware Accelerator for Software Data Structures Snehasish Kumar, Naveen Vedula, Arrvindh Shriraman (Simon Fraser University), Vijayalakshmi Srinivasan.
VHDL Project Specification Naser Mohammadzadeh. Schedule  due date: Tir 18 th 2.
ESL and High-level Design: Who Cares? Anmol Mathur CTO and co-founder, Calypto Design Systems.
Hardware/Software Partitioning of Floating-Point Software Applications to Fixed-Point Coprocessor Circuits Lance Saldanha, Roman Lysecky Department of.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow.
SimBioSys Inc.© 2004http:// Conformational sampling in protein-ligand complex environment Zsolt Zsoldos SimBioSys Inc., © 2004 Contents:
CS6963 L18: Global Synchronization and Sorting. L18: Synchronization and Sorting 2 CS6963 Administrative Grading -Should have exams. Nice job! -Design.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
CS 484. Iterative Methods n Gaussian elimination is considered to be a direct method to solve a system. n An indirect method produces a sequence of values.
+ Clusters Alternative to SMP as an approach to providing high performance and high availability Particularly attractive for server applications Defined.
© David Kirk/NVIDIA and Wen-mei W. Hwu University of Illinois, CS/EE 217 GPU Architecture and Parallel Programming Lecture 10 Reduction Trees.
CS/EE 217 GPU Architecture and Parallel Programming Midterm Review
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
Lecture 5B Block Diagrams HASH Example.
STL: Maps Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
Co-processors for speeding up drug design algorithms Advait Jain Priyanka Jindal Pulkit Gambhir Under the guidance of: Prof. M Balakrishnan Prof. Kolin.
1 Implementation of Polymorphic Matrix Inversion using Viva Arvind Sudarsanam, Dasu Aravind Utah State University.
Hybridization: Localized Electron Model
WHITEBOARD PRACTICE FINDING THE MISSING ANGLE IN AN ANGLE PAIR.
High Speed Digital Systems Lab June 2008 Acceleration of Economic Calculation Developers: Ayal Ozer and Eyal Efrat Mentor: Michael Yampolsky Black & Scholes.
Calculating Fractal Dimension from Vector Images Kelly Ran.
Parallel accelerator project Final presentation Summer 2008 Student Vitaly Zakharenko Supervisor Inna Rivkin Duration semester.
Duke CPS Iterators: Patterns and STL l Access a container without knowing how it’s implemented ä libtapestry: first, isDone, next, current iterators.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Backprojection Project Update January 2002
Figure: 08-T01.
Chemistry 141 Monday, November 6, 2017 Lecture 26
Exploiting Parallelism
Chemical Bonding.
Binary Code  
Linchuan Chen, Peng Jiang and Gagan Agrawal
ECE408 / CS483 Applied Parallel Programming Lecture 23: Application Case Study – Electrostatic Potential Calculation.
Matlab as a Development Environment for FPGA Design
Vector accelerator array in constrained memory bandwidth
CS/EE 217 – GPU Architecture and Parallel Programming
Co-processors for speeding up drug design algorithms
Parallel Computation Patterns (Reduction)
Searching CLRS, Sections 9.1 – 9.3.
EE 4xx: Computer Architecture and Performance Programming
Intro to Data Structures and ADTs
Accelerating Quantum Chemistry with Batched and Vectorized Integrals
Chip&Core Architecture
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
Force Directed Placement: GPU Implementation
Presentation transcript:

Co-processors for speeding up drug design algorithms Advait Jain Priyanka Jindal Pulkit Gambhir

Objective To design FPGA based hardware accelerators for speeding up the energy minimization process.

Bottleneck Functions Iterate over list of bonds {O(N) elements} Iterate over list of angles {O(N) elements} Iterate over list of dihedrals {O(N) elements} Iterate over list of non-bonded pairs {O(N 2 ) elements} Eval energyEval Energy for stepDiff energy

Mathematical operations EvalEnergy_for_step() DiffEnergy()

Non-bonded List Node structure Float A, B, C (4*3 bytes) Int a1, a2 C is a function of charge q1 and q2 of atoms. 471,282 distinct Cs (3 bytes) A, B Are a function of radius and epsilon of atoms. 192 distinct pairs of A,B (1 byte)

New Data Structure Vector of Distinct Cs Vector of Distinct (A,B) pairs New Node structure 3d coordinates of atoms Int a1, a2 Unsigned common_index 3 1

Generating the proposed Data Structure Given q1, q2 Calculate C Insert (C, C_index) Into the Hash table Key: C, Data: C_Index Node.Common index = C_Index (corresponding to C) Repeat for all non- bonded pairs

Hash Table to distinct C vector (C_index, C)(C, C_index) Hash table Vector of Distinct Cs

Result of new data structure Molecule Size: 2008 VanderList: 2,008,417 AB_Vander list: 136 C_Vanderlist: 21,651 Old DatastructureNew Datastructure 2,008,417 * 20 = 40,168,340 bytes 2,008,417 * * ,651 * 4 = 24,188,696 bytes

Cache Profiling (old vs new) D1 misses: 3,158,603,092 ( 3,152,690,177 rd + 5,912,915 wr) D1 misses: 2,872,958,414 ( 2,868,217,925 rd + 4,740,489 wr) L2d misses: 1,270,584,560 ( 1,266,933,599 rd + 3,650,961 wr) L2d misses: 503,167,419 ( 500,920,315 rd + 2,247,104 wr) L2 misses: 1,270,606,180 ( 1,266,955,219 rd + 3,650,961 wr) L2 misses: 503,188,994 ( 500,941,890 rd + 2,247,104 wr)

Bottleneck Functions Iterate over list of bonds {O(N) elements} Iterate over list of angles {O(N) elements} Iterate over list of dihedrals {O(N) elements} Iterate over list of non-bonded pairs {O(N 2 ) elements} Eval energyEval Energy for stepDiff energy

Split Up code Eval_Energy_for _step(%) Diff_Energy(%) Non-bonded pairs Dihedrals Angles Bonded00.00

Ongoing Work  Multiple threads operating on the Non- bonded list together.  Floating point precision requirement.

Tentative Schedule  Software Profiling August  No. of calls  Cache misses  Effect of parameters  Control Flow Analysis August - September  Flow Diagram  Data parallelism  Floating point precision requirement  Exploring H/W Options September - October  Platform Selection  S/W H/W Partitioning  Implementation October onwards  Analysis