Parallel computing in Computational chemistry

Slides:



Advertisements
Similar presentations
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
Advertisements

 The central processing unit (CPU) interprets and executes instructions.  The “brains” of the computer.  The speed of the processor is how fast it.
Lincoln University Canterbury New Zealand Evaluating the Parallel Performance of a Heterogeneous System Elizabeth Post Hendrik Goosen formerly of Department.
The Protein Folding Problem David van der Spoel Dept. of Cell & Mol. Biology Uppsala, Sweden
A many-core GPU architecture.. Price, performance, and evolution.
Reference: Message Passing Fundamentals.
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Direct Self-Consistent Field Computations on GPU Clusters Guochun.
GPU-accelerated Evaluation Platform for High Fidelity Networking Modeling 11 December 2007 Alex Donkers Joost Schutte.
5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.
Institute for Mathematical Modeling RAS 1 Dynamic load balancing. Overview. Simulation of combustion problems using multiprocessor computer systems For.
Microcontroller Presented by Hasnain Heickal (07), Sabbir Ahmed(08) and Zakia Afroze Abedin(19)
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.
GPU Architecture and Programming
An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow.
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
Computer Organization & Assembly Language © by DR. M. Amer.
ICAL GPU 架構中所提供分散式運算 之功能與限制. 11/17/09ICAL2 Outline Parallel computing with GPU NVIDIA CUDA SVD matrix computation Conclusion.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lecture 15: Basic Parallel Programming Concepts.
MSc in High Performance Computing Computational Chemistry Module Parallel Molecular Dynamics (i) Bill Smith CCLRC Daresbury Laboratory
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Computer Operation. Binary Codes CPU operates in binary codes Representation of values in binary codes Instructions to CPU in binary codes Addresses in.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Multicore Applications in Physics and Biochemical Research Hristo Iliev Faculty of Physics Sofia University “St. Kliment Ohridski” 3 rd Balkan Conference.
MAHARANA PRATAP COLLEGE OF TECHNOLOGY SEMINAR ON- COMPUTER PROCESSOR SUBJECT CODE: CS-307 Branch-CSE Sem- 3 rd SUBMITTED TO SUBMITTED BY.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Sub-fields of computer science. Sub-fields of computer science.
CS203 – Advanced Computer Architecture
GCSE OCR Computing A451 The CPU Computing hardware 1.
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Introducing Computer Systems
A Level Computing – a2 Component 2 1A, 1B, 1C, 1D, 1E.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Chapter 2.1 CPU.
Parallel Plasma Equilibrium Reconstruction Using GPU
CS 286 Computer Architecture & Organization
Distributed Processors
Concepts of CS Lecture 1.
Implementation Issues
Scalability of Intervisibility Testing using Clusters of GPUs
Super Computing By RIsaj t r S3 ece, roll 50.
THE CPU i Bytes 1.1.
INTRODUCTION TO MICROPROCESSORS
Introduction to Parallelism.
What happens inside a CPU?
Leiming Yu, Fanny Nina-Paravecino, David Kaeli, Qianqian Fang
Real-Time Ray Tracing Stefan Popov.
Phnom Penh International University (PPIU)
Components of Computer
INTRODUCTION TO MICROPROCESSORS
General Architecture of Digital Computer
Implementing Simplified Molecular Dynamics Simulation in Different Parallel Paradigms Chao Mei April 27th, 2006 CS498LVK.
“The Brain”… I will rule the world!
BIC 10503: COMPUTER ARCHITECTURE
Microprocessor & Assembly Language
Kiran Subramanyam Password Cracking 1.
Sathish Vadhiyar Courtesy: Dr. David Walker, Cardiff University
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
Hybrid Programming with OpenMP and MPI
Computer Classification
Computer simulation studies of forced rupture kinetics of
Computational issues Issues Solutions Large time scale
Presentation transcript:

Parallel computing in Computational chemistry

Why? What Happens in molecular level?

Computational chemistry is a branch of chemistry that uses computer simulation to assist in solving chemical problems. It uses methods of theoretical chemistry, incorporated into efficient computer programs, to calculate the structures and properties of molecules.

J. Chem. Phys. 27, 1208 (1957); doi: 10.1063/1.1743957 PNO: 96 PBC MD simulation IBM-704

IBM-704 The first mass-produced computer with floating-point arithmetic hardware was introduced by IBM in 1954 32 bit execute up to 12,000 calculations per second (O kFLOPS) Today: Petaplops 10^15 ( A quadrillion (thousand trillion) calculations per second ) Future: exaFLOPS 10^18 (a billion billion calculations per second) !!!!

1964: Rahman; MD simulation of liquid Ar 1960: Vineyard group; Simulated radiation damage of a Cu crystal with MD 1964: Rahman; MD simulation of liquid Ar 1969: Barker and Watts; Monte Carlo simulation of water 1971: Rahman and Stillinger; MD simulation of water Cray 1 (1976) Cray T3E (1995)

Ref: www.maximumpc.com Year speed unit 1985 33 MHz 1989 100 1993 233 1996 385 1997 450 1999 570 1.4 GHz 2000 2 2001 2.25 2004 2.3 3.2 2006

PARALLEL COMPUTING: is a type of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved at the same time.

Instructions: 1- clean the windows 2- clean the door 3- clean the roof 4- clean the table

problem instructions processor problem instructions processor processor processor processor

Required conditions for parallel processing Having suitable hardware The problem can be parallelized Having suitable algorithm

Required conditions for parallel processing Having suitable hardware The problem can be parallelized Having suitable algorithm

HARDWARE:Parallel hardware architectures Memory CPU Shared Memory Memory Control Unit Arithmetic Logic Input Output C P U Memory CPU network Distributed Memory

HARDWARE: Computational Units (CPU) CPU: Central Processing Unit (basic arithmetic, logical, control, input/output) Single Core CPU Dual Core Quad Core

HARDWARE: Computational Units (GPU) GPU: Graphics Processing Unit CPU GPU

HARDWARE: Computational Units (GPU) Ref: www.ks.uiuc.edu/Research/namd Molecular dynamics simulation of protein insertion process NCSA Lincoln Cluster performance (8 Intel cores and 2 NVIDIA Tesla GPUs per node, 1 million atoms)

HARDWARE: Computational Units (GPU) GPUs need a fundamentally different architecture. One would have to program an application specifically for a GPU that uses different techniques. GPU constraints: It needs new programming languages. It needs new programming paradigm. NAMD (www.ks.uiuc.edu/Research/namd) LAMMPS (lammps.sandia.gov) Gromacs (www.gromacs.org) DL_POLY 4 (www.stfc.ac.uk//research/app/ccg/software/DL_POLY/44516.aspx) GAMESS 2012 closed shell MP2 and closed shell CCSD(T) energy (www.msg.ameslab.gov/gamess)

Required conditions for parallel processing Having suitable hardware The problem can be parallelized Having suitable algorithm

The problem can be parallelized

The problem can be parallelized x(1)=100. DO 10 i=2,1000 x(i)=sin(x(i-1)) 10 CONTINUE i=2 : X(2)=sin(x(1)) i=3 : X(3)=sin(x(2)) i=4 : X(4)=sin(x(3))

The problem can be parallelized 2 1 5 3 7 6 9 4 for (i = 0; i < n; i++) for (j = 0; i < n; j++) B= 6 1 2 3 4 5 9 8 -8 C[i][j] = 0; for (k= 0; k < n; k++) C[i][j] += a[i][k] * b[k][j] end for end for end for

The problem can be parallelized 2 1 7 5 3 1 6 A= 9 2 3 6 4 7 2 6 1 4 5 2 3 6 5 B= 1 9 4 8 -8 5

Required conditions for parallel processing Having suitable hardware The problem can be parallelized Having suitable algorithm

Parallel algorithms in computational chemistry-QM

Obtain initial guess for Parallel algorithms in computational chemistry-QM Obtain initial guess for density matrix Fock matrix formation Two-electron integrals Iterate Diagonalize Fock matrix Density formation Annihilation Integral evaluation Others Fock matrix formation Form new density matrix

Parallel algorithms in computational chemistry-QM . . . Ref: DOI: 10.1039/c002859b

Parallelization Strategies in MD Molecular dynamics (MD) is a computer simulation technique where the time evolution of a set of interacting atoms is followed by integrating their equation of motion.

Parallelization Strategies in MD Initialize Force Calculation Others forces Motion Analysis Summarize

Parallelization Strategies in MD-Replicated Data . . . Ref: ROM. J. BIOCHEM., 46, 2, 129-148 (2009)

Parallelization Strategies in MD-Replicated Data Advantages: Simplicity (this is relatively easy parallel strategy to implement, requiring only minor changes to scalar code. dis-advantages: Memory usage is high (due to duplication of data) Communication costs are quite high

Parallelization Strategies in MD-Force Decomposition Properties: Communication operations scale as rather than N Memory cost for positions and force vectors are reduced by the factor Retains the simplicity of the RD technique. Ref: DOI: 10.1007/1-4020-2670-5_15

Parallelization Strategies in MD-Spatial Decomposition rcut Properties: The communication costs can be minimized. Needs more sophisticated programming. Ref: DOI: 10.1002/(SICI)1096-987X(199703)18:4<478::AID-JCC3>3.0.CO;2-Q

How to run a parallel program efficiently? A lot of independent programs run as serial using a lot of CPU. (embarrassingly parallel) A problem divides in to some parts and each parts is run on each CPU. Load balancing Communication cost Communication cost Computation cost The number of CPU Amount of memory The chosen algorithm

How to run a parallel program efficiently?

How to run a parallel program efficiently?

How to run a parallel program efficiently?

How to run a parallel program efficiently?

How to run a parallel program efficiently? Hexanitroethane C2N6O12 B3LYP/6-31g(df, pd) Single point

How to run a parallel program efficiently?

THANKS!