N-Body Simulation Michael Mersic CS680.

Slides:



Advertisements
Similar presentations
Formal Computational Skills
Advertisements

Instructor Notes Lecture discusses parallel implementation of a simple embarrassingly parallel nbody algorithm We aim to provide some correspondence between.
Boyce/DiPrima 9th ed, Ch 2.7: Numerical Approximations: Euler’s Method Elementary Differential Equations and Boundary Value Problems, 9th edition, by.
N-Body I CS 170: Computing for the Sciences and Mathematics.
Developing Computer Simulations Using Object Oriented Programming. The Three Body Problem: A Case Study Mike O’Leary & Shiva Azadegan Towson University.
CS 282.  Any question about… ◦ SVN  Permissions?  General Usage? ◦ Doxygen  Remember that Project 1 will require it  However, Assignment 2 is good.
Some foundations of Cellular Simulation Nathan Addy Scientific Programmer The Molecular Sciences Institute November 19, 2007.
Dan Iannuzzi Kevin Pine CS 680. Outline The Problem Recap of CS676 project Goal of this GPU Research Approach Parallelization attempts Results Difficulties.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
Parallel Computation of the Minimum Separation Distance of Bezier Curves and Surfaces Lauren Bissett, Nicholas Woodfield,
Combined Lagrangian-Eulerian Approach for Accurate Advection Toshiya HACHISUKA The University of Tokyo Introduction Grid-based fluid.
CSCE Monte Carlo Methods When you can’t do the math, simulate the process with random numbers Numerical integration to get areas/volumes Particle.
Molecular Dynamics Classical trajectories and exact solutions
Math 3C Euler’s Method Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB.
Chapter 12: Simulation and Modeling Invitation to Computer Science, Java Version, Third Edition.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Lecture 10 CSS314 Parallel Computing
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
Chapter 9 Testing a Claim
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Math 3120 Differential Equations with Boundary Value Problems Chapter 2: First-Order Differential Equations Section 2-6: A Numerical Method.
Scientific Methods Error Analysis Random and Systematic Errors Precision and Accuracy.
Simulation Time-stepping and Monte Carlo Methods Random Number Generation Shirley Moore CS 1401 Spring 2013 March 26, 2013.
Motion in One Direction Chapter : Displacement and Velocity Main Objectives:  Describe motion in terms of frame of reference, displacement, time,
MECN 3500 Inter - Bayamon Lecture 3 Numerical Methods for Engineering MECN 3500 Professor: Dr. Omar E. Meza Castillo
Molecular Dynamics Simulations on a GPU in OpenCL Alex Cappiello.
CFD Refinement By: Brian Cowley. Overview 1.Background on CFD 2.How it works 3.CFD research group on campus for which problem exists o Our current techniques.
ADAPTIVE CONTROL SYSTEMS
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Chapter 3.
QCAdesigner – CUDA HPPS project
1 Frisbee Physics Simulation Charles George Advisor: Brian Postow 03/05/05.
Introduction to Loops For Loops. Motivation for Using Loops So far, everything we’ve done in MATLAB, you could probably do by hand: Mathematical operations.
Asteroid Modeling and Prediction Introduction Asteroid impact with Earth is an ever present danger for the inhabitants of our planet. The desire to be.
3 DIFFERENTIATION RULES. We have:  Seen how to interpret derivatives as slopes and rates of change  Seen how to estimate derivatives of functions given.
L19: Putting it together: N-body (Ch. 6) November 22, 2011.
Particle Systems. Applications Particle systems are broadly defined for: Explosion Cloth Fluid And more… It is integrated into many animation software,
Hill Climbing In a Banking Application (PThreads Version) -by Nitin Agarwal Kailash Aurangabadkar Prashant Jain Rashmi Kankaria Nan Zhang.
Funnel Sort*: Cache Efficiency and Parallelism
1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Hypothesis Testing.
+ Chapter 9 Testing a Claim 9.1Significance Tests: The Basics 9.2Tests about a Population Proportion 9.3Tests about a Population Mean.
1 ITCS 4/5145 Parallel Programming, B. Wilkinson, Nov 12, CUDASynchronization.ppt Synchronization These notes introduce: Ways to achieve thread synchronization.
P2 Chapter 8 CIE Centre A-level Pure Maths © Adam Gibson.
Introduction.
Chapter 12: Simulation and Modeling
Parallel Computing and Parallel Computers
MASS Java Documentation, Verification, and Testing
Chapter 9: Testing a Claim
Chapter 9: Testing a Claim
Cart on Ramp Lab.
Prepared by Vince Zaccone
Chapter 9: Testing a Claim
Chapter 9: Testing a Claim
Navigation In Dynamic Environment
Chapter 9: Testing a Claim
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Chapter 9: Testing a Claim
Chapter 9: Testing a Claim
Advanced Games Development Game Physics
Chapter 9: Testing a Claim
We’ll need the product rule.
Chapter 9: Testing a Claim
Chapter 9: Testing a Claim
Chapter 9: Testing a Claim
Synchronization These notes introduce:
Chapter 9: Testing a Claim
Unit 5: Hypothesis Testing
Chapter 9: Testing a Claim
Presentation transcript:

N-Body Simulation Michael Mersic CS680

What is N-Body Simulation? Simulating the interaction of some number N of objects in a system. A physics interpretation is the movement of stars under the influence of gravity in a galaxy. An introduction to N-Body Simulation that I relied on extensively is available here: http://www.artcompsci.org/msa/web/index.html

How is N-Body Simulation done? A simulation starts with the bodies in some initial position and initial velocity (for this project, these are randomly generated.) Then for each time step acceleration of each Body is calculated based on the influence of gravity of each other Body. Velocity is updated based on acceleration and position is updated based on velocity.

Acceleration Calculation The slow part of N-Body simulation is the acceleration calculation. Each body is under the influence of gravity from each other body in the system. In the serial version, this is a n(n-1)/2 calculation (since acceleration from i to j and j to i can be calculated in the same loop.) double rjix, rjiy, rjiz; rjix = px[j] - px[i]; rjiy = py[j] - py[i]; rjiz = pz[j] - pz[i]; double r2 = rjix*rjix + rjiy*rjiy + rjiz*rjiz; double r3 = r2*sqrt(r2); ax[i] += m[j] * rjix / r3; ay[i] += m[j] * rjiy / r3; az[i] += m[j] * rjiz / r3; ax[j] -= m[i] * rjix / r3; ay[j] -= m[i] * rjiy / r3; az[j] -= m[i] * rjiz / r3;

N-Body Simulation Position and Velocity update The simplest update step is a Forward Euler algorithm: While these are nice simple equations to implement, it is not very accurate. Basically for the entire time step dt a body is moving in the v_i direction which is only correct at time i.

Leap-Frog Algorithm The problem with Forward Euler is that it is not very accurate. As dt is made 10 times smaller, the accuracy improves 10 times. Using better methods, better accuracy can be achieved. With the Leap Frog Algorithm we expect to get 100 times more accurate as dt is made 10 time smaller.

Leap-Frog Algorithm Position are defined on integer time steps and velocity is defined on integer + ½ time steps. Velocity is updated by (a_i + a_i+1) / 2 which is the approximate value of a halfway between time steps i and i + 1.

How to Parallelize N-Body Position and velocity are given. Then an initialization step sets a at timestep 0 based on the given positions, mass, and gravity. Then for each time step: Update a body's velocity based on ½ its acceleration at t-1. Update a body's position based on its velocity. Update a body's acceleration based on position and mass of every other body. Update a body's velocity based on ½ its acceleration at time t. As mentioned before, the acceleration update is an n(n-1) / 2 operation. This is the part of the algorithm that needs to be parallelized. The idea is to have each MPI process have N/p of the bodies in the N-Body system. For each time step, each process will update it's acceleration based on the local N-Bodies. Then the process will communicate in a ring its bodies to the next process and receiving bodies from the previous process. This is an n(n-1) algorithm, not n(n-1)/2 like the original. Therefore, at least 3 processors need to be used to achieve improved performance.

CUDA Parallelization The CUDA parallelization is straight forward. Assign 1 GPU per MPI process then have k of n n-body updates occur in parallel on k-CUDA- threads. Enough blocks are launched so that there is 1 CUDA thread for each of the n-bodies.

Correctness? Serial Version – To verify correctness I implemented the serial algorithm and used examples from http://www.artcompsci.org/msa/web/index.html to show that I seem to be getting reasonable results. The implementation may not be perfect, however I believe it is correct enough to give accurate timings. Parallel Version – I determine the parallel version is correct if it matches the output of the serial version for at least a few time steps. Because the floating point math is being done in a different order on the parallel version, numeric errors will creep in and the answers will diverge. Currently the MPI version matches the Serial version to 6 decimal places at 20 time steps for a 9 body system with 3 processes. But does differ at 100 time steps.

Note: Timings, Speedup and Performance graphs are based on a run of 10 steps in the simulation. For example, it took the serial version 1400 seconds to run 10 steps at size 64,000. Timings

Speedup

Performance