Setup distribution of N particles

Slides:



Advertisements
Similar presentations
Time averages and ensemble averages
Advertisements

Courant and all that Consistency, Convergence Stability Numerical Dispersion Computational grids and numerical anisotropy The goal of this lecture is to.
Formal Computational Skills
Ryuji Morishima (UCLA/JPL). N-body code: Gravity solver + Integrator Gravity solver must be fast and handle close encounters Special hardware (N 2 ):
3.4 N-body Simulation Introduction to Programming in Java: An Interdisciplinary Approach · Robert Sedgewick and Kevin Wayne · Copyright.
Parallel Strategies Partitioning consists of the following steps –Divide the problem into parts –Compute each part separately –Merge the results Divide.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
Module on Computational Astrophysics Jim Stone Department of Astrophysical Sciences 125 Peyton Hall : ph :
CS267, Yelick1 Cosmology Applications N-Body Simulations Credits: Lecture Slides of Dr. James Demmel, Dr. Kathy Yelick, University of California, Berkeley.
Module on Computational Astrophysics Professor Jim Stone Department of Astrophysical Sciences and PACM.
Computer-Aided Analysis on Energy and Thermofluid Sciences Y.C. Shih Fall 2011 Chapter 6: Basics of Finite Difference Chapter 6 Basics of Finite Difference.
Parallel Mesh Refinement with Optimal Load Balancing Jean-Francois Remacle, Joseph E. Flaherty and Mark. S. Shephard Scientific Computation Research Center.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Module on Computational Astrophysics Jim Stone Department of Astrophysical Sciences 125 Peyton Hall : ph :
Parallel Programming: Case Studies Todd C. Mowry CS 495 September 12, 2002.
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Lecture 2 Number Representation and accuracy
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.
CISE301_Topic11 CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4:
Lecture Notes Dr. Rakhmad Arief Siregar Universiti Malaysia Perlis
ME451 Kinematics and Dynamics of Machine Systems Numerical Solution of DAE IVP Newmark Method November 1, 2013 Radu Serban University of Wisconsin-Madison.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
Scientific Computing Numerical Solution Of Ordinary Differential Equations - Euler’s Method.
Sensitivity derivatives Can obtain sensitivity derivatives of structural response at several levels Finite difference sensitivity (section 7.1) Analytical.
Engineering Analysis – Computational Fluid Dynamics –
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
5. Integration method for Hamiltonian system. In many of formulas (e.g. the classical RK4), the errors in conserved quantities (energy, angular momentum)
An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.
Discretization Methods Chapter 2. Training Manual May 15, 2001 Inventory # Discretization Methods Topics Equations and The Goal Brief overview.
Circuit Simulation using Matrix Exponential Method Shih-Hung Weng, Quan Chen and Chung-Kuan Cheng CSE Department, UC San Diego, CA Contact:
The story so far… ATM 562 Fovell Fall, Convergence We will deploy finite difference (FD) approximations to our model partial differential equations.
Barnes Hut N-body Simulation Martin Burtscher Fall 2009.
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! 1 ITCS 4/5145 Parallel Computing,
NUMERICAL ANALYSIS I. Introduction Numerical analysis is concerned with the process by which mathematical problems are solved by the operations.
Numerical Integration Methods
Transfer Functions Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: The following terminology.
EEE 431 Computational Methods in Electrodynamics
Introduction to Numerical Methods I
Chapter 30.
Setup distribution of N particles
ChaNGa: Design Issues in High Performance Cosmology
Parallel Application Case Studies
Class Notes 9: Power Series (1/3)
Finite Volume Method for Unsteady Flows
Chapter 27.
Differential Equations
Ordinary differential equaltions:
Course Outline Introduction in algorithms and applications
Chapter 26.
Topic 3 Discretization of PDE
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
Copyright © Cengage Learning. All rights reserved.
Cosmology Applications N-Body Simulations
Topic 3 Discretization of PDE
Linearization of Nonlinear Models
ECE 576 Power System Dynamics and Stability
SKTN 2393 Numerical Methods for Nuclear Engineers
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
Numerical Integration Methods
SKTN 2393 Numerical Methods for Nuclear Engineers
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
Chapter 1 / Error in Numerical Method
N-Body Gravitational Simulations
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.
Topic 3 Discretization of PDE
Presentation transcript:

Setup distribution of N particles Compute forces between particles Evolve positions using ODE solver Display/analyze results

Leap-frog scheme Define positions (x) and forces (F) at time level n velocities (v) at time level n+1/2 Then, for ith particle n-1 n-1/2 n n+1/2 n+1 time x, F v x, F v x, F To start integration, need initial x and V at two separate time levels. Specify x0 and v0 and then integrate V to Dt/2 using high-order scheme

Accuracy of leap-frog scheme Can show the truncation error in leap-frog is second order in Dt Evolution eqns: Replace Vn-1/2 in second equation using first Substitute this back into first equation Rearrange This is central difference formula for F=ma.

Let X(t) be the “true” (analytic) solution. Then EQ. 1 Use a Taylor expansion to compute Xn+1 and Xn-1 Thus Substitute these back into eq. 1 Truncation error O(Dt2)

Truncation versus round-off error Note the error we have just derived is truncation error Unavoidable result of approximating solution to some order in x or t Completely unrelated to round-off error, which results from representing the continuous set of real numbers with a finite number of bits. Truncation error can be reduced by using smaller step Dt, or higher-order algorithm Round-off error can be reduced by using higher precision (64 bit rather than 32, etc.), and by ordering operations carefully. In general, truncation error is much larger than round-off error

Stability of leap-frog scheme Easiest to illustrate with an example. Suppose the force is given by a harmonic oscillator, that is: Then “true” (analytic) solution is Substitute force law into leap-frog FDE for F=ma Look for oscillatory solutions of the form x = x0eiwt giving

This is good! Leap-frog (correctly) gets oscillatory solutions, but at a modified frequency Note this gives correct solution (W) as Dt --> 0 For WDt > 2, frequency becomes complex. Real part of w’ gives oscillatory solution, imaginary part gives exponentially growing (unstable) solution. So stability limit is Dt < 2/W; or Dt < 2/[(dF/dx)/m]1/2 in general Above is a simple example of a von-Neumann stability analysis

Consistency of leap-frog scheme Leap-frog is consistent in the sense that as Dt --> 0, the difference equations converge to the differential equations Leap-frog is also a symplectic method (time symmetric). Scheme has same accuracy for Dt negative. n-1 n-1/2 n n+1/2 n+1 time x, F v x, F v x, F

Efficiency of leap-frog scheme Leap-frog is extremely efficient in terms of computational cost (only 12 flops per particle excluding force evaluation) Also extremely efficient in terms of memory storage (does not require storing multiple time levels). All the work (and memory) is in force evaluation: 10N flops per particle for direct summation To update all particle positions in one second on a 1 Gflop processor requires N < 104 Extra efficiency can be gained by using different timesteps for each particle (more later).

Variable time steps with leap-frog For efficiency, need to take variable time steps (evolve particles at center of cluster on smaller timestep than particles at edge).

Variable time steps with leap-frog However, this destroys symmetry of leap-frog; greatly increases truncation error. n-1 n-1/2 n n+1/2 n+1 time x, F v x, F v x, F

But, variable timestep leap-frog can be symmetrized Hut, Makino, & McMillan 1995

Force evaluation with variable time steps. Now particle positions are known at different time levels. Greatly complicates force calculation. Must compute derivatives of force wrt time, and use Taylor expansion to compute total force on particle at current position. The Good: Allows higher-order (Hermite) integration methods. The Bad: This just makes force evaluation even more expensive! The Ugly: Direct N-body must be optimized if we are to go beyond 104 particles.

Solving the force problem with hardware. Jun Makino, U. Tokyo Special purpose hardware to compute force:

GRAPE-6 The 6th generation of GRAPE (Gravity Pipe) Project Gravity calculation with 31 Gflops/chip 32 chips / board ⇒ 0.99 Tflops/board 64 boards of full system is installed in University of Tokyo ⇒ 63 Tflops On each board, all particle data are set onto SRAM memory, and each target particle data is injected into the pipeline, then acceleration data is calculated Gordon Bell Prize at SC2000, SC2001 (Prof. Makino, U. Tokyo) also nominated at SC2002

Andromeda – 2 million light years away Do we really need to compute force from every star for distant objects? Andromeda – 2 million light years away

Solving the force problem with software -- tree codes Distance = 25 times size

Organize particles into a tree Organize particles into a tree. In Barnes-Hut algorithm, use a quadtree in 2D

In 3D, Barnes-Hut uses an octree

If angle subtended by the particles contained in any node of tree is smaller than some criterion, then treat all particles as one. Results in an Nlog(N) algorithm.

Alternative to Barnes-Hut is KD tree. KD tree is binary - extremely efficient Requires N to be power of 2 Nnodes = 2N-1

Parallelizing tree code. Best strategy is to distribute particles across processors. That way, work of computing forces and integration is distributed across procs. Challenge is load balancing Equal particles  equal work. Solution: Assign costs to particles based on the work they do Work unknown and changes with time-steps Insight : System evolves slowly Solution: Count work per particle, and use as cost for next time-step.

A Partitioning Approach: ORB Orthogonal Recursive Bisection: Recursively bisect space into subspaces with equal work Work is associated with bodies, as before Continue until one partition per processor High overhead for large no. of processors

Another Approach: Costzones Insight: Tree already contains an encoding of spatial locality. Costzones is low-overhead and very easy to program

Space Filling Curves Morton Order Peano-Hilbert Order