Daniel Blackburn Load Balancing in Distributed N-Body Simulations.

Slides:



Advertisements
Similar presentations
Instructor Notes Lecture discusses parallel implementation of a simple embarrassingly parallel nbody algorithm We aim to provide some correspondence between.
Advertisements

Nearest Neighbor Search
PARTITIONAL CLUSTERING
Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
Memory Management Chapter 7.
Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable supply of ready processes to.
Parallel Strategies Partitioning consists of the following steps –Divide the problem into parts –Compute each part separately –Merge the results Divide.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
1 ME 302 DYNAMICS OF MACHINERY Dynamic Force Analysis Dr. Sadettin KAPUCU © 2007 Sadettin Kapucu.
CS267, Yelick1 Cosmology Applications N-Body Simulations Credits: Lecture Slides of Dr. James Demmel, Dr. Kathy Yelick, University of California, Berkeley.
Hierarchical Methods for the N-Body problem based on lectures by James Demmel
Advanced Topics in Algorithms and Data Structures 1 Rooting a tree For doing any tree computation, we need to know the parent p ( v ) for each node v.
Chapter 7 Memory Management
CSE 160/Berman Mapping and Scheduling W+A: Chapter 4.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
Gravity. Gravitational Field Interpretation: Gravitational Field is the force that a test particle would feel at a point divided by the mass of the test.
Chapter 7 Memory Management
Module on Computational Astrophysics Jim Stone Department of Astrophysical Sciences 125 Peyton Hall : ph :
Parallel Programming: Case Studies Todd C. Mowry CS 495 September 12, 2002.
Parallel Adaptive Mesh Refinement Combined With Multigrid for a Poisson Equation CRTI RD Project Review Meeting Canadian Meteorological Centre August.
Minji-verse(?) CosKASI group workshop 2014 At Sobaek Observatory(11.11~12) Oh, Minji.
More Trees Multiway Trees and 2-4 Trees. Motivation of Multi-way Trees Main memory vs. disk ◦ Assumptions so far: ◦ We have assumed that we can store.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Institute for Mathematical Modeling RAS 1 Dynamic load balancing. Overview. Simulation of combustion problems using multiprocessor computer systems For.
INAF Osservatorio Astrofisico di Catania “ScicomP 9” Bologna March 23 – Using LAPI and MPI-2 in an N-body cosmological code on IBM SP in an N-body.
Chapter 7 Memory Management
Chapter 7 Memory Management Seventh Edition William Stallings Operating Systems: Internals and Design Principles.
SEMILARITY JOIN COP6731 Advanced Database Systems.
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
10.1 – The Distance and Midpoint Formulas. Geometry Review What is the difference between the symbols AB and AB? segment from A to B The length of the.
1 Memory Management Chapter 7. 2 Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CSCI-455/552 Introduction to High Performance Computing Lecture 11.5.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
CE 201- Statics Chapter 9 – Lecture 1. CENTER OF GRAVITY AND CENTROID The following will be studied  Location of center of gravity (C. G.) and center.
An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.
Mayur Jain, John Verboncoeur, Andrew Christlieb [jainmayu, johnv, Supported by AFOSR Abstract Electrostatic particle based models Developing.
Informationsteknologi Wednesday, October 3, 2007Computer Systems/Operating Systems - Class 121 Today’s class Memory management Virtual memory.
Smoothed Particle Hydrodynamics Matthew Zhu CSCI 5551 — Fall 2015.
Barnes Hut N-body Simulation Martin Burtscher Fall 2009.
Data Structures and Algorithms in Parallel Computing Lecture 10.
High-Performance Computing 12.2: Parallel Algorithms.
1 Parallel Applications Computer Architecture Ning Hu, Stefan Niculescu & Vahe Poladian November 22, 2002.
CompSci 100e2.1 1 N-Body Simulation l Applications to astrophysics.  Orbits of solar system bodies.  Stellar dynamics at the galactic center.  Stellar.
Clustering (1) Chapter 7. Outline Introduction Clustering Strategies The Curse of Dimensionality Hierarchical k-means.
Barnes Hut N-body Simulation Martin Burtscher Fall 2009.
UNC Chapel Hill David A. O’Brien Automatic Simplification of Particle System Dynamics David O’Brien Susan Fisher Ming C. Lin Department of Computer Science.
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! 1 ITCS 4/5145 Parallel Computing,
Massively Parallel Cosmological Simulations with ChaNGa Pritish Jetley, Filippo Gioachin, Celso Mendes, Laxmikant V. Kale and Thomas Quinn.
1 Scalable Cosmological Simulations on Parallel Machines Filippo Gioachin¹ Amit Sharma¹ Sayantan Chakravorty¹ Celso Mendes¹ Laxmikant V. Kale¹ Thomas R.
ChaNGa CHArm N-body GrAvity. Thomas Quinn Graeme Lufkin Joachim Stadel Laxmikant Kale Filippo Gioachin Pritish Jetley Celso Mendes Amit Sharma.
1 ChaNGa: The Charm N-Body GrAvity Solver Filippo Gioachin¹ Pritish Jetley¹ Celso Mendes¹ Laxmikant Kale¹ Thomas Quinn² ¹ University of Illinois at Urbana-Champaign.
ChaNGa CHArm N-body GrAvity. Thomas Quinn Graeme Lufkin Joachim Stadel Laxmikant Kale Filippo Gioachin Pritish Jetley Celso Mendes Amit Sharma.
Chapter 7 Memory Management
Memory Management Chapter 7.
Memory Allocation The main memory must accommodate both:
ChaNGa: Design Issues in High Performance Cosmology
Chapter Objectives Chapter Outline
Quadtrees 1.
Course Outline Introduction in algorithms and applications
Lecture 3: Main Memory.
Cosmology Applications N-Body Simulations
Adaptivity and Dynamic Load Balancing
Compact routing schemes with improved stretch
Operating Systems: Internals and Design Principles, 6/E
N-Body Gravitational Simulations
Presentation transcript:

Daniel Blackburn Load Balancing in Distributed N-Body Simulations

N-Body Simulations A simulation of a dynamic system of particles under the interaction of a distance mediated force For instance, a simulation of the stars within a cluster or galaxy under the force of gravity

Barnes-Hut Algorithm N-Body simulations can be computed by direct integration  For each particle, calculate the interaction with every other particle  Running time is O (n 2 ) There are many efficient algorithms for N-Body Simulations Barnes-Hut algorithm is based on treating groups of distant particles as a single entity  Running time is O (n log n)

Barnes-Hut Cont. An oct-tree is constructed to contain the particles  Each node corresponds to a segment of simulation space  The root contains the entire simulation space  The children of each node subdivide the space of the node into 8 equally sized cubic segments  The nodes of any layer are non-overlapping  Each leaf holds 1 particle  Each non-leaf stores the mass and center of gravity of all the particles stored by its children Forces are calculated between a particle and a tree node  Let L be the length of the node and D be the distance between the node's center of gravity and the particle.  If L/D < 1 then calculate the force the node exerts on the particle  Otherwise, compute the interaction between the particle and the 8 children of the node

Interactions for one particle

Distributed Barnes-Hut Naïve N-Body Simulations do not require load balancing  Equal computation required for every particle But in Barnes-Hut  Particles in high density areas require more computations than particles in low density areas.  Nearby particles are treated as individuals, but far away particles are calculated as groups  Particles move during the simulation so a good partitioning at the start may become a poor partitioning by the end of the simulation

Data-Shipping vs. Function Shipping Each process constructs a local tree for the particles it controls Interactions must be computed for particles which reside on other processes Two approaches  Data shipping: each process requests enough of the tree of every other process to compute interactions between its particles and remote process.  Function shipping: A list of particles is sent to every other process to compute  Hybrid: Processes share some information about their trees, and use function shipping otherwise

Static Partitioning, Static Assignment The simulation space is broken into k * N equally sized segments Each process is statically assigned k pieces and is responsible for all particles within its segments Particles may transition between processes as they move Relies on distributed segments to overcome load imbalance Gives up some locality to achieve balance

Static Partitioning, Dynamic Assignment The simulation space is broken into k * N equally sized segments A load is calculated for each segment based on the number of calculations done in the last step Each process is dynamically assigned contiguous pieces and is responsible for all particles within its segments Uses a Morton ordering or Z curve to maximize adjacent segments that are physically contiguous Improves locality and load balancing over static assignment, but at increased cost

Dynamic Partitioning, Dynamic Assignment Processes coordinate to construct a combined tree Each node contains the load experienced during the last step Each process does a walk through the tree claiming nodes up to its share of the total load When a process has claimed its share of the load, it signals the next process where it left off and the next process begins its walk from that point

K-Means Clustering Simulation space is divided into N clusters using a K- Means clustering algorithm Ensures that close particles are assigned to the same process regardless of where they are in simulation space Centroids and cluster assignments are recomputed each step The distance function for each centroid is scaled based on the load of the cluster during the last step